A Brief Introduction to DNS

December 26, 2024 · 11 min read · Page View:

In 1983, Paul Mockapetris proposed a Domain Name System architecture in RFC 882 and RFC 883.

If you have any questions, feel free to comment below.

In my mind, DNS is the key of the internet. I always believe if you control the DNS, you control the Internet world. So let us get started to know the DNS.

Hosts #

Long long ago, if we want to access a computer, we need to know its IP address. But it’s hard to remember, and if the computer’s IP is changed, we need to notify the others.

So we need to make a list to map the computer’s name to its IP address and save it in every computer as well as updating from a specific computer which to maintain the list. And the list is called hosts.(eg. timerring: 88.88.88.88) And this is the origin of ARPANET.

But with the development of Internet, the number of IP is increasing, the hosts file is too large, and the names will be conflicts. Then comes the DNS.

DNS #

Paul Mockapetris proposed the Domain Name System in 1983, which is a distributed database that maps domain names to IP addresses.

So every time we want to access a website, we just need to query the website domain name to the DNS server and get the corresponding IP address.

We don’t need the local hosts records anymore.(The hosts file of the computers are empty.)

DHCP #

The ip of DNS server may be dynamic(for residential broadband). Every time you go online, it will be allocated by the gateway, which is so-called DHCP mechanism. (Dynamic Host Configuration Protocol)

And it may be assigned the fixed address. You can check the DNS server ip of your Linux in /etc/resolv.conf.

DNS protocol #

First we need a specific rule to ensure the domain name is unique. So just like th e address in real world, from the street to city, the domain also obey the rule home.google.com.

The level of domain #

So how can the DNS server know the ip of every domain? The answer is hierarchical query. Look back to the info of query math.stackexchange.com.. There is a pot . in the end, which means the root domain. It is usually omitted. So the level is as follows:

root domain: .
top-level domain(TLD): such as .com.
second-level domain(SLD): This level domain can be registered by user normally.
host: user can assign the name of host, such as www

Domain Name Resource Record #

Every domain has a corresponding record in the DNS, and its format is:

Domain_name Time_to_live Class Type Value

Domain_name: The domain name.
Time_to_live: The time to live of the record.
Class: Most of the time is IN(Internet).
Type: The type of the record.
Value: The value of the record.

The type of the record #

The type of the record is as follows:

A: address record, return the IPv4 of domain
AAAA: address record, return the IPv6 of domain
NS: Name server record. Every level of domain has its own NS record. This record point out the server of this level domain. These server know the every record of sublevel of domain.(Authoritative Name Server)
MX: Mail eXchange record: return the server address of receiving email.
CNAME: Canonical Name record: return another domain, which means the domain is a springboard for another domain.
PTR: Pointer Record: PTR is used to check if the ip actually possesses the domain which it claims to.

DNS server #

Second, we have the unique domain name, but we query the same DNS server is not realistic. Now we need to split the server according to the top level domain(TLD) to form the DNS zone.

eg.

Now every zone has a master server and many slave servers to backup and expedite the query. These servers are called Authoritative Name Servers.

And it save the two types of records:

This zone’s domain name resource records.
This zone’s parent DNS and sub-DNS server records(mainly NS records).

Now you can find that the A and B zone don’t have the parent DNS server, so how to ensure they know each other? The answer is the root DNS server ..

You can find the root DNS server here: root DNS server.

Domain Name Resolution #

Every time you connect to the network, you will get default DNS server(operator provide) or you can use the public DNS, and you query the domain name to the DNS server(so-called local DNS server).

If the domain name is actually in the same zone, then the local DNS server will return the record directly.
Else the local DNS server will query the root DNS server(The root domain server is fixed, which is built in the DNS server.) to get the parent DNS server of the domain. And then query the DNS server recursively(eg. www.google.com: .-> com -> google.com -> www.google.com -> xx.xx.xx.xx).

Caching Mechanism #

Most of us usually visit the just 20% websites or we may visit the different pages of the same website, so our browser and OS can cache the records of these websites to improve the query efficiency.

So the Time_to_live segment is used to determine the cache time of the record. For those stable domain eg. google.com, the Time_to_live is set to a large value. And for those who often changes the domain resolution results, the Time_to_live is set to a small value.

Meanwhile, the DNS server will also cache the records of the domain just as our computer do.

DNS leaking #

Check your DNS is leaking or not. https://ipleak.net/ It uses the random sub domain to record the last DNS servers to the authorative DNS.
Besides, check this https://browserleaks.com/webrtc Avoid the WebRTC, it will leak your public ip because it will send the stun packet via UDP, and the UDP somehow cannot be proxied by the http proxy. But even you use the socks5 proxy, it will still leak ip because the browser like chrome will not give the UDP packet to the proxy. (If you use the tun mode or router proxy, you still need to check the node has the UDP proxy in the configuration.)

The main reason of DNS leaking is that the DNS request is plaintext, so every server in the request process will know what you are requesting. If the website you visit does not need to DNS request, then it can directly connect or proxy. But if the website does not hit the rules, it will send the DNS request locally. And the local request will only used to match the rules(redir-host), so the mode is abandoned by the official.

Methods to avoid DNS leaking:

DoH: DNS over HTTPS mainly HTTP/2 443 (RFC 8484)
DoT: DNS over TLS mainly UDP 853
Make all requests encrypted and send to remote node, and let the remote node process all the requests. (Fake IP mode: have the TCP connection and domain name, the proxy client can easily package it using SOSCKS5 or some other protocol and send to the remote node.)

For more details, you can refer to my blog Understanding Clash Through Configuration.

Example process #

To begin with, you should know the OSI model. I have drawn a good image to explain it. You can also read my previous blog Real Computer Network.
OSI model

So just imagine the process of go surfing the internet. eg. you query google.com on your browser, and then you get the page, what happened?

In the common case, when you purchase the broadband, the operator will provide you a Fiber - optic Modem, and you will buy a router. The router connects to the internet via PPPoE and get the public WAN ip (in fact still a intranet ip) and the DNS servers ip (common two DNS servers). Your router is as the gateway of your local network, so it will has own local network ip, and it will allocate the ip and DNS server ip to your devices through DHCP(commonly the DNS ip and the gateway ip are all the router’s local ip).

Your browser first checks the browser’s cache to see if it has the ip of google.com, and then checks the OS’s cache(include the host file if there is the mapping relationship). If there isn’t, it will send a DNS request.(eg. tell me the ip of google.com)
In the transport layer, the source port is eg.222 and the default destination DNS port is 53.
In the network layer, the source ip is your computer’s ip 192.168.1.10 and the destination ip is the DNS server’s ip 8.8.8.8. But the DNS ip you cannot find locally, so you need to send the DNS request to the gateway you connect. And because the communication via MAC address in the same network, so it will be processed in data link layer.
In data link layer, the source MAC address AA-AA-AA-AA is your computer’s MAC address and the destination MAC address CC-CC-CC-CC (get through ARP protocol) is the gateway’s MAC address.
Then it will be sent through NIC and in the cable.
The switch will receive the packet and forward it to the gateway(eg.router).
In the data link layer of router, router finds the MAC is itself and resolve it pass to the network layer. But it cannot find the DNS 8.8.8.8 in its routing table, so it will send the packet to the default router(in the public network).
Before sends to the public network, the router will use NAT to change the source private ip to the public ip(the WAN ip of router).
In the public network, the routers will find and change the MAC addressed to forward the packet to the next router.
Then the DNS server will receive the packet, and resolve the packet, in the transport layer, it find the destination port is 53, so it knows it is a DNS request. And it resolve the ip of google.com and return the ip.
And the return process is similar to the request process. Finally, after those, your computer will receive the ip of google.com and request the ip of google.com to obtain the page, and the process is similar as above.

But sometimes, the process won’t be so smooth, due to the DNS server is overseas, so the traffic needs to go through the public exit port(except for using the IPLC intranet of ISP). Every packet will be checked, thus causing the DNS pollution（tampers a not exist ip）, TCP reset(sends the RST packet in advance toreject the connection request), block ip or active detection.

dig #

The dig command is a powerful tool used for querying Domain Name System (DNS) servers.

eg. dig baidu.com which is equivalent to dig a baidu.com (record type is A).

# You will see the query parameters and statistics.
; <<>> DiG 9.10.6 <<>> baidu.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17961
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

# This is the query content `A` record is the abbreviation for address.
;; QUESTION SECTION:
;baidu.com.			IN	A

# This is the answer from server. `6` means that TTL which is the abbreviation of `Time to live`.
;; ANSWER SECTION:
baidu.com.		6	IN	A	198.18.28.63

# This is the conclusion of the query. 
# You can find the DNS of your machine. The #53 means the port, which is the default port of DNS.
;; Query time: 1 msec
;; SERVER: 198.19.0.3#53(198.19.0.3)
;; WHEN: Thu Dec 26 22:50:14 CST 2024
;; MSG SIZE  rcvd: 43

dig +short baidu.com: Show in brief only get the IP
dig @4.2.2.2 baidu.com: Query through the public DNS server via @DNSserver, such as 8.8.8.8 of google and 4.2.2.2 of Level3.
dig ns com: You can query the each level of domain separately.
dig mx github.com: Query the mail server of the domain.
dig -x ip: Query the PTR record of the ip.
dig +trace baidu.com: Show the trace of the query.
- First list all the root servers, then query these ip for the toplevel and sublevel server.
- But in fact, most of time the server first reply will be cached, and you will just see the cached result (except you are the first user to query a very niche domain, at that time you can actually see the whole process from the root server to the TLD to the SLD to the host). Normally the cache results just like the following:

; <<>> DiG 9.10.6 <<>> +trace baidu.com
;; global options: +cmd
.			221	IN	NS	m.root-servers.net.
.			221	IN	NS	d.root-servers.net.
.			221	IN	NS	h.root-servers.net.
.			221	IN	NS	l.root-servers.net.
.			221	IN	NS	f.root-servers.net.
.			221	IN	NS	g.root-servers.net.
.			221	IN	NS	a.root-servers.net.
.			221	IN	NS	j.root-servers.net.
.			221	IN	NS	b.root-servers.net.
.			221	IN	NS	k.root-servers.net.
.			221	IN	NS	i.root-servers.net.
.			221	IN	NS	c.root-servers.net.
.			221	IN	NS	e.root-servers.net.
;; Received 239 bytes from 198.19.0.3#53(198.19.0.3) in 9 ms

baidu.com.		6	IN	A	198.18.28.63
;; Received 43 bytes from 198.18.29.158#53(c.root-servers.net) in 0 ms

Above are thirteen root domain server all over the world. From A.ROOT-SERVERS.NET to M.ROOT-SERVERS.NET.

dig cname facebook.github.io: The cname is mainly for the internal jump of domain, which provide the server configuration with more convenience. This is transparent to users.

...

;; ANSWER SECTION:
facebook.github.io. 3370    IN  CNAME   github.map.fastly.net.
github.map.fastly.net.  600 IN  A   103.245.222.133

We can see that the CNAME of facebook.github.io points to the github.map.fastly.net, and return the ip of github.map.fastly.net. So when we tend to change the ip, we can directly change the configuration of github.map.fastly.net, there is no need to change the facebook.github.io anymore.

You cannot set other records anymore after you set CNAME. This will avoid the conflicts with other records because CNAME means a substitute.(eg. test.com and try.com have their own MX records, if the rules are not the same, then there will be some conflicts)

whois #

Find the register info of domain

whois github.com

Pay attention to the similar command whoami which is used to check the user name of system.

Reference: