22

In Linux, how do /etc/hosts and DNS work together to resolve hostnames to IP addresses?

  1. if a hostname can be resolved in /etc/hosts, does DNS apply after /etc/hosts to resolve the hostname or treat the resolved IP address by /etc/hosts as a "hostname" to resolve recursively?
  2. In my browser (firefox and google chrome), when I add to /etc/hosts:

    127.0.0.1 google.com www.google.com
    

    typing www.google.com into the address bar of the browsers and hitting entering won't connect to the website. After I remove that line from /etc/hosts, I can connect to the website. Does it mean that /etc/hosts overrides DNS for resolving hostnames?

    After I re-add the line to /etc/hosts, I can still connect to the website, even after refreshing the webpage. Why doesn't /etc/hosts apply again, so that I can't connect to the website?

Thanks.

heemayl
  • 56,300
Tim
  • 101,790
  • 11
    Beware that many Web browsers implement their own DNS servers and DNS cache and do not consult any name-lookup mechanisms that have been configured on the system. In other words, some Web browsers completely ignore /etc/hosts and the locally-defined name servers. It's quite confusing to witness the first time around. (Looking at you, Chromium-based browsers!) – Christopher Feb 10 '19 at 16:25
  • @Christopher I was coming here to say the same thing. Related https://unix.stackexchange.com/questions/363498/why-does-chromium-not-cache-dns-for-more-than-a-minute/363501#363501 – Rui F Ribeiro Feb 10 '19 at 17:46
  • @Christopher After I re-add the line to /etc/hosts, I can still connect to the website, even after refreshing the webpage. Why doesn't /etc/hosts apply again, so that I can't connect to the website? Is it because of DNS cache of Firefox? – Tim Feb 10 '19 at 22:40
  • @RuiFRibeiro This Chromium build seems to respect /etc/hosts and the system-defined DNS servers: (https://github.com/Eloston/ungoogled-chromium). Installation on macOS with Homebrew: brew cask install eloston-chromium. – Christopher Feb 28 '19 at 14:20

3 Answers3

32

This is dictated by the NSS (Name Service Switch) configuration i.e. /etc/nsswitch.conf file's hosts directive. For example, on my system:

hosts:    files mdns4_minimal [NOTFOUND=return] dns

Here, files refers to the /etc/hosts file, and dns refers to the DNS system. And as you can imagine whichever comes first wins.

Also, see man 5 nsswitch.conf to get more idea on this.


As an aside, to follow the NSS host resolution orderings, use getent with hosts as database e.g.:

getent hosts example.com
heemayl
  • 56,300
  • 1
    Thanks. In my part 2, is it because my web browser's DNS server does not work, but web browser's DNS cache works? – Tim Feb 10 '19 at 23:41
  • How does systemd.resolver affects resolutions? Where do NIS and LDAP fit into the resolution system? What order follows a MacOS system or a Windows system?. –  Feb 11 '19 at 00:40
  • @Tim Yes, your browser is fetching the data from cache. – heemayl Feb 11 '19 at 08:41
10

To answer just your last question: /etc/hosts doesn't apply again immediately because firefox is caching the last hostname it got for google.com; if you want it to always fetch it again, you'll have to set network.dnsCacheExpiration to 0 in about:config. More info (though a bit outdated) here. Sorry if this is offtopic.


As a sidenote, many programs don't use the standard resolver (getaddrinfo(3), getnameinfo(3) [1]) because it sucks.

First, the interface is not asynchronous; any moderately complex program will have to spawn a separate thread doing just the getaddrinfo() and then invent its own protocol to communicate with it (and let's not even enter into getaddrinfo_a(), which is sending a signal upon completion, so it's even worse).

Second, the resolver implementation in glibc (the standard C library in linux) is horrible, expecting you to let it pull random dynamic objects into the address space via dlopen() behind your back, and making it impossible to contain it in any way or use it in statically linked executables.

Since many programs don't use the standard resolver directly, they also don't bother to replicate its behavior exactly, and ignore some or all of /etc/resolv.conf, /etc/hosts, /etc/nsswitch.conf or /etc/gai.conf.

[1] and don't even mention the non-reentrant, ipv4-only gethostbyname(), which was deprecated since ages.

  • Thanks. What do you mean "non-reentrant"? – Tim Feb 11 '19 at 11:42
  • 2
    It means that if you're doing a google = GHBN("google.com"); facebook = GHBN("facebook.com") you may end up with both google and facebook containing the address of facebook.com. When the two calls are done in different threads, it's even funnier: you may end with an address which is half google and half facebook or complete garbage. –  Feb 11 '19 at 11:58
  • What has replaced gethostbyname() now? – Tim Feb 11 '19 at 12:01
  • 1
    getaddrinfo is the standard function for that, but is itself brokrn, as I already explained, so it's not used as is by browsers or other real-life apps. –  Feb 11 '19 at 13:07
  • Indeed firefox and Chrome use their own resolves, for instance. Thanks for the insightful notes. – Rui F Ribeiro Jul 31 '19 at 13:30
8

The file /etc/hosts and the DNS don't work together. They provide independent resolutions of names (network names).

The glue that links them is inside /etc/nsswitch.conf for linux systems. In /etc/netsvc.conf for AIX servers, in the system for Windows and could be listed with lookupd -configuration (search for LookupOrder, similar to: Cache FF DNS NI DS) in MacOS systems.

The actual order becomes complex and usually convoluted as each name resolution service could (and many times do) look inside other levels of resolution. Like dnsmasq (a light DNS server generally at 127.0.0.1:53, or ::1:53 (or both)) usually reads and includes the /etc/hosts file contents. Or like systemd.resolver (a basic resolver that should only resolve un-dotted names like mycomputer) calls directly DNS resolutions for dotted names (mycomputer.here.dev.) under some conditions.

In general, services are called in order and the first one that doesn't fail wins and is accepted as the correct address. The general basic order is: /etc/hosts (file), mDNS (un-dotted names), DNS, NIS, NIS+, LDAP. In some linux systems there is a last resort resolution for the computer hostname in the service myhostname

For example, in this system (from cat /etc/nsswitch):

hosts:          files mdns4_minimal [NOTFOUND=return] dns myhostname

Note that the very old (glibc 2.4 and earlier) order entry set in /etc/host.conf as:

order hosts,bind,nis

Only apply to the files (file /etc/hosts) name service.

The effects on this (linux) client computer related to NIS and LDAP are (usually) controlled by the DNS server used (bind, unbound, etc.).

so:

  1. If a hostname can be resolved in /etc/hosts, does DNS apply after /etc/hosts to resolve the hostname or treat the resolved IP address by /etc/hosts as a "hostname" to resolve recursively?

None.

If a hostname can be resolved in /etc/hosts, the DNS doesn't apply (if files is before DNS).

nor is the resolved IP address treated as a "hostname".

It simply is: the resolved address.

browser

A browser could use any method to resolve a name (after it has checked its cache of resolved names). Only if it uses a system provided method the order given above apply. The browser, as any program, could choose to contact any DNS server directly.

If the system order has /etc/hosts before DNS, it means that an entry in that file will take precedence to DNS resolution service.

So:

  1. ... Does it mean that /etc/hosts overrides DNS for resolving hostnames?

Yes (if the browser use the system provided resolution).

Why doesn't /etc/hosts apply again, so that I can't connect to the website?

Only until the browser internal cache is cleared (or it times out) for that specific name is that name searched outside of the browser again.

If the browser has a name resolved in its cache, the browser uses it again.

Use this to clear the cache.

Or simply close (wait a while) and re-start the browser.