2

(Edit: This applies to much more than postfix; it's just where I noticed/debugged it)

I've installed postfix, but when it starts up and creates its chroot, it gets an empty copy of /etc/resolv.conf which means it can't resolve any domains.

I added some logging in various network scripts to see when resolve.conf is being wiped/re-populated and when postfix is starting...

Here's the log from a boot:

Sun Mar 27 19:12:30 UTC 2016
  EXECUTE: root + /sbin/resolvconf2 -d eth0 -f

Sun 27 Mar 19:12:31 UTC 2016
  Postfix startup script

Sun Mar 27 19:12:37 UTC 2016
  EXECUTE: root + /sbin/resolvconf2 -a eth0

Note there are 7 seconds between resolvconf being called to wipe the config and then re-populate it. For this period, /etc/resolv.conf is effectively empty. It's between these calls that postfix (and many other services) starts up.

It seems strange that services are being started in this huge gap between resolvconf being cleared/recreated.

This is a clean install of Raspbian with Postfix installed and no other changes.

EDIT: Looking in syslog, there are actually tons of things failing due to no DNS in the period between dhcpcd starting and finishing. Seems flawed that other services are trying to start concurrently?

terdon
  • 242,166
  • What happens if instead of on if up you run it on if post-up? – Rui F Ribeiro Mar 27 '16 at 16:10
  • Should not make much difference, I just notice the dhcp client is running in pre-up. – Rui F Ribeiro Mar 27 '16 at 16:14
  • @RuiFRibeiro Possibly that'll fix it by fluke (just being a bit later), but I'm actually quite interested in understanding what's happening more than just fixing it. What's happening seems illogical? =D – Danny Tuppeny Mar 27 '16 at 16:32
  • Maybe not, we can be getting into a race condition again. Try inserting an strace in the postfix invocating and redirecting stderr to a file – Rui F Ribeiro Mar 27 '16 at 16:36
  • 1
    @RuiFRibeiro All figured out; see my answer. It's down to a crazy default setting on the Pi... Wish I'd found it sooner! – Danny Tuppeny Mar 27 '16 at 21:20

3 Answers3

5

Ok, after many wasted hours, I found this in raspi-conf...

raspi-conf

So it seems this is broken-by-design. The default of "fast boot" comes at the expense of random failure. On a clean Raspian install even without postfix installed, my syslog contains lots of DNS errors from various scripts during the DHCP process.

So, the fix is to set this to "Slow" boot, which creates a script that waits for the network at boot. Edit: You can script the call to raspi-config like this:

sudo raspi-config nonint do_wait_for_network Slow

This fixes both the postfix issue that I noticed, and also clears up a ton of DNS-related errors normally written to syslog at boot.

I think as default behaviour, this is crazy. I've posted feedback on GitHub.

  • congrats on the investigation work! Now that you talk about it, I also undid a setup on my ARM where the dhcp client went ahead without waiting for the IP...or was it on NetBSD? Cannot remember, check it out. – Rui F Ribeiro Mar 28 '16 at 09:52
  • 1
    There's some justification for their decision in the GitHub issue (they agree neither option is ideal) which makes sense. Interestingly Raspbian Lite (GUI-less version) defaults other way - presumably assuming you're less likely to run without a network. I might switch to that if I can't get the GUI stuff I wanted working anyway! :) – Danny Tuppeny Mar 28 '16 at 10:55
  • I clearly remember I also setup my NetBSD ARM to wait for the IP address and not go on with the rest of the boot of process as I wanted to use it as a server. I know I modified things around in my DHCP / network services in ArmBian (Debian based), cannot remember wether waiting to get an IP address was one of them. I "bricked" mine yesterday while playing with the new RTC DS3231 chip, once I get it again running will check it out. – Rui F Ribeiro Mar 28 '16 at 11:51
1

There is also another thing that must be taken into account, for precedences services into a server that has an IP address given by DHCP.

dhcpclient by default does not wait for the IP address. While you might not be interested into changing that in a workstation, in a server that creates precedence problems.

So dhclient is invoked by default with the -nw option that according to the manual:

-nw Become a daemon process immediately (nowait) rather than waiting until an IP address has been acquired.

This precedence was giving me some problems (e.g. BIND did not boot right, even with a restart in the DHCP exit hooks)

I have changed my eth0 from:

iface eth0 inet dhcp

to

iface eth0 inet manual
   pre-up /sbin/dhclient -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • On Raspbian it's all a bit weird; interfaces doesn't have "dhcp" but "manual". There's something in the forums about why (part of moving from one dhcp client to another?) but it was over my head.

    Interstingly, Raspian Jessie Lite already has the wait.conf file I pasted a script for so it doesn't have this problem!

    – Danny Tuppeny Mar 29 '16 at 17:20
  • Interesting. I do not use systemd. – Rui F Ribeiro Mar 29 '16 at 17:41
1

Another way to handle this is to recopy resolv.conf after the network is fully up. You can do that with systemd by placing the following into a file in /etc/systemd/system and call it something like fixpostfix.service, then run sudo systemctl fixpostfix.service. Now, after each reboot, once the network is fully online, this will copy the properly filled-out resolv.conf

[Unit]
Description=Fix poorly copied resolv.conf for postfix
Wants=network-online.target
After=network-online.target

[Service]
Type=oneshot
ExecStart=/bin/cp /etc/resolv.conf /var/spool/postfix/etc/resolv.conf

[Install]
WantedBy=multi-user.target
rsjaffe
  • 111
  • 4
  • I used the above solution on an Ubuntu-based openvpn server setup which I automated with Ansible. See https://github.com/ksylvan/vpn-server – Kayvan Sylvan Apr 17 '17 at 01:28