44

I have a Linux based process controller that occasionally locks up to the point where you can't ping it (i.e. I can ping it, then it becomes no longer pingable without any modifications to network settings).

I'm curious, what process/system is responsible for actually responding to pings? It appears that this process is crashing.

Eliah Kagan
  • 4,155
Izzo
  • 961
  • Can you still ssh into it while it's not responding to pings? Or do existing SSH sessions lock up? – Peter Cordes Apr 24 '18 at 22:18
  • @PeterCordes The entire system locks up and is essentially a brick until forcing a reboot. – Izzo Apr 24 '18 at 22:49
  • 3
    Ok, that's normally the only way a machine will stop responding to pings. It would be weird if pings stopped working but other stuff kept working, because ping handling works even if user-space is hosed and everything is blocked on disk I/O to a dead disk or NFS mount or whatever. Try connecting a monitor to your system and see if there's a console message as it locks up. (And if you can use the magic SysRQ keyboard sequences to dump info, or remount readonly, force-sync the disks + reboot. – Peter Cordes Apr 24 '18 at 22:56
  • 2
    While your question is interesting, ping isn't the source of your system's problems, but rather a consequence of an unstable system. Check the logs to understand what's wrong. – Pedro Lobito Apr 25 '18 at 15:02
  • @PedroLobito What logs specifically? – Izzo May 02 '18 at 14:08
  • /var/log/messages, /var/log/boot.log - https://www.cyberciti.biz/faq/linux-log-files-location-and-how-do-i-view-logs-files/ – Pedro Lobito May 02 '18 at 14:59

4 Answers4

60

The kernel network stack is handling ICMP messages, which are those sent by the ping command.

If you do not get replies, besides network problems or filtering, and host based filtering/rate-limiting/black-holing/etc. it means the machine is probably overloaded by something, which can be transient, or the kernel crashed, which is rare but can happen (faulty hardware, etc.), not necessarily because of the ICMP traffic (but trying to overload it with such traffic can be a good test at the beginning of life of a server to see how it sustains things). In the later case of kernel crash you should have ample information in the log files or on the console.

Also note that ping is almost always the wrong tool to check if a service is online or not. For various reasons, but mostly because it does not mimic real application traffic, by definition. For example if you need to check that a webserver is still live, you should instead do an HTTP query to it (TCP port 80 or 443), if you need to check a mailserver you do an SMTP query (TCP port 25), if a DNS server, an UDP and a TCP query to port 53, etc.

  • 4
    @Outurnate any other application service test would fail or be in a timeout so the end result observed will be the same. I never miss an opportunity to lecture against using ping as this creates far too many false positive in troubleshooting so I think users not knowing exactly what ping does and how it can gives misleading results should stick to something else. – Patrick Mevzek Apr 24 '18 at 17:35
  • 3
    In most overload situations the only things which still respond are those done by the kernel. That means a machine will usually respond to ping regardless of how overloaded it is. Attempts to reach a closed port will respond with RST for TCP and an ICMP error in case of UDP. And the first few attempts to reach an open TCP port will complete a handshake. A disk failure can lead to pretty much the same symptoms. – kasperd Apr 24 '18 at 22:03
  • @kasperd I have seen (very) overloaded servers (swapping ones specifically) not replying to ICMP requests either. And of course to nothing else also. The kernel did not crash, it was just busy in disk I/O stuff. – Patrick Mevzek Apr 24 '18 at 22:31
  • This is astounding to me. It's handled by the kernal? I must have a lot to learn about computer architecture yet. And/or networking. – Nacht Apr 24 '18 at 23:57
  • 2
    @Nacht Yup. A network interface is a HW device; as such there's a kernel driver to interface with it. A second layer then provides generic management/communication APIs. (This isn't unique to networking: there's ALSA for audio devs, video outs use the KMS API, USB has {U,E,X}HCI, then usb_storage, usbhid, etc.) Network routing tables, firewall rules (via iptables), handshaking, packet assembly, retransmits, etc. are all in-kernel. Since ICMP is a protocol unto itself, with no payload and no processing beyond "respond or don't", the kernel handles ICMP responses directly for minimal overhead. – FeRD Apr 25 '18 at 07:18
  • 5
    @Nacht: It's not really about fundamental computer architecture; it's a implementation choice. Microkernels will handle ICMP in an OS process. – MSalters Apr 25 '18 at 08:11
14

There is no userland process responsible for responding to pings. Ping is just a utility to send ICMP echo packets. These are received and process by the kernel's networking stack

Outurnate
  • 1,219
  • 10
  • 19
11

The kernel itself (not any user process) is responsible to sending ICMP echo reply messages in response to ICMP echo request messages. So, if a host stops responding to pings, it is usually due to some of the following reasons:

  • network connectivity between you and host being pinged might have been severed. It could be due to tons of reasons itself: physical damage to the cables, noise in the case of wireless, broken route tables, you being under DDoS attack, problematic routers/switches in between etc. You'd start troubleshooting in this case by using ethtool(8), iwconfig(8), route(8), ping(8) its router, tcpdump(8) etc. on target host.

  • firewall setting on target host (or any router/firewall in between you and target host) may be limiting amount of pings (or amount of traffic traffic). It could also be due to tools like fail2ban(8) firewalling stuff on demand. See iptables(8) to check.

  • there has been software/hardware malfunction at target host. Network kernel module on target host might have OOPSed and/or become confused, or even whole kernel might have PANICked. You'll see messages about at in dmesg(8) on target host, or as screen output on physical console (if physical access is impractical, another machine with serial console can help.) If kernel OOPS/PANIC is the problem, newer kernel with better drivers might help, or you could kludge around the system lockups with watchdog(8) and helper drivers. Or you can change hardware parts.

Matija Nalis
  • 3,111
  • 1
  • 14
  • 27
1

One fun way to convince yourself that the Linux kernel handles it directly is to observe the kernel settings for it with:

tail /proc/sys/net/ipv4/icmp*

which might output something like:

==> /proc/sys/net/ipv4/icmp_echo_enable_probe <==
0

==> /proc/sys/net/ipv4/icmp_echo_ignore_all <== 0

==> /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts <== 1

==> /proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr <== 0

==> /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses <== 1

==> /proc/sys/net/ipv4/icmp_msgs_burst <== 50

==> /proc/sys/net/ipv4/icmp_msgs_per_sec <== 1000

==> /proc/sys/net/ipv4/icmp_ratelimit <== 1000

==> /proc/sys/net/ipv4/icmp_ratemask <== 6168

This already indicates that ICMP is handled by the kernel, since /proc/sys is a special filesystem used to configure and inspect kernel state: What is in /dev, /proc and /sys?

Next, if you connect 2 computers on the same LAN behind a typical home modem router, and then you manage to ping from computer A to computer B with something like:

ping 192.168.1.102

you can then go on computer B and turn off ping replies with:

echo 1 | sudo tee /proc/sys/net/ipv4/icmp_echo_ignore_all

and as soon as you do that, ping from computer A will start to fail. And to re-enable:

echo 0 | sudo tee /proc/sys/net/ipv4/icmp_echo_ignore_all

So we were able to directly control ICMP just by talking to the kernel.

References to icmp_echo_ignore_all can be seen in the kernel code pointed to by Ruslan in a comment https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/icmp.c?id=24cac7009cb1b211f1c793ecb6a462c03dc35818#n935

static bool icmp_echo(struct sk_buff *skb)
{
    struct net *net;
net = dev_net(skb_dst(skb)-&gt;dev);
if (!net-&gt;ipv4.sysctl_icmp_echo_ignore_all) {
    struct icmp_bxm icmp_param;

    icmp_param.data.icmph      = *icmp_hdr(skb);
    icmp_param.data.icmph.type = ICMP_ECHOREPLY;
    icmp_param.skb         = skb;
    icmp_param.offset      = 0;
    icmp_param.data_len    = skb-&gt;len;
    icmp_param.head_len    = sizeof(struct icmphdr);
    icmp_reply(&amp;icmp_param, skb);
}
/* should there be an ICMP stat for ignored echos? */
return true;

}

Tested with two Ubuntu 23.10 machines running Linux 6.5.0 connected on a home wireless LAN.

Ciro Santilli OurBigBook.com
  • 18,092
  • 4
  • 117
  • 102