I am using a CentOS 7.5 instance in AWS which has 16 CPUs and 32GB memory. I found when I run the following command, the whole system will be unresponsive, I cannot run any commands on it anymore, cannot even establish a new SSH session (but can still ping it). And I do not see OOM killer triggered at all, it seems the whole system just hang forever.
stress --vm 1 --vm-bytes 29800M --vm-hang 0
However if I run stress --vm 1 --vm-bytes 29850M --vm-hang 0
to consume a bit more memory (50MB), OOM kill will be successfully triggered (I can see it in dmesg
). And if I run the stress
command to consume less memory than 29800MB (e.g. stress --vm 1 --vm-bytes 29700M --vm-hang 0
), the system will be responsive (and no OOM kill) and I can run any commands as usual.
So it seems 29800MB
is a "magic number" for this instance, if I run stress
command to use more memory than it, the command will be OOM killed, and if I run stress
command to use less memory than it, everything is OK, if I run stress
command to just use 29800MB memory, the whole system will be unresponsive. And I also observed the same behavior in Linux host with different spec, e.g. in a CentOS 7.5 instance which has 72 CPUs and 144GB memory, the "magic number" is "137600MB".
My question is, why won't OOM kill be triggered when the "magic number" of memory is used?