34

I am using Debian sid, hard drive formatted with ext4, running on linux 3.1

I remember on previous linux versions (maybe before 3.0), if I run out of memory, and swap is not enabled, programs will usually crash. This is perfect for my environment: simple web browsing with no critical operations. That is, if I accidentally run across a bad website which uses up too much memory, it just crashes without rendering my terminal unusable.

But in my current setup, the computer hangs with violent I/O throughput in the background. iotop reveals kswapd0 to be the culprit, which means it is due to swapping. After using swapon -s to determine any swaps that were enabled, I used swapoff -a to disable all swaps and swapon -s again to confirm that all swaps were disabled.

Then I tried maximizing my memory usage again. Alas, the behavior I expected didn't happen. Instead, kswapd0 tries over and over to swap out the RAM and fails as there is no swap space. Because it never gives up, my computer is locked in eternal I/O heavy freeze, bad for my disk's health.

Am I doing something wrong in trying to swapoff -a? Why is the behavior different than what it used to be (probably pre-3.0 times)?

Anthony Ananich
  • 7,334
  • 5
  • 33
  • 45
syockit
  • 753
  • 2
  • 6
  • 17
  • That doesn't really make sense. Doing the swapoff -a itself, if there was stuff in the swap, will generate a lot of I/O (and can result in processes getting killed if there is not enough real RAM availabe). Are you sure it's not the swapoff -a that caused the I/O "storm"? – Mat Nov 15 '11 at 11:40
  • 1
    I suppose it is enough to comment the fstab line about swap. Try if the behavior is the same. – enzotib Nov 15 '11 at 11:48
  • @Mat swapoff -a should disable swap permanently, meaning it should stay disabled after next reboot. I confirmed this. Yet, I/O "storm" still happens during the session after next reboot. For the record, I/O "storm" didn't happen at the moment I did swapoff -a because swap was 0 at that time. – syockit Nov 15 '11 at 11:48
  • @enzotib I have no swap in my fstab. – syockit Nov 15 '11 at 11:49
  • 10
    @syockit: swapoff -a is not permanent. – Mat Nov 15 '11 at 11:51
  • @Mat hmm, then it was permanent probably due to deletion of that line in fstab. My mistake! – syockit Nov 15 '11 at 11:54
  • kswapd0 does work always and even without any dedicated swap partion/file. Because your RAM contains mapped files (e.g. the executrables binaries of programs running), which have copies in the fs. Hence in "almost-no-ram-left situations", even without any swap-partition etc. Linux kswapd0 will temporarilry make room in the ram by removing those mapper files copies in RAM. Thinking that any bug or forgotten setting to disabled swap-partition is an wrong assumption to start with – humanityANDpeace Nov 23 '18 at 18:20
  • See also: https://askubuntu.com/q/432809/50254 – Mikko Rantalainen Jan 08 '19 at 13:17
  • it's better to enable zram instead of disabling swap – phuclv Feb 17 '19 at 08:58
  • 1
    database loading got to about 15% in 14 hours. Turned off swap, and on the next attempt it's gotten to 40% in 4 hours. admittedly, the server is under-powered and lown on ram, but without swap turned on OpenSuSE works much faster for this one process. The OS's opinion of "better" and mine differ dramatically during a simple mysql db load. commented out the swap drive in /etc/fstab and rebooted. – TheSatinKnight Jun 23 '19 at 16:13
  • https://unix.stackexchange.com/questions/28678/how-to-limit-available-virtual-memory-per-process – Andrew Jun 21 '20 at 21:38

6 Answers6

19

Disabling swap won't do what you want. You will still get violent I/O throughput, but it will be of clean pages rather than dirty ones.

With no swap, the system will compress the cache of clean (unmodified) pages to near zero, because those are the only pages it can evict from physical memory. It can only evict dirty (modified) pages from memory by writing them to swap, with no swap, it has no way to evict dirty pages.

As you run low on physical memory, each process will have to load its code pages from disk as it evicts the previous process code pages. The result will be violent thrashing and excessive work done by the swap subsystem.

This is a special case of a very important principle: For a well-designed system, you can't make it run better by reducing its choices. Linux is a well-designed system. Removing swap just gives it fewer choices, so it's not surprising that it behaves worse.

  • 1
    This is only true if you allocate just short of all memory. A run away process usually will be trying to allocate much more, and so it will be killed early, freeing up that memory, as opposed to continuing to swap the system to death trying to accommodate more allocations, hence, disabling swap can be helpful when you only max out your ram usage from a runaway process. – psusi Nov 15 '11 at 18:39
  • 1
    Just short of all memory will pretty much always be allocated. Linux is specifically tuned this way. Do a cat /proc/meminfo on any typical Linux box after a few hours of load. – David Schwartz Nov 15 '11 at 18:43
  • By allocated I ( and most people ) mean not to the page cache. – psusi Nov 16 '11 at 03:45
  • As far as performance is concerned, it doesn't matter if memory is allocated to the page cache or something else, particularly if it's dirty, particularly if you have no swap space. – David Schwartz Nov 16 '11 at 04:11
  • Okay so can I disable paging instead? Like psusi said, on usual usage, my ram only maxes out due to runaway process. Even if enable swap, the runaway process will manage to fill all of swap and still make the computer unusable in the end (it happens when I don't carefully monitor what's going on in the web browser's background) – syockit Nov 16 '11 at 10:45
  • 1
    If it is allocated to the page cache, then it is available to be allocated to processes for their private use. All pages in the page cache are backed by files on disk, so dirty pages in the page cache can be flushed to their file and then freed. Swap is used as the disk store for anonymous memory allocated by processes. The point here is that when you have no swap, the runaway process is killed, but when you have swap, the runaway process keeps allocating more memory, which causes more swapping, which grinds the entire system to a halt. – psusi Nov 16 '11 at 14:18
  • 2
    @syockit If you disable paging, you can't run any programs. Paging is the mechanism by which files are read in when mapped into memory. – David Schwartz Nov 16 '11 at 16:39
  • 1
    @psusi : When you have no swap, clean pages are evicted from cache because dirty pages cannot be evicted fast enough (if at all). This is what causes the violent thrashing. Long before any processes are killed, the cache will be squeezed to near zero. This means everything that's not a dirty, anonymous page will be gone. That's what causes the thrashing -- the majority of code pages are not dirty, anonymous pages. So almost every code page faults. – David Schwartz Nov 26 '11 at 04:29
  • 1
    No, violent thrashing is the result of pages being constantly written out to swap, and read back in from swap. When you have no swap, then there is nowhere for dirty anonymous pages to be written to, so they must be kept in memory. This means the system quickly runs out of memory entirely, and kills the runaway process, without doing any swapping. Clean pages in the cache are reduced to a minimum, which will cause more cache misses on IO, but this also happens when you have swap and are thrashing. – psusi Nov 28 '11 at 04:23
  • 3
    @psusi : Clean pages will not be reduced to a minimum when you have swap. It will instead swap out dirty, anonymous pages that haven't been recently used. Of course, either way you'll get violent thrashing eventually if the working set exceeds physical memory. The point is, with or without swap, you will get lots of violent thrashing before you actually run out of memory. The difference is, with swap the violent thrashing will be swapping (dirty pages, write and read). Without swap, the violent thrashing will be code faults (clean pages, read only). – David Schwartz Nov 28 '11 at 04:57
  • 2
    You're missing the point; this will only happen without swap in a vary narrow sweet spot where 99% of memory is allocated. As soon as it hits 100% ( which likely happens pretty fast ), then the run away process is killed, freeing up lots of memory. With swap, you thrash heavily for a very long time before you exhaust both ram and swap and only then is the process killed. – psusi Nov 28 '11 at 15:17
  • 1
    @psusi The sweet spot is not narrow. Usage will stay at 95+% for a very, very long time as the contents of memory gradually shifts from mostly clean pages to mostly dirty pages. All the while, the cache of clean pages will get squeezed harder and harder and performance and thrashing will get worse and worse. No process will get killed until the cache of clean pages is squeezed to nearly zero. Likely you will give up long before you get there. You generally only see OOM kills before violent thrashing in the very different case where there's a small number of large allocations. – David Schwartz Feb 02 '12 at 10:20
  • 3
    @DavidSchwartz, the narrow sweet spot is the 95%+ usage window. A runaway process will quickly grow to 100% and be killed. So yes, you will purge your disk cache, but the runaway process is killed quickly and the system returns to having plenty of free memory. This is much better than when you have swap enabled, in which case, the system runs at 95% and keeps moving more and more out to swap, hammering away at the disk the whole time, and only gives up and kills the run away process once swap is also exhausted. – psusi Feb 02 '12 at 23:27
  • 2
    @psusi: You are correct if the concern is a runaway process that rapidly blows up in memory consumption. But that's not what the OP is talking about, which is a process that consumes excessive, but not unbounded or massively excessive, memory. As it grows through the large sweet spot (where the cache is squeezed) it will grow more and more slowly as the system thrashes. – David Schwartz Feb 02 '12 at 23:44
  • 1
    "For a well-designed system, you can't make it run better by reducing its choices." Bull. – Andrew Jun 21 '20 at 19:55
  • @DavidSchwartz Quote from your answer:"With no swap, the system will compress the cache of clean (unmodified) pages to near zero, because those are the only pages it can evict from physical memory. It can only evict dirty (modified) pages from memory by writing them to swap, with no swap, it has no way to evict dirty pages." Why does not compress the dirty page? And does swap operation may reduce the defragmenter? – John Jun 24 '20 at 03:31
15

A better solution than turning off swap, which will at best cause random processes to be killed when memory runs low, is to set the per process data segment limit for processes that pull stuff off the net. This way a runaway browser will hit the limit and die, rather than cause the whole system to become unusable. Example, from the shell

(ulimit -d 400000; firefox) &

The number after -d is in kilobytes. You should experiment with this on your system to choose the best value for your browsing habits. The parentheses cause a subshell to be created; the ulimit command only affects that shell and its children, isolating its effects from the parent shell.

Kyle Jones
  • 15,015
  • Will this work for chromium, say, where we have a bunch of chromium processes using small chunks of memory? – jberryman Jun 25 '15 at 16:37
  • @jberryman No, the memory limits are per-process rather than per-user. – Kyle Jones Jun 26 '15 at 14:36
  • Is there a way to send it a specified signal (e.g., SIGHUP) when it reaches the memory limit? – Geremia Feb 24 '17 at 16:16
  • 1
    @Geremia No. The brk and sbrk system calls stop working, which will make most things curl up and die. – Kyle Jones Feb 24 '17 at 16:23
  • 1
    If you want to go with manual tuning, I would suggest using memory cgroup instead of ulimit because with memory cgroup you can set limit for the whole process group and can configure the memory allocating process to stop and your user mode policy process can decide what to do (e.g. send some signals, select process to be killed, raise the memory limit on the fly). See https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt and https://www.kernel.org/doc/Documentation/cgroup-v2.txt for details. – Mikko Rantalainen Jan 08 '19 at 13:04
5

To make sure that swap is not used, you'd be better off preventing any swap being added at boot. This can be done, depending on the system, by disabling the swap boot service or just commenting out the swap entry in /etc/fstab.

As far as your hangup is concerned, the stop() function in /etc/init.d/swap might give a clue:

stop()
{
       ebegin "Deactivating swap devices"

       # Try to unmount all tmpfs filesystems not in use, else a deadlock may
       # occure. As $RC_SVCDIR may also be tmpfs we cd to it to lock it
       cd "$RC_SVCDIR"
       umount -a -t tmpfs 2>/dev/null

       case "$RC_UNAME" in
               NetBSD|OpenBSD) swapctl -U -t noblk >/dev/null;;
               *)              swapoff -a >/dev/null;;
       esac
       eend 0
}

Notice the part about deadlock. You can try doing umount -a -t tmpfs yourself before turning swap off.


Edit:

Probably, you might also achieve your goal by modifying sysctl settings (see this question).

  • I don't have swap in init.d, nor do I have it on fstab, but I do have /etc/init.d/mountoverflowtmp that mounts tmpfs for emergency log writes. Does the swap daemon use tmpfs too? – syockit Nov 15 '11 at 11:58
  • You might have it enabled elsewhere - do grep -RF swap /etc/ if you wish to find it. But to disable a service, you'd use a command like service (IIRC; I don't use Debian myself). – rozcietrzewiacz Nov 15 '11 at 12:02
  • 1
    Swap itself does not use tmpfs, because tmpfs is an in-memory (RAM) filesystem. But other services/programs that use tmpfs might rely on swap in a special manner. I don't really know, but it might have something to do with caching or a special way in which tmpfs driver claims access to swap space. – rozcietrzewiacz Nov 15 '11 at 12:04
  • There's something about how Linux handles virtual memory that I don't understand. I've disabled swap in most ways possible: via swapoff, and via vm.swappiness=0. Yet kswapd0 still runs! I wonder if this is a regression from the 2.4 days… – syockit Nov 15 '11 at 15:36
  • 5
    @syockit It's expected behavior. The system is still swapping clean pages (pages that contain copies of file data). It requires no swap space to swap clean pages, since they can be read back from sources other than swap. – David Schwartz Nov 15 '11 at 16:58
4

On my system (debian sid 2016-11-15), I did this:

  1. disable the swap now:

    swapoff -a
    
  2. comment the line with swap partition in /etc/fstab (you may not need this, maybe only step 3 without step 2 will work for you)

    #### #UUID=c6ddbc95-3bb5-49e1-ab25-b1c505e5360c none            swap    sw              0       0
    
  3. disable the mounting of swap in systemd (Note, wrap the unit name in quotes in case the unit name has backslash characters):

    systemctl --type swap
    systemctl stop "dev-\x2da821.swap"
    systemctl mask "dev-\x2da821.swap"
    

That seems to do the trick.

trusktr
  • 4,165
4

It is better to comment out swap partition entry in /etc/fstab than running swapoff -a after each boot.

I have the same issue with kswapd0 on my hardware.

Tuning vm.swappiness system parameter does not help for me.

sysctl -w vm.swappiness=0

I googled and read a lot of posts, mailing lists, and now I think that this is kernel bug.

When there is no active swap partition and free memory becomes less then some threshold (about 300MB in my case) the system becomes unresponsive due to kswapd0 madness.

Probably it is reproduced with special configuration and conditions.

For somebody it is solved by system re-installation with re-partitioning for others by building custom kernel with kswapd0 disabled.

Anthon
  • 79,293
humkins
  • 1,187
  • 2
    If kswapd0 goes mad and you don't have swap activated you're out of RAM. Your choices are OOM Killer or kswapd0. Linux goes with kswapd0 because the kernel assumes that it's more important to finish slowly than to abort the process. For casual humans, the threshold where kernel thinks that enough forward progress still does happen is already glacially slow and nearly anybody would rather select OOM Killer. – Mikko Rantalainen Aug 31 '18 at 10:35
1

the computer hangs with violent I/O throughput in the background. iotop reveals kswapd0 to be the culprit

I've found one way (so far) to avoid that. If you want to test it and see how it does on your system, see the kernel patch inside this question. Basically, it doesn't evict Active(file) pages (at least) when under memory pressure, thus the disk thrashing (constant reading) is reduced to almost nothing and OOM-killer is allowed to trigger within 1 second, instead of freezing the OS for what seems like permanently(or at least for many minutes). I am hoping that actual programmers(of which I'm not) would improve the patch and make it into an actual solution, now that they see that what it does is working for these situations.

  • is this kernel patch mainlined already? – humanityANDpeace Nov 23 '18 at 18:22
  • @humanityANDpeace probably not, because it's not that good(as I am not a programmer), however I did run into some issue with it, such as: sometimes, depending on workload, with this patch, you can run out of memory in cases in which without this patch you wouldn't have and thus OOM-killer will kill Xorg and xfwm4, UNLESS I run echo 1 | sudo tee /proc/sys/vm/drop_caches when Active(file): (of /proc/meminfo) is over 2GB (on a 16G RAM system) -it can go to max 4G –  Nov 28 '18 at 19:13