56

I have a Debian (Buster) laptop with 8 GB RAM and 16GB swap. I'm running a very long running task. This means my laptop has been left on for the past six days while it churns through.

While doing this I periodically need to use my laptop as a laptop. This shouldn't be a problem; the long running task is I/O bound, working through stuff on a USB hard disk and doesn't take much RAM (<200 MB) or CPU (<4%).

The problem is when I come back to my laptop after a few hours, it will be very sluggish and can take 30 minutes to come back to normal. This is so bad that crash-monitors flag their respective applications as having frozen (especially browser windows) and things start incorrectly crashing out.

Looking on the system monitor, of the 2.5 GB used around half gets shifted into swap. I've confirmed this is the problem by removing the swap space (swapoff /dev/sda8). If I leave it without swap space it comes back to life almost instantly even after 24 hours. With swap, it's practically a brick for the first five minutes having been left for only six hours. I've confirmed that memory usage never exceeds 3 GB even while I'm away.

I have tried reducing the swappiness (see also: Wikipedia) to values of 10 and 0, but the problem still persists. It seems that after a day of inactivity the kernel believes the entire GUI is no longer needed and wipes it from RAM (swaps it to disk). The long running task is reading through a vast file tree and reading every file. So it might be the kernel is confused into thinking that caching would help. But on a single sweep of a 2 TB USB HD with ~1 billion file names, an extra GB RAM isn't going to help performance much. This is a cheap laptop with a sluggish hard drive. It simply can't load data back into RAM fast enough.

How can I tell Linux to only use swap space in an emergency? I don't want to run without swap. If something unexpected happens, and the OS suddenly needs an extra few GBs then I don't want tasks to get killed and would prefer start using swap. But at the moment, if I leave swap enabled, my laptop just can't be used when I need it.

The precise definition of an "emergency" might be a matter for debate. But to clarify what I mean: An emergency would be where the system is left without any other option than to swap or kill processes.


What is an emergency? - Do you really have to ask?... I hope you never find yourself in a burning building!

It's not possible for me to define everything that might constitute an emergency in this question. But for example, an emergency might be when the kernel is so pushed for memory that it has start killing processes with the OOM Killer. An emergency is NOT when the kernel thinks it can improve performance by using swap.


Final Edit: I've accepted an answer which does precisely what I've asked for at the operating system level. Future readers should also take note of the answers offering application level solutions.

  • 12
    Define "emergency" and say something about how this is different from any ordinary situation when swap would be used. – Kusalananda Feb 08 '19 at 14:32
  • 4
    I wanted to know if you wanted to somehow define a special type of out-of-bounds "emergency event" that would allow the kernel to use swap, but that swap would otherwise not be used. AFAIK paging out memory is something that is slow and only ever done "in emergencies" anyway, and the "swappiness" thing is the only thing that you are able use to adjust this behaviour with (but I'm no Linux user). – Kusalananda Feb 08 '19 at 14:58
  • 2
    No that's not correct. It's not only done in emergencies. At the very least I thought my question made it clear I've only used 3GB out of 8GB... That's hardly an emergency but the kernel is swapping anyway. I suggest you read up on swappiness and surrounding topics. There is quite a bit of discussion over the various reasons for swapping. It is plausible I'm asking for a concept that doesn't exist in the kernel, but my reasons for asking for it are reasonably well justified.. – Philip Couling Feb 08 '19 at 15:02
  • 1
    Here's another discussion on the same subject, I noticed this issue about 19 years ago and still haven't seen a solution, but unfortunately receive a few sarcastic comments anytime I brought it up. – X Tian Feb 08 '19 at 15:17
  • 5
    I recognise the advice always has been "never run without swap". But memory sizes have out scaled hard drive (HDD not SSD) read/write speeds meaning that swap is increasingly a bad idea. It feels like some believe 8GB RAM + 8GB swap will out perform 16GB RAM + 0 swap. If it truly does then something is very wrong with the Linux kernel. – Philip Couling Feb 08 '19 at 15:25
  • Are you ...sure... You have excluded actual RAM use of your app? Is it regularly allocating then deallocating RAM? Any way you can malloc a single block of RAM and have that be your workspace, essentially doing your own in-house memory management? – Harper - Reinstate Monica Feb 09 '19 at 01:52
  • 8
    @Philip Couling: No, the point is that 16 GB RAM + 16 GB swap will outperform 16 GB and 0 swap - especially when your code happens to need 17 GB of memory :-) – jamesqf Feb 09 '19 at 04:19
  • 2
    What kind of disk do you use? Is it an SSD? – jpmc26 Feb 09 '19 at 05:01
  • Have you tried, to confirm your theory, to run your device without swap for a couple of days ? – Robert Riedl Feb 09 '19 at 11:12
  • Oh sorry, must habe skipped that! – Robert Riedl Feb 09 '19 at 12:01
  • 2
    You can also try swappiness 1 since 0 seems to be a special value and 10 is not very aggressive, – eckes Feb 09 '19 at 13:44
  • 1
    This is not an answer because it doesn't reduce swapping, but recovery can be sped up by echo 1 > /sys/kernel/mm/swap/vma_ra_enabled – jpa Feb 09 '19 at 14:21
  • 2
    @Philip Couling: Who said the 15 GB program will run faster on a 16 GB machine with swap? Not me: I said the 17 GB program will not crash, even it it runs slower. Or FTM the 15 GB program when you want to use a browser or something for a bit. Now whether the RAM/swap use is handled sensibly is another question entirely... – jamesqf Feb 09 '19 at 22:51
  • 1
    Is it possible your long running process is writing a lot to /tmp ? – james Feb 10 '19 at 02:33
  • Using swapoff on my SSD and using 12 GB of DDR3L RAM ( Gnome requires about 1 GB and 5 for Inkscape and 4 GB for Virtual Machines and the extra 2 GB just to make sure of any "emergency" ) btw I'm using 4 Gen i7 with dual monitor setup. – Kushagra Karira Feb 10 '19 at 17:36
  • Can you confirm that you saw this behavior with swapiness set to exactly 0 for several days, and not just a low value like 10? Specifically this parameter sets the probability of swapping out used paged instead of dropping cache (100 = equal probability, 0 = always cache). A low value will still cause all idle application pages to be swapped out, it'll just take longer. – that other guy Feb 10 '19 at 21:40
  • @Philip Couling: That depends on what you mean by performance. As I understood it, the machine is just getting slow, rather than crashing because it runs out of memory, no? Now I would agree that the swap algorithm on your machine could be improved. My guess, though, is that the problem might be certain browsers' tendency to grab as much memory as possible. (Right now, Firefox has 25% of my memory, with one open tab displaying this page.) How does the swap algorithm know that it probably doesn't really need near as much as it's asking for? – jamesqf Feb 11 '19 at 18:04
  • @jamesqf your comments suggest you've simply not read the question and / or my comments carefully enough. – Philip Couling Feb 14 '19 at 15:41
  • 1
    @Philip Couling: Or perhaps you just haven't worded the question clearly enough. It seems that you don't like the way the current swap algorithm behaves, and would like to change it. (And from your description, I would too :-)) Or perhaps there's a problem with your long-running process not being able to access the full amount of RAM? In any case, saying you're having problems does not mean that everyone would, or that it's inherent in swap. – jamesqf Feb 15 '19 at 04:14
  • OMG, people that have no direct experience with the problem described in the question should not be even commenting, WTF... The "are you sure" guys, or "have you tried" Andy LOL – Winampah May 17 '21 at 19:19
  • Hey Philip, I just want to add something interesting: you mention using Debian. I'm also having problems with swapping in Debian. I have found by accident that this same slowness and unresponsiveness DOES NOT happen at all in Manjaro. When the OOM Killer is triggered in Manjaro, the system doesn't become a brick like that. I'm investigating more into why would that happen, and how to fine-tune Debian into becoming better at this. Seems to me like Debian is still using swapping techniques from when computers needed swap, obsolete approach. – Winampah May 17 '21 at 19:21

6 Answers6

29

One fix is to make sure the memory cgroup controller is enabled (I think it is by default in even half-recent kernels, otherwise you'll need to add cgroup_enable=memory to the kernel command line). Then you can run your I/O intensive task in a cgroup with a memory limit, which also limits the amount of cache it can consume.

If you're using systemd, you can set +MemoryAccounting=yes and either MemoryHigh/MemoryMax or MemoryLimit (depeneds on if you're using cgroup v1 or v2) in the unit, or a slice containing it. If its a slice, you can use systemd-run to run the program in the slice.

Full example from one of my systems for running Firefox with a memory limit. Note this uses cgroups v2 and is set up as my user, not root (one of the advantages of v2 over v1 is that delegating this to non-root is safe, so systemd does it).

$ systemctl --user cat mozilla.slice 
# /home/anthony/.config/systemd/user/mozilla.slice
[Unit]
Description=Slice for Mozilla apps
Before=slices.target

[Slice]
MemoryAccounting=yes
MemoryHigh=5G
MemoryMax=6G

$ systemd-run --user --slice mozilla.slice --scope -- /usr/bin/firefox &
$ systemd-run --user --slice mozilla.slice --scope -- /usr/bin/thunderbird &

I found to get the user one working I had to use a slice. System one works just by putting the options in the service file (or using systemctl set-property on the service).

Here is an example service (using cgroup v1), note the last two lines. This is part of the system (pid=1) instance.

[Unit]
Description=mount S3QL filesystem
Requires=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=forking
User=s3ql-user
Group=s3ql-user
LimitNOFILE=20000
ExecStartPre=+/bin/sh -c 'printf "S3QL_CACHE_SIZE=%%i\n" $(stat -c "%%a*%%S*.90/1024" -f /srv/s3ql-cache/ | bc) > /run/local-s3ql-env'
ExecStartPre=/usr/bin/fsck.s3ql  --cachedir /srv/s3ql-cache/fs1 --authfile /etc/s3ql-authinfo  --log none «REDACTED»
EnvironmentFile=-/run/local-s3ql-env
ExecStart=/usr/bin/mount.s3ql --keep-cache --cachedir /srv/s3ql-cache/fs1 --authfile /etc/s3ql-authinfo --cachesize ${S3QL_CACHE_SIZE} --threads 4
ExecStop=/usr/bin/umount.s3ql /mnt/S3QL/
TimeoutStopSec=2m
MemoryAccounting=yes
MemoryLimit=1G

Documentation is in systemd.resource-control(5).

derobert
  • 109,670
  • 1
    Can’t you do something comparable and portable by just using ulimit? – Old Pro Feb 10 '19 at 19:53
  • 1
    @OldPro not really. First, there isn't AFAIK a ulimit on total memory usage including page cache (which is the usage that is becoming excessive here). Second, ulimit for memory is per-process, cgroups work even if the long-running task forks. – derobert Feb 10 '19 at 21:35
  • I thought the reason memory accounting is enabled by default on newer systems is due to a change in systemd version 238. – sourcejedi Feb 11 '19 at 11:41
  • 1
    @sourcejedi that's relatively recent. When the memory controller was first introduced, just having it available (not even in use) had a large enough performance cost that some distros at least disabled it by default and you had to pass that kernel command line argument to enable it. The performance problems were fixed, so that changed, and more recently systemd activates it too by default. – derobert Feb 11 '19 at 14:46
20

Having such a huge swap nowadays is often a bad idea. By the time the OS swapped just a few GB of memory to swap, your system had already crawled to death (like what you saw)

It's better to use zram with a small backup swap partition. Many OSes like ChromeOS, Android and various Linux distros (Lubuntu, Fedora) have enabled zram by default for years, especially for systems with less RAM. It's much faster than swap on HDD and you can clearly feel the system responsiveness in this case. Less so on an SSD, but according to the benchmark results here it still seems faster even with the default lzo algorithm. You can change to lz4 for even better performance with a little bit less compression ratio. It's decoding speed is nearly 5 times faster than lzo based on official benchmark

In fact Windows 10 and macOS also use similar pagefile compression techniques by default

There's also zswap although I've never used it. Probably worth a try and compare which one is better for your usecases

After that another suggestion is to reduce the priority of those IO-bound processes and possibly leave a terminal running on higher priority so that you can run commands on it right away even when the system is on a high load

Further reading

phuclv
  • 2,086
  • Just so that I understand, you are saying that I can create a zram block device, use it as swap, with a lower priority swap as the HDD partition? – Philip Couling Feb 09 '19 at 13:52
  • @PhilipCouling if you're using HDD then yes, definitely you should use a zram or similar solutions. The priority of the swap should be lower than zram, so that Linux tries to use up the zram first, and then it'll consider the swap. If you use Ubuntu then the zram-config package already takes care of the priority settings for you – phuclv Feb 09 '19 at 13:55
  • 3
    I'm accepting this answer because it appears to do exactly what I've asked for. If I still have my 16GB swap enabled at a reduced priority, then the kernel will only use it when zswap has been exhausted. IE: "in an emergency". Note on debian-buster this is very easy to setup, simply by installing the zram-tools. – Philip Couling Feb 15 '19 at 13:02
15

It seems that after a day of inactivity the kernel believes the entire GUI is no longer needed and wipes it from RAM (swaps it to disk).

The kernel is doing The Right Thing™ believing it. Why would it keep unused1 memory in RAM and so essentially waste it instead of using it as cache or something?

I don't think the Linux kernel is gratuitously or anticipatory swapping out pages, so if it does it that must be to store something else on RAM, thus improving performance of your long running task, or at least with this goal.

If you know when you'll need to reuse your laptop in advance, you might use the at command (or crontab) to schedule a swap cleanup (swapoff -a;swapon -a).

As cleaning the swap might be overkill, and even trigger the OOM killer if for some reason, not everything fit in RAM, you might just "unswap"2 everything related to the running applications you want to revive.

One way to do it would be to attach a debugger like gdb to each of the affected processes and trigger a core dump generation:

# gdb -p <pid>
...
generate-core-dump /dev/null
...
quit

As you wrote, your long running application is not reusing the data it reads after the initial pass, so you are in a specific case where long term caching is not useful. Then bypassing the cache by using direct I/O like suggested by Will Crawford should be a good workaround.

Alternatively, you might just regularly flush the file cache by echoing 1 or 3 to the /proc/sys/vm/drop_caches pseudo-file before the OS thinks it's a good idea to swap out your GUI applications and environment.

See How do you empty the buffers and cache on a Linux system? for details.

1Unused in the sense: no more actively used since a significant period of time, the memory still being relevant to its owners.
2Put back in RAM pages stored on the swap area.

jlliagre
  • 61,204
  • 2
    Thanks for the thought on possible causes. I've added a little to the question since it might be relevant. I wonder if there's a way to lower the priority of caching against application's own memory. – Philip Couling Feb 08 '19 at 15:49
  • 6
    "I don't think the Linux kernel is gratuitously or anticipatory swapping out pages so if it does it, that must be to store something else on RAM, thus improving performance." – I think this wording is a bit ambiguous. The kernel will definitely write pages to swap, whenever it has the chance (e.g. there is little disk I/O). It will, however, not remove them from RAM. That way, you have the best of both worlds: if you quickly need those pages again, they are already in RAM, and there is nothing to do. If an emergency (as the OP put it) arises, you simply need free those pages in RAM, because – Jörg W Mittag Feb 09 '19 at 04:56
  • 3
    … they are already in swap. And that is precisely why you do not want to use swap "only in emergencies", because during an emergency, the system is already under stress and the last thing you want is add large amounts of disk I/O to that. – Jörg W Mittag Feb 09 '19 at 04:58
  • 2
    The thing causing it to swap out is likely the long running process: it's accessing files on disk. Those files in memory will have been more recently used than the GUI's memory. – jpmc26 Feb 09 '19 at 05:00
  • 3
    @JörgWMittag Do you have evidence the Linux kernel is, when the I/O usage is low, preemptively writing pages to the swap area "just in case", i.e. without freeing them from the RAM? – jlliagre Feb 09 '19 at 21:07
  • Why would it keep in RAM unused memory and so essentially waste it instead of using it as cache or something? Maybe because RAM is faster than swap space ? By a lot. In fact by heaps. Though rereading your comment I wonder if I misinterpreted your statement. Even so my point is true whether or no. I have seen serious performance hits from swap space though in recent years and I ended up (with some reservations) disabling swap on that box. – Pryftan Feb 10 '19 at 14:14
  • @Pryftan I fully agree that RAM is faster than hard disks or SSDs, but, unless you have an "unlimited" amount of RAM, it makes no sense to keep data in it if that data stays unused for a significant period of time. When this happens, RAM that could have been better used with storing "hot" data is wasted, so the overall performance is degraded. Just like it makes more sense to store non moving vehicles in a parking lot than to keep them randomly parked on the freeway – jlliagre Feb 10 '19 at 16:26
  • @jlliagre Yes.. If there are things that could be stored in RAM, at least. And perhaps that's what you were saying. In which case - well you know. Even so as noted I have seen performance hits though I wasn't sure why; it seemed rather odd to me but thinking on it it probably was that a lot of swap was being used as (I dimly recall) the box in question had not much RAM. Upgrading RAM helped tremendously but I believe I might have changed swap settings too. – Pryftan Feb 10 '19 at 19:03
  • 1
    "preemptively writing pages to the swap area "just in case", i.e. without freeing them from the RAM?" I asked a specific question on this, and no-one found any evidence for it. It's a myth, though it seems to be controversial for some weird reason. https://unix.stackexchange.com/questions/533739/does-linux-perform-opportunistic-swapping-or-is-it-a-myth/ – sourcejedi Nov 02 '19 at 11:57
  • Re: whether Linux will write pages to swap without evicting / reclaiming the RAM when I/O is done: Does Linux perform "opportunistic swapping", or is it a myth? says no, Linux doesn't do this. https://www.kernel.org/doc/gorman/html/understand/understand014.html says Linux 2.6 does do something like this, as the "swap cache", but only(?) as part of swapping out shared anonymous pages that have multiple references. I had thought that Linux did get some pages written to swap ready to reclaim if needed under light mem pressure, but perhaps not. – Peter Cordes Sep 04 '22 at 22:02
10

Is the process you're running something you've created yourself?

If so, it might be worth tweaking your code to open the files using the O_DIRECT flag, which to quote the manual page -

Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT. See NOTES below for further discussion.

  • 1
    Another similar (but probably easier, as I'm pretty sure O_DIRECT has alignment restrictions and you'll kill performance if your reads aren't large) is fadvise to tell the kernel you won't need that data again, flushing it from the page cache. (on phone or would provide links, sorry) – derobert Feb 10 '19 at 21:39
  • 1
    @derobert For one, the nocache command is a convenient hack to do this. (It uses LD_PRELOAD to hijack some libc calls). – sourcejedi Feb 11 '19 at 11:46
6

Here's an idea, which I haven't tried myself (and I'm sorry I haven't the time right now to experiment with this).

Suppose you create a small VM with only 512MB memory for your background process I'm not sure if you'd want this to have any swap, your call, and switch off swap on your host system.

X Tian
  • 10,463
3

Remove swap or diminish it by about 20% (may vary with systems) as recently OSs are not using swap anymore the same way as they did in the few last years. Probably it will answer your question:

--> official redhat.com

some of the Red Hat info below,

In the past, some application vendors recommended swap of a size equal to the RAM, or even twice the RAM. Now let us imagine the above-mentioned system with 2GB of RAM and 2GB of swap. A database on the system was by mistake configured for a system with 5GB of RAM. Once the physical memory is used up, the swap gets used. As the swap disk is much slower than RAM, the performance goes down, and thrashing occurs. At this point, even logins into the system might become impossible. As more and more memory gets written to, eventually both physical- and swap memory are completely exhausted and the OOM killer kicks in, killing one or more processes. In our case, quite a lot of swap is available, so the time of poor performance is long.

and

https://wiki.debian.org/Swap

a portion of the Debian link above,

Information and considerations related to the amount of swap to use:

"The recommended amount of swap space has traditionally been double the amount of system memory. This has changed over time to one and half times system memory, both answers are decent baselines but are becoming less and less useful answers to the question as time passes. There are many variables about your system and intended use that will determine the available system swap you will want to have."

You may try:

"Best way to disable swap in Linux"


***Personal note:***

Since I've 6 GB RAM and in all my recently Linux OSs. I've never seen any indication of the use of Swap. I determined it I must turn it off either for space (a few gigabytes more) or because it has slowed my system sometimes.

  • 2
    In the past, some application vendors recommended swap of a size equal to the RAM, or even twice the RAM. I feel much older seeing that, somehow... Even though I still have one of the HDDs at the ~528MB barrier and also a 2.5GB, somehow that quote - well it's something from so very long ago... Interesting quote though and it might explain why I saw similar problems a few years ago. I believe I used sysctl to fix it but I don't remember exactly what setting if that was it eve. – Pryftan Feb 10 '19 at 14:10