How to fine tune Linux cache/swap ratio?

Question

I've got a problem with my PC which makes the GUI almost unusable, very sloppy until dead.

I've come down in my analysis that it is caused by the cache/buff forcing too much swapping. Is there any way to finetune these settings?

Usecase: simply read or write tons of data from/to any harddrive (not ssd). Let's say using dd or f3read/f3write. After about a minute the cache or buff gets so large that linux starts swapping heavily.

In this atop snippet you see this in the PAG row.

MEM |  tot    15.5G |  free    3.5G  | cache   7.8G  |  buff   96.1M |  slab  394.5M |  vmbal   0.0M  | hptot   0.0M  |
SWP |  tot     1.0G |  free  634.7M  |               |               |               |  vmcom   8.5G  | vmlim   8.8G  |
PAG |  scan  156637 |  steal 156616  | stall      0  |               |               |  swin       0  | swout  11814  |
PSI |  cs     0/0/2 |  ms     5/2/2  | mf     5/2/1  |  is  50/24/15 |  if  50/24/15 |                |               |
DSK |           sdb |  busy     56%  | read      61  |  write   1312 |  MBr/s    0.0 |  MBw/s  147.8  | avio 3.95 ms  |
DSK |           sda |  busy     24%  | read     100  |  write  11803 |  MBr/s    0.2 |  MBw/s    4.6  | avio 0.20 ms  |

I don't fully understand the meaning of the fields. But I tried the same on my laptop. And everything is similar except the SWOUT stat is far lower and the system does not suffer.

Ubuntu 19.10 Kernel 5.3.0-19-generic on both computers. Swap is on SSD. according to atop SSD busy is between 20 and 50% from swapping mostly.

I already tried setting /proc/sys/vm/swappiness from 60 to 10 which does not help. And I set vfs_cache_pressure from 100 to 50 but this did not help either.

Could it be that the cause lays somewhere else? I did have problems with SATA which should be solved now. And I had a GPU HANG once (on intel) which I believe has been caused by the swapping problem...

When I started to see this problem (before I did a thorough analysis) I added swap (did not have any before) because kswapd always went amok. Adding swap at least prevents kswapd from drawing 100% cpu.

any idea?

I ran into a big problem like this. I'm not certain, but when I used dd if=/dev/sda of=/dev/null bs=512k, let it fill 1/2 of ram with cache, and then restarted the command, I got an unusable GUI. I think because cache that is read a second time is moved from the "inactive" list, to the "active" list, and then the kernel starts trimming the active list. So my Q: do you have the same problem if you make sure to use drop_caches before you start the dd command? https://unix.stackexchange.com/questions/518868/during-disk-read-tests-gui-becomes-unresponsive-for-10s-of-seconds-this-includ — sourcejedi, Nov 01 '19 at 18:08
Also mentioned in the link above - if you have problems because of using dd, there is an option to avoid the cache - iflag=direct / oflag=direct. — sourcejedi, Nov 01 '19 at 18:10
@sourcejedi nice to see you have the same problem. no, it's not related to DD. It happens the same with Dolphin file copy, f3write etc. drop caches does not help either since the caches are filled quickly again. iflag=direct actually helps. but this is a workaround that cannot be used everywhere. — JPT, Nov 01 '19 at 18:50
I described why I asked about drop_caches; it was not to avoid filling the cache. — sourcejedi, Nov 01 '19 at 19:48
AFAIK there's core Linux magic here. It maybe varies a bit in effects depending on the speeds and sizes etc, but there's a problem on all systems. Maybe there are some things that make it much worse, but I think it is already pretty bad from the core. I found switching to BFQ helped in some tests, I think particularly with swapping, but not to the point where I can fully trust the system. — sourcejedi, Nov 01 '19 at 19:54
Hmm. FWIW I was testing on HDD, and in the linked test I disabled swap. I overlooked what you said about this happening when operating on a HDD but not an SSD. That is a very interesting suggestion. — sourcejedi, Nov 01 '19 at 19:57
This sound very similar to a problem I had. See here for answers I received: https://unix.stackexchange.com/questions/499485/how-do-i-use-swap-space-for-emergencies-only — Philip Couling, Nov 01 '19 at 21:32
@PhilipCouling Your Question is interesting, but it's not the same. What you describe happens after hours of work, which ... well... is a Linux-Problem since decades. On my PC this happens after one or two minutes. and it doesn't happen on my laptop with almost identical setup but less RAM. On my PC the bottleneck is swapping OUT, on your's it's swapping IN. — JPT, Nov 02 '19 at 14:49
Actually now I reread, one detail makes this very different. In my case removing swap fixed it. In your case you had no swap to start. — Philip Couling, Nov 02 '19 at 20:22
@sourcejedi could you please try with kernel 4? I tried with linux-modules-4.15.0-1050-oem, the only 4 kernel which is delivered with eaon. The problem is gone. It doesnt swap at all. Mouse doesn't lag. GUI is only slowed down a little bit. — JPT, Nov 03 '19 at 10:00
@PhilipCouling I do have swap and removing the swap fixed it, too (tried this after I wrote the question). But removing swap is a dirty workaround, not a fix. — JPT, Nov 03 '19 at 10:00
Sorry, I don't have time to dig into this at the moment. I was mostly trying to send you the link, that I was hitting a similar problem, and at the time it appeared to be more an artefact of how I had been (re) running my test, than the realistic scenarios I was thinking about. — sourcejedi, Nov 03 '19 at 14:35
@JPT oh I see. I agree it's a dirty work around. With my problem I did find it effectively impossible to prevent Linux from swapping for cache purposes. Zram swap allowed Linux to think it was swapping without trashing the system in the process. — Philip Couling, Nov 03 '19 at 15:23
@PhilipCouling Yes, for your problem this is the only solution, I agree. — JPT, Nov 05 '19 at 12:52

score 0 · Answer 1 · answered Nov 05 '19 at 12:50

I found a Workaround: Use Kernel 4

In Ubuntu 19.10 Eoan this means installing kernel linux-image-4.15.0-1050-oem the only 4.x kernel which is available from the repository. I tried 19.4 or older kernels, they don't work in Ubuntu 19.10

I will try to find out the real cause and will publish the solution here if I found it.

Until then, I think this might be possible direction: https://bugs.freedesktop.org/show_bug.cgi?id=111790

How to fine tune Linux cache/swap ratio?

1 Answers1