How to turn off block cache for individual processes?

Question

I have a big system with many disk-bound services. They work much better with the usage of the block cache.

Beside that, also some backup process is running.

I know how should they use the block cache: they should absolutely not.

Backup happens by copying a block device to another with a buffer command. The probability that it would require any caching is practically zero.

However, if the backup runs, it makes the ordinary services worse. Giving a low ionice to it does not help too much - the problem is not its IO priority, but that it overwrites the block cache with unneeded data.

Can I somehow set up this buffer command to not use the block cache at all?

It copies lvm volumes to another, if it matters.

score 5 · Accepted Answer · answered May 30 '20 at 20:32

I've found the nocache tool for the task.

In general, it is not possible in Linux: there is no such option, or flag, or anything, what could be set up for a process.

However, the posix_fadvise(...) call can be used to advice the block/buffer cache subsystem, when a consecutive read/write operation is expected. A POSIX_FADV_DONTNEED gives the "extra information" to the kernel, that it should not cache them, because it won't be re-read in the close future.

nocache intercepts all the important file operations with a posix_fadvise(...) through a shared lib injected by the LD_PRELOAD environment variable.

As in the name, it is only an advice; however my experiments show a huge performance improvement (effectively, other important tasks can run, parallel with the backups in the background, without a visible performance decrease for the end users).

Marcus Müller · Answer 2 · 2023-07-06T21:54:21.413

Tools like nocache are actually not the appropriate solution. To cite nocache's source:

What this tool is not good for:

Controlling how your page cache is used

Why do you think some random tool you found on GitHub can do better than the Linux Kernel?

Defending against cache thrashing

Use cgroups to bound the amount of memory a process has. See below or search the internet, this is widely known, works reliably, and does not introduce performance penalties or potentially dangerous behavior like this tool does.

So, use cgroups (to be more precise, in 2023 definitely cgroupsv2 whenever possible) to bound the amount of cache your process can use (and thus bound the amount of cache it can evict):

How to run a process and its children in a memory-bounded cgroup

Do this if you e.g. want to run a backup but don’t want your system to slow down due to page cache thrashing.

If you use systemd

If your distro uses systemd, this is very easy. Systemd allows to run a process (and its subprocesses) in a “scope”, which is a cgroup, and you can specify parameters that get translated to cgroup limits.

When I run my backups, I do:
$ systemd-run --scope --property=MemoryLimit=500M -- backup command 

(MemoryMax for v2)

The effect is that cache space stays bounded by an additional max 500MiB:

Before:
$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7.5G        2.4G        1.3G        1.0G        3.7G        3.7G
Swap:          9.7G         23M        9.7G
During (notice how buff/cache only goes up by ~300MiB):
$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7.5G        2.5G        1.0G        1.1G        4.0G        3.6G
Swap:          9.7G         23M        9.7G
How does this work?

Use systemd-cgls to list the cgroups systemd creates. On my system, the above command creates a group called run-u467.scope in the system.slice parent group; you can inspect its memory settings like this:
$ mount | grep cgroup | grep memory cgroup on
/sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec
latime,memory)
$ cat /sys/fs/cgroup/memory/system.slice/run-u467.scope/memory.limit_in_bytes
524288000

With it, you bound the memory what the process can use. OP wants a different thing, he wanted a process to not use block cache (i.e. it wanted its block read results to avoid tge read cache and block writes to avoid the write cache). I think your answer is NAA. — peterh, Jul 06 '23 at 21:38
@peterh but you're mistaken there. The memory that gets bounded includes the caches. (note that this text isn't by me, it's literally by the nocache author, hence the full quotation; I do agree with them, that's what bounding the memory in cgroups does.) Also note that you do not want to completely disable disk caches for things like a backup or an antivirus scan: Readahead caches are very useful in exactly that scenario. — Marcus Müller, Jul 06 '23 at 21:50
Wow, I am the OP! I see it just now. :-) Right, if you have a reasonable upper bound for the process (a tar or dd is likely so) then it is a good partial solution. Although it does not turn off the cache entirely, but it is close to that. I am really sorry, I do not really understand why it is so in the nocache source, but I think of the statements n the source, are blatantly crap. I am sorry that this is what I say about a statement about a software from its author, but I am sure. Yes, his tool is good to turn off block cache to avoid cash trashing, even if he does not admit it, — peterh, Jul 16 '23 at 05:37
I know this because I experienced it, and I know this exactly because I trust Linux Kernel, and his tool is actually advicing the linux kernel page cache to do things better. — peterh, Jul 16 '23 at 05:39
But I believe cgroupv2 is the good direction here on the long-term, and now I would try your solution first. — peterh, Jul 16 '23 at 05:43

How to turn off block cache for individual processes?

2 Answers2

How to run a process and its children in a memory-bounded cgroup

If you use systemd

How does this work?

Linked

Related