How to diagnose high `Shmem`? (was: Why `echo 3 > drop_caches` cannot zero the cache?)

Question

I have a problem with my Linux machine where system now seems to run easily out of RAM (and trigger OOM Killer) when it normally can handle similar load just fine. Inspecting free -tm shows that buff/cache is eating lots of RAM. Normally this would be fine because I want to cache the disk IO but it now seems that kernel cannot release this memory even if system is going out of RAM.

The system looks currently like this:

              total        used        free      shared  buff/cache   available
Mem:          31807       15550        1053       14361       15203        1707
Swap:           993         993           0
Total:        32801       16543        1053

but when I try to force the cache to be released I get this:

$ grep -E "^MemTotal|^Cached|^Committed_AS" /proc/meminfo 
MemTotal:       32570668 kB
Cached:         15257208 kB
Committed_AS:   47130080 kB
$ time sync
real    0m0.770s
user    0m0.000s
sys 0m0.002s
$ time echo 3 | sudo tee /proc/sys/vm/drop_caches 
3
real    0m3.587s
user    0m0.008s
sys 0m0.680s
$ grep -E "^MemTotal|^Cached|^Committed_AS" /proc/meminfo 
MemTotal:       32570668 kB
Cached:         15086932 kB
Committed_AS:   47130052 kB

So writing all dirty pages to disks and dropping all caches was only able to release about 130 MB out of 15 GB cache? As you can see, I'm running pretty heavy overcommit already so I really cannot waste 15 GB of RAM for a non-working cache.

Kernel slabtop also claims to use less than 600 MB:

$ sudo slabtop -sc -o | head
 Active / Total Objects (% used)    : 1825203 / 2131873 (85.6%)
 Active / Total Slabs (% used)      : 57745 / 57745 (100.0%)
 Active / Total Caches (% used)     : 112 / 172 (65.1%)
 Active / Total Size (% used)       : 421975.55K / 575762.55K (73.3%)
 Minimum / Average / Maximum Object : 0.01K / 0.27K / 16.69K
OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

247219  94755   0%    0.57K   8836       28    141376K radix_tree_node

118864 118494   0%    0.69K   5168       23     82688K xfrm_state

133112 125733   0%    0.56K   4754       28     76064K ecryptfs_key_record_cache
$ cat /proc/version_signature 
Ubuntu 5.4.0-80.90~18.04.1-lowlatency 5.4.124
$ cat /proc/meminfo 
MemTotal:       32570668 kB
MemFree:         1009224 kB
MemAvailable:          0 kB
Buffers:           36816 kB
Cached:         15151936 kB
SwapCached:          760 kB
Active:         13647104 kB
Inactive:       15189688 kB
Active(anon):   13472248 kB
Inactive(anon): 14889144 kB
Active(file):     174856 kB
Inactive(file):   300544 kB
Unevictable:      117868 kB
Mlocked:           26420 kB
SwapTotal:       1017824 kB
SwapFree:            696 kB
Dirty:               200 kB
Writeback:             0 kB
AnonPages:      13765260 kB
Mapped:           879960 kB
Shmem:          14707664 kB
KReclaimable:     263184 kB
Slab:             601400 kB
SReclaimable:     263184 kB
SUnreclaim:       338216 kB
KernelStack:       34200 kB
PageTables:       198116 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    17303156 kB
Committed_AS:   47106156 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       67036 kB
VmallocChunk:          0 kB
Percpu:             1840 kB
HardwareCorrupted:     0 kB
AnonHugePages:    122880 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     9838288 kB
DirectMap2M:    23394304 kB

Can you suggest any explanation what could be causing the Cached in /proc/meminfo to take about 50% of the system RAM without ability to release it? I know that PostgreSQL shared_buffers with huge pages enabled would show up as Cached but I'm not running PostgreSQL on this machine. I see that Shmem in meminfo looks suspiciously big but how to figure out which processes are using that?

I guess it could be some misbehaving program but how to query the system to figure out which process is holding that RAM? I currently have 452 processes / 2144 threads so investigating all of those manually would be a huge task.

I also checked that the cause of this RAM usage is not (only?) System V shared memory:

$ ipcs -m | awk 'BEGIN{ sum=0 } { sum += $5 } END{print sum}'
1137593612

While total bytes reported by ipcs is big, it's still "only" 1.1 GB.

I also found similar question https://askubuntu.com/questions/762717/high-shmem-memory-usage where high Shmem usage was caused by crap in tmpfs mounted directory. However, that doesn't seem to be the problem with my system either, using only 221 MB:

$ df -h -B1M | grep tmpfs
tmpfs                    3181       3      3179   1% /run
tmpfs                   15904     215     15689   2% /dev/shm
tmpfs                       5       1         5   1% /run/lock
tmpfs                   15904       0     15904   0% /sys/fs/cgroup
tmpfs                    3181       1      3181   1% /run/user/1000
tmpfs                    3181       1      3181   1% /run/user/1001

I found another answer that explained that files that used to live on tmpfs system that's already been deleted but the file handle is still alive doesn't show up in df output but still eats RAM. I found out that Google Chrome wastes about 1.6 GB to deleted files that it has forgotten(?) to close:

$ sudo lsof -n | grep "/dev/shm" | grep deleted | grep -o 'REG.*' | awk 'BEGIN{sum=0}{sum+=$3}END{print sum}'
1667847810

(Yeah, above doesn't filter chrome but I also tested with filtering and that's pretty much just Google Chrome wasting my RAM via deleted files with open file handles.)

Update: It seems that the real culprit is Shmem: 14707664 kB and 1.6 GB is explained by deleted files in tmpfs, System V shared memory explains 1.1 GB and existing files in tmpfs about 220 MB. So I'm still missing about 11.8 GB somewhere.

At least with Linux kernel 5.4.124 it appears that Cached includes all of Shmem which is the explanation why echo 3 > drop_caches cannot zero the field Cached even though it does free the cache.

So the real question is why Shmem is taking over 10 GB of RAM when I wasn't expecting any?

Update: I checked out top and found out that fields RSan ("RES Anonymous") and RSsh ("RES Shared") pointed to thunderbird and Eclipse. Closing Thunderbird didn't release any cached memory but closing Eclipse freed 3.9 GB of Cached. I'm running Eclipse with JVM flag -Xmx4000m so it seems that JVM memory usage may appear as Cached! I'd still prefer to find a method to map memory usage to processes instead of randomly closing processes and checking if it freed any memory.

Update: File systems that use tmpfs behind the scenes could also cause Shmem to increase. I tested it like this:

$ df --output=used,source,fstype -B1M | grep -v '/dev/sd' | grep -v ecryptfs | tail -n +2 | awk 'BEGIN{sum=0}{sum+=$1}END{print sum}'
4664

So it seems that even if I only exclude filesystems backed by real block devices (my ecryptfs is mounted on those block devices, too) I can only explain about 4.7 GB of lost memory. And 4.3 GB of that is explained by snapd created squashfs mounts which to my knowledge do not use Shmem.

Update: For some people, the explanation has been GEM objects reserved by GPU driver. There doesn't seem to be any standard interface to query these but for my Intel integrated grapchics, I get following results:

$ sudo sh -c 'cat /sys/kernel/debug/dri/*/i915_gem_objects' | perl -npe 's#([0-9]+) bytes#sprintf("%.1f", $1/1024/1024)." MB"#e'
1166 shrinkable [0 free] objects, 776.8 MB
Xorg: 114144 objects, 815.9 MB (38268928 active, 166658048 inactive, 537980928 unbound, 0 closed)
calibre-paralle: 1 objects, 0.0 MB (0 active, 0 inactive, 32768 unbound, 0 closed)
Xorg: 595 objects, 1329.9 MB (0 active, 19566592 inactive, 1360146432 unbound, 0 closed)
chrome: 174 objects, 63.2 MB (0 active, 0 inactive, 66322432 unbound, 0 closed)
chrome: 174 objects, 63.2 MB (0 active, 0 inactive, 66322432 unbound, 0 closed)
chrome: 20 objects, 1.2 MB (0 active, 0 inactive, 1241088 unbound, 0 closed)
firefox: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
GLXVsyncThread: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
chrome: 1100 objects, 635.1 MB (0 active, 0 inactive, 180224 unbound, 0 closed)
chrome: 1100 objects, 635.1 MB (0 active, 665772032 inactive, 180224 unbound, 0 closed)
chrome: 20 objects, 1.2 MB (0 active, 0 inactive, 1241088 unbound, 0 closed)
[k]contexts: 3 objects, 0.0 MB (0 active, 40960 inactive, 0 unbound, 0 closed)

Those results do not sensible to me. If each of those lines were an actual memory allocation the total would be in hundreds of gigabytes!

Even if I assume that the GPU driver just reports some lines multiple times, I get this:

$ sudo sh -c 'cat /sys/kernel/debug/dri/*/i915_gem_objects' | sort | uniq | perl -npe 's#([0-9]+) bytes#sprintf("%.1f", $1/1024/1024)." MB"#e'
1218 shrinkable [0 free] objects, 797.6 MB
calibre-paralle: 1 objects, 0.0 MB (0 active, 0 inactive, 32768 unbound, 0 closed)
chrome: 1134 objects, 645.0 MB (0 active, 0 inactive, 163840 unbound, 0 closed)
chrome: 1134 objects, 645.0 MB (0 active, 676122624 inactive, 163840 unbound, 0 closed)
chrome: 174 objects, 63.2 MB (0 active, 0 inactive, 66322432 unbound, 0 closed)
chrome: 20 objects, 1.2 MB (0 active, 0 inactive, 1241088 unbound, 0 closed)
firefox: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
GLXVsyncThread: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
[k]contexts: 2 objects, 0.0 MB (0 active, 24576 inactive, 0 unbound, 0 closed)
Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed)
Xorg: 114162 objects, 826.8 MB (0 active, 216350720 inactive, 537980928 unbound, 0 closed)
Xorg: 594 objects, 1329.8 MB (14794752 active, 4739072 inactive, 1360146432 unbound, 0 closed)

That's still way over the expected total numbers in range 4-8 GB. (The system has currently two seats logged in so I'm expecting to see two Xorg processes.)

Update: Looking the GPU debug output a bit more, I now think that those unbound numbers mean virtual blocks without actual RAM used. If I do this I get more sensible numbers for GPU memory usage:

$ sudo sh -c 'cat /sys/kernel/debug/dri/*/i915_gem_objects' | perl -npe 's#^(.*?): .*?([0-9]+) bytes.*?([0-9]+) unbound.*#sprintf("%s: %.1f", $1, ($2-$3)/1024/1024)." MB"#eg' | grep -v '0.0 MB'
1292 shrinkable [0 free] objects, 848957440 bytes
Xorg: 303.1 MB
Xorg: 32.7 MB
chrome: 667.5 MB
chrome: 667.5 MB

That could explain about 1.5 GB of RAM which seems normal for the data I'm handling. I'm still missing multiple gigabytes to somewhere!

Update: I'm currently thinking that the problem is actually caused by deleted files backed by RAM. These could be caused by broken software that leaks open file handle after deleting/discarding the file. When I run

$ sudo lsof -n | grep -Ev ' /home/| /tmp/| /lib/| /usr/' | grep deleted | grep -o " REG .*" | awk 'BEGIN{sum=0}{sum+=$3}END{print sum / 1024 / 1024 " MB"}'
4560.65 MB

(The manually collected list of path prefixes are actually backed by real block devices - since my root is backed by real block device, I cannot just list all the block mount points here. A more clever script could list all non-mount-point directories in root and also list all block mounts longer than just / here.)

This explains nearly 4.6 GB of lost RAM. Combined with the output from ipcs, GPU RAM (with the assumption about unbound memory) and tmpfs usage I'm still currently missing about 4 GB Shmem somewhere!

This seems to provide somewhat similar info but doesn't explain my case: https://unix.stackexchange.com/questions/56879/tracking-down-missing-memory-usage-in-linux — Mikko Rantalainen, Aug 29 '21 at 19:22
Here's a good article for somewhat similar cases: https://cdn.ttgtmedia.com/searchEnterpriseLinux/downloads/Linux_Toolbox.pdf — Mikko Rantalainen, Aug 29 '21 at 19:55
tmpfs is a common cause and | grep tmpfs is not a good way to check for it as they're not necessarily called that. Deleted but open files would still show as used in df -h though. Lazy umounts with any open filehandle would still use space but not show up in df at all. If there is a tool that covers all possibilities then I'd like to know too, these can be so difficult to track down sometimes. — frostschutz, Aug 29 '21 at 21:05
I rechecked by checking all mounts and df cannot report the missing RAM for my case. The more I look at it, the more sure I'm that the problem is caused by Shmem usage. Also note lsof shows more deleted file usage than df reports for the same tmpfs! As a result, I would assume that df does not report deleted files that are still open. — Mikko Rantalainen, Aug 30 '21 at 06:44
If you're feeling lucky, you could just manually close open files to free the RAM: https://superuser.com/a/963680/100154 – obviously, it's hard to know if a deleted file with an open file handle is a leak or actually used and if you close the file behind the process that's using it, you may experience any random behavior when that process tries to use that file again. — Mikko Rantalainen, Aug 30 '21 at 09:09
If you have lots of leaking apps, the old-age solution is to add enough swap to keep the leaked memory and trust that kernel can figure out which part of the RAM is actually used. As far as I know, even Shmem can be swapped out. — Mikko Rantalainen, Aug 30 '21 at 09:16
We had unexpected power outage to I no longer can debug the same system because it got restarted as a result. After reboot it hasn't shown similar behavior so I think this was caused by some bug (either user mode or kernel mode code). I'm still looking for hints how to debug this kind of situation in the future. — Mikko Rantalainen, Sep 03 '21 at 07:50

score 0 · Accepted Answer · answered Feb 03 '23 at 10:44

I'm the author of the question above and even though full answer hasn't surfaced this far, here's the best known explanation this far:

With modern Linux kernel, the Cached value of /proc/meminfo no longer describes the amount of disk cache. However, the kernel developers considered that changing this at this point is already too late.
In practice, to actually measure the amount of disk cache in use, you should compute Cached - Shmem to estimate it. If you take the numbers from original question you get 15151936−14707664 (kiB) (from the output of /proc/meminfo) or 444272 (kiB), so it appears that the system actually had about 433 MiB of disk cache. In that case, it should be obvious that dropping all disk cache wouldn't free a lot of memory (even if all disk cache were dropped, the Cached field would have decreased only 3%.

So the best guess is that some user mode software was using a lot of shared memory (typically tmpfs or shared memory maps) and that was causing the Cached to show high values despite the fact that system actually had very little disk cache which suggests it was close to getting into out-of-memory condition. I think Committed_AS being way more than MemTotal supports this theory.

Here's a (shortened) copy of the conclusion from the above linked linux-mm thread in case the above link doesn't work in the future:

Subject: Re: Why is Shmem included in Cached in /proc/meminfo?
From: Vlastimil Babka @ 2021-08-30 16:05 UTC

On 8/30/21 12:44 AM, Mikko Rantalainen wrote:

It's not immediately obvious from fs/proc/meminfo.c function meminfo_proc_show() but the output of Cached: field seems to always include all of Shmem: field, too

However, if we change it now, we might create even larger confusion. People looking at the output for the first time (and IIRC also the 'free' command uses it) on a new kernel wouldn't be misled anymore. But people working with both old and new kernels will now have to take in account that it changed at some point... not good.

From: Khalid Aziz @ 2021-08-30 19:38 UTC

On Mon, 2021-08-30 at 20:26 +0300, Mikko Rantalainen wrote:

Of course one possible solution is to keep "Cached" as is and introduce "Cache" with the real cache semantics (that is, it includes sum of (Cached - Shmem) and memory backed RAM). That way system administrators would at least see two different fields with unique values and look for the documentation.

I would recommend adding new field. There is likely to be a good number of tools/scripts out there that already interpret the data from /proc/meminfo and possily take actions based upon that data. Those tools will break if we change the sense of existing data. A new field has the down side of expanding the output further but it also doesn't break existing tols.

score 0 · Answer 2 · edited May 08 '23 at 08:07

I'm currently writing a tool to help diagnose memory issues, based on the information in a document from RedHat with some formulas.

About disk cache/tmpfs, what I understand is:

cache = disk cache - swap cache - tmpfs ram usage

tmpfs can reside in swap, so we must compute the real memory usage of tmpfs first.

Simple solution:

shmem = shared memory segments + tmpfs ram

However, shared memory segments can also be in swap, and it seems that shmem does not include huge-page shared memory segments (tested on kernels 5.4 and 5.15).

How to diagnose high `Shmem`? (was: Why `echo 3 > drop_caches` cannot zero the cache?)

2 Answers2

Linked