3

I have a problem with my Linux machine where system now seems to run easily out of RAM (and trigger OOM Killer) when it normally can handle similar load just fine. Inspecting free -tm shows that buff/cache is eating lots of RAM. Normally this would be fine because I want to cache the disk IO but it now seems that kernel cannot release this memory even if system is going out of RAM.

The system looks currently like this:

              total        used        free      shared  buff/cache   available
Mem:          31807       15550        1053       14361       15203        1707
Swap:           993         993           0
Total:        32801       16543        1053

but when I try to force the cache to be released I get this:

$ grep -E "^MemTotal|^Cached|^Committed_AS" /proc/meminfo 
MemTotal:       32570668 kB
Cached:         15257208 kB
Committed_AS:   47130080 kB

$ time sync real 0m0.770s user 0m0.000s sys 0m0.002s

$ time echo 3 | sudo tee /proc/sys/vm/drop_caches 3 real 0m3.587s user 0m0.008s sys 0m0.680s

$ grep -E "^MemTotal|^Cached|^Committed_AS" /proc/meminfo MemTotal: 32570668 kB Cached: 15086932 kB Committed_AS: 47130052 kB

So writing all dirty pages to disks and dropping all caches was only able to release about 130 MB out of 15 GB cache? As you can see, I'm running pretty heavy overcommit already so I really cannot waste 15 GB of RAM for a non-working cache.

Kernel slabtop also claims to use less than 600 MB:

$ sudo slabtop -sc -o | head
 Active / Total Objects (% used)    : 1825203 / 2131873 (85.6%)
 Active / Total Slabs (% used)      : 57745 / 57745 (100.0%)
 Active / Total Caches (% used)     : 112 / 172 (65.1%)
 Active / Total Size (% used)       : 421975.55K / 575762.55K (73.3%)
 Minimum / Average / Maximum Object : 0.01K / 0.27K / 16.69K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
247219 94755 0% 0.57K 8836 28 141376K radix_tree_node
118864 118494 0% 0.69K 5168 23 82688K xfrm_state
133112 125733 0% 0.56K 4754 28 76064K ecryptfs_key_record_cache

$ cat /proc/version_signature Ubuntu 5.4.0-80.90~18.04.1-lowlatency 5.4.124

$ cat /proc/meminfo MemTotal: 32570668 kB MemFree: 1009224 kB MemAvailable: 0 kB Buffers: 36816 kB Cached: 15151936 kB SwapCached: 760 kB Active: 13647104 kB Inactive: 15189688 kB Active(anon): 13472248 kB Inactive(anon): 14889144 kB Active(file): 174856 kB Inactive(file): 300544 kB Unevictable: 117868 kB Mlocked: 26420 kB SwapTotal: 1017824 kB SwapFree: 696 kB Dirty: 200 kB Writeback: 0 kB AnonPages: 13765260 kB Mapped: 879960 kB Shmem: 14707664 kB KReclaimable: 263184 kB Slab: 601400 kB SReclaimable: 263184 kB SUnreclaim: 338216 kB KernelStack: 34200 kB PageTables: 198116 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 17303156 kB Committed_AS: 47106156 kB VmallocTotal: 34359738367 kB VmallocUsed: 67036 kB VmallocChunk: 0 kB Percpu: 1840 kB HardwareCorrupted: 0 kB AnonHugePages: 122880 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 9838288 kB DirectMap2M: 23394304 kB

Can you suggest any explanation what could be causing the Cached in /proc/meminfo to take about 50% of the system RAM without ability to release it? I know that PostgreSQL shared_buffers with huge pages enabled would show up as Cached but I'm not running PostgreSQL on this machine. I see that Shmem in meminfo looks suspiciously big but how to figure out which processes are using that?

I guess it could be some misbehaving program but how to query the system to figure out which process is holding that RAM? I currently have 452 processes / 2144 threads so investigating all of those manually would be a huge task.

I also checked that the cause of this RAM usage is not (only?) System V shared memory:

$ ipcs -m | awk 'BEGIN{ sum=0 } { sum += $5 } END{print sum}'
1137593612

While total bytes reported by ipcs is big, it's still "only" 1.1 GB.

I also found similar question https://askubuntu.com/questions/762717/high-shmem-memory-usage where high Shmem usage was caused by crap in tmpfs mounted directory. However, that doesn't seem to be the problem with my system either, using only 221 MB:

$ df -h -B1M | grep tmpfs
tmpfs                    3181       3      3179   1% /run
tmpfs                   15904     215     15689   2% /dev/shm
tmpfs                       5       1         5   1% /run/lock
tmpfs                   15904       0     15904   0% /sys/fs/cgroup
tmpfs                    3181       1      3181   1% /run/user/1000
tmpfs                    3181       1      3181   1% /run/user/1001

I found another answer that explained that files that used to live on tmpfs system that's already been deleted but the file handle is still alive doesn't show up in df output but still eats RAM. I found out that Google Chrome wastes about 1.6 GB to deleted files that it has forgotten(?) to close:

$ sudo lsof -n | grep "/dev/shm" | grep deleted | grep -o 'REG.*' | awk 'BEGIN{sum=0}{sum+=$3}END{print sum}'
1667847810

(Yeah, above doesn't filter chrome but I also tested with filtering and that's pretty much just Google Chrome wasting my RAM via deleted files with open file handles.)

Update: It seems that the real culprit is Shmem: 14707664 kB and 1.6 GB is explained by deleted files in tmpfs, System V shared memory explains 1.1 GB and existing files in tmpfs about 220 MB. So I'm still missing about 11.8 GB somewhere.

At least with Linux kernel 5.4.124 it appears that Cached includes all of Shmem which is the explanation why echo 3 > drop_caches cannot zero the field Cached even though it does free the cache.

So the real question is why Shmem is taking over 10 GB of RAM when I wasn't expecting any?

Update: I checked out top and found out that fields RSan ("RES Anonymous") and RSsh ("RES Shared") pointed to thunderbird and Eclipse. Closing Thunderbird didn't release any cached memory but closing Eclipse freed 3.9 GB of Cached. I'm running Eclipse with JVM flag -Xmx4000m so it seems that JVM memory usage may appear as Cached! I'd still prefer to find a method to map memory usage to processes instead of randomly closing processes and checking if it freed any memory.

Update: File systems that use tmpfs behind the scenes could also cause Shmem to increase. I tested it like this:

$ df --output=used,source,fstype -B1M | grep -v '/dev/sd' | grep -v ecryptfs | tail -n +2 | awk 'BEGIN{sum=0}{sum+=$1}END{print sum}'
4664

So it seems that even if I only exclude filesystems backed by real block devices (my ecryptfs is mounted on those block devices, too) I can only explain about 4.7 GB of lost memory. And 4.3 GB of that is explained by snapd created squashfs mounts which to my knowledge do not use Shmem.

Update: For some people, the explanation has been GEM objects reserved by GPU driver. There doesn't seem to be any standard interface to query these but for my Intel integrated grapchics, I get following results:

$ sudo sh -c 'cat /sys/kernel/debug/dri/*/i915_gem_objects' | perl -npe 's#([0-9]+) bytes#sprintf("%.1f", $1/1024/1024)." MB"#e'
1166 shrinkable [0 free] objects, 776.8 MB

Xorg: 114144 objects, 815.9 MB (38268928 active, 166658048 inactive, 537980928 unbound, 0 closed) calibre-paralle: 1 objects, 0.0 MB (0 active, 0 inactive, 32768 unbound, 0 closed) Xorg: 595 objects, 1329.9 MB (0 active, 19566592 inactive, 1360146432 unbound, 0 closed) chrome: 174 objects, 63.2 MB (0 active, 0 inactive, 66322432 unbound, 0 closed) chrome: 174 objects, 63.2 MB (0 active, 0 inactive, 66322432 unbound, 0 closed) chrome: 20 objects, 1.2 MB (0 active, 0 inactive, 1241088 unbound, 0 closed) firefox: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) GLXVsyncThread: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) chrome: 1100 objects, 635.1 MB (0 active, 0 inactive, 180224 unbound, 0 closed) chrome: 1100 objects, 635.1 MB (0 active, 665772032 inactive, 180224 unbound, 0 closed) chrome: 20 objects, 1.2 MB (0 active, 0 inactive, 1241088 unbound, 0 closed) [k]contexts: 3 objects, 0.0 MB (0 active, 40960 inactive, 0 unbound, 0 closed)

Those results do not sensible to me. If each of those lines were an actual memory allocation the total would be in hundreds of gigabytes!

Even if I assume that the GPU driver just reports some lines multiple times, I get this:

$ sudo sh -c 'cat /sys/kernel/debug/dri/*/i915_gem_objects' | sort | uniq | perl -npe 's#([0-9]+) bytes#sprintf("%.1f", $1/1024/1024)." MB"#e'

1218 shrinkable [0 free] objects, 797.6 MB calibre-paralle: 1 objects, 0.0 MB (0 active, 0 inactive, 32768 unbound, 0 closed) chrome: 1134 objects, 645.0 MB (0 active, 0 inactive, 163840 unbound, 0 closed) chrome: 1134 objects, 645.0 MB (0 active, 676122624 inactive, 163840 unbound, 0 closed) chrome: 174 objects, 63.2 MB (0 active, 0 inactive, 66322432 unbound, 0 closed) chrome: 20 objects, 1.2 MB (0 active, 0 inactive, 1241088 unbound, 0 closed) firefox: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) GLXVsyncThread: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) [k]contexts: 2 objects, 0.0 MB (0 active, 24576 inactive, 0 unbound, 0 closed) Renderer: 4844 objects, 7994.5 MB (0 active, 0 inactive, 8382816256 unbound, 0 closed) Xorg: 114162 objects, 826.8 MB (0 active, 216350720 inactive, 537980928 unbound, 0 closed) Xorg: 594 objects, 1329.8 MB (14794752 active, 4739072 inactive, 1360146432 unbound, 0 closed)

That's still way over the expected total numbers in range 4-8 GB. (The system has currently two seats logged in so I'm expecting to see two Xorg processes.)

Update: Looking the GPU debug output a bit more, I now think that those unbound numbers mean virtual blocks without actual RAM used. If I do this I get more sensible numbers for GPU memory usage:

$ sudo sh -c 'cat /sys/kernel/debug/dri/*/i915_gem_objects' | perl -npe 's#^(.*?): .*?([0-9]+) bytes.*?([0-9]+) unbound.*#sprintf("%s: %.1f", $1, ($2-$3)/1024/1024)." MB"#eg' | grep -v '0.0 MB'
1292 shrinkable [0 free] objects, 848957440 bytes

Xorg: 303.1 MB Xorg: 32.7 MB chrome: 667.5 MB chrome: 667.5 MB

That could explain about 1.5 GB of RAM which seems normal for the data I'm handling. I'm still missing multiple gigabytes to somewhere!

Update: I'm currently thinking that the problem is actually caused by deleted files backed by RAM. These could be caused by broken software that leaks open file handle after deleting/discarding the file. When I run

$ sudo lsof -n | grep -Ev ' /home/| /tmp/| /lib/| /usr/' | grep deleted | grep -o " REG .*" | awk 'BEGIN{sum=0}{sum+=$3}END{print sum / 1024 / 1024 " MB"}'
4560.65 MB

(The manually collected list of path prefixes are actually backed by real block devices - since my root is backed by real block device, I cannot just list all the block mount points here. A more clever script could list all non-mount-point directories in root and also list all block mounts longer than just / here.)

This explains nearly 4.6 GB of lost RAM. Combined with the output from ipcs, GPU RAM (with the assumption about unbound memory) and tmpfs usage I'm still currently missing about 4 GB Shmem somewhere!

  • Why do I get a feeling of Déjà vu. – Bib Aug 29 '21 at 19:20
  • This seems to provide somewhat similar info but doesn't explain my case: https://unix.stackexchange.com/questions/56879/tracking-down-missing-memory-usage-in-linux – Mikko Rantalainen Aug 29 '21 at 19:22
  • Here's a good article for somewhat similar cases: https://cdn.ttgtmedia.com/searchEnterpriseLinux/downloads/Linux_Toolbox.pdf – Mikko Rantalainen Aug 29 '21 at 19:55
  • tmpfs is a common cause and | grep tmpfs is not a good way to check for it as they're not necessarily called that. Deleted but open files would still show as used in df -h though. Lazy umounts with any open filehandle would still use space but not show up in df at all. If there is a tool that covers all possibilities then I'd like to know too, these can be so difficult to track down sometimes. – frostschutz Aug 29 '21 at 21:05
  • I rechecked by checking all mounts and df cannot report the missing RAM for my case. The more I look at it, the more sure I'm that the problem is caused by Shmem usage. Also note lsof shows more deleted file usage than df reports for the same tmpfs! As a result, I would assume that df does not report deleted files that are still open. – Mikko Rantalainen Aug 30 '21 at 06:44
  • See also: https://unix.stackexchange.com/a/483006/20336 – Mikko Rantalainen Aug 30 '21 at 07:58
  • If you're feeling lucky, you could just manually close open files to free the RAM: https://superuser.com/a/963680/100154 – obviously, it's hard to know if a deleted file with an open file handle is a leak or actually used and if you close the file behind the process that's using it, you may experience any random behavior when that process tries to use that file again. – Mikko Rantalainen Aug 30 '21 at 09:09
  • If you have lots of leaking apps, the old-age solution is to add enough swap to keep the leaked memory and trust that kernel can figure out which part of the RAM is actually used. As far as I know, even Shmem can be swapped out. – Mikko Rantalainen Aug 30 '21 at 09:16
  • See also: https://lwn.net/Articles/634978/ – Mikko Rantalainen Aug 30 '21 at 14:45
  • We had unexpected power outage to I no longer can debug the same system because it got restarted as a result. After reboot it hasn't shown similar behavior so I think this was caused by some bug (either user mode or kernel mode code). I'm still looking for hints how to debug this kind of situation in the future. – Mikko Rantalainen Sep 03 '21 at 07:50

2 Answers2

0

I'm the author of the question above and even though full answer hasn't surfaced this far, here's the best known explanation this far:

  • With modern Linux kernel, the Cached value of /proc/meminfo no longer describes the amount of disk cache. However, the kernel developers considered that changing this at this point is already too late.

  • In practice, to actually measure the amount of disk cache in use, you should compute Cached - Shmem to estimate it. If you take the numbers from original question you get 15151936−14707664 (kiB) (from the output of /proc/meminfo) or 444272 (kiB), so it appears that the system actually had about 433 MiB of disk cache. In that case, it should be obvious that dropping all disk cache wouldn't free a lot of memory (even if all disk cache were dropped, the Cached field would have decreased only 3%.

So the best guess is that some user mode software was using a lot of shared memory (typically tmpfs or shared memory maps) and that was causing the Cached to show high values despite the fact that system actually had very little disk cache which suggests it was close to getting into out-of-memory condition. I think Committed_AS being way more than MemTotal supports this theory.


Here's a (shortened) copy of the conclusion from the above linked linux-mm thread in case the above link doesn't work in the future:

Subject: Re: Why is Shmem included in Cached in /proc/meminfo?
From: Vlastimil Babka @ 2021-08-30 16:05 UTC

On 8/30/21 12:44 AM, Mikko Rantalainen wrote:

It's not immediately obvious from fs/proc/meminfo.c function meminfo_proc_show() but the output of Cached: field seems to always include all of Shmem: field, too

However, if we change it now, we might create even larger confusion. People looking at the output for the first time (and IIRC also the 'free' command uses it) on a new kernel wouldn't be misled anymore. But people working with both old and new kernels will now have to take in account that it changed at some point... not good.

From: Khalid Aziz @ 2021-08-30 19:38 UTC

On Mon, 2021-08-30 at 20:26 +0300, Mikko Rantalainen wrote:

Of course one possible solution is to keep "Cached" as is and introduce "Cache" with the real cache semantics (that is, it includes sum of (Cached - Shmem) and memory backed RAM). That way system administrators would at least see two different fields with unique values and look for the documentation.

I would recommend adding new field. There is likely to be a good number of tools/scripts out there that already interpret the data from /proc/meminfo and possily take actions based upon that data. Those tools will break if we change the sense of existing data. A new field has the down side of expanding the output further but it also doesn't break existing tols.

0

I'm currently writing a tool to help diagnose memory issues, based on the information in a document from RedHat with some formulas.

About disk cache/tmpfs, what I understand is:

cache = disk cache - swap cache - tmpfs ram usage

tmpfs can reside in swap, so we must compute the real memory usage of tmpfs first.

Simple solution:

shmem = shared memory segments + tmpfs ram

However, shared memory segments can also be in swap, and it seems that shmem does not include huge-page shared memory segments (tested on kernels 5.4 and 5.15).

More precise solution

shmem = "4k pages sysvipc shm rss" + tmpfs ram usage

"4k sysvipc shm rss" is sum of memory used by shared memory segments with standard page size (4k), so no huge pages.

You can get the RSS usage of memory segments under /proc/sysvipc/shm.

The fact that an shm is using 4k or 2M pages doesn't seem to be exposed under /proc, but it is possible to get that info by attaching to a shared memory segment and scanning the physical pages (/proc/kpageflags). I used that to add the number of shared memory pages to the output:

sudo ./memstats groups
[...]
Scanning shm...
Shared memory segments (MiB):
         key           id       Size        RSS         4k/2M        SWAP   USED%        SID
============================================================================================
           0            3          9          9       2442/0            0  100.02           
           0            2          9         10          0/5            0  104.86           
[...]
Toby Speight
  • 8,678