18

So, I'm trying to do some investigation on where does swap use come from in a system with high swap usage:

# free
             total       used       free     shared    buffers     cached
Mem:        515324     508800       6524          0       4852      27576
-/+ buffers/cache:     476372      38952
Swap:       983032     503328     479704

Adding up swap used per process:

# for proc in /proc/*; do cat $proc/smaps 2>/dev/null | awk '/Swap/{swap+=$2}END{print swap "\t'`readlink $proc/exe`'"}'; done | sort -n | awk '{total+=$1}/[0-9]/;END{print total "\tTotal"}'
0       /bin/gawk
0       /bin/sort
0       /usr/bin/readlink
28      /sbin/xxxxxxxx
52      /sbin/mingetty
52      /sbin/mingetty
52      /sbin/mingetty
52      /sbin/mingetty
56      /sbin/mingetty
56      /sbin/mingetty
60      /xxxxxxxxxxx
60      /usr/sbin/xxx
84      /usr/sbin/xxx
108     /usr/bin/xxx
168     /bin/bash
220     /sbin/init
256     /sbin/rsyslogd
352     /bin/bash
356     /bin/bash
360     /usr/sbin/sshd
496     /usr/sbin/crond
672     /usr/sbin/sshd
12972   /opt/jdk1.6.0_22/bin/java
80392   /usr/libexec/mysqld
311876  /opt/jdk1.6.0_22/bin/java
408780  Total

Which gives a lower value for total used swap. Where is the remaining used swapspace? Is it vmalloc()'ed memory inside the kernel? Something else? How can I identify it?

Output of meminfo:

# cat /proc/meminfo 
MemTotal:       515324 kB
MemFree:          6696 kB
Buffers:          5084 kB
Cached:          28056 kB
SwapCached:     157512 kB
Active:         429372 kB
Inactive:        65068 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       515324 kB
LowFree:          6696 kB
SwapTotal:      983032 kB
SwapFree:       478712 kB
Dirty:             100 kB
Writeback:           0 kB
AnonPages:      399456 kB
Mapped:           8792 kB
Slab:             7744 kB
PageTables:       1820 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   1240692 kB
Committed_AS:  1743904 kB
VmallocTotal:   507896 kB
VmallocUsed:      3088 kB
VmallocChunk:   504288 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     4096 kB
ninj
  • 189
  • Buffers and cache are included and they are not associated with any process. – goldilocks Apr 08 '13 at 15:24
  • 2
    @goldilocks: nope, those are in physical memory. Also, they don't add up. – ninj Apr 08 '13 at 15:26
  • You're right, I guess caching stuff to swap would be sort of pointless. However, I think stuff that is swapped out may be left there and tracked even after the process which owns is defunct, as long as that swap space is not otherwise needed; this saves time later if a process loads the same page and then that page has to be swapped out again -- it's still there in swap already. Google "swap cache" http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314 This parallels how the actual "cache cache" comes to be (it's stuff saved in memory from now defunct processes). – goldilocks Apr 08 '13 at 15:41
  • ...meaning, lol, that "caching stuff in swap" is not so pointless, just that it doesn't get there by swapping the RAM cache out. – goldilocks Apr 08 '13 at 15:48
  • @goldilocks: that would be the SwapCached part in /proc/meminfo, which again, doesn't add up. It's also my understanding that SwapFree is updated whenever a swap page is gotten/released, regardless of whether it still exists as a swap_cache page. – ninj Apr 08 '13 at 15:58
  • See mm/swapfile.c: free_swap_and_cache() calls swap_entry_free() which increments nr_swap_pages (SwapFree), but only calls delete_from_swap_cache() (which decrements total_swapcache_pages - SwapCached) when the swap page is not still mapped elsewhere or the swap is full. – ninj Apr 08 '13 at 16:06
  • Look at this http://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system – innocent-world Aug 23 '13 at 15:05
  • 1
    Isn't the answer just that the kernel can swap, and that's not included in your processing? Particularly the kernel has a whole heap of "user space" processes nowadays... Just a considered guess tho. – iain Oct 02 '13 at 22:23
  • AFAIK the SwapCached is pages that were in the swap and have been put back in memory most probably for reading, so once the read is done, the kernel can just reclaim the memory pages without touching the swap. I think Iain is probably right, the extra 100MB is kernel data swapped (e.g. modules which are currently not being actively used). However, I have no clue how to verify this. – Huygens Oct 15 '13 at 11:22
  • it might be that you have a RAM-backed directory mounted somewhere. – strugee Oct 19 '13 at 06:16
  • Just a thought. Few processes have task directory containing few more running tasks. They also consume swap spaces. Try adding them too. e.g. /proc/12345/task/12346/smaps – SHW Oct 21 '13 at 07:27
  • The issue you are observing isn't actually due to swap space being unaccounted for. The "(deleted)" that the kernel sometimes appends to /proc/*/exe links is output by readlink and is causing parse errors in your awk script, and you are effectively not counting processes whose binaries are no longer present in your total. The ongoing discussion about swap/caching/etc. going on here is a red herring. The more accurate question would have been "Why is my measurement of total swap usage different than what free reports?". – Jason C Oct 23 '13 at 00:55
  • That's my question too - care to comment?: http://superuser.com/questions/603655/on-centos-6-3-why-do-several-swap-measuring-methods-return-different-results. – Chris Card Dec 02 '13 at 08:43

2 Answers2

13

The difference you are observing isn't actually due to swap space being unaccounted for. The "(deleted)" that the kernel sometimes appends to /proc/*/exe links is output by readlink and is causing parse errors in your awk script, and you are effectively not counting processes whose binaries are no longer present in your total.

Some kernels append the word "(deleted)" to /proc/*/exe symlink targets when the original executable for the process is no longer around.

The reason your command is showing less than the total is because of this. The output of readlink on such links will be something like "/path/to/bin (deleted)", which causes a parse error in awk when the output is substituted back into the string (it doesn't like the parentheses and spaces). For example, do this:

for a in /proc/*/exe ; do readlink $a ; done | grep deleted

And you will see a few entries with "(deleted)" appended. If you looked at the swap usage for these entries, their total would match the discrepancy you see, as the resulting awk errors prevent their totals from being calculated and included in the final total.

If you run your original command without redirecting stderr anywhere, you will probably notice a few "runaway string constant" errors. Those errors are a result of the above and you should not have ignored them.

Ignoring other potential improvements to your original command, you could modify it by removing the " (deleted)", like this (note |awk '{print $1}' added to readlink output):

for proc in /proc/*; \
  do cat $proc/smaps 2>/dev/null | awk '/Swap/{swap+=$2}END{print swap "\t'`readlink $proc/exe|awk '{print $1}' `'" }'; \
done | sort -n | awk '{total+=$1}/[0-9]/;END{print total "\tTotal"}'

This use of awk to fix the output of readlink may break if the name contains spaces -- you can use sed or whatever method you prefer.

Bonus Info

By the way, you could just use smem -t. The "Swap" column displays what you want.

As for calculating it yourself, though, you can also get this information more directly from the VmSwap field in /proc/*/status (smaps requires some kernel support and isn't always available), and avoid having to redirect error output by using a proper filename pattern that avoids the errors to begin with:

for proc in /proc/[0-9]*; do \
  awk '/VmSwap/ { print $2 "\t'`readlink $proc/exe | awk '{ print $1 }'`'" }' $proc/status; \
done | sort -n | awk '{ total += $1 ; print $0 } END { print total "\tTotal" }'

If you don't need the actual binary and can deal with just having the process name, you can get everything from status:

for a in /proc/*/status ; do \
  awk '/VmSwap|Name/ { printf $2 " " } END { print "" }' $a ; \
done | awk '{ total+=$2 ; print $0 } END { print "Total " total }'

And finally, if just having the PIDs suffices, you can just do it all with awk:

awk '/VmSwap/ { total += $2; print $2 "\t" FILENAME } END { print total "\tTotal" }' /proc/*/status

Note:

Now this isn't to say that there aren't differences between free and smem (the latter being the same as your script). There are plenty (see, for example, https://www.google.com/search?q=smem+free, which has more than enough results on the first page to answer your questions about memory usage). But without a proper test, your specific situation cannot be addressed.

Jason C
  • 1,383
  • 3
  • 14
  • 29
  • 1
    The smem -t incorrectly computes swap usage, too. For example, if you run Apache with lots of childs for a long time the smem -t may report each child using e.g. 40 MB of swap and the total swap reported exceeds the available swap space. In reality, Apache has swapped ~40 MB and every forked child is reported to use the same swap area. Basically the problem is similar to Rss vs Pss data in /proc/*/smaps. – Mikko Rantalainen May 08 '20 at 07:50
7

Swap is also used by the tmpfs if the kernel needs more free ram or simply because it's unused for some time... so any tmpfs usage might consume swap.

higuita
  • 585