I develop a system (x86_64 based) that runs Linux (Ubuntu 14.04.3) and has several pieces of custom built hardware connected. I have written drivers and control software for the custom hardware.
After running tests over an extended period of time (several days) on this system I noticed that the amount of free memory (as shown by cat /proc/meminfo
and the free
command) was steadily declining. The amount of memory used by my user space processes was well within reason. I cleared the page cache but it had little effect on the amount of free memory.
I rebooted the system and then wrote a script to run a simple test in a loop 500 times, clear the page cache, and then write the output of /proc/meminfo
to a file. I then ran this script over several days to collect some data points. After analyzing the resulting data I noticed that the amount of free memory (MemFree, MemAvailable also) steadily declines in a linear fashion while the other categories remained roughly flat. I collected way too much data to post here, but here are the first and last data points:
_Run 1_ _Run 650_
MemTotal: 65738276 65738276
MemFree: 65182220 30881420
MemAvailable: 65124632 30824008
Buffers: 2292 2064
Cached: 101204 100816
SwapCached: 0 0
Active: 174772 195008
Inactive: 82924 82444
Active(anon): 154304 174712
Inactive(anon): 66624 66628
Active(file): 20468 20296
Inactive(file): 16300 15816
Unevictable: 0 0
Mlocked: 0 0
SwapTotal: 66978812 66978812
SwapFree: 66978812 66978812
Dirty: 188 184
Writeback: 0 0
AnonPages: 154296 180848
Mapped: 99760 99360
Shmem: 66672 66676
Slab: 46836 48352
SReclaimable: 17000 18008
SUnreclaim: 29836 30344
KernelStack: 4176 4128
PageTables: 7244 6680
NFS_Unstable: 0 0
Bounce: 0 0
WritebackTmp: 0 0
CommitLimit: 99847948 99847948
Committed_AS: 433008 417576
VmallocTotal: 34359738367 34359738367
VmallocUsed: 1886988 1886956
VmallocChunk: 34357817344 34357817344
HardwareCorrupted: 0 0
AnonHugePages: 106496 126976
CmaTotal: 0 0
CmaFree: 0 0
HugePages_Total: 0 0
HugePages_Free: 0 0
HugePages_Rsvd: 0 0
HugePages_Surp: 0 0
Hugepagesize: 2048 2048
DirectMap4k: 93872 93872
DirectMap2M: 1894400 1894400
DirectMap1G: 67108864 67108864
Notice that a tiny bit more than half of the system's 64GB of memory is consumed somehow while there doesn't appear to be a significant increase in any other category.
Here are some of the things I've done to try to figure out the issue:
- I immediately suspected a kernel memory leak probably cause by one of my drivers, so I audited all of my driver code. I didn't find any obvious problems.
- I also enabled the kernel's kmemleak checker and reran the previously mentioned test script. It did not find any leaks.
- Most recently I enabled the page_owner debug feature of the kernel and reran my script while collecting periodic output. After sorting and then diffing the outputs I see a few deltas (by my calculation, the total delta is 10,872 pages or ~42MB) but nothing anywhere near the ~30 GB that is consumed.
I am thoroughly stumped and fast running out of ideas.
Does anyone have any idea as to what is going on and/or suggestions as to how to figure it out?
free && sync && echo 3 > /proc/sys/vm/drop_caches && free
, see at http://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system – magor Jan 25 '17 at 18:36top
sorted by memory (M) orps aux --sort -rss | head -2
you should see what process is using the memory I guess. – magor Jan 25 '17 at 19:02/proc/meminfo
output comparison above. – Dave Jan 25 '17 at 19:14free
andtop
sorted by memory (M key) ? – magor Jan 26 '17 at 10:18