2

I develop a system (x86_64 based) that runs Linux (Ubuntu 14.04.3) and has several pieces of custom built hardware connected. I have written drivers and control software for the custom hardware.

After running tests over an extended period of time (several days) on this system I noticed that the amount of free memory (as shown by cat /proc/meminfo and the free command) was steadily declining. The amount of memory used by my user space processes was well within reason. I cleared the page cache but it had little effect on the amount of free memory.

I rebooted the system and then wrote a script to run a simple test in a loop 500 times, clear the page cache, and then write the output of /proc/meminfo to a file. I then ran this script over several days to collect some data points. After analyzing the resulting data I noticed that the amount of free memory (MemFree, MemAvailable also) steadily declines in a linear fashion while the other categories remained roughly flat. I collected way too much data to post here, but here are the first and last data points:

                       _Run 1_     _Run 650_
MemTotal:              65738276    65738276
MemFree:               65182220    30881420
MemAvailable:          65124632    30824008
Buffers:                   2292        2064
Cached:                  101204      100816
SwapCached:                   0           0
Active:                  174772      195008
Inactive:                 82924       82444
Active(anon):            154304      174712
Inactive(anon):           66624       66628
Active(file):             20468       20296
Inactive(file):           16300       15816
Unevictable:                  0           0
Mlocked:                      0           0
SwapTotal:             66978812    66978812
SwapFree:              66978812    66978812
Dirty:                      188         184
Writeback:                    0           0
AnonPages:               154296      180848
Mapped:                   99760       99360
Shmem:                    66672       66676
Slab:                     46836       48352
SReclaimable:             17000       18008
SUnreclaim:               29836       30344
KernelStack:               4176        4128
PageTables:                7244        6680
NFS_Unstable:                 0           0
Bounce:                       0           0
WritebackTmp:                 0           0
CommitLimit:           99847948    99847948
Committed_AS:            433008      417576
VmallocTotal:       34359738367 34359738367
VmallocUsed:            1886988     1886956
VmallocChunk:       34357817344 34357817344
HardwareCorrupted:            0           0
AnonHugePages:           106496      126976
CmaTotal:                     0           0
CmaFree:                      0           0
HugePages_Total:              0           0
HugePages_Free:               0           0
HugePages_Rsvd:               0           0
HugePages_Surp:               0           0
Hugepagesize:              2048        2048
DirectMap4k:              93872       93872
DirectMap2M:            1894400     1894400
DirectMap1G:           67108864    67108864

Notice that a tiny bit more than half of the system's 64GB of memory is consumed somehow while there doesn't appear to be a significant increase in any other category.

Here are some of the things I've done to try to figure out the issue:

  • I immediately suspected a kernel memory leak probably cause by one of my drivers, so I audited all of my driver code. I didn't find any obvious problems.
  • I also enabled the kernel's kmemleak checker and reran the previously mentioned test script. It did not find any leaks.
  • Most recently I enabled the page_owner debug feature of the kernel and reran my script while collecting periodic output. After sorting and then diffing the outputs I see a few deltas (by my calculation, the total delta is 10,872 pages or ~42MB) but nothing anywhere near the ~30 GB that is consumed.

I am thoroughly stumped and fast running out of ideas.

Does anyone have any idea as to what is going on and/or suggestions as to how to figure it out?

Dave
  • 121
  • 2
    It's the mechanism of linux for using the memory. Please read this article http://linuxatemyram.com . You can free up the memory with free && sync && echo 3 > /proc/sys/vm/drop_caches && free , see at http://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system – magor Jan 25 '17 at 18:36
  • Did you read my post? As I mentioned above, this is not caused by the page cache, clearing the page cache has no effect. – Dave Jan 25 '17 at 18:41
  • Yeah I've read it. If you cleared the cache, and still have the memory occupied, with top sorted by memory (M) or ps aux --sort -rss | head -2 you should see what process is using the memory I guess. – magor Jan 25 '17 at 19:02
  • Neither of those show anything consuming anywhere near 30 GB of memory... as you can see from the /proc/meminfo output comparison above. – Dave Jan 25 '17 at 19:14
  • Is it possible to test with removing custom drivers / custom hardware? – George Vasiliou Jan 25 '17 at 19:38
  • Unfortunately not, the tests utilize that hardware. I guess I should mention that stopping all my software and removing all the drivers also has no effect on the amount of free memory. – Dave Jan 25 '17 at 19:45
  • can you add the output of free and top sorted by memory (M key) ? – magor Jan 26 '17 at 10:18

0 Answers0