I'm trying to debug the issue that's periodically taking down my work machine. It's an OOM-type problem, and based on this /proc/meminfo output:
Total: 32902608 kB
MemFree: 4054100 kB
Buffers: 72696 kB
Cached: 18126492 kB
SwapCached: 57124 kB
Active: 4351400 kB
Inactive: 17526856 kB
Active(anon): 3737244 kB
Inactive(anon): 17065796 kB
Active(file): 614156 kB
Inactive(file): 461060 kB
Unevictable: 64 kB
Mlocked: 64 kB
SwapTotal: 11717628 kB
SwapFree: 4202224 kB
Dirty: 680 kB
Writeback: 0 kB
AnonPages: 3622052 kB
Mapped: 457156 kB
Shmem: 17123968 kB
Slab: 4184848 kB
SReclaimable: 1822044 kB
SUnreclaim: 2362804 kB
KernelStack: 7032 kB
PageTables: 55828 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 28168932 kB
Committed_AS: 37689696 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 153644 kB
VmallocChunk: 34359467188 kB
HardwareCorrupted: 0 kB
AnonHugePages: 2105344 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 387924 kB
DirectMap2M: 33120256 kB
the thing that sticks out is that 17GB Shmem
usage, at least as far as I can tell. It grows erratically, and is somehow associated with Chrome usage - it seems like closing tabs can cause the fairly rapid growth (maybe ~300MB/min) to stop, but reopening the same tabs does not seem to restart the growth (inconclusive, need to catch a misbehaving tab in the act first).
To my understanding, shmem includes tmpfs usage and GEM object allocation, but as far as I can tell the usage isn't very high (large pulse-shm-#### files, but only totaling 1GB).
In terms of evidence, there's also this suspicious snippet from dmesg
:
[320918.676580] [TTM] Out of kernel memory
[320918.676587] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (196608, 2, 4096, -12)
[320918.678451] [TTM] Out of kernel memory
[320918.678454] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12)
[339871.343917] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (393216, 2, 4096, -12)
[339873.207965] chrome[12513]: segfault at 10 ip 00007f48d1f385ea sp 00007ffc1bd73ab0 error 4 in chrome[7f48cfd7b000+5a2d000]
Based on the circumstances, I'm guessing it has something to do with Chrome and perhaps a graphics memory leak of some kind (Radeon driver, Ubuntu trusty), but from here I'm lost on how to debug (or at least bisect, to first figure out whether or not it is a GEM leak or not).
EDIT: Forgot to mention, but ipcs -m
also seems fairly boring:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 0 postgres 600 48 5
0x00000000 360449 htung 600 393216 2 dest
0x00000000 294914 htung 600 393216 2 dest
0x00000000 1114115 htung 600 524288 2 dest
0x00000000 655364 htung 600 524288 2 dest
0x00000000 2850821 htung 600 524288 2 dest
0x00000000 983046 htung 600 33554432 2 dest
0x00000000 1015815 htung 600 524288 2 dest
0x00000000 15663112 htung 600 393216 2 dest
0x00000000 15040521 htung 600 33554432 2 dest
0x00000000 1277962 htung 600 393216 2 dest
0x00000000 1507339 htung 600 1048576 2 dest
0x00000000 2752524 htung 700 153216 2 dest
0x00000000 1867789 htung 600 393216 2 dest
0x00000000 1638414 htung 600 393216 2 dest
0x00000000 2719759 htung 700 7680000 2 dest
0x00000000 2293776 htung 700 7680000 2 dest
0x00000000 1966097 htung 600 1048576 2 dest
0x00000000 2326546 htung 700 7680000 2 dest
0x00000000 2359315 htung 700 7680000 2 dest
0x00000000 2392084 htung 700 7680000 2 dest
0x00000000 2555925 htung 700 220576 2 dest
0x00000000 2588694 htung 700 40848 2 dest
0x00000000 2621463 htung 700 40848 2 dest
0x00000000 2654232 htung 700 40848 2 dest
0x00000000 2687001 htung 700 7680000 2 dest
0x00000000 15826970 htung 600 524288 2 dest
0x00000000 3735579 htung 600 393216 2 dest
0x00000000 3506205 htung 600 393216 2 dest
0x00000000 3932190 htung 600 393216 2 dest
0x00000000 15106079 htung 600 131664 2 dest
0x00000000 3866656 htung 600 1048576 2 dest
0x00000000 3899425 htung 600 393216 2 dest
0x00000000 4030498 htung 600 524288 2 dest
0x00000000 4063267 htung 600 393216 2 dest
0x00000000 8978468 htung 600 1048576 2 dest
0x00000000 6094885 htung 600 393216 2 dest
0x00000000 35651622 htung 600 4011000 2 dest
0x00000000 6127657 htung 600 393216 2 dest
0x00000000 6488106 htung 600 7206400 2 dest