2

I'm trying to debug the issue that's periodically taking down my work machine. It's an OOM-type problem, and based on this /proc/meminfo output:

Total:       32902608 kB
MemFree:         4054100 kB
Buffers:           72696 kB
Cached:         18126492 kB
SwapCached:        57124 kB
Active:          4351400 kB
Inactive:       17526856 kB
Active(anon):    3737244 kB
Inactive(anon): 17065796 kB
Active(file):     614156 kB
Inactive(file):   461060 kB
Unevictable:          64 kB
Mlocked:              64 kB
SwapTotal:      11717628 kB
SwapFree:        4202224 kB
Dirty:               680 kB
Writeback:             0 kB
AnonPages:       3622052 kB
Mapped:           457156 kB
Shmem:          17123968 kB
Slab:            4184848 kB
SReclaimable:    1822044 kB
SUnreclaim:      2362804 kB
KernelStack:        7032 kB
PageTables:        55828 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    28168932 kB
Committed_AS:   37689696 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      153644 kB
VmallocChunk:   34359467188 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2105344 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      387924 kB
DirectMap2M:    33120256 kB

the thing that sticks out is that 17GB Shmem usage, at least as far as I can tell. It grows erratically, and is somehow associated with Chrome usage - it seems like closing tabs can cause the fairly rapid growth (maybe ~300MB/min) to stop, but reopening the same tabs does not seem to restart the growth (inconclusive, need to catch a misbehaving tab in the act first).

To my understanding, shmem includes tmpfs usage and GEM object allocation, but as far as I can tell the usage isn't very high (large pulse-shm-#### files, but only totaling 1GB).

In terms of evidence, there's also this suspicious snippet from dmesg:

[320918.676580] [TTM] Out of kernel memory
[320918.676587] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (196608, 2, 4096, -12)
[320918.678451] [TTM] Out of kernel memory
[320918.678454] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (4096, 2, 4096, -12)
[339871.343917] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (393216, 2, 4096, -12)
[339873.207965] chrome[12513]: segfault at 10 ip 00007f48d1f385ea sp 00007ffc1bd73ab0 error 4 in chrome[7f48cfd7b000+5a2d000]

Based on the circumstances, I'm guessing it has something to do with Chrome and perhaps a graphics memory leak of some kind (Radeon driver, Ubuntu trusty), but from here I'm lost on how to debug (or at least bisect, to first figure out whether or not it is a GEM leak or not).

EDIT: Forgot to mention, but ipcs -m also seems fairly boring:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x0052e2c1 0          postgres   600        48         5                       
0x00000000 360449     htung      600        393216     2          dest         
0x00000000 294914     htung      600        393216     2          dest         
0x00000000 1114115    htung      600        524288     2          dest         
0x00000000 655364     htung      600        524288     2          dest         
0x00000000 2850821    htung      600        524288     2          dest         
0x00000000 983046     htung      600        33554432   2          dest         
0x00000000 1015815    htung      600        524288     2          dest         
0x00000000 15663112   htung      600        393216     2          dest         
0x00000000 15040521   htung      600        33554432   2          dest         
0x00000000 1277962    htung      600        393216     2          dest         
0x00000000 1507339    htung      600        1048576    2          dest         
0x00000000 2752524    htung      700        153216     2          dest         
0x00000000 1867789    htung      600        393216     2          dest         
0x00000000 1638414    htung      600        393216     2          dest         
0x00000000 2719759    htung      700        7680000    2          dest         
0x00000000 2293776    htung      700        7680000    2          dest         
0x00000000 1966097    htung      600        1048576    2          dest         
0x00000000 2326546    htung      700        7680000    2          dest         
0x00000000 2359315    htung      700        7680000    2          dest         
0x00000000 2392084    htung      700        7680000    2          dest         
0x00000000 2555925    htung      700        220576     2          dest         
0x00000000 2588694    htung      700        40848      2          dest         
0x00000000 2621463    htung      700        40848      2          dest         
0x00000000 2654232    htung      700        40848      2          dest         
0x00000000 2687001    htung      700        7680000    2          dest         
0x00000000 15826970   htung      600        524288     2          dest         
0x00000000 3735579    htung      600        393216     2          dest         
0x00000000 3506205    htung      600        393216     2          dest         
0x00000000 3932190    htung      600        393216     2          dest         
0x00000000 15106079   htung      600        131664     2          dest         
0x00000000 3866656    htung      600        1048576    2          dest         
0x00000000 3899425    htung      600        393216     2          dest         
0x00000000 4030498    htung      600        524288     2          dest         
0x00000000 4063267    htung      600        393216     2          dest         
0x00000000 8978468    htung      600        1048576    2          dest         
0x00000000 6094885    htung      600        393216     2          dest         
0x00000000 35651622   htung      600        4011000    2          dest         
0x00000000 6127657    htung      600        393216     2          dest         
0x00000000 6488106    htung      600        7206400    2          dest   
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • 2
    Well, behavior largely unchanged, but definitely seem like the leak is Chrome-related, and not related to a specific tab. I have reproduced the leak-in-progress with only one inconsequential tab open, and closing that last tab (and thus Chrome) ends the leak. Reopening Chrome with that same tab does not restart the leak, so Chrome seems to have a rogue thread doing naughty things :( – user508633 Mar 08 '16 at 05:03
  • Did you ever figure out how to debug this any better than just killing random processes until the problem goes away? I have a very similar problem here: https://unix.stackexchange.com/q/666762/20336 – Mikko Rantalainen Aug 29 '21 at 20:10

0 Answers0