1

I have a fortran program which I compiled myself and I ran the executable hundreds of times (without recompiling or anything), but now when I run it, it crashes instantly with segmentation fault. There are three other instances of the program running right now. top outputs the following:

top - 15:37:06 up 5 days,  1:06,  2 users,  load average: 3,00, 3,01, 3,06
Tasks: 290 total,   4 running, 285 sleeping,   0 stopped,   1 zombie
%Cpu(s): 24,4 us,  0,0 sy,  0,0 ni, 75,5 id,  0,1 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem :  8058952 total,  2409096 free,  2964692 used,  2685164 buff/cache
KiB Swap:  8263676 total,  8263676 free,        0 used.  4614096 avail Mem 

PID USER   PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                      
1230 user  20   0 12,329g 675720   3080 R 100,0  8,4  14:17.45 tetramer                                                                    
1236 user  20   0 12,329g 675688   3052 R 100,0  8,4  13:58.96 tetramer                                                                    
1234 user  20   0 12,329g 675800   3168 R 100,0  8,4  14:02.23 tetramer                                                                                                                                                     

It does use a lot of memory (at least virtual memory), but up till now it used to be possible to run several instances at the same time as long as the actual memory usage was low enough. The fortran code in question follows, it crashes before getting to the write.

   IMPLICIT REAL*8(A-H,O-Z)
c
  PARAMETER ( np = 220 )
c
  PARAMETER ( ndim = 25000)
  PARAMETER ( ndim2 = ndim*(ndim+1)/2 )
C
  DIMENSION  array(np,6,6),array2(np)
c
  DIMENSION  vector(50), vector2(50)
  DIMENSION  v1(159,30001),v2(159,30001),v3(159,30001)
C
  COMMON /PARM/com1(99000) ,com2(0:8,0:8,99000)
 1        ,com3(0:8),com4(0:8,0:8,0:8),nmax,mmax
 1        ,com5(0:8,0:8)
C
  COMMON /SET/ AX(0:4,-4:4,50),AY(0:4,-4:4,50),AZ(0:4,-4:4,50)
 1          ,DD(0:4,-4:4,50), dd2(0:4,-4:4), nmax0(0:4,-4:4)
C
  DIMENSION  AH( ndim2 ),AF( ndim2 ),AF2(ndim2)
  DIMENSION  E( ndim ),VEC( ndim,ndim)
  DIMENSION   AH2(ndim,ndim),TEMP(ndim,ndim)
  dimension nbarray(6)
C
  CHARACTER*1   PARI
C
  write(6,*) ' ######   ##### '

I really have no idea why I am getting a segmentation fault all of a sudden. As far as I can tell, I am not even accessing any memory in the program yet (just allocating), so how can I get a segmentation fault?

Also, when piping the output of the program into a perlskript, I got a SIGPIPE for some reason, although it was the fortran program that crashed, not the perlskript.

Does anyone have any idea what might be happening here and how I can fix it?

I am running ubuntu 16.04 if that's relevant.

Edit: requested outputs are:

~$ ldd ./tetramer 
    not a dynamic executable

~$ strace ./tetramer 
    execve("./tetramer", ["./tetramer"], [/* 32 vars */]) = -1 ENOMEM (Cannot allocate memory)
    --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
    +++ killed by SIGSEGV +++
    Segmentation fault (core dumped)

I also did some testing and it's always the fourth instance of the program that crashes with segmentation fault. I recently did a reinstall (I wiped some older ubuntu and installed 16.04), and it may be that under 16.04 I could only ever run three at a time and didn't notice. The times where I am absolutely sure that there were more than three instances where all before the reinstall.

I think it may have to do with the fact that the program tries to allocate 12gb of memory when the total memory plus swap is only 16gb, but with the parameters I am using right now it only really needs around 1gb (in the RES column), so I don't see why I can't run more than three instances.

fifaltra
  • 625
  • Did you update any packages recently? – fpmurphy Jun 13 '16 at 14:18
  • Any useful information from 'dmesg' command or in /var/log/syslog (or similar files)? You might see OOM ("out of memory") messages. – Stephen Harris Jun 13 '16 at 14:24
  • That's what I updated today: bsdutils grep libblkid1 libfdisk1 libmetacity-private3a libmount1 libsmartcols1 libuuid1 lshw metacity-common mount mtr-tiny thermald util-linux uuid-runtime I am not sure whether the problem occured before today (maybe it happened the first time sometime since Thursday, but I can't really check), and I also don't know how to check when and what I updated recently. – fifaltra Jun 13 '16 at 14:25
  • @StephenHarris: there's no OOM in dmesg, that's the only message today: do_general_protection: 33 callbacks suppressed (plus some messages about random time, whatever that is) – fifaltra Jun 13 '16 at 14:32
  • @StephenHarris: there's also nothing in /var/log/syslog that seems to be connected to this – fifaltra Jun 13 '16 at 14:35
  • @fpmurphy1: see above, not sure I you get notified if I don't include your name... – fifaltra Jun 13 '16 at 14:36
  • To find out what libraries it uses, run ldd /path/to/your/fortran/program. Then use apt-file to find out which packages the libs belong to, and then something like zgrep ' upgrade PACKAGENAME ' /var/log/dpkg.log* to find out when they were last upgraded. Also run your program with strace - that will tell you exactly where it is crashing...add the last 10 or 20 lines of the strace output to your question. – cas Jun 14 '16 at 07:03
  • Try adding, e.g., 12 or 24 GB of swap file and see if it will run. If what you say about using only 1GB RES is correct, it'll never be used but it may allow you to run a few more simultaneous instances. – cas Jun 15 '16 at 00:10
  • @cas: Thanks for the tip, I just added a 16GB swap file and now I can run 6 instances simultaneously. – fifaltra Jun 15 '16 at 14:38

0 Answers0