1

My application which is based on two Java processes that interchange data over a http connection runs ouf of files and produces this error message:

Aug 14 11:27:40 server sender[8301]: java.io.IOException: Too many open files
Aug 14 11:27:40 server sender[8301]: at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
Aug 14 11:27:40 server sender[8301]: at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
Aug 14 11:27:40 server sender[8301]: at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
Aug 14 11:27:40 server sender[8301]: at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:455)
Aug 14 11:27:40 server sender[8301]: at java.lang.Thread.run(Thread.java:748)

Both processes are under control of SystemD. I checked the processes using cat /proc/5882/limits, the limits are defined like this:

Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             63434                63434                processes
Max open files            4096                 4096                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       63434                63434                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

When I run lsof | grep pid | wc -l I have less than 2000 entries (I run lsof this way because of information retrieved from Discrepancy with lsof command when trying to get the count of open files per process)

I don't have the slightest idea what I could check or increase further.

Marged
  • 701
  • 3
    linux has a variety of knobs https://unix.stackexchange.com/questions/84227 have you tuned them all? – thrig Aug 14 '18 at 13:55
  • Wrap your method into try/catch and dump the number of file handles when an exception is trapped. This is a Q for SO, not USE. – ajeh Aug 14 '18 at 16:49
  • ajeh I search for a method to display the correct number of file handles on Linux level, because of this I didn't post the question on SO – Marged Aug 14 '18 at 17:06
  • You linked to the post about the discrepancy with lsof when trying to get count of open files per process, but why not use the solution shown there? lsof -aKp "$pid" – Wildcard Aug 15 '18 at 05:17
  • @Wildcard both approaches effectively return the same result, one of them is slightly faster My approach enables me to grep for the path I know my files are stored in and find possibly other pids using them too – Marged Aug 15 '18 at 05:28

1 Answers1

3

The best way to tell how many open file descriptors your process has is to use:

$ ls /proc/8301/fd/ | wc -l

(Assuming PID 8301, like in your log.)

Running lsof will traverse the whole /proc tree and will try to resolve the names of all files (these are pseudo-symlinks and need a call to readlink each for resolution), so running lsof will take a long time (depending on how busy your machine is), so by the time you look at the result, it's possible everything has changed already. Using ls /proc/${pid}/fd/ will be quick (only one readdir call), so much more likely to capture something close to the current situation.

Regarding solving the problem, you may want to consider increasing the number of file descriptors allowed to your service, which you can do by setting the LimitNOFILE= directive in your systemd unit file.

filbranden
  • 21,751
  • 4
  • 63
  • 86
  • The limit is 4096, lsof shows less than 2000. I assume raising the limit does not solve the problem – Marged Aug 15 '18 at 05:23
  • @Marged It's hard to tell, depending on how busy your service is, files might be coming and going faster than you can monitor them... Try to monitor it using the ls /proc/${pid}/fd/ | wc -l command, which will give you a much better estimate of the current number of files than running lsof on all processes of the machine. Furthermore, there's really not much of a downside in increasing the open file limit, so bumping it up to 8192 or maybe even something like 32768 might be just fine, regardless... Good luck! – filbranden Aug 15 '18 at 05:30