1

My code needs to go through files in a directory, picking only those, which are currently opened (for writing) by any other process on the system.

The ideal solution would apply for all Unixes, but I'll settle for a Linux-only.

The program is written in Python, but I can add a custom C-function, if I have to -- I just need to know, what API is available for this...

One suggestion I found was to go through all file-descriptors under Linux /proc, resolving their links to see, if they point at the file of interest. But that seems rather heavy...

I know, for example, that opening a file increases its reference count -- filesystem will not deallocate blocks of an opened file even if it is deleted -- until it is closed -- the feature relied upon by tmpfile(3).

Perhaps, a user process can get access to these records in the kernel?

  • 2
    lsof does this. Download the source for lsof and read it. – waltinator May 08 '21 at 18:06
  • 1
    lsof does this and probably does it by reading /proc :) – ilkkachu May 08 '21 at 18:06
  • Yeah, lsof -- and fuser -- scan /proc. But that yields more information than I need -- I don't care, which processes have the file open. I just want to know, whether any such exist. Perhaps, this information can be obtained more cheaply, than /proc rescan? – Mikhail T. May 08 '21 at 18:23
  • The advantage of scanning /proc is that it is backed by direct kernel calls, not a physical file system. That gives /proc a huge performance advantage over opening and reading a directory, even just to find the names. – Paul_Pedant May 08 '21 at 18:34
  • The advantage of scanning /proc is that it is the only way to get the information without modifying the kernel. – symcbean May 08 '21 at 23:25
  • Why can't you just use lsof on the file and then analyze the return code of said command? – cutrightjm May 09 '21 at 01:50
  • @cutrightjm, exec-ing lsof is quite hideous, but, yes, I could do the same scan of /proc as lsof appears to be performing. But I'm looking for something less expensive than checking every process whether it has the file open. I know, kernel already has the count inside somewhere. – Mikhail T. May 09 '21 at 03:36

2 Answers2

1

On Linux, /proc/<pid>/fd/ contains a list of symlinks to files held open by <pid>. This means you can quickly and easily build a list of files open at this moment in time by checking what they link to.

This isn't as "heavy" as you think. e.g. even running the bash while/read loop below only took about 1.5 seconds on my ancient AMD Phenom-II 1090T with ~1000 processes currently running.

In bash, you could build an associative array with something like:

declare -A openfiles

while IFS=$'\n' read l; do openfiles[$l]=1 done < <(find /proc/*/fd/ -type l -printf '%l\0' | grep -zvE '^(socket|pipe|anon_inode):' | sort -zu)

(This is just a simple example, completely unoptimised. It wouldn't be hard at all to optimise it)

and then check whether a file is open with:

if [ "${openfiles[full-path-to-file]}" == 1 ] ; then .... ; fi

In python, you could use os.walk() and os.readlink() to build a dict. Or use the proc, procfs or psutil module.

psutil is cross-platform and has an open_files() method which seems like it would be useful here.

Note: you could do this with lsof, but lsof is extraordinarily slow. It does a lot more than what you need for this job.

cas
  • 78,579
  • Everyone seems to suggest this, and, I guess, this is, what I'll have to do. The thing is, such a scan gives a lot more information, than I need -- it will tell me, which processes have "my" file open, for example -- whereas I'm only looking for whether there are any such... – Mikhail T. May 09 '21 at 19:35
  • @MikhailT. you don't need to use the extra info. The example bash algorithm above doesn't, all it does is note the fact that a filename is opened by something,openfiles[$l]=1, so all it ends up with is a list of open files. – cas May 10 '21 at 01:04
  • though I don't need to use it, I'm paying the price of collecting it. The kernel already has the information I'm looking for -- for each file opened, there is a reference count so that filesystem does not reuse the underlying blocks even if the file is deleted... I just can't figure out, how to access it this information. – Mikhail T. May 10 '21 at 01:07
  • So, take a pragmatic approach - implement the obvious method you know now. Replace it with something better (or more aesthetically pleasing) later, when you've figured out if it's possible and how to do it. BTW, there's no guarantee that the kernel or any filesystem drivers even makes that count available to external callers for each individual file (you can get a total open files count from proc/sys/fs/file-nr, but not a list of files). And the answer might be different for each different fs (zfs, for example, doesn't seem to export any info to /sys or /proc/sys/fs). – cas May 10 '21 at 01:33
  • Such a list may not be available, because no one has ever requested it - most people, for example, want to know about open files in the context of a particular process. e.g. they want to know which pid(s) are preventing them from unmounting a filesystem. and it might not be worth implementing because anyone who wants such a list can always trawl through /proc like lsof does (or like my answer above, which does much less but is faster, 1.5 seconds on my system for my bash code, 15 seconds for lsof. and the same algorithm would be even faster in perl or python or anything that wasn't shell) – cas May 10 '21 at 01:47
-3

fnctl is what you are looking for. Its man page is quite elaborate. It tells you if an open file is read only or write as well and returns -1 for closed files. It is powerful and can give much more details like (flawed) lock mechanism.

#include <fcntl.h>

int fcntl(int fd, int cmd, ...);

For your purpose, use the F_GETFL flag as in

int r, fd;
r = fcntl ( fd, F_GETFL );

if (r==-1) printf("File %d is closed.\n", fd); if (r>0) printf("File %d is open.\n", fd);

If the return value is -1, the file is not open (or fd was not a valid file descriptor). If the return value is positive, fd describes an open file.

  • 1
    fcntl on Linux does, indeed, have lots of additional capabilities (compared to other Unixes), but none of them seem to help me here... Leases and locks come closest, but not close enough -- did you have something else in mind? – Mikhail T. May 08 '21 at 20:25
  • I updated my answer with more details to address your problem. – Jona Engel May 10 '21 at 07:15
  • 3
    I think he's looking to find if a file is open, probably in another process, and not if a file descriptor is open in the current process – ilkkachu May 10 '21 at 08:34
  • I see. I missunderstood. In this case there are already answers here: https://unix.stackexchange.com/questions/333186/how-to-list-the-open-file-descriptors-and-the-files-they-refer-to-in-my-curren and here: https://unix.stackexchange.com/questions/66235/how-to-display-open-file-descriptors-but-not-using-lsof-command – Jona Engel May 10 '21 at 11:59
  • No, @2419, the questions you referenced aren't "mine". The first one, again, lists filedescriptors opened by the current bash -- I'm looking for other processes having the specified file open for writing. The second question is about the total number of opened files... – Mikhail T. May 10 '21 at 14:27