2

So, I've been able to figure out bits of this myself but am having trouble piecing them together. I have a task I need to automate - I have folders filled with gigabytes of obsolete files, and I want to purge them if they meet two criteria.

  1. Files must not have been modified in the past 14 days - for this, I'm using find -

find /dir/* -type f -mtime +14

  1. And the files cannot be in use, which can be determined by

lsof /dir/*

I don't know bash quite well enough yet to figure out how to combine these commands. Help would be appreciated. I think I essentially want to loop through each line of output from find, check if it is present in output from lsof, and if not, rm -f -- however, I am open to alternative methodologies if they accomplish the goal!

3 Answers3

4

The following should work:

for x in `find <dir> -type f -mtime +14`; do lsof "$x" >/dev/null && echo "$x in use" || echo "$x not in use" ; done

Instead of the echo "$x not in use" command, you can place your rm "$x" command.

How does it work:

  • find files, last modified 14 days or longer ago:

find <dir> -type f -mtime +14

  • loop over items in a list:

for x in <list>; do <command>; done

  • execute command 2 if result of lsof is 0, else execute command 1:

lsof "$x" && <command 1> || <command 2>

This relies on the lazy evaluation of Bash to execute command 1 or command 2.

On my system (Ubuntu 14.04) this works with file names with spaces in them and even for file names with ? and * in them.
This is however no guarantee that it will work with every shell on any system. Please test before replacing the echo command with the rm command.

NZD
  • 1,422
  • Just so I understand - where you've used do lsof $x >/dev/null - this is basically the unix way of saying if it's in use, do nothing by sending output to /dev/null - and if not, do this other command (in my case, rm -f) – tastyCIDR Nov 14 '15 at 15:59
  • 1
    @starseed In this case I'm only interested in the exit code of lsof to select between command 1 and command 2. The output of lsof would only clutter the output of the for-loop, so that's why I redirect it to the 'sink'. – NZD Nov 14 '15 at 19:52
  • 1
    @starseed If you run lsof on a file and the file is in use, its exit code is zero. You can check this by running command echo $? immediately after the lsof command. That will print the exit code of the previous command. If the file is not in use, its exit code is one. You can again check this by running command echo $? immediately after the lsof command. – NZD Nov 14 '15 at 19:56
  • 1
    @starseed Beware that the code in this answer breaks down completely if any file name contains whitespace or wildcards. – Gilles 'SO- stop being evil' Nov 14 '15 at 22:55
  • Good to know, if it won't work with whitespace then I'd best not use it - not very airtight =\

    Thanks!

    – tastyCIDR Nov 15 '15 at 06:18
  • @starseed @Gilles If you replace $x with "$x" it will handle file names with spaces and on my system (Ubuntu 14.04) also with ? and * in file names. It treats ? and * as normal characters. – NZD Nov 15 '15 at 21:07
  • Ended up going with this. – tastyCIDR Nov 16 '15 at 14:39
2

Use the -exec action in find to execute a command for each file. This executes a program with arguments; if you need a more complex command (with variable expansion, conditionals, etc.) then you need to invoke a shell explicitly:

find /dir/* -type f -mtime +14 -exec sh -c '
  if …; then
    rm "$0"
  fi
' {} \;

To test whether a file is currently open, the most straightforward way is to call fuser.

find /dir/* -type f -mtime +14 -exec sh -c '
  if ! fuser "$0" >/dev/null 2>/dev/null; then
    rm "$0"
  fi
' {} \;

Beware that just because a file hasn't been modified in a long time and isn't currently open doesn't mean that it isn't useful. I recommend at least testing that the file hasn't been read in a while; this can be tested with the access time, but beware that Linux systems don't update the access time reliably. (Whether they do, and how often, depends on the kernel version, on the mount options, and on how the access time compares with the modification time.)

I recommend reviewing the files before deleting them.

find /dir/* -type f -mtime +14 -atime +14 -exec sh -c '
  if ! fuser "$0" >/dev/null 2>/dev/null; then
    echo "$0"
  fi
' {} \; >files-to-delete-potentially.txt

Review the file names, and erase the ones you want to keep. Then to remove them all, assuming none of the file names contains a newline character, you can use

<files-to-delete-potentially.txt tr '\n' '\0' | xargs -0 rm
  • Reviewing the files to be deleted isn't really possible for my use case - I'm trying to automate this process, which is why it's important I get it right! This is going to run on quite a few servers on a regular basis. – tastyCIDR Nov 15 '15 at 15:27
  • can you explain this line? if fuser "$0" >/dev/null 2>/dev/null; I think I know what it's trying to accomplish but I do not quite understand how. – tastyCIDR Nov 15 '15 at 16:18
  • 1
    @starseed fuser returns 0 if the file is open by at least one process (and lists the processes but we don't care about that), and 1 otherwise. Oops, the test should be the other way round. – Gilles 'SO- stop being evil' Nov 15 '15 at 16:27
1

Instead of using lsof which has a nightmare tangle of options and interesting ouput to parse, I suggest using the -atime or -amin options to find. These let you specify the file access time in days or seconds, respectively.

Instead of using another process to find out if a file is currently "in use", you can check to see if was accessed within the last N seconds or days.

The following command lists all files which were modified more than 14 days a ago and accessed less than 60 seconds ago.

find "$dir" -type f -mtime +14 -amin -60

To remove the files that match this criteria, you can use find's -exec command ; command. The strange thing about this is that you specify each command argument separately to find and terminate with a semicolon (;). If {} appears in any argument, it is replaced by the name of the name of the file being processed.

This command removes all files created more than 14 days ago, and not accessed within the last 60 minutes:

find "$dir" -type f -mtime +14 -amin -60 -exec rm '{}' \;
RobertL
  • 6,780
  • I'm having an issue with this that I don't quite understand - testing this on OSX (not Ubuntu, the target environment, in case it's relevant). I have a folder I'm using to test with some PDFs in it - half of which were last modified more than two days ago. I have one open in Preview to make sure it is being accessed. However:

    find ~/dir/ -type f ! -mtime -2 ! -amin -10

    Which should return a list of every file not modified within the past two days and which has not been accessed within the past ten minutes - is returning the PDF I have open. I think -amin only catches when it was opened?

    – tastyCIDR Nov 14 '15 at 15:49
  • Did you open the PDF more that 10 minutes ago? Test lsof on the file also. If the PDF reader opens the file, reads it, and then closes, then that will be the last access time. lsof will not be able to detect this either. If you are only concerned about removing a file that's currently open by another process, you should know that if a process has the file open and you rm the file, the file continues to exist on disk (without the name in the file system), until the last process closes the file. – RobertL Nov 14 '15 at 19:09
  • 1
    Beware that many Linux systems don't update the atime. (It depends on your kernel version and mount options, and some combinations can cause some files to be updated but not others depending on access patterns.) – Gilles 'SO- stop being evil' Nov 14 '15 at 22:55
  • @Gilles I came across another stack question that indicated this - I suspected that was my problem. This forces me to use lsof for this use case. Thanks! – tastyCIDR Nov 15 '15 at 15:13
  • @RobertL that's good to know, but yeah I think I'm going to go with something still using lsof to make sure it can't be a problem - I fear edge cases when this is going to run on so many servers (dozens at the moment, ever expanding) – tastyCIDR Nov 15 '15 at 15:15