4

I have a dir with a gigantic amount of very small files that I want to remove and simply removing the dir with rm -rf /path/to/the/dir is already taking multiple days.

It might sound strange that this is going slow, but the dir is not a dir on regular filesystem. It's a dir on a Lustre Filesystem of a cluster.

I'm running the rm command on node A of the cluster which has the Lustre mounted, but the backend of the Lustre are 2 ZFS filesystems, one on node B and one on node C so all the networktraffic might be the cause of rm going slow.

Does anyone know faster ways to remove the dir than my way ?

U880D
  • 1,146
  • 1
    If you need to wait until the directory has been removed so that you can populate it with new files, one workaround might be mv /path/to/the/dir /path/to/the/dir.DELETEME && nohup rm -rf /path/to/the/dir.DELETEME & – Chris Davies Jun 12 '16 at 13:03
  • 3
  • 2
    This is not a duplicate. In this case rm is slow because the filesystem is remote. POSIX doesn't provide any help beyond rm -rf but a Lustre-specific technique for deleting the files faster would be handy. – Celada Jun 12 '16 at 15:17
  • Using a list of files from find/lfs find is good because it avoids LDLM lock ping-pong between filename generation and unlinking the names from the directory. However, using a separate 'rm' for each file can be inefficient. Also, GNU 'rm' is inefficient because it will 'stat()' each file before unlinking it. It is better to use 'unlink' (which is more efficient than 'rm' but can only delete a single file), or use 'lfs find /path/to/dir -print0 | xargs -0 rm' to unlink many files per 'rm' invocation. You can also 'cat rmlist | xargs rm' if the files have no spaces in them. – LustreOne May 10 '18 at 16:13

3 Answers3

7

Several GNU commands, such as tar and rm are inefficient when operating on a large class of files on Lustre. For example, with millions of files, rm -rf * may take days, and have a considerable impact on Lustre for other users.
The reason lies in the time it takes to expand the wild card.

A better way to do this is to generate a list of files to be removed or tar-ed, and to act them one at a time, or in small sets.

A good way to review files before they are deleted is the following:

$ lfs find <dir> -t f > rmlist.txt  
$ vi rmlist.txt  
$ sed -e 's:^:/bin/rm :' rmlist.txt > rmlist.sh  
$ sh rmlist.sh    

# the directory structure will remain, but unless there are many directories, we can simply delete it:  
$ rm -rf <dir>  

Some of the useful references for Lustre IO:
1. https://www.nics.tennessee.edu/computing-resources/file-systems/io-lustre-tips
2. https://www.rc.colorado.edu/support/examples-and-tutorials/parallel-io-on-janus-lustre.html

Thanks!

2

Use munlink:

find -P $dir -type f -o -type l -print0 | xargs -0 munlink

... and remove the empty directories:

find -P $dir -depth -type d -empty -delete

I updated the find with more args. Reference: https://support.pawsey.org.au/documentation/display/US/Deleting+Large+Numbers+of+Files+on+Lustre+Filesystems

Atisom
  • 71
0

Since I haven't got enough reputation so far I can't comment on @Atisom's solution, hence a new answer:

The find command shown in Atisom's solution is not working as find will match either -type f or -type l -print0

To make it work, add parentheses instead:

find -P $dir \( -type f -o -type l \) -print0 | xargs -0 munlink
AdminBee
  • 22,803