Remove directories sequentially vs. simultaenously in background

Question

Part of a script I am writing utilizes rm -rf to completely remove two directories.

These directories are relatively large and can take a considerable amount of time to remove.

Currently the directories are removed sequentially:

rm -rf dir1
rm -rf dir2

Would it be any faster to remove them simultaneously in the background via:

rm -rf dir1 &
rm -rf dir2 &
wait

If so why?

@Rahul I am aware of that method, but does that produce different results than my second method above? — John, Jun 23 '16 at 19:55
@Rahul That is more or less the same as two invocations of rm in sequence. — Kusalananda, Jun 23 '16 at 19:55

score 3 · Answer 1 · answered Jun 23 '16 at 20:05

It depends.

If the files being removed are in the same file system and hardware device they will eventually be sequential anyways, because the operating system will wait the physical resource to do the actual operations in the hardware. Although each instance of rm will queue the operations so one is available when the other finishes, don't expect a big improvement from this though.

If the rm occurs on files that are in 2 different FS / Hardware devices it will effectively be parallel and asynchronous, so it will be twice as fast.

score 1 · Answer 2 · answered Jun 23 '16 at 21:38

Like alx741 said I don't think you're going to get any real benefit one way or the other unless they're on separate file systems. I did some testing with a 700MB file. Here are my tests that backup my thoughts. I don't have multiple local partitions to play with so I can't test that.

Here it is as one command sequentially.

time rm -f test.dat1 test.dat2
real    0m0.297s
user    0m0.000s
sys     0m0.295s

Here it is simultaneously

time rm -f test.dat1 &
time rm -f test.dat2 &
real    0m0.145s
user    0m0.000s
sys     0m0.144s

real    0m0.150s
user    0m0.000s
sys     0m0.150s

Here they are in sequence again as two separate commands.

time rm -f test.dat1
time rm -f test.dat2
real    0m0.146s
user    0m0.000s
sys     0m0.146s

real    0m0.153s
user    0m0.000s
sys     0m0.152s

score 0 · Answer 3 · answered Jun 24 '16 at 08:40

It's theoretically possible that the simultaneous (concurrent/parallel) execution might be slower. It's conceivable that the operating system might keep each directory clustered. I.e., the contents of dir1 might have inode numbers that are close to each other, and use data blocks that are close to each other. And the same might be true for dir2. But the contents of dir1 might not be close to the contents of dir2. (This may depend on the operating system version, the file system type, and the history of how the directories were created.) If this is the case (the two directories are not close together), and the filesystem is on a disk (HDD) that requires physically seeking I/O heads, then the simultaneous execution might require more seeking (→ thrashing) than the sequential execution.

Remove directories sequentially vs. simultaenously in background

3 Answers3