I have a backup disk which contains hundreds of backups of the same machine from different dates. The backup was made with rsync and hardlinks, i.e. if a file doesn't change the backup script just makes a hardlink to the file in an older backup. So if a file never changes you have essentially one copy in the backup disk but say 100 hardlinks to it in each directory representing the backup of each date (say back-1
, back-2
, ... back-n
). If I want to thin it out, I delete a subset of them but not all. Suppose I want to delete back_5
, back_6
, ... back_10
(just as an example, in my real szenario there are many more). Then I try to parallize it via:
echo back_5 back_6 back_10 | xargs -n 1 -P 0 rm -rf
This takes multiple hours. So is there any faster way to do this?
-P 0
parallelisation only apply to threads? I wonder what the bottleneck is here… perhaps it's the read/write speed of the drive? If so, then perhaps parallelising it is having a negative effect. – Sparhawk May 21 '16 at 06:31df -h
) just 300GB. But I am not sure ifdf -h
ordu -hs
report correct numbers because of the hardlinks. – student May 21 '16 at 07:02/dev/urandom
, which should take (0.100/300)2460*60≈30 seconds to delete. Then test it with additional hardlinks still present. Then test it parallelised. – Sparhawk May 21 '16 at 08:12unlink()
is supposed to work. Speeding up deletion depends on many circumstances. https://unix.stackexchange.com/q/37329/13746 offers a number of good candidates to try. – xebeche Jun 14 '20 at 09:59