0

I 'm working on a bash script to copy files from a single USB drive to multiple others.

I'm currently using rsync that copies from the source to a single destination, going through all of the output drives in a loop one at a time:

for line in $(cat output_drives_list); do
            rsync -ah --progress --delete mountpoints/SOURCE/ mountpoints/$line/
        done

I'm trying to optimize the process to get maximum use of the USB bandwidth, avaiding the bottleneck of a single drive write speed.

Is is possible to do something like rsync, but with multiple output directories, that will write to all output drives at once, but read only once from the input?

I guess that some of this is already taken care of by the system cache, but that only optimizes for read.

If I run multiple rsync processes in parallel, this might optimize the write speed, but I'm also afraid it'll butcher the read speed.

Do I need to care about single-read when copying in parallel?

unfa
  • 1,745

2 Answers2

1

I can not test it but if you start more processes in the background it might be the solution:

START=$(date +%s)
for line in $(cat output_drives_list); do
    rsync -ah --progress --delete mountpoints/SOURCE/ mountpoints/$line/ &
done
jobs # get a list of running jobs
wait # wait for all processes to complete
sync
echo It took: $(( $(date +%s)-START )) seconds

Edit: Added date-stuff after 'benchmarking' was mentioned.

hschou
  • 2,910
  • 13
  • 15
  • I'm going to try this approach and benchmark it against "sequential" rsync operation. – unfa Mar 09 '17 at 14:13
  • Benchamarks:

    Copying 7,5 GB files with /dev/urandom data to 4 USB devices took 74 minutes in sequential, and 22 minutes in parallel.

    Writing 14 devices in parallel - looks like only 7 are active together at each moment - that's the USB 2.0 bottleneck. Still it completed in 28 minutes.

    – unfa Mar 09 '17 at 16:08
  • @unfa "bottleneck" I think you should add more USB controllers to the motherboard (not USB hubs) to speed up. Like this "2-Port USB 3.0 PCI Controller" http://www.addonics.com/products/ad2u3pci.php and then only use one of the USB ports. Note that /dev/urandom is really slow - you should test with /dev/zero – hschou Mar 09 '17 at 16:52
1

Read speed is going to be your biggest bottleneck for the destination write.

Depending on how big the source disk is, how about creating a RAM disk on the copier, caching your files in there and then copying from there to the multiple destinations using concurrent processes as @hschou demonstrates above?

How to create "real" RAM disk that reserves memory

RAM read is always going to be quicker than multiple Random accesses to a Flash or SSD, even if some of the RAM disk ends up being swapped to local physical disk.

Alan
  • 11
  • 1
  • That sounds like a nice solution, but I will be copying more data than the machine's RAM can hold, so a complete RAMdisk copy of the input is not an option. – unfa Mar 09 '17 at 14:12
  • If you could hold more than 50% it would still be worth doing it. – Alan Mar 09 '17 at 14:32
  • I was also going to suggest creating a list of available read devices with just your source in and as each disk reported completion, add that to the pool. For example, do the first 10 from the 1, then use that 10 to do the next 100 and so on... – Alan Mar 09 '17 at 14:34