1

I'm using du to continuously monitor the amount of data written to USB drives that I'm duplicating.

I compare disk usage of source and target drives and display copying progress to the user.

The problem is that du reports 100% data present on the target drive, even though I see lots of data is still in the system cache, the drive's LED is blinking, and the drives are not ready to be removed.

I run rsync, sync and umount in sequence to ensure the data is really there before letting the user remove the target drive. I can't monitor the sync progress however. So the user will see 100% long before the drives are really synced.

I'd love to be able to monitor the "real" copying progress, as it's what really matters - there's no use to see rsync complete copying 1 GB file in 25 seconds, while I'll have to wait another 5 minutes while sync flushes that to drive (I'm exaggerating, but you get the idea).

This is how I monitor rsync progress in a loop for each drive:

PROGRESS="$(echo "$(du -s "/MEDIA/TARGET" 2>/dev/null  | cut -f 1) / $(du -s "/MEDIA/SOURCE" 2>/dev/null | cut -f 1) " | bc -l)"

$PROGRESS is a float between 0 and 1, indicating the ratio between source drive usage and target drive usage.

How can I modify this so it'll consider only data that is already synced to drive, and not just waiting in system cache?

Edit:

I found that dd can perform writes omitting the system cache. I made a test and indeed copying a file this way makes du report actual values, and my progress indications would finally be accurate:

dd if=/media/SOURCE/file of=/media/TARGET/file bs=4M oflag=direct

This uses the read cache, but disabled the write cache, making the proress easier to track, without performing excessive reads. The problem is, to use dd instead of rsync I need to manually recreate the directory structure. I don't need to take care of the file attributes or modification dates.

I guess I could use a combination of find, mkdir and dd to first recreate the directory tree and then copy the files one by one. I wonder - if there are any downsides to this approach?

unfa
  • 1,745
  • You can in fact monitor sync progress. – NarūnasK Apr 26 '17 at 09:58
  • Yes, but only as a summary for all drives present in the system - I'd like to be able to do this for each individual block device. – unfa Apr 26 '17 at 10:35
  • atop and iostat give I/O activity by block device, so I always assumed this ignores internal caches and only measures "real" I/O. – dirkt Apr 26 '17 at 12:36
  • Even if atop and iotop gave physical write and read speeds - they a re still just speeds, I need to know the amount of data that has been written, I can calculate xfer speed later if I need it.

    For a cached write atop reports a steady write speed of around 9.7 MB/s. That makes me think it is physical write speed, not the caching speed (as I can see there's several hundreds of MB in the cache still (that the system normally considers as already written - which is not true).

    – unfa Apr 27 '17 at 08:08

1 Answers1

0

Looks like the best way to handle this is to use direct file output. This way du readings will be much more accurate.

Unfortunately only dd allows that, so we need to workaround two problems:

  1. dd doesn't know what to do with directories
  2. dd can only copy one file at a time

First let's define input and output directories:

SOURCE="/media/source-dir"
TARGET="/media/target-dir"

Now let's cd into the source directory so find will report relative directories we can easily manipulate:

cd "$SOURCE"

Duplicate the directory tree from $SOURCE to $TARGET

find . -type d -exec mkdir -p "$TARGET{}" \;

Duplicate files from $SOURCE to $TARGET omitting write cache (but utilising read cache!)

find . -type f -exec dd if={} of="$TARGET{}" bs=8M oflag=direct \;

This won't preserve file modification times, ownership and other attributes - but for me that's ok.

unfa
  • 1,745