2

The command I am currently using to backup one HDD to another (locally, not remotely) is

rsync \
    --info=PROGRESS2,BACKUP,DEL \
    -ab \
    --human-readable \
    --inplace \
    --delete-after \
    --debug=NONE \
    --log-file=/media/blueray/WDPurple/rsync.log \
    --backup-dir=red_rsync_bak.$(date +"%d-%m-%y_%I-%M-%S%P") \
    --log-file-format='%t %f %o %M' \
    --exclude='lost+found' \
    --exclude='.Trash-1000' \
    /media/blueray/WDRed \
    /media/blueray/WDPurple

if I use --delete-after rsync consider the moved directories as deleted and created directories.

As a result, when I move directories in source, it delete those directories from the destination and then copy them from the source. Often it takes a long time as I sometime move large directories in the source.

I found few solutions to this problem.

  1. Patch rsync.

  2. without patch.

  3. use BorgBackup or bup

  4. use --fuzzy --delay-updates --delete-delay

However, each has its own issues.

The patch was created long ago and I am not sure whether it will have issues with the modern rsync or not. Moreover, maintaining a patch is difficult for me.

Option two create a mess in my HDD. moreover, I use many more rsync options and not sure whether it will be safe or not.

As far option 3 is concerned, I invested a lot of time with rsync and now do not want to move to new tool. Moreover, those tools have their own issues.

Regarding option 4, a rename using --fuzzy --delay-updates --delete-delay of /test/10GBfile to /test/otherdir/10GBfile_newname would still resend the data, since it's not in the same directory. It has a lot more issues. Ex. --delay-updates conflicts with with --inplace.

So, the solution I am looking for is to use --itemize-changes with --dry-run and get the list of directories moved then first run mv in the destination (It will be great if it have a prompt like x will be moved to a/x in destinition, y will be moved to b/y in destinition,c/z will be moved to z in destinition. Do you want to continue?) and then run my rsync command mentioned in the top. I am ready to consider directory with same name and size as similar directory.

Suppose the directory tree looks like:

.
├── dest
│   ├── test
│   │   └── empty-asciidoc-document.adoc
│   ├── test2
│   │   └── empty-asciidoc-document.adoc
│   └── test3
│       └── empty-asciidoc-document.adoc
├── src
│   ├── grandpartest1
│   │   └── partest
│   │       └── test1
│   │           └── empty-asciidoc-document.adoc
│   ├── grandpartest2
│   │   └── partest2
│   │       └── test2
│   │           └── empty-asciidoc-document.adoc
│   └── grandpartest3
│       └── partest3
│           └── test3
│               └── empty-asciidoc-document.adoc

I noticed that if I move directories the --itemize-changes output looks like:

% rsync --dry-run -ai --inplace --delete-after /home/blueray/Downloads/src/ /home/blueray/Downloads/dest/
.d..t...... ./
cd+++++++++ grandpartest/
cd+++++++++ grandpartest/partest/
cd+++++++++ grandpartest/partest/test/
>f+++++++++ grandpartest/partest/test/empty-asciidoc-document.adoc
cd+++++++++ grandpartest2/
cd+++++++++ grandpartest2/partest2/
cd+++++++++ grandpartest2/partest2/test2/
>f+++++++++ grandpartest2/partest2/test2/empty-asciidoc-document.adoc
cd+++++++++ grandpartest3/
cd+++++++++ grandpartest3/partest3/
cd+++++++++ grandpartest3/partest3/test3/
>f+++++++++ grandpartest3/partest3/test3/empty-asciidoc-document.adoc
*deleting   test3/empty-asciidoc-document.adoc
*deleting   test3/
*deleting   test2/empty-asciidoc-document.adoc
*deleting   test2/
*deleting   test/empty-asciidoc-document.adoc
*deleting   test/

we can get the deleted directories using:

% echo "$dryrunoutput" | grep "*deleting.*/$" | awk '{print $2}' | while read spo; do echo ${spo%?}; done
test3
test2
test

Added directories using:

% echo "$dryrunoutput" | grep "cd++.*/$" | awk '{print $2}' | while read spo; do echo ${spo%?}; done | while read spo; do echo ${spo##*/}; done
grandpartest
partest
test
grandpartest2
partest2
test2
grandpartest3
partest3
test3

Directories that were both added and deleted using:

$ sort  <(echo "$deletedirectories") <(echo "$addeddirectoriesvalue") | uniq -d
test
test2
test3

Directory size in byte, to compare both are same directory (more or less, this will work for me) using:

% /usr/bin/du -sb "/home/blueray/Documents/src/test2/test" | grep -oh "^\S*"
4096
% /usr/bin/du -sb "/home/blueray/Documents/dest/test" | grep -oh "^\S*"
4096

The script I came up with so far is:

#!/bin/bash

source="/media/blueray/WDRed/_working/_scripts/_rsync-test/src/" destination="/media/blueray/WDRed/_working/_scripts/_rsync-test/dest/" dryrunoutput=$(rsync --dry-run -ai --inplace --delete-after $source $destination) deletedirectories=$( echo "$dryrunoutput" | grep "deleting./$" | awk '{print $2}' | while read spo; do echo ${spo%?}; done ) addeddirectorieskey=$( echo "$dryrunoutput" | grep "cd++./$" | awk '{print $2}' | while read spo; do echo ${spo%?}; done ) addeddirectoriesvalue=$( echo "$dryrunoutput" | grep "cd++./$" | awk '{print $2}' | while read spo; do echo ${spo%?}; done | while read spo; do echo ${spo##*/}; done )

intersection=$( sort <(echo "$deletedirectories") <(echo "$addeddirectoriesvalue") | uniq -d )

sourcesize=$(/usr/bin/du -sb "${source}test2/test" | grep -oh "^\S*")

destsize=$(/usr/bin/du -sb "${destination}test" | grep -oh "^\S*")

if [[ "$destsize" == "$sourcesize" ]] then mv "${destination}test/" "$destination$addeddirectories" fi

If you notice mv "${destination}test/" "$destination$addeddirectories", here part of the path is hard coded. It has other issues as well. It only work for single directory and stuff like that.

P.S. I know similar name and size does not mean they are same, but in my case it will work. My directories are the main problem, files are not. So, I am not really worried about file move detection. I am only interested in directory move detection.

amphetamachine
  • 5,517
  • 2
  • 35
  • 43
Ahmad Ismail
  • 2,678
  • 2
    You might be interested in lsyncd, which looks at a tree with inotify, so it can detect if you actually just move files/dirs around. – Ulrich Schwarz Jan 08 '21 at 14:18
  • 2
    https://unix.stackexchange.com/questions/6411/any-way-to-sync-directory-structure-when-the-files-are-already-on-both-sides/6511#6511 proposes non-rsync solutions. I recommend using one of them rather than trying to reinvent the wheel. – Gilles 'SO- stop being evil' Jan 08 '21 at 16:47
  • 1
    @Gilles'SO-stopbeingevil' I actually decided to go with borgbackup. – Ahmad Ismail Jan 08 '21 at 20:42
  • You may also want to consider filesystems like zfs (or maybe btrfs) which can snapshot file systems and send streams representing the difference between two snapshots to replicate somewhere (zfs send/zfs recv). – Stéphane Chazelas Jan 09 '21 at 07:50
  • I closed this as the issue seems to have resolved itself. – Kusalananda Jan 11 '21 at 10:53
  • In a previous comment you stated that you've decided to go with borgbackup. This sounds like a good resolution to me. – Kusalananda Jan 11 '21 at 10:58
  • I decided but I am hearing bad things about the other solutions https://www.reddit.com/r/DataHoarder/comments/f40fa2/restic_alternative_for_backups/ – Ahmad Ismail Jan 11 '21 at 11:01
  • Depending who you listen to, you will hear good or bad things about each and every solution you have listed, and any other alternative solution you can come up with. It's up to you to implement and test something that suits you. Using borgbackup or something like restic seems reasonable since they are actually backup softwares. Rsync is a file copy utility. – Kusalananda Jan 11 '21 at 11:06
  • Recently I got a heavy bill from a data recovery service provider for recovering a 4TB HDD.

    They consider themselves surgeons and charge like wise. They even refer to the HDD that they will need to take the parts from as donor HDD. When I talked to them about my data privacy, they said, it is like visiting a doctor, you have to open up and show everything. I do not want to go back to them anymore.

    – Ahmad Ismail Jan 11 '21 at 11:13
  • is --remove-source-files same as --delete-after? – alecxs Jan 13 '21 at 19:34
  • 1
    @alecxs no. --remove-source-files remove from source. --delete-after delete from destination. – Ahmad Ismail Jan 13 '21 at 20:17

1 Answers1

1

You could use this as a basis for your backups. It requires that the source and destination filesystems can handle hard-linked files, and that you don't mind the destination files remaining hard-linked into a working directory between runs. There is a dependency on GNU's version of find for the -printf option that writes out the file's inode and relative path.

#!/bin/bash
# Usage: [<rsync_args...>] <src> <dst> 
#
args=("$@")
src="${args[-2]}"          # '.'
dst="${args[-1]}"          # eg 'remote:/tmp/dest'
unset args[-1] args[-1]    # Yes really

Create the working set space

temp=".inodes" mkdir -p "$src/$temp"

Build the set of files indexed by inode

echo Create inodes >&2 find "$src" -path "$src/$temp" -prune -o -type f -printf "%i\t%P\0" | while IFS= read -d '' -r line do inode="${line%%$'\t'}" file="${line#$'\t'}" ln -f "$src/$file" "$src/$temp/$inode" done

Copy the index and then the full tree

echo Copy inodes >&2 rsync -avPR "${args[@]}" "$src/./$temp/" "$dst/"

echo Copy structure >&2 rsync -avHPR --delete-after "${args[@]}" "$src/./$temp/" "$src/./" "$dst/"

Remove the working set on the source (not essential but you may prefer it)

echo Tidyup >&2 rm -rf "$src/$temp"

If you call it dsync and put it into your path, you could use it like this

dsync /media/blueray/WDRed /media/blueray/WDPurple

or potentially

dsync --info=PROGRESS2,BACKUP,DEL --backup --human-readable --inplace --delete-after --log-file=/media/blueray/WDPurple/rsync.log --backup-dir=red_rsync_bak.$(date +"%d-%m-%y_%I-%M-%S%P") --log-file-format='%t %f %o %M' --exclude='lost+found' --exclude='.Trash-1000' /media/blueray/WDRed /media/blueray/WDPurple
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • 95% done. The only problem is it is still creating the backup directories within the new directory. Ex. red_rsync_bak.15-01-21_09-02-30am inside red_rsync_bak.15-01-21_09-03-04am instead of beside. – Ahmad Ismail Jan 14 '21 at 23:04
  • Now I understand what is going on. --delete-after delete the old red_rsync_bak.15-01-21_09-02-30am that is why it shows up inside red_rsync_bak.15-01-21_09-03-04am. One solution may be to use an absolute path for --backup-dir= (though I have not tested it yet). However, this script does not work at all with --delete-delay. I am awarding the bounty but please fix these issues at your convenience as others might find your script useful. – Ahmad Ismail Jan 15 '21 at 04:33