5

I have a lot of lossless music on my computer. Unfortunately, sometimes (now quite rarely) they got corrupt and I had to replace them with my backup from an external HDD.

The backup was done manually and it was of course tedious. As my music collection grew it became more and more difficult to find by hand which album weren't archived.

What would be the optimal solution for this case? Simply cp-ing? Some rsync usage? I'd rather not update existing files - they hardly change and I don't want to delete good files with corrupt ones.

marmistrz
  • 2,742
  • 1
    If those files are hardly ever changed, isn't it wise to use some kind of version control? Say 1 in 1000 files gets changed over time, then it's hardly a problem to have two or three versions available. Sometimes the original is corrupt (has happened to me), and then you keep that instead of the good updated one. Unless you take the effort to delete the corrupt version in the backup manually. – SPRBRN Jul 24 '15 at 08:56
  • 2
    I'm recommending Borg Backup quite a lot these days. Here is one more recommendation. This is one of the modern deduplicating backup programs. See a slightly more detailed answer here. – Faheem Mitha Mar 12 '16 at 21:05

2 Answers2

6

I wholeheartedly recommend rsync. It can automatically detect what files are missing on the destination compared to the source and copy only them. IIUC, it's the best solution for your use case. In your case cp is useless because it will always copy all files and will be much slower than rsync. If you will understand how rsync works it will prove to be the best solution for copying large set of data in any case.

Just note that if you decided to use rsync via network in the future it must be installed both on source and destination machines. This is the only drawback of rsync known to me.

EDIT:

rsync usage is very simply. Usually it comes down to the following command:

$ rsync -avz <SOURCE> <DESTINATION>

-a means archive mode - copy directory recursively and recreate symlinks, save permissions, modification times, groups ownerhsip, owners, -v means verbose, -z means compress

When downloading, temporary files names are prepended with .. Next time the same command will be run only files that have been changed locally and new files that appeared in DESTINATION will be downloaded from DESTINATION to SOURCE.

It's easy to use rsync over ssh:

$ rsync -avz -e ssh hosting:/home1/rkumvbrh/mail/drabczyk.org/arkadiusz .

-e ssh can be omitted because ssh is the default:

$ rsync -avz hosting:/home1/rkumvbrh/mail/drabczyk.org/arkadiusz .

When using rsync via network it must be installed on bothe ends:

$ rsync -avz -e "ssh" router:<FILE> .
ash: rsync: not found
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: remote command not found (code 127) at io.c(226) [Receiver=3.1.0]

EDIT:

Ok, at first I thought that you want to replace all files that got corrupted on your local disk with files from an external disk and copy all new files from external disk. But if you want to copy only new files from your internal disk to an external disk you must add --ignore-existing option so that new files will be copied to an external disk but corrupted files will not:

$ rsync -avz --ignore-existing <PATH/TO/INTERNAL_HDD> <PATH/TO/EXTARNAL_HDD>

From man rsync:

--ignore-existing This tells rsync to skip updating files that already exist on the destination (this does not ignore existing directories, or nothing would get done). See also --existing.

   This option is a transfer rule, not an exclude, so it
   doesn't affect the data that goes into the file-lists, and
   thus it doesn't affect deletions. It just limits the files
   that the receiver requests to be transferred.

   This option can be useful for those doing backups using the
   --link-dest option when they need to continue a backup run that
   got interrupted. Since a --link-dest run is copied into a new
   directory hierarchy (when it is used properly), using --ignore
   existing will ensure that the already-handled files don't
   get tweaked (which avoids a change in permissions on the
   hard-linked files). This does mean that this option is only
   looking at the existing files in the destination hierarchy
   itself.
  • Would you recommend any particular tutorial on rsync (in Polish, English, German or Russian) or will simply any first found by duckduckgo fit? :) – marmistrz Jul 24 '15 at 09:47
  • 1
    I can't recommend any tutorial because I have never read any as rsync usage is well described in its manpage. Wait a second, I will add an example usage to my answer. – Arkadiusz Drabczyk Jul 24 '15 at 09:50
  • 1
    @marmistrz: see updated answer – Arkadiusz Drabczyk Jul 24 '15 at 10:04
  • Yes, but the local files can be possibly damaged. I want to copy only the local files only if no remote (i.e. on the external drive) backup exists. Besides, I'd rather have a one-directional copy - sometimes I have a second, different version of an album on my backup drive which should stay here (and not get onto the local drive). Will the same syntax work with filesystems? – marmistrz Jul 24 '15 at 10:18
  • @marmistrz: do you want to copy files from an external HDD or the other way around? – Arkadiusz Drabczyk Jul 24 '15 at 10:26
  • from the internal HDD to the external HDD – marmistrz Jul 24 '15 at 10:29
  • Does it make sense to compress the data if we are working over two HDDs? Besides, it seems important to have trailing slashes. I created a small playground and rsync -avz a b creates the directory b/a – marmistrz Jul 24 '15 at 10:44
  • 1
    @marmistrz: just a bit of warning - don't run this commands right away on your real data, try to play around with them at first to check whether they do what you want – Arkadiusz Drabczyk Jul 24 '15 at 10:47
  • @marmistrz: and about the compression - I don't know. Measure how much time does it take w/ compression and w/o compression using time command for example. – Arkadiusz Drabczyk Jul 24 '15 at 10:49
  • If most of your files are music, they will most likely already be compressed. In this case, I would not expect much speed-up with -z, not even over a slow connection. – klimpergeist Jul 24 '15 at 11:13
0

Adding -c to rsync command will make sure every single file are the same on the two side.