1

This lists all files in two backups, sorted by size:

tar tvf backup1.tar.bz2 |sort -k3 -n >backup1_files.txt
tar tvf backup2.tar.bz2 |sort -k3 -n >backup2_files.txt

I'd like to list all files present in backup2.tar.bz2 but not present in backup1.tar.bz2, sorted by size.

How to do this?


NB:

  • Doing a diff of these .txt files won't work because the modification dates of some files won't be the same. Thus this question is not a duplicate of Is there a tool to get the lines in one file that are not in another?.

  • Removing v would remove the modification dates, but also the filesizes, so it's not an option because it would be impossible to sort them by size.

Basj
  • 2,519

2 Answers2

0

If you have AWK, you can use a one liner like this:

awk '{if (NR==FNR) { arr[$6]=1 } else { if (! arr[$6]) { print } } }' backup2_files.txt backup1_files.txt

This will build an AWK array with the file names of backup 2 and then check whether the file names of backup 1 are present in that array. If not, it will print them.

EDIT: Here's an improved version that's more robust to files with whitespace in the name and doesn't need any temporary files:

 awk '{ key=""; for (i = 6; i <= NF; i++) { key=col_cat $i }; if (NR == FNR) { arr[key]=1 } else { if (! arr[key]) { print } } }' <(tar tvf backup2.tar.bz2 |sort -k3 -n) <(tar tvf backup1.tar.bz2 |sort -k3 -n)

You can write the awk code into a file like intersect.awk and re-use it like:

awk -f intersect.awk <(tar tvf backup2.tar.bz2 |sort -k3 -n) <(tar tvf backup1.tar.bz2 |sort -k3 -n)
  • Thank you for your answer. Would there be a direct solution without using these temporary .txt files, by piping tar ... directly into awk? – Basj Nov 01 '19 at 09:02
  • You can use a bash/zsh process substitution: awk '{if (NR==FNR) { arr[$6]=1 } else { if (! arr[$6]) { print } } }' <(tar tvf backup2.tar.bz2) <(tar tvf backup1.tar.bz2)

    One thing about this though, be careful with spaces in file names, by default awk splits on whitespace, so some matches might be wrong.

    – Bastian Schiffthaler Nov 01 '19 at 09:06
0

The proposed methods from other answers do not work since tar will print:

name123 symbolic link to namexyz

if there are symlinks in the archive and similar messages for hardlinks.

So the only way to deal with that is to use star:

star -t -tpath < archive.tar.bz2 > somename

Do this for all archives, sort the outpout and then use the well known methods to compare the resulting files.

The option -tpath tells starto only print the filename, once on a line.

star is part of the schilytools.

BTW: If a filename contains a newline character, this method will confuse the comparing tools.

schily
  • 19,173