4

I'm trying to create a backup script. I've managed to get this script working fine on a CentOS 6.7 machine and am now trying to get it working properly on Debian 7.

I am running into a problem I can't seem to solve with Google or any of the information found on this site. I'll try to explain my situation before getting into the problem.

On CentOS, I use the following command to find files that have been changed in the past 24 hours in $SOURCEDIR and use xargs to put only these files into $ARCHIVE. If no files are found a message pops up.

find $SOURCEDIR -mtime -1 -print | xargs -r tar rcvf $ARCHIVE || { echo "No files have been changed in the past 24 hours. Exiting script ..." ; exit 1; }

I am aware that using tar rcvf can invoke the following error message:

You may not specify more than one '-Acdtrux' or '--test-label' option

This however, does not seem to happen on the CentOS machine. It does on the Debian machine, thus I've removed the r command from the tar command. The reason I've added this in the first place is because I want to avoid the archive being overwritten if find would return more than 100 results.

Now onto the actual problem. Whenever I run

find $SOURCEDIR -mtime -1 -print

I get a list of the files that have been changed in $SOURCEDIR in the past 24 hours, as expected. However, whenever I run the complete command including the pipe symbol and the xargs command like this:

find $SOURCEDIR -mtime -1 -print | xargs -r tar cvf $ARCHIVE || { echo "No files have been changed in the past 24 hours. Exiting script ..." ; exit 1; }

I actually see the find command print all files from $SOURCEDIR before I end up with an archive including all the files from $SOURCEDIR, and I do not understand why. Any help would be greatly appreciated.

Anthon
  • 79,293
  • does the output of find $SOURCEDIR -mtime -1 -print also include . as one of the output? – Sundeep May 12 '16 at 13:44
  • No, there is no . included in the output of the find command. The output includes the name of the $SOURCEDIR and all the files that have been changed in the past 24 hours. – Seroczynski May 12 '16 at 13:48
  • and I guess $SOURCEDIR doesn't show up on CentOS? you just need to remove that before piping to tar.. – Sundeep May 12 '16 at 13:52
  • I get the exact same result on CentOS, the $SOURCEDIR is the first result. I've edited the original post to include the fact that I get to see all files listed by find when I run the full command. – Seroczynski May 12 '16 at 13:57
  • can you try this? find $SOURCEDIR -mtime -1 -print | grep -xv "$SOURCEDIR" | xargs -r tar cvf $ARCHIVE || { echo "No files have been changed in the past 24 hours. Exiting script ..." ; exit 1; } – Sundeep May 12 '16 at 13:57
  • That actually worked perfectly, but I still do not understand why? – Seroczynski May 12 '16 at 14:00
  • I'll put it as an answer with explanation – Sundeep May 12 '16 at 14:01
  • 4
    Give find a -type f argument if you only want it to print pathnames of files. Otherwise, it'll print directory pathnames, too. – Mark Plotnick May 12 '16 at 14:21

4 Answers4

2
find $SOURCEDIR -mtime -1

also includes $SOURCEDIR in the output, which needs to be removed before further processing

Using grep -vx one can define a particular line to be excluded..

find $SOURCEDIR -mtime -1 -print | grep -xv "$SOURCEDIR" | xargs -r tar cvf $ARCHIVE || { echo "No files have been changed in the past 24 hours. Exiting script ..." ; exit 1; }
Sundeep
  • 12,008
  • 1
    Simpler: find "$SOURCEDIR" -mindepth 1 -mtime -1. But going further, the same problem will occur with subdirectories, so -type f is probably the right solution here (unless directory metadata matters). – Gilles 'SO- stop being evil' May 12 '16 at 22:10
2

You will not only run into problems if xargs invokes tar twice, you also will get problems if your file-names contain special characters like newlines.

You should drop the use of xargs and tar and use find with cpio:

find $SOURCEDIR -mtime -1 -print0 | cpio --create -0 --verbose \
     --format=ustar -O $ARCHIVE

ustar provides you with a POSIX.1 compliant tar file in $ARCHIVE.

Anthon
  • 79,293
2

As others have identified, the problem with your command is that it includes directories, and tar archives them recursively. If a directory has been modified recently, all the files in it and its subdirectories get included, whether they have been modified or not.

If you don't care to back up directory metadata, then just tell find not to print directory names. It isn't enough to omit the root: the same thing can happen with subdirectories too.

find "$SOURCEDIR" -mtime -1 ! -type d -print | xargs -r tar -rcf "$ARCHIVE"

Using xargs fails with file names containing spaces and some other special characters. This is easy to fix: use -exec instead of xargs.

find "$SOURCEDIR" -mtime -1 ! -type d -exec tar -rcf "$ARCHIVE" {} +

If you want to back up directory metadata, let find print everything and instead tell tar not to recurse into subdirectories. Since find is doing the recursion, tar doesn't need to.

find "$SOURCEDIR" -mtime -1 -exec tar -rcf "$ARCHIVE" --no-recursion {} +

With this approach, you can avoid the use of tar -rc and instead solve the problem of repeated tar invocations by first creating an archive with only the root directory, and then appending to it in batches. (Why the root directory? Because GNU tar is afraid of creating an empty archive.)

tar -cf "$ARCHIVE" --no-recursion "$SOURCEDIR"
find "$SOURCEDIR" -mindepth 1 -mtime -1 -exec tar -rf "$ARCHIVE" --no-recursion {} +
  • very well explained with incremental thought process, so much to learn here – Sundeep May 13 '16 at 02:15
  • Thanks for the detailed explanation. But what does the {} + at the end of the commands do? – Seroczynski May 13 '16 at 08:11
  • @Seroczynski It's the syntax of the -exec action of find. {} is replaced by the list of file names, as many as fit in the command line length limit. For example -exec command arg1 arg2 {} + executes command arg1 arg2 file1 file2 … fileN1 then command arg1 arg2 fileN1+1 … fileN1+N2 and so on. There's also -exec … {} … \; which executes the command for each file, one at a time. – Gilles 'SO- stop being evil' May 13 '16 at 11:41
0

When you execute find $SOURCEDIR -mtime -1 -print it will have as the first result the folder $SOURCEDIR itself. That is why everything is included.
You have to exclude the first result or $SOURCEDIR.

magor
  • 3,752
  • 2
  • 13
  • 28