5

Possible Duplicate:
Rsync filter: copying one pattern only

I would like to use rsync to transfer all files from a server (or server-via-ssh) which have a specific ending string, such as a file type extension like tar.gz. I want them all regardless of how deep they might go, now, or in the future. But this can be more than just the extension. In one case I want to get all files ending in -server-cloudimg-amd64-root.tar.gz. And equally important I do not want to get any other files at all, even if they are at shallower paths.

The simple case of --include="**/*-server-cloudimg-amd64-root.tar.gz does not get them. I do know rsync include/exclude is not simple and did not expect that to work. I know there is some need to specify directories to also be transferred. But rsync's logic has been a perpetual mystery to me (not like any path ACL rules I've ever seen) because of the fact that it also requires matching parent directories separately. I think what is needed is simply an option with the semantics that says "if this file matches, include all necessary directories to make it transfer without implying anything else matches" in much the same way the command mkdir -p ${DIRNAME} would create the parents of the named directory as needed. I see no such option in rsync. Is there some straightforward way to do this in one pass?

Skaperen
  • 716

2 Answers2

11

In this case when you have no physical access to the remote server, you must get in touch with all those includes and excludes. There is an excellent post here, on SE (see the featured answer):

Rsync filter: copying one pattern only

I think that you will need something like

rsync -am --include='*.tar.gz' --include='*/' --exclude='*' SOURCE DESTINATION
  • That seems to work (with my pattern in place of the example pattern). But I don't understand why. I'd still want to test it more thoroughly to be sure no "weird" site would end up having things that confuse it. – Skaperen Sep 08 '12 at 20:28
  • OK, I commented too quick. It is pulling down other directories that do not have the matching files. Though they are empty in the target, they are getting created. On some very large trees that can be a problem requiring cleanup every time rsync is run. – Skaperen Sep 08 '12 at 20:32
  • I see from the linked answers, specifically the answer beginning with the word "Judging", that the culprit is --include='*/'. It matches all directories. This is not what I want. I only want the directories needed for the files that match. – Skaperen Sep 08 '12 at 20:38
  • -m option will do the job, I have edited my answer. – Paweł Rumian Sep 08 '12 at 20:44
  • OK, that's definitely looking better. I guess I needed the right combination of -m and --include='*/' to do this. I was looking too hard for one magic option. – Skaperen Sep 08 '12 at 21:10
  • I'm glad I was able to help. – Paweł Rumian Sep 08 '12 at 21:12
0

You want to use find:

find /top/level/path -name '*.tar.gz' -exec rsync {} +

top/level/path is the top level directory, and the search would be performed in all its subdirectories. You can use -maxdepth or -mindepth options if you want to narrow your search, and use wildcards like ? or * with -name.

You can of course add additional options to rsync, like rsync -av. The final part -- {} + -- feeds rsync with the list of files found by find command.

If you want to see the list of files that would be passed to rsync, you can test it by sybstituting rsync with echo:

find /top/level/path -name '*.tar.gz' -exec echo {} +
  • Unfortunately I do not have access at the server side to run something like "find". I am already using "rsync -a" and the like. I can download the whole tree or specific subdirectories OK. I could download a list with "rsync -r rsync://sitename/module/" and have done so but that is what I want to avoid. Also executing rsync multiple times is "just wrong" (there could be hundreds or thousands of matched files among a million that do not match). – Skaperen Sep 08 '12 at 20:10
  • But this is a good example showing that find does matches in a simple way that works. The problem is even though rsync does consider the file matched in a way like that, it will refuse to transfer that file if any parent directory didn't also get a match somehow. – Skaperen Sep 08 '12 at 20:13
  • Ah, OK, I have misunderstood you. I have created a second answer. – Paweł Rumian Sep 08 '12 at 20:21