0

I need to transfer a lot of web directories to another machine. The directory structure looks like this:

/var/www/
    site1/
        cgi-bin/ ...
        logs/ ...
        index.html
        images/ ...
        ...
    site2/
        cgi-bin/
        logs/
        ...
    size3/
    ...

To pack the files I'd like to use tar. I need to pack all files, except all "cgi-bin" and all "logs" directories only in certain provided paths, not in any subdirectory!

These directories should be excluded from packing because they should not appear in the destination and they can be quite large. That slows down the transfer. So I do not want to include them and only delete them in the destination.

I tried several combinations of this:

cd /var/www
tar cfz ~/web.tgz site* --exclude-from excludes.list

excludes.list is a file that contains lines such as shown here: (the example shows different path styles, I used each of them consistently for the entire file, but no variant worked)

site1/cgi-bin
site1/logs
./site2/cgi-bin
/var/www/site2/logs

The "cgi-bin" directories may occur in other subdirectories (I list them all with a find command), the "logs" directories that I want to exclude are all directly in each "sitex" directory. Other "logs" directories must be included.

I could only get to two results:

  1. No files were excluded at all
  2. All directories that partially matched an exclude pattern were excluded, including e.g. /var/www/site2/bla/site1/logs/. This is not acceptable as it's excluding too much.

Is there a way to make tar exclude exactly the absolute paths provided and nothing else that partially looks like a provided exclude pattern?

Kusalananda
  • 333,661
ygoe
  • 223
  • 2
  • 13

1 Answers1

2

You could use a process substitution and find to build the list of directories you want to exclude.
I'm assuming your find implementation supports the -maxdepth option:

cd /var/www
tar cfz ~/web.tgz --exclude-from=<(
  find site* -maxdepth 1 -type d -name 'logs'
  find site* -type d -name 'cgi-bin'
) site* 

Test setup:

site1/
├── cgi-bin
│   └── file1
├── images
│   ├── cgi-bin
│   │   └── file2
│   └── logs
│       └── file3
├── index.html
└── logs
    └── file4

Output:

$ tar cvfz ~/web.tgz --exclude-from=<(
  find site* -maxdepth 1 -type d -name 'logs'
  find site* -type d -name 'cgi-bin'
) site*
site1/
site1/images/
site1/images/logs/
site1/images/logs/file3
site1/index.html

Excluded directories (output of both find commands):

site1/logs
site1/cgi-bin
site1/images/cgi-bin
Freddy
  • 25,565
  • That's about what I already did, just using a separate file instead of piping the find result directly into the tar process. What are the lines your find produces? Are they different to mine? (I showed them above.) If not, this is not an answer but still my question. – ygoe Dec 06 '20 at 15:14
  • Added output of my test setup, I don't know why it doesn't work for you. – Freddy Dec 06 '20 at 15:32
  • --exclude-from is a nonportable GNU tar extension. Other tar implementations use a different option name for this feature. Modern tar implementations use libfind and thus support to use a find(1) like command line directly. See e.g. http://schilytools.sourceforge.net/man/man1/star.1.html – schily Dec 29 '20 at 09:52
  • 1
    @schily It is clear that this question is specifically about GNU tar, and that the answer addresses GNU tar. – Kusalananda Dec 30 '20 at 11:54
  • @Kusalananda Why didn't you change the tag from tar to gnu-tar if you believe this is obvious to all readers? – schily Dec 30 '20 at 23:01
  • @schily Done. Thanks for the suggestion. – Kusalananda Dec 30 '20 at 23:09