131

I need to backup a fairly large directory, but I am limited by the size of individual files. I'd like to essentially create a tar.(gz|bz2) archive which is split into 200MB maximum archives. Clonezilla does something similar to this by splitting image backups named like so:

sda1.backup.tar.gz.aa
sda1.backup.tar.gz.ab
sda1.backup.tar.gz.ac

Is there a way I can do this in one command? I understand how to use the split command, but I'd like to not have to create one giant archive, then split it into smaller archives, as this would double the disk space I'd need in order to initially create the archive.

Naftuli Kay
  • 39,676

7 Answers7

167

You can pipe tar to the split command:

tar cvzf - dir/ | split --bytes=200MB - sda1.backup.tar.gz.

On some *nix systems (like OS X) you may get the following error:

split: illegal option -- -

In that case try this (note the -b 200m):

tar cvzf - dir/ | split -b 200m - sda1.backup.tar.gz.

If you happen to be trying to split the file to fit on a FAT32 formatted drive, use a byte limit of 4294967295. For example:

tar cvzf - /Applications/Install\ macOS\ Sierra.app/ | \
split -b 4294967295 - /Volumes/UNTITLED/install_macos_sierra.tgz.

When you want to extract the files use the following command (as of @Naftuli Kay commented):

cat sda1.backup.tar.gz.* | tar xzvf -
jordanm
  • 42,678
  • 27
    Will cat sda1.backup.tar.gz.* | tar xzvf - do the job? – Naftuli Kay Jan 18 '13 at 19:45
  • 4
    Yes, it should. split by default sets the names the files so that when sorted by LOCALE (which is done by shell globbing) will be in the correct order. – jordanm Jan 18 '13 at 19:47
  • @NaftuliTzviKay Using cat on Command line works fine. But when I do the same using a she'll Script, I am thrown at error saying the file.tar.gz.* not found. – Vinay Dec 08 '14 at 17:33
  • 4
    Without verbose, just do tar czf ... without the v and merge by cat backup.tar.gz.* | tar tar xzf - without v. I see no benefit of the verbose output here by v. – Léo Léopold Hertz 준영 Jul 09 '16 at 12:15
  • how does it work for .bz2.aa or .bz2.bb uncompressing ? – Praneeth Oct 24 '17 at 20:23
  • @Praneeth exactly the same. cat foo.bz2.* | tar xvf - – jordanm Oct 24 '17 at 21:35
  • 1
    Just helped a friend by packing Xcode onto a FAT32 formatted flash drive with: tar cvzf - Xcode.app/ | split -b 2000m - /Volumes/PH/xcode/xcode.tgz (used from cd /Applications/) Thank you very much :) – ecth Dec 08 '17 at 12:06
22

tar split archive

I found this to be the best solution for a few reasons:

  • It creates parts without interaction, automatically naming parts
  • You can use any compression you want, usual tar options
  • Requires no external commands for splitting or joining
  • Uses no extra disk space (intermediate)
  • Any dearchiver handles easily as each archive is self-contained
  • Increase safety as each archive is self-contained, files do not span multiple archives

This command is creating 2GB chunks without the compression:

tar -cv --tape-length=2097000 --file=my_archive-{0..50}.tar file1 file2 dir3
  • c for create
  • v for verbose, to list files added to the archive
  • --tape-length is chunk size: you can add a suffix, if you omit it, a kilobyte is assumed (hence 2 million for a 2 gigabyte)
  • --file is where we magically create names for chunks: we give arbitrarily 50 but you may put any big enough number, only those needed will be used
  • list of files and directories to be included in archives

Similarily, this command is creating 1GB chunks with the gzip compression:

tar -czv --tape-length=2097000 --file=my_archive-{0..50}.tar.gz file1 file2 dir3
MacMladen
  • 321
  • 2
    Note that (I'm pretty sure) the tar file won't be broken at file boundaries, meaning half a file can be in one tar archive, and the other half is in the next tar archive. At least, that seems to be the case from the errors I'm seeing when trying to extract a single tar archive ("Unexpected EOF in archive"). Just mentioning this to help others in my situation. Please correct me if I'm wrong. – joe Feb 23 '22 at 13:35
  • 4
    I would use --file=my_archive.tar.gz.{00..50} instead, for two reasons: First, placing the number at the end indicates that the file is just a part of a larger archive. Second, using fixed-width numbers will sort the files correctly when using cat to recombine the pieces. – Greg Barrett Apr 01 '22 at 17:10
  • 1
    @MacMladen When I try to use compression option I get an error that can't use compression. any idea on how to add compression? – Sruly Jul 03 '22 at 12:37
  • @Sruly the command typed exactly should not yield any error. Can you post the exact error? (copy/paste output in terminal) – MacMladen Aug 23 '22 at 13:43
  • 2
    @MacMladen This is the results. tar -cz --tape-length=209700 --file=my-ubuntu22-fs-{00..50}.tar.gz ubuntu22-fs tar: Cannot use multi-volume compressed archives Try 'tar --help' or 'tar --usage' for more information. – Sruly Aug 25 '22 at 22:36
  • The resulting archives are not self-contained in my case. tar -tvf ./my_archive-3.tar lists <somefile>--Continued at byte 148992-- at the top and tar: Unexpected EOF in archive at the bottom – vdi Feb 21 '23 at 08:28
  • 5
    how do you unpack the files afterwards? – banan3'14 Mar 05 '23 at 19:39
  • 1
    After much trail and error I found the best way to unpack the archive afterwards was like: tarcat my_archive-*.tar | tar -xf - Using the tarcat script from GNU.org. – PicoutputCls May 15 '23 at 15:46
21

On macOS, the split command works slightly differently:

$ tar cvzf - foo | split -b 2500m - foo.tgz.
5

Just to add: As the maximum allowed file size in vfat/fat32 is 2^32 minus 1 (4294967295 bytes), the split command with the maximum allowed file size on such file system is:

split -b4294967295 -d my_input_file my_output_file_splitted
Aydin K.
  • 165
4
serega@serega-sv:~$ tar -c  -M --tape-length=1024 --file /tmp/pseudo-tape.tar --new-volume-script=/tmp/new-volume.sh --volno-file=/tmp/volno /tmp/stuff-to-archive 
tar: Removing leading `/' from member names
moving /tmp/pseudo-tape.tar to /tmp/archive.1
moving /tmp/pseudo-tape.tar to /tmp/archive.2
moving /tmp/pseudo-tape.tar to /tmp/archive.3

You'll need a script for automation moving pseudo-tape.tar file to a new name:

serega@serega-sv:~$ cat /tmp/new-volume.sh 
dir="/tmp"
base_name="pseudo-tape.tar"
next_volume_name=`echo -n "archive."; cat $dir/volno`
echo "moving $dir/$base_name to $dir/$next_volume_name"
mv "$dir/$base_name" "$dir/$next_volume_name"
  • I haven't downvoted your answer, because I am happy to see one that uses -M --tape-length. However, this answer does ignore the OP's request for a solution that uses gzip or bzip2 compression. –  Mar 15 '18 at 13:28
  • 2
    Caution: you are not talking about tar but rather about a tar clone called gtar (GNU tar). This tar clone supports to create multi volume archives but with a noticeable probability is unable/unwilling to extract from those multi volume archives as it incorrectly claims that a follow up volume is not the right continuation part. – schily Sep 01 '18 at 11:32
2

Just to throw in my own contribution, I wrote an app recently that splits up tarballs along file boundaries, which you may find useful:

https://github.com/dmuth/tarsplit

Douglas Muth
  • 121
  • 2
1

Instead of tar I'd use 7zip or some other archiver that can natively split archive of file boundaries.

With split command you may have roubles recovering faulty archives when just one part of the series gets damaged.

7z and some other archives additionally may create recovery sum added to archives or even have option to add recovery volume that saves your day when you loose or damage entire part.