54

I usually assumed that tar was a compression utility, but I am unsure, does it actually compress files, or is it just like an ISO file, a file to hold files?

TheDoctor
  • 965

4 Answers4

66

Tar is an archiving tool (Tape ARchive), it only collects files and their metadata together and produces one file. If you want to compress that file later you can use gzip/bzip2/xz. For convenience, tar provides arguments to compress the archive automatically for you. Checkout the tar man page for more details.

cjm
  • 27,160
0xAF
  • 1,219
  • 9
    A slight clarification on the answer. It is GNU tar that provides those extra compression arguments. For example, Solaris tar does not provide arguments for compression. – Tero Kilkanen Apr 29 '14 at 22:20
  • 5
    oooh, that's why I keep seeing thing.tar.7z – Mooing Duck Apr 30 '14 at 00:34
  • BSD tar provides an argument for compression as well, though it only accepts z and determines the compression method based on the extension, whereas GNU tar has separate zZjJ arguments for the different compression methods. – wingedsubmariner Apr 30 '14 at 00:59
  • @wingedsubmariner The BSD tar manpage doesn't say it supports -j, but it (at least on mac) does. – Kevin Apr 30 '14 at 01:13
  • @wingedsubmariner: I don't know if the BSD tar on Mac is modified by Apple or not, but it supports zZjJ as well. Even though the man page does not mention the -J flag, it actually accepts -J and outputs an xz file. – Siyuan Ren Apr 30 '14 at 02:58
  • 2
    Just read the BSD tar manpage, and it turns out I was mistaken, BSD tar uses separate zZjJ for compression just like GNU tar. However, it does automatically detect compression when decompressing though, whereas GNU tar expects zZjJ then also. – wingedsubmariner Apr 30 '14 at 03:10
  • 5
    @wingedsubmariner: no; modern-ish versions of GNU tar decompress automatically without requiring the -zZjJ options. – Jonathan Leffler Apr 30 '14 at 04:02
  • @JonathanLeffler: My tar does – Engineer2021 Apr 30 '14 at 16:03
  • @staticx: Which version of GNU tar are you running, and on which platform? – Jonathan Leffler Apr 30 '14 at 16:04
  • @JonathanLeffler: RHEL 5. tar (GNU tar) 1.23 Copyright (C) 2010 Free Software Foundation, Inc. – Engineer2021 Apr 30 '14 at 16:05
  • @JonathanLeffler: I did tar cvfz test.tar.gz test.c ; tar xvf test.tar.gz and got test.c back – Engineer2021 Apr 30 '14 at 16:07
  • @staticx: curious! GNU tar 1.26 on Ubuntu 12.04 doesn't, but I'm tolerably certain I'd have to go back further than 2010 to find a version that doesn't decompress at least some file types automatically. The gzip automatic decompression has been around a long time, AFAICR (meaning, mostly, I don't remember when it was added, but it was quite a long time ago). Periodically, new compression formats were released (.bz2, .lz, .xz, .7z) and for a while I needed to hold tar's hand with --use-compress-program=whatever as an option. The set of compression formats evolves, therefore. – Jonathan Leffler Apr 30 '14 at 16:11
  • @staticx: OK; that's consistent with 'decompresses automatically'. You do have to tell it which 'compress' to use (either by flag or possibly by file extension); that won't change. – Jonathan Leffler Apr 30 '14 at 16:12
  • @JonathanLeffler: Yes, sorry I may have misconstrued your sentence. I thought you were implying that you had to use xvfz when in fact it will detect the file extension and try that. – Engineer2021 Apr 30 '14 at 16:12
  • @JonathanLeffler: This also works: tar cvfz test.tar ; tar xvf test.tar. – Engineer2021 Apr 30 '14 at 16:19
  • @staticx: as a point of detail, it works by content rather than extension (or as well as extension). Try: tar -czf /tmp/junk.tar.bz2 *.*, then file /tmp/junk.tar.bz2, and tar -tvf /tmp/junk.tar.bz2. – Jonathan Leffler Apr 30 '14 at 16:19
  • @JonathanLeffler: Right, I figured there is a header that it reads to determine the type since relying on the .gz, .bz2, etc is unreliable. So it will decompress automatically – Engineer2021 Apr 30 '14 at 16:20
23

tar produces archives; compression is a separate functionality. However tar alone can reduce space usage when used on a large number of small files that are smaller than the filesystem's cluster size. If a filesystem uses 1kb clusters, even a file that contains a single byte will consume 1kb (plus an inode). A tar archive does not have this overhead.

BTW, an ISO file is not really "a file to hold files" - it's actually an image of an entire filesystem (one originally designed to be used on CDs) and thus its structure is considerably more complex.

  • 3
    Actually an empty file will not consume 1kb. A 1-1023 byte file will. – psusi Apr 30 '14 at 03:28
  • @psusi so for a file of bytes 1-1023 will consume 1024 always which results in wastage of 1023-1 bytes. – Shiplu Mokaddim May 14 '19 at 13:36
  • tar has significant alignment / block size overhead, due to its origin as a Tape Archiver. If a is an empty file, tar -cf a.tar a will create a 10240-byte file a.tar. You can use a hex editor or od to verify that most of the file is NUL (zero) bytes. – Clement Cherlin Sep 12 '22 at 15:59
4

The original UNIX tar command did not compress archives. As was mentioned in a comment, Solaris tar doesn't compress. Nor does HP-UX, nor AIX, FWIW. By convention, uncompressed archives end in .tar.

With GNU/Linux you get GNU tar. (You can install GNU tar on other UNIX systems.) By default it does not compress; however, it does compress the resulting archive with gzip (also by GNU) if you supply -z. The conventional suffix for gzipped files is .gz, so you'll often see tarballs (slang for a tar archive, usually implying it's been compressed) that end in .tar.gz. That ending implies tar was run, followed by gzip, e.g. tar cf - .|gzip -9v > archive.tar.gz. You'll also find archives ending in .tgz, e.g. tar czf archive.tgz ..

Edit: www.linfo.org/tar.html reminded me that GNU tar supports much more functionality than merely compressing with gzip, and it reminded me that the suffixes are more than plain conventions. They have built-in semantics. It also supports bzip2 (-j for .bz2) and old compress (-Z for .Z). Then I looked at the man page and was reminded that -a automatically maps your desired compression method based on suffix.

One other nit. As the Linux tar man page says, GNU produces info pages, not man pages, so to learn all about GNU tar, run info tar.

tbc0
  • 211
  • The GNU tar still doesn't handle compressions by itself, it just pipes to/from gzip, bzip2, compress and others. – ott-- Aug 06 '15 at 20:03
  • I had a look at the source. GNU tar handles compression! The implementation takes advantage of code reuse and sound UNIX user space architectural principles. "Just pipes" is understating the way compression is tightly integrated into the tool. The fact that it happens to fork helper programs is a technicality. If you want to defend "just pipes," then cite file names and line numbers and let's see which side the community takes. – tbc0 Aug 06 '15 at 21:15
  • It takes some days before I can check that source. – ott-- Aug 06 '15 at 21:24
1

tar utility does not compress until you give argument to do so [tar -z file name].

A J
  • 111