64

I've always used GNU tar. However, all GNU/Linux distributions that I've seen ship bsdtar in their repositories. I've even seen it installed by default in some, IIRC. I know for sure that Arch GNU/Linux requires it as a part of basedevel (maybe base, but I'm not sure), as I've seen it in PKGBUILDs.

Why would you want to use bsdtar instead of GNU tar? What are the advantages?

Note that I am the person who asked What are the main differences between BSD and GNU/Linux userland?.

strugee
  • 14,951

6 Answers6

40

The Ubuntu bsdtar is actually the tar implementation bundled with libarchive; and that should be differentiated from classical bsdtar. Some BSD variants do use libarchive for their tar implementation, eg FreeBSD.

GNUtar does support the other tar variants and automatic compression detection.

As visualication pasted the blurb from Ubuntu, there are a few things in there that are specific to libarchive:

  1. libarchive is by definition a library, and different from both classical bsdtar and GNUtar in that way.
  2. libarchive cannot read some older obscure GNU tar variations, most notable was encoding of some headers in base64, so that the tar file would be 7-bit clean ASCII (this was the case for 1.13.6-1.13.11 and changed in 1.13.12, that code was only officially in tar for 2 weeks)
  3. libarchive's bsdtar will read non-tar files (eg zip, iso9660, cpio), but classical bsdtar will not.

Now that we've gotten libarchive out of the way, it mostly comes down to what is supported in classical bsdtar.

You can see the manpages yourself here:

In your original question, you asked what are the advantages to the classical bsdtar, and I'm not sure there are really any. The only time it really matters is if you're trying to writing shell scripts that need to work on all systems; you need to make sure what you pass to tar is actually valid in all variants.

GNUtar, libarchive's bsdtar, classical bsdtar, star and BusyBox's tar are certainly the tar implementations that you'll run into most of the time, but I'm certain there are others out there (early QNX for example). libarchive/GNUtar/star are the most feature-packed, but in many ways they have long deviated from the original standards (possibly for the better).

Kusalananda
  • 333,661
robbat2
  • 3,639
26

BSDTAR vs TAR plus much more

Here is one benefit!!

I'm going to go into 5 topics here (and go way off topic, but it will cover what you want as well):

  1. bsdtar vs tar
  2. sparse files vs not
  3. thick and thin files/luns with btrfs
  4. thick and thin files/luns without btrfs
  5. diff between thick and thin and how it doesn't apply to just luns

bsdtar handles sparse files better then regular tar

  • bsdtar will take all of the zeros and just metadata them up
  • tar will actually processes every zero

*example: imagine a 20 tb sparse file (called biglun) with 10 megs of data throughout the 20 tb sparsefile (biglun)... now since this is a sparse file it will only take up 10 megs on the drive.

How to make a sparse file:

Sparse File - how to make it - detect it - everything Sparse files are like "thin" luns (if you were to use it for a lun). "thick" luns would be different story.

*back to topic:

  • taring up the biglun will make tar go through all of 10 megs along with all of the ~20tb worse of zeroes spread across the lun... it will take some time I presume, and the tar file will be pretty big. Also -- extracting it -- I've never done an extract of a tar file of a sparse file, but it might not be pretty; I might be wrong here.

  • bsdtarring the biglun will just process the 10 megs of data, and make small metadata for the ~20tb of zeros.

Benefit? Well lots of them; I just wrote some above.

It's similar to rsync vs cp

  • Also, if you rsync a giant sparse file, it will behave like tar
  • If you cp a giant file, it will behave automatically like bsdtar (you can change cp'ss behaviour to go over the zeroes, or not go over the zeroes)

Personally, I like to imagine sparse files like thin luns, and regular files like thick luns...

Next topic is BTRFS thin vs thick luns:

  • With filesystems like BTRFS, thin luns are sparse files (make it with truncate, like in the wiki doc).

     truncate -s <size in kilobytes> filename
    

    tip: backup with bsdtar, copy with cp

  • thick luns are regular files with the +C attribute (+C so that it makes it none COW, copy on write, so that all writes essentially stick around to where it's allocated to, and no new writes happen for that file when there are overwrites or deletes - research COW and BTRFS). Instead of making the file with truncate, make it with "fallocate -l "

    fallocate -l <size in kilobytes> filename
    chattr +C filename
    

    tip: backup with bsdtar or tar, copy with rsync or cp

next topic is EXT thin vs thick luns:

  • thin luns which are sparse

    truncate -s <size in kilobytes> filename
    

    tip: backup with bsdtar, copy with cp

  • thick luns are regular files with the +C attribute (+C so that it makes it none COW, copy on write, so that all writes essentially stick around to where its allocated to, and no new writes happen for that file when there are overwrites or deletes - research COW and BTRFS). Instead of making the file with truncate, make it with "fallocate -l "

    touch filename
    fallocate -l <size in kilobytes> filename
    

    tip: backup with bsdtar or tar, copy with rsync or cp

whats a thick vs thin file

  • thick luns/files, fill up their data from 0 to the size allotted, metadata pretends where the 0s are. as you fill up data, the data fills up
  • thick luns/files: fill up their data at the start with 0s or whatever (lazy zero or eager zero) - these set reservations (or as ZFS like to call refreservations)

VMWARE ARTICLE HERE describes lazy vs eager zero with thick luns/files: https://communities.vmware.com/message/2199576

tip

remember thick and thin doesn't just apply to luns, it can also be on files, zfs filesystems (shares/volumes/luns), and I'm sure other things (just look at zfs).

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
kobbsoss
  • 369
  • 2
    Nice and thorough. Welcome to the site... – eyoung100 Nov 10 '14 at 21:31
  • 2
  • Sparse with any tar: Just pass -S to most tar implementations, they've all supported it for a long time.
  • Sparse with rsync: again, pass --sparse, it works.
  • The downside to using any sparse detection is that the tool has to actually read the blocks more, which can introduce a lot of CPU (esp in cases of alternating zero/non-zero runs).

    – robbat2 Jun 08 '15 at 23:11
  • It's still better to use bsdtar, even though gnu tar supports the sparse flag, because bsdtar knows how to skip over sparse holes, without processing them (e.g. if you have a 1 TB sparse file with only 1k of data, bsdtar will process 1k of data. Gnu tar will process 1TB. – moveaway00 Sep 07 '15 at 17:24
  • @moveaway00 if bsdtar only processes 1k in that case, then it copies the ideas from star, as star is the first implementation that uses SEEK_DATA/SEEK_HOLE, since that method was invented as a common idea from star and the ZFS guys. – schily Aug 18 '21 at 06:28
  • I learned quite a bit of workflow-altering information reading this, thank you. – Matt Alexander Nov 06 '21 at 17:49