What are the differences between bsdtar and GNU tar?

Question

I've always used GNU tar. However, all GNU/Linux distributions that I've seen ship bsdtar in their repositories. I've even seen it installed by default in some, IIRC. I know for sure that Arch GNU/Linux requires it as a part of basedevel (maybe base, but I'm not sure), as I've seen it in PKGBUILDs.

Why would you want to use bsdtar instead of GNU tar? What are the advantages?

Note that I am the person who asked What are the main differences between BSD and GNU/Linux userland?.

This answer to Why is extracting this tgz throwing an error on my Mac but not on Linux? on Apple.SE is also relevant to this question. — Adam Liter, Jul 29 '15 at 04:26
GNU tar options include --sort, --mtime, and so on, whereas bsdtar does not appear to support those options. This is important e.g. in the context of reproducible builds. — djvg, Nov 16 '23 at 10:37
Also see https://www.gnu.org/software/tar/manual/tar.html#Reproducibility — djvg, Nov 16 '23 at 11:27
On Windows 10, BSD tar 3.5.2 does not properly handle paths with non-ascii characters (regardless of codepage settings and pax format), whereas GNU tar 1.35 does. E.g. an archive created with tar -cf Ł.tar ends up as L.tar. Reproduced in cmd, powershell, and (git-)bash. On Ubuntu (bash) I do not see this issue (gnu tar 1.34 and bsdtar 3.6.0). — djvg, Mar 15 '24 at 10:25

score 40 · Accepted Answer · edited Jun 27 '18 at 20:13

The Ubuntu bsdtar is actually the tar implementation bundled with libarchive; and that should be differentiated from classical bsdtar. Some BSD variants do use libarchive for their tar implementation, eg FreeBSD.

GNUtar does support the other tar variants and automatic compression detection.

As visualication pasted the blurb from Ubuntu, there are a few things in there that are specific to libarchive:

libarchive is by definition a library, and different from both classical bsdtar and GNUtar in that way.
libarchive cannot read some older obscure GNU tar variations, most notable was encoding of some headers in base64, so that the tar file would be 7-bit clean ASCII (this was the case for 1.13.6-1.13.11 and changed in 1.13.12, that code was only officially in tar for 2 weeks)
libarchive's bsdtar will read non-tar files (eg zip, iso9660, cpio), but classical bsdtar will not.

Now that we've gotten libarchive out of the way, it mostly comes down to what is supported in classical bsdtar.

You can see the manpages yourself here:

GNU tar(1)
FreeBSD tar(1) - libarchive-based
NetBSD tar(1)
OpenBSD tar(1)
Standard/Schily tar(1) - the oldest free tar implementation, no heritage to any other
busybox (1) - Mini tar implementation for BusyBox, common in embedded systems

In your original question, you asked what are the advantages to the classical bsdtar, and I'm not sure there are really any. The only time it really matters is if you're trying to writing shell scripts that need to work on all systems; you need to make sure what you pass to tar is actually valid in all variants.

GNUtar, libarchive's bsdtar, classical bsdtar, star and BusyBox's tar are certainly the tar implementations that you'll run into most of the time, but I'm certain there are others out there (early QNX for example). libarchive/GNUtar/star are the most feature-packed, but in many ways they have long deviated from the original standards (possibly for the better).

score 26 · Answer 2 · edited Feb 24 '18 at 00:23

BSDTAR vs TAR plus much more

Here is one benefit!!

I'm going to go into 5 topics here (and go way off topic, but it will cover what you want as well):

bsdtar vs tar
sparse files vs not
thick and thin files/luns with btrfs
thick and thin files/luns without btrfs
diff between thick and thin and how it doesn't apply to just luns

bsdtar handles sparse files better then regular tar

bsdtar will take all of the zeros and just metadata them up
tar will actually processes every zero

*example: imagine a 20 tb sparse file (called biglun) with 10 megs of data throughout the 20 tb sparsefile (biglun)... now since this is a sparse file it will only take up 10 megs on the drive.

How to make a sparse file:

Sparse File - how to make it - detect it - everything Sparse files are like "thin" luns (if you were to use it for a lun). "thick" luns would be different story.

*back to topic:

taring up the biglun will make tar go through all of 10 megs along with all of the ~20tb worse of zeroes spread across the lun... it will take some time I presume, and the tar file will be pretty big. Also -- extracting it -- I've never done an extract of a tar file of a sparse file, but it might not be pretty; I might be wrong here.
bsdtarring the biglun will just process the 10 megs of data, and make small metadata for the ~20tb of zeros.

Benefit? Well lots of them; I just wrote some above.

It's similar to rsync vs cp

Also, if you rsync a giant sparse file, it will behave like tar
If you cp a giant file, it will behave automatically like bsdtar (you can change cp'ss behaviour to go over the zeroes, or not go over the zeroes)

Personally, I like to imagine sparse files like thin luns, and regular files like thick luns...

Next topic is BTRFS thin vs thick luns:

With filesystems like BTRFS, thin luns are sparse files (make it with truncate, like in the wiki doc).
```
 truncate -s <size in kilobytes> filename
```
tip: backup with bsdtar, copy with cp
thick luns are regular files with the +C attribute (+C so that it makes it none COW, copy on write, so that all writes essentially stick around to where it's allocated to, and no new writes happen for that file when there are overwrites or deletes - research COW and BTRFS). Instead of making the file with truncate, make it with "fallocate -l "
```
fallocate -l <size in kilobytes> filename
chattr +C filename
```
tip: backup with bsdtar or tar, copy with rsync or cp

next topic is EXT thin vs thick luns:

thin luns which are sparse
```
truncate -s <size in kilobytes> filename
```
tip: backup with bsdtar, copy with cp
thick luns are regular files with the +C attribute (+C so that it makes it none COW, copy on write, so that all writes essentially stick around to where its allocated to, and no new writes happen for that file when there are overwrites or deletes - research COW and BTRFS). Instead of making the file with truncate, make it with "fallocate -l "
```
touch filename
fallocate -l <size in kilobytes> filename
```
tip: backup with bsdtar or tar, copy with rsync or cp

whats a thick vs thin file

thick luns/files, fill up their data from 0 to the size allotted, metadata pretends where the 0s are. as you fill up data, the data fills up
thick luns/files: fill up their data at the start with 0s or whatever (lazy zero or eager zero) - these set reservations (or as ZFS like to call refreservations)

VMWARE ARTICLE HERE describes lazy vs eager zero with thick luns/files: https://communities.vmware.com/message/2199576

tip

remember thick and thin doesn't just apply to luns, it can also be on files, zfs filesystems (shares/volumes/luns), and I'm sure other things (just look at zfs).

It's still better to use bsdtar, even though gnu tar supports the sparse flag, because bsdtar knows how to skip over sparse holes, without processing them (e.g. if you have a 1 TB sparse file with only 1k of data, bsdtar will process 1k of data. Gnu tar will process 1TB. — moveaway00, Sep 07 '15 at 17:24
@moveaway00 if bsdtar only processes 1k in that case, then it copies the ideas from star, as star is the first implementation that uses SEEK_DATA/SEEK_HOLE, since that method was invented as a common idea from star and the ZFS guys. — schily, Aug 18 '21 at 06:28
I learned quite a bit of workflow-altering information reading this, thank you. — Matt Alexander, Nov 06 '21 at 17:49

score 14 · Answer 3 · edited Jun 01 '20 at 06:00

From the Ubuntu package description:

The bsdtar program has a number of advantages over previous tar implementations:

Library. Since the core functionality is in a library, it can be used by other tools, such as pkg_add.

Automatic format detection. Libarchive automatically detects the compression (none/gzip/bzip2) and format (old tar, ustar, gnutar, pax, cpio, iso9660, zip) when reading archives. It does this for any data source.

Pax Interchange Format Support. This is a POSIX/SUSv3 extension to the old "ustar" tar format that adds arbitrary extended attributes to each entry. Does everything that GNU tar format does, only better.

Handles file flags, ACLs, arbitrary pathnames, etc. Pax interchange format supports key/value attributes using an easily-extensible technique. Arbitrary pathnames, group names, user names, file sizes are part of the POSIX standard; libarchive extends this with support for file flags, ACLs, and arbitrary device numbers.

GNU tar support. Libarchive reads most GNU tar archives. If there is demand, this can be improved further.

score 2 · Answer 4 · answered Jan 05 '14 at 10:00

The following is based on reading, not experience -- I am just starting out with Freebsd so I have almost no real experience with it (I'm coming from mostly Linux). I apologize (and humbly solicit correction) if I've missed something important and what I say here is rubbish ...

From my reading of the manual pages (most recently the one ref'd above http://www.freebsd.org/cgi/man.cgi?query=tar&sektion=1 ) the Freebsd tar lacks the ( -d, --diff, --compare) capability. This is not surprising, as the authors of Freebsd dump/restore don't seem to have provided anything like this either.

I do not know for certain whether the Gnu tar will incorporate all the UFS metadata as Freebsd tar is said to do, and this is an important issue. But for my taste, I can NEVER consider a dump to be completed until I have stored an MD5 sum of the output file, AND THEN compared the dump file against the data I've just supposedly dumped. Various problems can lead to the dumped data being different from what is on disk. (Not just file changes, but disk errors, memory errors, machine faults, and so on. All of which have actually happened to me.)

In my own opinion, this makes Gnu tar the only option I've so far found for creating true backups on a stock Freebsd system.

I would dearly love to learn otherwise, FWIW. I'd prefer to use the native utilities at least for partition cloning and hard-recovery backups. But if one can't verify the correctness of a dump I don't see the point in bothering to create one.

score 1 · Answer 5 · answered Nov 10 '17 at 02:03

1

bsdtar can read and tar members coming from other archives using the @archive syntax
GNU tar has the --delete option — though recently, I found that that it may corrupt the archive.

answered Nov 10 '17 at 02:03

bart

111

I'm using the --delete option with GNU tar often. Could you provide in example where it would corrupt the archive so that I can prevent this from happening? – josch Jan 13 '20 at 20:38

score 0 · Answer 6 · edited Feb 25 '21 at 15:21

0

From experience (albeit on Mac, which is a Unix based system), bsdtar can automatically detect the type of compression of a file (when used as bsdtar xf - ), while tar/gtar requires the user to specify the type of compression present in the file

edited Feb 25 '21 at 15:21

AdminBee

22,803

answered Feb 25 '21 at 15:07

Matthew Barclay

101

What are the differences between bsdtar and GNU tar?

6 Answers6

BSDTAR vs TAR plus much more

Next topic is BTRFS thin vs thick luns:

next topic is EXT thin vs thick luns:

whats a thick vs thin file

tip

Linked