0

I need to extract a specific folder from .tar.bz2 (34G). The issue is that it takes 1 hour. I guess that this is due to the compression. I guess that w/o compression extraction of a specific folder will be faster.

Hence, is it possible to obtain .tar from .tar.bz2?

pmor
  • 599

3 Answers3

5

If your question is whether it is possible to decompress the related part for the specific folder, bzip2 doesn't index its compressed data, which means there's no way to jump directly to a specific byte in the decompressed data without processing what's before it. tar is also a sequential format without a central index.


If, however, your question is whether you can amortise the work beforehand by decompressing once and extracting a single folder multiple times, then yes, that is possible using the bzip2 command:

bzip2 -d foo.tar.bz2

This will then decompress to foo.tar.

Chris Down
  • 125,559
  • 25
  • 270
  • 266
  • And this is not limited to bzip2. – Artem S. Tashkinov Dec 27 '23 at 16:00
  • 1
    Bzip2 does use independent blocks for compression, see https://sourceware.org/bzip2/manual/manual.html#recovering – Stephen Kitt Dec 27 '23 at 16:22
  • 1
    But that doesn't help, because tar has no index up front. To find the part you want to decompress, you need to decompress the whole compressed file: everything befroe the file to know where it starts, and everything after the file, because tar allows for keeping the same file around multiple times, so that you the "later" copy of the same file overwrites the earlier one. Only when you decompressed the last file you can be sure there's no further copies of the file. – Marcus Müller Dec 27 '23 at 18:12
  • you can also do bzcat foo.tar.bz2 | tar -xvf - – Paige Thompson Dec 27 '23 at 22:41
  • @StephenKitt Thanks! Not sure why I mistakenly thought that bzip2 has interblock dependencies. I'll update the answer. – Chris Down Dec 28 '23 at 00:01
2

I guess that w/o compression extraction of a specific folder will be faster.

Sadly, that's impossible, due to the nature of compression and the tar file format. To know where a file is, you need to decompress the whole compressed file: everything before the file to know where it starts, and everything after the file, because tar allows for keeping the same file around multiple times, so that you the "later" copy of the same file overwrites the earlier one. Only when you decompressed the last file you can be sure there's no further copies of the file.

So, the only thing you can do is decompress faster, using the parallel bzip2 implementation pbzip2 (you might need to install that first!)

pbzip2 -d -c large.tar.bz2 | tar xf - path/to/specific/folder

For future archiving: there's things that compress as well or nearly as well as bzip2 and allow for much faster decompression. So, if this problem occurs more often, it might make sense to re-archive the whole thing using something that allows for faster decompression, and for selective extraction without having to decompress the whole archive; something like

pbzip2 -d -c large.tar.bz2 | sqfstar -comp zstd -xattrs -Xcompression-level=8 large.sqsh

(In addition to pbzip, you'll need sqfstar, which on most systems (fedora-based, debian-based) is part of the squashfs-tools package)

Bonus: these archives can directly be mounted, but you can also use a command line tool to get individual files from them.

udisksctl loop-setup -f large.sqsh # note the displayed block device name
udisksctl mount -b /dev/loop1234   # only if not automounted by previous command
0

if you want to extract a specific folder from file.tar.bz2 and then put it in the own target path directory:

tar -C /own/target_path/ -xvf file.tar.bz2 path/specific_folder_from_file_tar_bz2

The specific folder will be decompress in /own/target_path

Regards...