bzip2: Check file's decompressed size without actually decompressing it

Question

I have a big bzip2 compressed file and I need to check it's decompressed size without actually decompressing it (similar to gzip -l file.gz or xz -l file.xz). How can this be done using bzip2?

Is your concern the time it would take to decompress, or the space? There does not seem to be an explicit size stored in the file itself, and even if it was, it could be forged. (Caveat emptor: I only looked at the wikipedia page for about two minutes.) — Ulrich Schwarz, Oct 12 '19 at 11:34
Is there a way to determine the decompressed size of a .bz2 file?. TL;DR "No". — Chris Davies, Oct 12 '19 at 11:39

frostschutz · Accepted Answer · 2019-10-12T16:34:18.617

Like mentioned in the comments and linked answer, the only reliable way is to decompress (in a pipe) and do a byte count.

$ bzcat file.bz2 | wc -c
1234

Alternatively find some tool that does it without the superfluous pipe (could be slightly more efficient):

$ 7z t file.bz2
[...]
Everything is Ok
Size:       1234

This also applies to gzip and other formats. Although gzip -l file.gz prints a size, it can be a wrong result. Once the file is past a certain size, you get stuff like:

$ gzip --list foobar.gz 
         compressed        uncompressed  ratio uncompressed_name
           97894400            58835168 -66.4% foobar
$ gzip --list foobar.gz 
         compressed        uncompressed  ratio uncompressed_name
         4796137936                   0   0.0% foobar

Or if the file was concatenated or simply not created correctly:

$ truncate -s 1234 foobar
$ gzip foobar
$ cat foobar.gz foobar.gz > barfoo.gz
$ gzip -l barfoo.gz 
         compressed        uncompressed  ratio uncompressed_name
                 74                1234  96.0% barfoo
$ zcat barfoo.gz | wc -c
2468

The size does not match so this is not reliable in any way.

Sometimes you can cheat, depending on what's inside the archive. For example if it's a compressed filesystem image, with a metadata header at the start, you could decompress just that header then read total filesystem size from it.

$ truncate -s 1234M foobar.img
$ mkfs.ext2 foobar.img
$ bzip2 foobar.img
$ bzcat foobar.img.bz2 | head -c 1M > header.img
$ tune2fs -l header.img
tune2fs 1.45.4 (23-Sep-2019)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          95b64880-c4a7-4bea-9b63-6fdcc86d0914
[...]
Block count:              315904
Block size:               4096

So by extracting a tiny part you learn that this is 315904 blocks of 4096 bytes, which comes out as 1234 MiB.

There's no guarantee that would be the actual size of the compressed file (it could be larger or smaller) but assuming no weird stuff, it's more trustworthy than gzip -l in any case.

Last but not least if those files are created by you in the first place, just record the size.

score 2 · Answer 2 · answered Oct 12 '19 at 12:27

This question has already been answered here. Pasted below:

As noted by others, bzip2 doesn't provide much information. But this technique works -- you will have to decompress the file, but you won't have to write the decompressed data to disk, which may be a "good enough" solution for you:

$ ls -l foo.bz2
-rw-r--r-- 1 ~quack ~quack 2364418 Jul  4 11:15 foo.bz2

$ bzcat foo.bz2 | wc -c         # bzcat decompresses to stdout, wc -c counts bytes
2928640                         # number of bytes of decompressed data

You can pipe that output into something else to give you a human-readable form:

$ ls -lh foo.bz2
-rw-r--r-- 1 quack quack 2.3M Jul  4 11:15 foo.bz2

$ bzcat foo.bz2 | wc -c | perl -lne 'printf("%.2fM\n", $_/1024/1024)'
2.79M

bzip2: Check file's decompressed size without actually decompressing it

2 Answers2

Linked