I have a big bzip2
compressed file and I need to check it's decompressed size without actually decompressing it (similar to gzip -l file.gz
or xz -l file.xz
). How can this be done using bzip2
?

- 2,473
-
Is your concern the time it would take to decompress, or the space? There does not seem to be an explicit size stored in the file itself, and even if it was, it could be forged. (Caveat emptor: I only looked at the wikipedia page for about two minutes.) – Ulrich Schwarz Oct 12 '19 at 11:34
-
1Is there a way to determine the decompressed size of a .bz2 file?. TL;DR "No". – Chris Davies Oct 12 '19 at 11:39
2 Answers
Like mentioned in the comments and linked answer, the only reliable way is to decompress (in a pipe) and do a byte count.
$ bzcat file.bz2 | wc -c
1234
Alternatively find some tool that does it without the superfluous pipe (could be slightly more efficient):
$ 7z t file.bz2
[...]
Everything is Ok
Size: 1234
This also applies to gzip and other formats. Although gzip -l file.gz
prints a size, it can be a wrong result. Once the file is past a certain size, you get stuff like:
$ gzip --list foobar.gz
compressed uncompressed ratio uncompressed_name
97894400 58835168 -66.4% foobar
$ gzip --list foobar.gz
compressed uncompressed ratio uncompressed_name
4796137936 0 0.0% foobar
Or if the file was concatenated or simply not created correctly:
$ truncate -s 1234 foobar
$ gzip foobar
$ cat foobar.gz foobar.gz > barfoo.gz
$ gzip -l barfoo.gz
compressed uncompressed ratio uncompressed_name
74 1234 96.0% barfoo
$ zcat barfoo.gz | wc -c
2468
The size does not match so this is not reliable in any way.
Sometimes you can cheat, depending on what's inside the archive. For example if it's a compressed filesystem image, with a metadata header at the start, you could decompress just that header then read total filesystem size from it.
$ truncate -s 1234M foobar.img
$ mkfs.ext2 foobar.img
$ bzip2 foobar.img
$ bzcat foobar.img.bz2 | head -c 1M > header.img
$ tune2fs -l header.img
tune2fs 1.45.4 (23-Sep-2019)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 95b64880-c4a7-4bea-9b63-6fdcc86d0914
[...]
Block count: 315904
Block size: 4096
So by extracting a tiny part you learn that this is 315904 blocks of 4096 bytes, which comes out as 1234 MiB.
There's no guarantee that would be the actual size of the compressed file (it could be larger or smaller) but assuming no weird stuff, it's more trustworthy than gzip -l
in any case.
Last but not least if those files are created by you in the first place, just record the size.

- 48,978
This question has already been answered here. Pasted below:
As noted by others, bzip2 doesn't provide much information. But this technique works -- you will have to decompress the file, but you won't have to write the decompressed data to disk, which may be a "good enough" solution for you:
$ ls -l foo.bz2
-rw-r--r-- 1 ~quack ~quack 2364418 Jul 4 11:15 foo.bz2
$ bzcat foo.bz2 | wc -c # bzcat decompresses to stdout, wc -c counts bytes
2928640 # number of bytes of decompressed data
You can pipe that output into something else to give you a human-readable form:
$ ls -lh foo.bz2
-rw-r--r-- 1 quack quack 2.3M Jul 4 11:15 foo.bz2
$ bzcat foo.bz2 | wc -c | perl -lne 'printf("%.2fM\n", $_/1024/1024)'
2.79M

- 156