4

I have a large, repetitive text file. It compresses very well - it's about 3MB compressed. However, if decompressed, it takes 1.7GB. Since it's repetitive, I only need a fraction of the output to check the contents of the file.

It was compressed using gzip. Does gunzip provides any way to only decompress the first few megs of a file?

Vitor Py
  • 1,912
  • 2
  • 19
  • 24
  • Duplicate, but on other StackExchange site - https://stackoverflow.com/questions/23676116/is-partial-gz-decompression-possible –  Jun 01 '17 at 19:37

1 Answers1

7

You could decompress to standard output and feed it through something like head to only capture a bit of it:

gunzip -c file.gz | head -c 20M >file.part

The -c flag to head requires the head implementation that is provided by GNU coreutils.

dd may also be used:

gunzip -c file.gz | dd of=file.part bs=1M count=20

Both of these pipelines will copy the first 20 MiB of the uncompressed file to file.part.

Kusalananda
  • 333,661
  • What if I wanted to skip decompressing the first x bytes and decomporess e.g. the last 20M to check the tail looks fine? I could just use tail -c 20M, but that would end up decompressing the entire file and just throwing away everything except the last 20M. – Timo May 26 '21 at 16:59
  • @Timo You can't generally start decompressing a compressed file in the middle. If this is something that is really important to you, you may want to ask a separate question about it. – Kusalananda May 26 '21 at 17:59
  • Yeah, so it seems. Perhaps not worth a separate question as it seems rather grim on this department. Gotta compress in smalle chunks to begin with or ditch compression altogether. Thanks anyways. – Timo May 26 '21 at 18:33