14

I have a job on a batch system that runs extremely long and produces tons of output. So much actually that I have to pipe the standard output through gzip to keep the batch node from filling its work area and subsequently crashing.

longscript | gzip -9 > log.gz

Now, I would like to investigate the output of the job while it is still running. So I do this:

gunzip log.gz

This runs very long, as it is huge file (several GB). I can see the output file being created while it is running and can look at it while it is being built.

tail log
> some-line-of-the-log-file
tail log
> some-other-line-of-the-log-file

However, ultimately, gzip encounters the end of the gzipped file. Since the job is still running and gzip is still writing the file, there is no proper footer yet, so this happens:

gzip: log.gz: unexpected end of file

After this, the extracted log file is deleted, as gzip thinks that the corrupted extracted data is of no use to me. I, however, disagree - even if the last couple of lines are scrambled, the output is still highly interesting to me.

How can I convince gzip to let me keep the "corrupted" file?

carsten
  • 355

3 Answers3

12

Apart from the very end of the file, you will be able to see the uncompressed data with zcat (or gzip -dc, or gunzip -c):

zcat log.gz | tail

or

zcat log.gz | less

or

zless log.gz

gzip will do buffering for obvious reasons (it needs to compress the data in chunks), so even though the program may have outputted some data, that data may not yet be in the log.gz file.

You may also store the uncompressed log with

zcat log.gz > log

... but that would be silly since there's obviously a reason why you compress the output in the first place.

Kusalananda
  • 333,661
2

If I understand correctly, you'd like to do something like tail -f with the still growing gzip file: I've developed gztool which can do that (among other things):

$ gztool -T log.gz

and it will output to console continuously, waiting for new data when it is necessary.

Note that gztool will also create an index file (log.gziin this case) that will make future tails or other random accesses to the gzip data with gztool almost instantaneous. If you do not want to create an index (even though it is 0.3%/gzip size and do not increase processing time) you can use -W to not create it.

circulosmeos
  • 281
  • 1
  • 4
  • 4
0

You can try to split the file and gzip each of it: https://stackoverflow.com/a/2016918/3090950

Anyway, could you run the command in verbose mode? This will provide you more information.

Neil
  • 121
  • 2