2

I need to delete the last line of gz file without uncompressing. The file has 500 lines.

How can I do that?

I have tried:

 gzip -dc "$files" | tail -500 | gzip -c > "$files".tmp

But It doesn´t works.

  • Do it step by step and then you can automate it in a single command. tail -500 will get you the last 500 lines which is obviously not what you're asking to do. Perhaps head would be more suitable. Note that your question also supposes that you don't uncompress it but you do (and have to). – Julie Pelletier Jul 08 '16 at 17:21
  • zcat myfile.dat.gz | head -499 myfile.dat.gz > mynewokfile.dat.gz

    It´s not working

    – ANOUK_prog Jul 08 '16 at 17:30
  • 1
    In your 'I have tried' code, gzip -d actually does decompress, but your question says you cannot decompress. You can tell by looking at man gzip what the -d option is doing. – 700 Software Jul 08 '16 at 17:51

4 Answers4

6

Assuming from your example that decompressing to a stream is fine, but you want to avoid decompressing to a file. You should be able to

gzip -cd "$files" | sed -e '$d' | gzip > "$files".tmp

using sed to go to the last line and delete it.

Eric Renouf
  • 18,431
  • Hello , decompresssing is not fine – ANOUK_prog Jul 08 '16 at 17:28
  • 1
    So your example is not on the path to an acceptable solution? I think you're going to have a tough time modifying a gzip file without decompressing it – Eric Renouf Jul 08 '16 at 17:30
  • @ANOUK_prog: What's the problem with the proposed solution which is similar to what you tried and deletes the last line as requested? – Julie Pelletier Jul 08 '16 at 17:35
  • 1
    The only problem I see with decompressing (gzip -d) as @Eric and @ANOUK have shown in their code is that for a large file this could be slow, or if there were special metadata (surely not) in the original .gz file that would be excluded from the following gzip command. – 700 Software Jul 08 '16 at 17:53
  • @GeorgeBailey yeah, I only went with this approach because the original question suggested that decompressing the file, operating on it and recompressing was not acceptable, if preserving the metadata is important this will not do – Eric Renouf Jul 08 '16 at 17:56
  • Because of the way gzip formats the compressed data, a portion (page?) of the .gz file would have to be decompressed before removing the last line. Perhaps it is possible to do this without decompressing the entire file, but I am not sure. I would be interested to know if there is a utility that is capable of doing this. I don't know whether metadata even exists in a .gz file, but if so there should be an easy way to copy it. – 700 Software Jul 08 '16 at 17:58
  • 3
    @GeorgeBailey At least a little meta data is stored. The man page includes "By default, gzip keeps the original file name and timestamp in the compressed file." but yeah, something would have to decompress at least a little bit of the file to modify it I would think, and I'm not (yet) aware of anything that will modify the compressed file in place without full decompression – Eric Renouf Jul 08 '16 at 18:04
6

You can't modify a compressed file without decompressing it.

At the very least, to delete all text after the 499th line, you have to decompress the first 499 lines to find where the 499th line ends. If you want to delete the last line regardless of how many lines there are, you need to decompress the whole file to identify where the last line starts.

There is no shortcut because the file is compressed. The encoding of a character depends on all the previous characters — the basic principle of gzip compression is to use shorter bit sequences for character sequences that have been encountered previously, and slightly longer bit sequences for character sequences that haven't been encountered yet, thus yielding a smaller file when character sequences are repeated. There's no way to determine that a particular character is a line break without examining all the previous characters.

Your attempt, which decompresses the file, works on the decompressed stream, and recompresses to another file, is on the right track. You just need the correct command to truncate the file: tail -500 keeps the last 500 lines, which isn't what you want. Use head -n 499 to keep the first 499 lines, or head -n -1 to remove the last line. Not all systems support a negative argument for head; if yours doesn't, you can use sed '$d' instead.

gunzip <"$file" | head -n -1 | gzip >"$file".tmp
mv -- "$file".tmp "$file"

Note that you can't directly write to file: gunzip <"$file" | … | gzip >"$file" would start overwriting the file while gunzip is still reading it. The commands in a pipeline are executed in parallel. While it's possible to avoid creating a temporary file, it's a bad idea, because any way to do that would result in a truncated file if the command is interrupted, so I won't discuss how to do it.

In theory, it would be possible to truncate a gzipped file by:

  1. uncompressing it in memory to determine the position where you want to truncate it;
  2. truncating the file to remove all data after the last character to keep;
  3. overwrite the last few bytes to correctly encode the last character;
  4. overwrite a few bytes at the beginning to reflect the new file size.

However this can't be done with standard tools, it would take some custom programming, and it would leave an invalid file if it was interrupted.

2

You can use zcat.

zcat <file> | head -n <lines>

Only decompresses enough to stream those n lines.

Further reading: http://www.thegeekstuff.com/2009/05/zcat-zless-zgrep-zdiff-zcmp-zmore-gzip-file-operations-on-the-compressed-files/

1

Building on @Eric Renouf's answer,(sorry, this is too long for a comment), to keep the original timestamp and filename metadata in the file, wrap it with:

gzip -cd "$file" | sed -e '$d' > "$file.tmp"
touch -r "$file" "$file.tmp"
# optionally keep the old file
# mv "$files" "$file.old"
mv "$file.tmp" "$file"
gzip "$file"

Or, since there's an uncompressed file just sitting there, use a xz instead of gzip to recompress it. Better compression, and often faster.

cas
  • 78,579