I need to delete the last line of gz file without uncompressing. The file has 500 lines.
How can I do that?
I have tried:
gzip -dc "$files" | tail -500 | gzip -c > "$files".tmp
But It doesn´t works.
I need to delete the last line of gz file without uncompressing. The file has 500 lines.
How can I do that?
I have tried:
gzip -dc "$files" | tail -500 | gzip -c > "$files".tmp
But It doesn´t works.
Assuming from your example that decompressing to a stream is fine, but you want to avoid decompressing to a file. You should be able to
gzip -cd "$files" | sed -e '$d' | gzip > "$files".tmp
using sed
to go to the last line and delete it.
gzip -d
) as @Eric and @ANOUK have shown in their code is that for a large file this could be slow, or if there were special metadata (surely not) in the original .gz
file that would be excluded from the following gzip
command.
– 700 Software
Jul 08 '16 at 17:53
gzip
formats the compressed data, a portion (page?) of the .gz
file would have to be decompressed before removing the last line. Perhaps it is possible to do this without decompressing the entire file, but I am not sure. I would be interested to know if there is a utility that is capable of doing this. I don't know whether metadata even exists in a .gz
file, but if so there should be an easy way to copy it.
– 700 Software
Jul 08 '16 at 17:58
You can't modify a compressed file without decompressing it.
At the very least, to delete all text after the 499th line, you have to decompress the first 499 lines to find where the 499th line ends. If you want to delete the last line regardless of how many lines there are, you need to decompress the whole file to identify where the last line starts.
There is no shortcut because the file is compressed. The encoding of a character depends on all the previous characters — the basic principle of gzip compression is to use shorter bit sequences for character sequences that have been encountered previously, and slightly longer bit sequences for character sequences that haven't been encountered yet, thus yielding a smaller file when character sequences are repeated. There's no way to determine that a particular character is a line break without examining all the previous characters.
Your attempt, which decompresses the file, works on the decompressed stream, and recompresses to another file, is on the right track. You just need the correct command to truncate the file: tail -500
keeps the last 500 lines, which isn't what you want. Use head -n 499
to keep the first 499 lines, or head -n -1
to remove the last line. Not all systems support a negative argument for head
; if yours doesn't, you can use sed '$d'
instead.
gunzip <"$file" | head -n -1 | gzip >"$file".tmp
mv -- "$file".tmp "$file"
Note that you can't directly write to file: would start overwriting the file while gunzip <"$file" | … | gzip >"$file"
gunzip
is still reading it. The commands in a pipeline are executed in parallel. While it's possible to avoid creating a temporary file, it's a bad idea, because any way to do that would result in a truncated file if the command is interrupted, so I won't discuss how to do it.
In theory, it would be possible to truncate a gzipped file by:
However this can't be done with standard tools, it would take some custom programming, and it would leave an invalid file if it was interrupted.
You can use zcat
.
zcat <file> | head -n <lines>
Only decompresses enough to stream those n
lines.
Further reading: http://www.thegeekstuff.com/2009/05/zcat-zless-zgrep-zdiff-zcmp-zmore-gzip-file-operations-on-the-compressed-files/
Building on @Eric Renouf's answer,(sorry, this is too long for a comment), to keep the original timestamp and filename metadata in the file, wrap it with:
gzip -cd "$file" | sed -e '$d' > "$file.tmp"
touch -r "$file" "$file.tmp"
# optionally keep the old file
# mv "$files" "$file.old"
mv "$file.tmp" "$file"
gzip "$file"
Or, since there's an uncompressed file just sitting there, use a xz
instead of gzip
to recompress it. Better compression, and often faster.
tail -500
will get you the last 500 lines which is obviously not what you're asking to do. Perhapshead
would be more suitable. Note that your question also supposes that you don't uncompress it but you do (and have to). – Julie Pelletier Jul 08 '16 at 17:21It´s not working
– ANOUK_prog Jul 08 '16 at 17:30gzip -d
actually does decompress, but your question says you cannot decompress. You can tell by looking atman gzip
what the-d
option is doing. – 700 Software Jul 08 '16 at 17:51