When I do:
# gzip -c foo > foo1.gz
# gzip < foo > foo2.gz
Why does foo2.gz
end up being smaller in size than foo1.gz
?
Because it's saving the filename and timestamp so that it can try to restore both after you decompress it later. Since foo
is given to gzip
via <stdin>
in your second example, it can't store the filename and timestamp information.
From the manpage:
-n --no-name
When compressing, do not save the original file name and time stamp by default. (The original name is always saved if the name had
to be truncated.) When decompressing, do not restore the original file name if present (remove only the gzip suffix from the com-
pressed file name) and do not restore the original time stamp if present (copy it from the compressed file). This option is the
default when decompressing.
-N --name
When compressing, always save the original file name and time stamp; this is the default. When decompressing, restore the original
file name and time stamp if present. This option is useful on systems which have a limit on file name length or when the time
stamp has been lost after a file transfer.
I've recreated the issue here:
[root@xxx601 ~]# cat /etc/fstab > file.txt
[root@xxx601 ~]# gzip < file.txt > file.txt.gz
[root@xxx601 ~]# gzip -c file.txt > file2.txt.gz
[root@xxx601 ~]# ll -h file*
-rw-r--r--. 1 root root 465 May 17 19:35 file2.txt.gz
-rw-r--r--. 1 root root 1.2K May 17 19:34 file.txt
-rw-r--r--. 1 root root 456 May 17 19:34 file.txt.gz
In my example, file.txt.gz
is the equivalent of your foo2.gz
. Using the -n
option disables this behavior when it otherwise would have access to the information:
[root@xxx601 ~]# gzip -nc file.txt > file3.txt.gz
[root@xxx601 ~]# ll -h file*
-rw-r--r--. 1 root root 465 May 17 19:35 file2.txt.gz
-rw-r--r--. 1 root root 456 May 17 19:43 file3.txt.gz
-rw-r--r--. 1 root root 1.2K May 17 19:34 file.txt
-rw-r--r--. 1 root root 456 May 17 19:34 file.txt.gz
As you can see above, the file sizes for file.txt
and file3.txt
match since they're now both omitting name and date.