Why does gzipping a file on stdin yield a smaller output than the same file given as an argument?

Question

When I do:

# gzip -c foo > foo1.gz 
# gzip < foo > foo2.gz

Why does foo2.gz end up being smaller in size than foo1.gz?

Bratchley · Accepted Answer · 2017-04-04T03:00:29.063

Because it's saving the filename and timestamp so that it can try to restore both after you decompress it later. Since foo is given to gzip via <stdin> in your second example, it can't store the filename and timestamp information.

From the manpage:

   -n --no-name
          When compressing, do not save the original file name and time stamp by default. (The original name is always saved if the name had
          to  be truncated.) When decompressing, do not restore the original file name if present (remove only the gzip suffix from the com-
          pressed file name) and do not restore the original time stamp if present (copy it from the compressed file). This  option  is  the
          default when decompressing.

   -N --name
          When compressing, always save the original file name and time stamp; this is the default. When decompressing, restore the original
          file name and time stamp if present. This option is useful on systems which have a limit on file name  length  or  when  the  time
          stamp has been lost after a file transfer.

I've recreated the issue here:

[root@xxx601 ~]# cat /etc/fstab > file.txt
[root@xxx601 ~]# gzip < file.txt > file.txt.gz
[root@xxx601 ~]# gzip -c file.txt > file2.txt.gz
[root@xxx601 ~]# ll -h file*
-rw-r--r--. 1 root root  465 May 17 19:35 file2.txt.gz
-rw-r--r--. 1 root root 1.2K May 17 19:34 file.txt
-rw-r--r--. 1 root root  456 May 17 19:34 file.txt.gz

In my example, file.txt.gz is the equivalent of your foo2.gz. Using the -n option disables this behavior when it otherwise would have access to the information:

[root@xxx601 ~]# gzip -nc file.txt > file3.txt.gz
[root@xxx601 ~]# ll -h file*
-rw-r--r--. 1 root root  465 May 17 19:35 file2.txt.gz
-rw-r--r--. 1 root root  456 May 17 19:43 file3.txt.gz
-rw-r--r--. 1 root root 1.2K May 17 19:34 file.txt
-rw-r--r--. 1 root root  456 May 17 19:34 file.txt.gz

As you can see above, the file sizes for file.txt and file3.txt match since they're now both omitting name and date.

Why does gzipping a file on stdin yield a smaller output than the same file given as an argument?

1 Answers1