1

I just (today is 31-jan-2019) downloaded some random source.tar.gz and extracted it. When I do stat on the resulting folder, I see:

Access: 2019-01-31 10:37:26.308991485 +0100
Modify: 2018-11-18 00:02:35.000000000 +0100
Change: 2019-01-31 10:36:03.881185889 +0100
 Birth: -

Modify date is in the past, before I even had my laptop. Could someone explain how this timestamp is useful?

It's very un-intuitive, and I would expect all times (Change/Access/Modify) to be bounded within my file system life time. (also one of the most useful dates, aka time which the particular file/folder first appeared, is absent, but yes I know I can get it with Birth is empty on ext4)

dgan
  • 264

2 Answers2

2

I am basing this answer on GNU tar, for which the manual is available here.

tar keeps the modification time of a file when an archive is created. This allows tar to perform certain operations that depend upon the timestamp of a file.

  1. The first use is for the 'update' operation:

-u, --update

Append files which are newer than the corresponding copy in the archive.

From the manual:

The --update operation updates a tar archive by comparing the date of the specified archive members against the date of the file with the same name. If the file has been modified more recently than the archive member, then the newer version of the file is added to the archive (as with --append).

Note that the update operation actually results in the file being appended, not overwritten. This is because of historical reasons, namely the difficulty of writing to the middle section of a tape.

  1. A second use is when extracting an archive over files that already exist on disk. If the existing file is newer, you can ask tar to keep it with the --keep-newer-files option:

--keep-newer-files

Don't replace existing files that are newer than their archive copies.

  1. tar archives were initially used to store files conveniently on magnetic tapes. From a long-term storage perspective, a tar archive can represent the state of a set of files at a specific point in time. Logically, this should include the timestamp of the contents of that archive. The modification timestamp is a reliable measure of when the file's contents were modified, as opposed to atime (which changes if you read the file) or ctime (which can change when tar 'fixes' any metadata on an extracted file).

  2. Because the modification time exists inside the archive, you could also compare that information against a file on the existing file system using the -d/--diff option:

The --compare (-d), or --diff operation compares specified archive members against files with the same names, and then reports differences in file size, mode, owner, modification date and contents.

Finally, if you really need to ignore the modification timestamp when extracting an archive, the -m/--touch option can do that:

-m, --touch

Do not extract data modification time. When this option is used, tar leaves the data modification times of the files it extracts as the times when the files were extracted, instead of setting it to the times recorded in the archive.

Haxiel
  • 8,361
1

With GNU tar you can extract the contents of an archive and set the modification time to the current date/time.

tar --touch -xvf source.tar.gz

This should allow you to achieve what you seek.

The default behavior is the default. I would always want the mtime of a file to be set to the instance of its last update, regardless of what server it came from and regardless of where the containing filesystem was created. Utilities like rsync rely (in part) on the ability to compare timestamps between servers and filesystems.

JRFerguson
  • 14,740