26

Does the hash of a file change if the filename or path or timestamp or permissions change?

$ echo some contents > testfile
$ shasum testfile 
3a2be7b07a1a19072bf54c95a8c4a3fe0cdb35d4  testfile
tarabyte
  • 4,296
  • 3
    If you need it to, then you can zip it. – ctrl-alt-delor Aug 03 '15 at 22:48
  • SHA-1 & MD5 are broken but can be used. But, you will very rarely find any false negative. Most of the people prefers SHA-2 family at least. https://crypto.stackexchange.com/questions/1434/are-there-two-known-strings-which-have-the-same-md5-hash-value https://www.hacksandsecurity.org/posts/two-images-have-same-md5-hash-md5-collision-example – P Satish Patro Nov 05 '19 at 10:55

2 Answers2

33

The hash of a file is the hash of its contents. Metadata such as the file name, timestamps, permissions, etc. have no influence on the hash.

Assuming a non-broken cryptographic hash, two files have the same hash if and only if they have the same contents. The most common such hashes are the SHA-2 family (SHA-256, SHA-384, SHA-512) and the SHA3 family. This does not include MD5 or SHA-1 which are broken, nor a CRC such as with cksum which is not a cryptographic hash.

  • 7
    In general, all hashes have collisions. A non-broken cryptographic hash implies that there is no efficient way to generate a collision. – Tyson Williams Jan 23 '19 at 15:29
  • @TysonWilliams That's true but irrelevant. If you have two files with the same hash, and the hash is a non-broken cryptographic hash, then the two files have the same hash if and only if they have the same contents. If you could find a collision, it wouldn't be a non-broken cryptographic hash. – Gilles 'SO- stop being evil' Jan 23 '19 at 22:23
  • 4
    What you said is false. Every (practical) hash function, even non-broken cryptographic ones, have collisions. There are more inputs than outputs, so by the pigeonhole principle, there must be a collision. – Tyson Williams Jan 23 '19 at 22:29
  • @TysonWilliams But I never claimed that there were no collisions. Of course there are collisions. But 1. it's extremely rare to find collisions for what is generally considered to be a non-broken cryptographic hash function, and 2. if someone did find a collision then that function would no longer be a non-broken cryptographic hash function. In practical terms, if you have two files with the same SHA-256 hash, they have identical contents. – Gilles 'SO- stop being evil' Jan 23 '19 at 22:33
  • 2
    The statement "f you have two files with the same hash, and the hash is a non-broken cryptographic hash, then the two files have the same hash if and only if they have the same contents." is equivalent to "non-broken cryptographic hash [functions] have no collisions". I agree that "In practical terms, if you have two files with the same SHA-256 hash, they have identical contents." – Tyson Williams Jan 23 '19 at 22:36
  • @Gilles'SO-stopbeingevil' Your second paragraph is superfluous and not correct. MD5 works just as your first paragraph describes. – scottlittle Dec 09 '20 at 02:12
  • @scottlittle What do you think is incorrect? It is possible (and easy) to create two files with the same MD5 hash. – Gilles 'SO- stop being evil' Dec 09 '20 at 08:53
  • @Gilles'SO-stopbeingevil' I stated what is incorrect: "MD5 works just as your first paragraph describes." Meaning that even if MD5 is "broken" as you say, it does not invalidate the fact that it is a hash of the file's contents. In practicality, MD5 is used because it is readily available on Unix/Linux (md5sum) and quick. Also, MD5 is commonly used to validate integrity of files sent between two companies in a professional setting as of 2020. – scottlittle Dec 09 '20 at 15:03
  • @scottlittle MD5 is a hash of the file's content, true. This does not contradict anything I say. An MD5 sum does not, in general, guarantee that the file has a specific content in practice. This is only true if the file was not deliberately crafted to allow later changes in the content. Using MD5 or SHA-1 to validate the integrity of files is not good practice, and makes you vulnerable to some attacks. SHA2 is as widely available and the performance difference is negligible. – Gilles 'SO- stop being evil' Dec 09 '20 at 15:34
  • @Gilles'SO-stopbeingevil' It's true that MD5 cannot guarantee that a file has specific content, but neither can SHA1 or SHA2- it's just a difference in probability. Also, the way that companies use MD5 to verify integrity does not contain a security risk because the underlying file being transferred is itself encrypted with GPG, for example. – scottlittle Dec 09 '20 at 16:27
  • @scottlittle It's not just a difference in probability. All are suitable to defend against accidental changes, because the probability that an accidental change to a file would result in the same hash is negligible. SHA2 is suitable to defend against deliberate modification after the fact, but MD5 and SHA-1 are not. Encryption is completely irrelevant. The use of MD5 to verify integrity has lead to severe vulnerabilities. – Gilles 'SO- stop being evil' Dec 09 '20 at 19:57
  • Encryption is not irrelevant- I was pointing out a method that MD5 is used by companies without concern for any compromise in security. Let's say an attacker uncovers the underlying file through a vulnerability in MD5, they still would have to break the GPG encryption! My point is that MD5 as a checksum is perfectly safe for most use cases of a general purpose hash. Your answer is deceptive for unnecessarily pointing people away from using MD5. Plus, it does nothing to answer the original question. – scottlittle Dec 09 '20 at 20:10
  • @scottlittle If you aren't concerned about the security of MD5, you don't understand the security. An attack would come from whoever prepared the file, not from a man-in-the-middle during the communication. Using MD5 allows the preparer to craft a file, then switch it to another file with the same MD5 later without being detected. This is a type of attack that people tend to forget about, but that can have devastating consequences. Using SHA-256 or SHA-512 avoids this potential attack. There is no excuse for relying on MD5 in 2020. – Gilles 'SO- stop being evil' Dec 09 '20 at 22:11
18

Not as far as I can tell after a simple test.

$ echo some contents > testfile
$ shasum testfile 
3a2be7b07a1a19072bf54c95a8c4a3fe0cdb35d4  testfile
$ mv testfile newfile
$ shasum newfile 
3a2be7b07a1a19072bf54c95a8c4a3fe0cdb35d4  newfile
tarabyte
  • 4,296
  • 3
    But note that if you blindly compare the outputs of shasum, they will not match since the output includes the filename/path (as shown in your example). A good workaround is to do something like shasum - < testfile. – DoxyLover Aug 03 '15 at 22:53