0

Does the MD5SUM for a file change when I make a copy of the file?

Example:

$ md5sum file01.dmp
$ cp file01.dmp file02.dmp
$ md5sum file02.dmp

Shouldn't the two MD5SUMs match?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
oradbanj
  • 111
  • hd5 sum must match, can you give more details ? note also that filename part will of course not match. – Archemar Mar 04 '16 at 16:16
  • 3
    if cp was successful, the answer is 'yes'. Is that your actual question, though? Did something go wrong, or is this a theoretical question? – Jeff Schaller Mar 04 '16 at 16:21
  • @Jeff - the cp did not report errors. I had a directory with database dump files. I made a copy of the entire directory and when I compared the MD5SUMs - I got different values for same filename (in different directory). Thinking somehow the directory could be a factor, I then made copy of one of file in same directory and again got different MD5SUM. That was the reason behind my question. – oradbanj Mar 04 '16 at 16:42

2 Answers2

3

md5sum (and sha1sum and sha256sum etc.) compute a hash of the file's contents. This does not include the filename or any other metadata (like modification times). If two files have the same contents, then md5sum will generate the same hash for each. (Note that the output of md5sum consists of the hash and the filename. The hash will not change if you rename the file, but of course the filename portion of the output will change.)

It is always true that two files that generate different hashes have different contents. If a copied file generates a different hash, then the copy failed in some way, or one of the files was modified after the copy.

However, it is not necessarily true that two files that generate the same hash have the same contents. Since hashes are a fixed size, there are a large number of different files that will all generate the same hash. That is called a collision. But finding a collision is not supposed to be easy. (It is for MD5, though, which is why it's no longer considered secure. MD5 is still plenty good enough for detecting accidental file corruption, just not malicious modification.)

cjm
  • 27,160
1

The md5sum output will change because it reports on the filename itself in the last field. To strip the filename so that only the hash itself is output, use awk or cut:

md5sum filename | cut -d ' ' -f1
Otheus
  • 6,138