1

Suppose I want to do a search to find out if I have a file that matches the sha256 generated from the file test1.txtusing the command:

sha256sum -b test1.txt

I get as output:

e3d748fdf10adca15c96d77a38aa0447fa87af9c297cb0b75e314cc313367daf * test1.txt

So, I want to find the files that match the checksum generated instead of using the name.

Is this possible?

Zanna
  • 3,571
MarianoM
  • 161
  • @GAD3R You're right, the answers are related. But here they have also contributed ideas and good answers. Can not the commands be merged and adapted so as not to create confusion? Mark it as resolved and that these answers are not seen, I think a lack of respect towards those who have spent their time responding. – MarianoM Nov 09 '18 at 08:37
  • Welcome , You can vote up and accept the best answer. – GAD3R Nov 09 '18 at 08:40
  • 1
    @GAD3R Thank you! If only I had a reputation ... anyway, later I will mark the answer of the solution, in case someone else wants to contribute some information. – MarianoM Nov 09 '18 at 08:50

3 Answers3

2
find . -type f -exec sha256sum -b {} + | 
grep -F 'e3d748fdf10adca15c96d77a38aa0447fa87af9c297cb0b75e314cc313367daf'

This would calculate the SHA256 checksum for each and every file in or under the current directory. The grep at the end would extract the results of the calculations that match the checksum that you are looking for.

If the result of the find operation was diverted to a file, it could serve as a "database" that you could use for doing multiple lookups on with grep. If some extra logic was added, you could make a cron job that periodically refreshed this file with information from new and updated files and removed old information (this was not really what this question was about, so I'm leaving any code out for the time being). With not so much extra effort, you could even do this against a simple SQLite database.

Related to the syntax of the find command:

Kusalananda
  • 333,661
  • Excellent! It works perfect. Now I do not understand the use of the keys {} well. I was reading a bit more but I found that "it can be used as a placeholder for each file that locates the search command" what does that mean? Does it refer to the coloring of the text or some other reason? I tried inserting a route / test and accepted it. This confuses me even more. It's just a curiosity to learn more about the parameters used. – MarianoM Nov 09 '18 at 08:33
  • @MarianoM It has nothing to do with colouring. The {} will be replaced by pathnames to found files. See update answer (with link to Understanding the -exec option of `find`) – Kusalananda Nov 09 '18 at 08:56
  • Excellent guide! Obviously the command has many uses, I'll have to do some practices. No more questions, Thank you! – MarianoM Nov 09 '18 at 09:13
1

Normally you won't have a database containing the sha256 sum of every file, so the only way would involve calculating the sha256 sum of every file (stopping if you find a match). That's a very heavy and time-consuming operation, so for practical purposes the answer in most cases is no.

  • Yes, I agree with you, but anyway if you want to keep the files complete, I suppose that is the only way to do it. Anyway, it's not to be doing this every day, I understand that. – MarianoM Nov 09 '18 at 07:19
1

Yes, this is possible, but only really in a brute-force way, by checksumming all the files in your system and comparing them to your signature.

(This us, in fact, how file de-duplicators work, by checksumming all the files and looking for matches, which are strong candidates for files with identical contents.)

If you're considering looking up files by their checksums often, you might want to create an index mapping checksums to paths, which might save you the job of having to recalculate those checksums often. If you implement this index cleverly, you might be able to do incremental updates, only having to checksum new files or files that have been updated since the previous scan.

filbranden
  • 21,751
  • 4
  • 63
  • 86
  • Interesting idea! I was thinking of asking a new question, but I think here is the right place to do it. It occurred to me that in order to speed up the process and ease the loading of the disk, I would have to add the dates of modifications of all the files to the list. So, in this way, if I want to keep the sha256 list up to date, the system should only compare the dates and files it finds different, generate and add the new sha256 in the list. This would allow me to have a constantly updated list. Do you think this is possible? – MarianoM Nov 09 '18 at 07:16