Grep, but for binaries

Question

I've got a fragment of an image file produced by data-recovery software. I suspect the complete original is somewhere on my home fileserver.

If this were a fragment of a text file, I could just grab a unique-looking fragment, run grep -r -l -F , and come back in a few hours for the answer. However, since this is a binary file, it's got all sorts of things that grep doesn't like (such as null bytes), and even if I can get past that, I don't know how to give grep input that isn't valid UTF-8.

How can I search for the original, preferably without writing my own search program?

(This is not a duplicate of this question: despite the likely-sounding title, that one is about finding strings in binary files, where I'm looking for binary data in binary files.)

The approach at this answer to "How to know if a text file is a subset of another" also works for binary files. — Stéphane Chazelas, Mar 01 '23 at 08:36

Gilles Quénot · Answer 1 · 2023-02-28T23:07:06.250

0

What I would do:

grep -a -r -l -F <fixed string> .

-a, --text
Process a binary file as if it were text; this is equivalent to the --binary-files=text option.

or

find . -type f -exec sh -c '
    strings "$1" | grep -lF <fixed pattern>
' sh {} \;

strings - print the sequences of printable characters in files

edited Feb 28 '23 at 23:07

answered Feb 28 '23 at 22:54

Gilles Quénot

33,867

In the general case, how would I get the "fixed pattern"? For this particular file, strings produces some fragments that look unique (as well as several chapters of an ebook -- data recovery does funny things sometimes), but I can't count on that happening with any arbitrary file. – Mark Feb 28 '23 at 23:09
I dunno what exactly you are trying to do. The commands I gave are meant to search binary files as requested – Gilles Quénot Feb 28 '23 at 23:14
The commands are half of what I need: how to search. The other half is "what to search for": how do I pull a piece out of the file I've got and tell grep "search for this"? – Mark Mar 01 '23 at 00:37
open the recovered fragment file with a text editor ... copy a section ... confirm that grep finds the recovered fragment file ... then search for the original file ... actually Notepad++ can do what you are asking – jsotola Mar 01 '23 at 01:08
@jsotola, the whole point of this question is that I'm dealing with a binary file, not a text file. Opening it in a text editor will give me nothing useful, what with all the nulls, control characters, and other non-text things in the file. – Mark Mar 01 '23 at 01:35
@Mark you may be confusing a text editor with a word processor ... download Notepadqq text editor ... use it to open a binary file ... search for \0 ... that should find a null – jsotola Mar 01 '23 at 03:12
What notepad++ have to do here? Related: https://unix.stackexchange.com/questions/80270/unix-linux-undelete-recover-deleted-files/98700#98700 – Gilles Quénot Mar 01 '23 at 07:05

MC68020 · Answer 2 · 2023-03-01T08:31:00.903

You could first dump the binary file using od :

I suggest using the -x and -w256 options in order to reduce the size of the file and the number of lines in order to maximize grep efficiency and necessarily the -A n option in order to remove the needless offset address, let's have :

od -x -A n -w256 yourbinary_fragment > pattern.txt

You could also make aggressive use of the -j -N and -w options or even reedit pattern.txt in order to reduce the number of lines to some strict minimum. (In order to significantly ease grep's work)

Then find for the files matching the patterns after being themselves dumped

find . -type f -exec sh -c '
    od -x -A n -w256 "$1" | grep -lFf pattern.txt
' sh {} \;

If using your machine for other purposes, I'd suggest to SCHED_BATCH that process.

score 0 · Answer 3 · answered Mar 01 '23 at 08:46

With perl and the Sys::Mmap module (in libsys-mmap-perl package on Debian):

fragment=/path/to/your/fragment
size=$(( $(wc -c < "$fragment") - 1 ))
find . -type f -size "+${size}c" -print0 | 
  perl -MSys::Mmap -l -0sne '
    BEGIN {
      open N, "<", $needle or die "$ARGV[0]: $!\n";
      mmap($n, 0, PROT_READ, MAP_SHARED, N);
    }
    if (open H, "<", $_) {
      mmap($h, 0, PROT_READ, MAP_SHARED, H);
      print if index($h, $n) >=0;
    } else {
      warn "$_: $!\n";
    }' -- -needle="$fragment"

score 0 · Answer 4 · answered Mar 01 '23 at 09:35

0

If you suspect one file is the first part of a different file, you could take the first few bytes from both files and compare these:

# Omit or change the bytes arguments as needed, see `man head`
head --bytes=1032 file1.bin > /tmp/file1.head.bin
head --bytes=1032 file2.bin > /tmp/file2.head.bin
diff --text /tmp/file.head.*

You could also visually look at the files using xxd /tmp/fil1.head.bin. Finally, programs like Meld or Beyond Compare show you visual side-by-side comparisons of the files.

answered Mar 01 '23 at 09:35

royarisse

101
2

The problem with this is that the beginnings of binary files tend to be very formulaic. For example, indexed Windows Bitmap files of a given size will tend to be identical for the first 1078 bytes. – Mark Mar 01 '23 at 21:50
For binaries you're indeed right, though it shouldn't be a problem for Image Files. Otherwise, it's simply a matter of grabbing the first 10k or so. – royarisse Mar 03 '23 at 15:27

Grep, but for binaries

4 Answers4

Linked