bgrep
I keep coming back to this random repo from time to time: https://github.com/tmbinc/bgrep
"Installation" as per README:
curl -L 'https://github.com/tmbinc/bgrep/raw/master/bgrep.c' | gcc -O2 -x c -o $HOME/.local/bin/bgrep -
Sample usage on a minimal example:
printf '\x01\x02abcdabcd' > myfile.bin
bgrep -B2 -A2 6263 myfile.bin
outputs:
myfile.bin: 00000003
\x02abc
myfile.bin: 00000007
dabc
because 6263
is bc
in ASCII, and that two byte sequence has matches at zero-indexed positions 3 and 7.
Let's see if it works on a large file that does not fit in the 32 GB memory of my Lnovo ThinkPad P51, tested on my SSD:
dd count=100M if=/dev/zero of=myfile.bin
printf '\x01\x02abcdabcd' >> myfile.bin
time bgrep -B2 -A2 6263 myfile.bin
output:
myfile.bin: c80000003
\x02abc
myfile.bin: c80000007
dabc
real 11m26.898s
user 1m32.763s
sys 9m53.756s
So it took a while but worked.
It is a bit annoying that doesn't support directly searching for plaintext characters, you have to give it a hex string. But we can convert as per https://stackoverflow.com/questions/2614764/how-to-create-a-hex-dump-of-file-containing-only-the-hex-characters-without-spac
bgrep `printf %s bc | od -t x1 -An -v | tr -d '\n '` myfile.bin
so a Bash alias would help:
bgrepa() {
pat=$1
shift
bgrep `printf %s "$pat" | od -t x1 -An -v | tr -d '\n '` "$@"
}
bgrepa bc -B2 -A2 myfile.bin
Regular expressions are not supported.
Tested on Ubuntu 23.04, bgrep 28029c9203d54f4fc9332d094927cd82154331f2.
grep --mmap ...
already? Real programmers would do that withvim -R -b 400gbfile
and then/pattern
. – ott-- Aug 13 '15 at 20:52grep --only-matching --byte-offset --binary
. The--only-matching
option can be implemented without buffering the whole line, but I don't know if your implementation does take advantage of this to actually save memory.--byte-offset
will indicate where the matching sequence starts in the binary data stream or blob. – Totor Oct 12 '21 at 13:19--mmap
at all. It's an option in BSD grep – phuclv Oct 16 '23 at 05:06