Say I've got two binary files, each roughly 50MB, that may contain a digitally-identical portion. Is there an easy way to compare these two files, find the longest identical portion, and save that to a file?
(If it matters, the files in question are stream-capture AAC files. They've got some audio boilerplate that sounds identical, but it may or may not be digitally identical. So I've got a two part task: first determine if the boilerplate is digitally identical, second extract the identical portion.)
rsync
has at its core a clever technique to locate same and ":changed" regions of two usually large files using only small information from one of them, accessed across a potentially costly network, see https://en.wikipedia.org/wiki/Rsync#Algorithm . The program has accreted options to manage syncing whole trees of files across many kinds of transports and platforms, but you might be able to extract the core algorithm and use it. Or go back to Tridgell's original work and build forward. – dave_thompson_085 Sep 29 '15 at 07:08