I am in the process of learning bash, and I need to compare two almost identical text files (only a few bits are flipped) and output the amount of bits that are the same. In other words, I need to compare bits, not characters. Reading through the bash documentation, I came across the comm
and diff
commands, but they seem to compare the files line by line and not bit by bit. Any help would be greatly appreciated.

- 138,973
3 Answers
Assuming you mean bytes and not bits, you can use cmp
(from man cmp
):
NAME
cmp - compare two files byte by byte
Using these two files as an example:
$ cat file1
The quick brown fox jumped over the lazy dog.
$ cat file2
The quick flown fax jumped over the hazy log.
You can do:
$ cmp -lb file1 file2
11 142 b 146 f
12 162 r 154 l
18 157 o 141 a
37 154 l 150 h
42 144 d 154 l
Alternatively, you could use fold
to print one byte per line and pass that to diff
:
$ diff <(fold -b1 file1) <(fold -b1 file2)
11,12c11,12
< b
< r
---
> f
> l
18c18
< o
---
> a
37c37
< l
---
> h
42c42
< d
---
> l

- 242,166
-
Terdon, I want to thank you for taking the time to read and reply to my post. I emailed my professor asking for further clarification but she did not reply (6 days ago). I decided to skip this question rather than answer incorrectly as you lose double the points if it is incorrect (yes, she is sadistic). – lostintranslation Nov 20 '21 at 00:38
Sample files:
$ cat tmp1 tmp2
unix and linux
uniq and lynux
You can use xxd
with diff
as follows:
$ diff <(xxd -c1 tmp1) <(xxd -c1 tmp2)
4c4
< 00000003: 78 x
---
> 00000003: 71 q
11c11
< 0000000a: 69 i
---
> 0000000a: 79 y
With -b
option to examine bits:
$ diff <(xxd -c1 -b tmp1) <(xxd -c1 -b tmp2)
4c4
< 00000003: 01111000 x
---
> 00000003: 01110001 q
11c11
< 0000000a: 01101001 i
---
> 0000000a: 01111001 y

- 24,618
-
Pandya, I want to thank you for taking the time to read and reply to my post. I emailed my professor asking for further clarification but she did not reply (6 days ago). I decided to skip this question rather than answer incorrectly as you lose double the points if it is incorrect (yes, she is sadistic). – lostintranslation Nov 20 '21 at 00:38
perl -lne 'BEGIN{$/=\8192}; print for split "", unpack("b*", $_)'
can print each bit of a file (starting with the least significant bit in each byte) as 0
and 1
characters, one per line.
Once you have that for each file, you can paste it side-by-side with paste -d '\0'
for instance. Then you can just count the numbers of 00 an 11 with grep -xce 00 -e 11
to count the number of bits they have in common.
bits() {
perl -lne 'BEGIN{$/=\8192}; print for split "", unpack("b*", $_)' "$1"
}
paste -d '\0' <(bits file1) <(bits file2) | grep -xce 00 -e 11

- 544,893
-
Stephane, I want to thank you for taking the time to read and reply to my post. I emailed my professor asking for further clarification but she did not reply (6 days ago). I decided to skip this question rather than answer incorrectly as you lose double the points if it is incorrect (yes, she is sadistic). – lostintranslation Nov 20 '21 at 00:38
comm
anddiff
in the bash configurations, those have nothing to do withbash
. – Stéphane Chazelas Nov 11 '21 at 16:52comm
anddiff
are not in any way connected to or related to bash. Do you need to do this in bash, for some reason, or are you open to external tools likecomm
anddiff
? Also, what kind of files are you comparing? Text? Binary? Something else? ASCII? Unicode? Can we assume one bit per character? – terdon Nov 11 '21 at 17:03