0

I am in the process of learning bash, and I need to compare two almost identical text files (only a few bits are flipped) and output the amount of bits that are the same. In other words, I need to compare bits, not characters. Reading through the bash documentation, I came across the comm and diff commands, but they seem to compare the files line by line and not bit by bit. Any help would be greatly appreciated.

ilkkachu
  • 138,973
  • 3
    If a single bit is flipped, a character would be different between the files. Comparing characters seems then to be the first step. – Kusalananda Nov 11 '21 at 16:50
  • 2
    Not sure how you could find comm and diff in the bash configurations, those have nothing to do with bash. – Stéphane Chazelas Nov 11 '21 at 16:52
  • As Stéphane pointed out, comm and diff are not in any way connected to or related to bash. Do you need to do this in bash, for some reason, or are you open to external tools like comm and diff? Also, what kind of files are you comparing? Text? Binary? Something else? ASCII? Unicode? Can we assume one bit per character? – terdon Nov 11 '21 at 17:03
  • 1
    Not clear on "output the amount of bits that are the same". Do you just want to count the bits that are the same and/or different ? Or do you want to show the places where the bits are different? – Paul_Pedant Nov 11 '21 at 17:46
  • What is required when the files are of different lengths: are the excess bytes all counted as bit-flips (compared to zero) ? – Paul_Pedant Nov 13 '21 at 00:24
  • I want to thank everyone who took the time to read and reply to my post. What I posted was a straight copy-paste of the instructions. I emailed my professor asking for further clarification but she did not reply (6 days ago). I decided to skip this question rather than answer incorrectly as you lose double the points if it is incorrect (yes, she is sadistic). – lostintranslation Nov 20 '21 at 00:36

3 Answers3

2

Assuming you mean bytes and not bits, you can use cmp (from man cmp):

NAME

cmp - compare two files byte by byte

Using these two files as an example:

$ cat file1
The quick brown fox jumped over the lazy dog.

$ cat file2 The quick flown fax jumped over the hazy log.

You can do:

$ cmp -lb file1 file2
11 142 b    146 f
12 162 r    154 l
18 157 o    141 a
37 154 l    150 h
42 144 d    154 l

Alternatively, you could use fold to print one byte per line and pass that to diff:

$ diff <(fold -b1 file1) <(fold -b1 file2)
11,12c11,12
< b
< r
---
> f
> l
18c18
< o
---
> a
37c37
< l
---
> h
42c42
< d
---
> l
terdon
  • 242,166
  • Terdon, I want to thank you for taking the time to read and reply to my post. I emailed my professor asking for further clarification but she did not reply (6 days ago). I decided to skip this question rather than answer incorrectly as you lose double the points if it is incorrect (yes, she is sadistic). – lostintranslation Nov 20 '21 at 00:38
1

Sample files:

$ cat tmp1 tmp2
unix and linux
uniq and lynux

You can use xxd with diff as follows:

$ diff <(xxd -c1 tmp1) <(xxd -c1 tmp2)
4c4
< 00000003: 78  x
---
> 00000003: 71  q
11c11
< 0000000a: 69  i
---
> 0000000a: 79  y

With -b option to examine bits:

$ diff <(xxd -c1 -b tmp1) <(xxd -c1 -b tmp2)
4c4
< 00000003: 01111000  x
---
> 00000003: 01110001  q
11c11
< 0000000a: 01101001  i
---
> 0000000a: 01111001  y
Pandya
  • 24,618
  • Pandya, I want to thank you for taking the time to read and reply to my post. I emailed my professor asking for further clarification but she did not reply (6 days ago). I decided to skip this question rather than answer incorrectly as you lose double the points if it is incorrect (yes, she is sadistic). – lostintranslation Nov 20 '21 at 00:38
1

perl -lne 'BEGIN{$/=\8192}; print for split "", unpack("b*", $_)' can print each bit of a file (starting with the least significant bit in each byte) as 0 and 1 characters, one per line.

Once you have that for each file, you can paste it side-by-side with paste -d '\0' for instance. Then you can just count the numbers of 00 an 11 with grep -xce 00 -e 11 to count the number of bits they have in common.

bits() {
  perl -lne 'BEGIN{$/=\8192}; print for split "", unpack("b*", $_)' "$1"
}
paste -d '\0' <(bits file1) <(bits file2) | grep -xce 00 -e 11
  • Stephane, I want to thank you for taking the time to read and reply to my post. I emailed my professor asking for further clarification but she did not reply (6 days ago). I decided to skip this question rather than answer incorrectly as you lose double the points if it is incorrect (yes, she is sadistic). – lostintranslation Nov 20 '21 at 00:38