How to count the number of bytes in a file, grouping the same bytes?

Question

Example: I have the file "mybinaryfile", and the contents in hex are:

A0 01 00 FF 77 01 77 01 A0

I need to know how many A0 bytes there are in this file, how many 01, and so on. The result could be:

A0: 2
01: 3
00: 1
FF: 1
77: 2

Is there some way to make this count directly in shell or do I need to write a program in whatever language to do this specific task?

Looking at answers, this seems to be a worthy codegolf ;) – val - disappointed in SE Jun 29 '19 at 15:31 — val - disappointed in SE, Jun 29 '19 at 15:31

Stephen Kitt · Accepted Answer · 2019-06-28T17:51:55.853

19

This uses od to show one hex value per line, then sorts and counts:

od -t x1 -w1 -v -An mybinaryfile | sort | uniq -c

(-w1 is an extension, it’s not mandated by POSIX.)

edited Jun 28 '19 at 17:51

answered Jun 28 '19 at 17:00

Stephen Kitt

434,908

Alternatives to od are: xxd -c1 -p file and/or hexdump -v -e '/1 "%02X \n"' file. – Jun 29 '19 at 21:00

score 5 · Answer 2 · edited Jun 28 '19 at 18:22

Using Perl to unpack the slurped file to a byte array and then use a hash to count unique bytes:

printf '\xA0\x01\x00\xFF\x77\x01\x77\x01\xA0' | 
  perl -0777 -nE '
    @bytes = unpack("C*",$_) 
    }{ 
    $counts{$_}++ for @bytes; 
    for $k (sort { $a <=> $b } keys %counts) {
      printf "%02X: %d\n", $k, $counts{$k}
    }
 '
00: 1
01: 3
77: 2
A0: 2
FF: 1

If a sufficiently recent version of List::MoreUtils is available, you may be able to simplify the counting by using its frequency function.

score 1 · Answer 3 · edited Jun 11 '20 at 14:16

Quick Python solution:

#!/usr/bin/env python3
import sys, itertools, collections
print(
    *itertools.starmap(
        "{:02X}: {:d}".format,
        collections.Counter(sys.stdin.detach().read()).items()),
    sep="\n")

One-liner:

python3 -c 'import sys, itertools, collections; print(*itertools.starmap("{:02X}: {:d}".format, collections.Counter(sys.stdin.detach().read()).items()), sep="\n")' \
    < input.bin

Options and caveats

If you want to the output sorted by frequency in descending order, replace .items() with .most_common(). Alternatively or for other sorting schemes, use the built-in sorted() function or post-process the output with the sort(1) program.
In its current state, the programs slurps the entire standard-input data into a byte buffer which is fine for relatively small files. For larger files, the program needs to be rewritten to read files in chunks.

score 1 · Answer 4 · answered Jun 29 '19 at 16:17

< my_binary_file xxd -p | fold -w 2 | sort | uniq -c
```
   1 00
   3 01
   2 77
   2 a0
   1 ff
```
< my_binary_file xxd -p | fold -w 2 | sort | uniq -c | awk '{print $2": "$1}'
```
00: 1
01: 3
77: 2
a0: 2
ff: 1
```

Explanation

< my_binary_file passes the contents of my_binary_file to the standard input of the xxd command.
xxd -p converts the data read from its standard input in a hexadecimal dump, and the modifier -p (plain) tells the program to output only the digits without offsets nor textual representation.
fold -w 2 inserts a newline character every two characters (-w 2), converting the input stream to a newline-separated byte list.
sort, as the name suggests, sorts the lines grouping the byte values.
uniq -c counts the occurrences of each value in the input data.
Optionally, a bit of awk magic converts the output format to the requested in the original post.

score 1 · Answer 5 · answered Mar 31 '20 at 11:56

If the file is very large you can count and sort as you go

od -t x1 -w1 -v -An binaryfile |
    awk '{h[$1]++} END {for (v in h) {printf "%d\t%s\n", h[v], v} }' |
    sort -k2

If you need a POSIX solution

od -t x1 -v -An binaryfile |
    tr ' ' '\n' |
    awk '$1 > "" { h[$1]++ } END { for (v in h) {printf "%d\t%s\n", h[v], v} }' |
    sort -k2

How to count the number of bytes in a file, grouping the same bytes?

5 Answers5

Options and caveats

Explanation

Linked

Related