5

Example: I have the file "mybinaryfile", and the contents in hex are:

A0 01 00 FF 77 01 77 01 A0

I need to know how many A0 bytes there are in this file, how many 01, and so on. The result could be:

A0: 2
01: 3
00: 1
FF: 1
77: 2

Is there some way to make this count directly in shell or do I need to write a program in whatever language to do this specific task?

Lawrence
  • 329

5 Answers5

19

This uses od to show one hex value per line, then sorts and counts:

od -t x1 -w1 -v -An mybinaryfile | sort | uniq -c

(-w1 is an extension, it’s not mandated by POSIX.)

Stephen Kitt
  • 434,908
  • Alternatives to od are: xxd -c1 -p file and/or hexdump -v -e '/1 "%02X \n"' file. –  Jun 29 '19 at 21:00
5

Using Perl to unpack the slurped file to a byte array and then use a hash to count unique bytes:

printf '\xA0\x01\x00\xFF\x77\x01\x77\x01\xA0' | 
  perl -0777 -nE '
    @bytes = unpack("C*",$_) 
    }{ 
    $counts{$_}++ for @bytes; 
    for $k (sort { $a <=> $b } keys %counts) {
      printf "%02X: %d\n", $k, $counts{$k}
    }
 '
00: 1
01: 3
77: 2
A0: 2
FF: 1

If a sufficiently recent version of List::MoreUtils is available, you may be able to simplify the counting by using its frequency function.

steeldriver
  • 81,074
1

Quick Python solution:

#!/usr/bin/env python3
import sys, itertools, collections
print(
    *itertools.starmap(
        "{:02X}: {:d}".format,
        collections.Counter(sys.stdin.detach().read()).items()),
    sep="\n")

One-liner:

python3 -c 'import sys, itertools, collections; print(*itertools.starmap("{:02X}: {:d}".format, collections.Counter(sys.stdin.detach().read()).items()), sep="\n")' \
    < input.bin

Options and caveats

  • If you want to the output sorted by frequency in descending order, replace .items() with .most_common(). Alternatively or for other sorting schemes, use the built-in sorted() function or post-process the output with the sort(1) program.

  • In its current state, the programs slurps the entire standard-input data into a byte buffer which is fine for relatively small files. For larger files, the program needs to be rewritten to read files in chunks.

1
  • < my_binary_file xxd -p | fold -w 2 | sort | uniq -c

       1 00
       3 01
       2 77
       2 a0
       1 ff
    
  • < my_binary_file xxd -p | fold -w 2 | sort | uniq -c | awk '{print $2": "$1}'

    00: 1
    01: 3
    77: 2
    a0: 2
    ff: 1
    

Explanation

  • < my_binary_file passes the contents of my_binary_file to the standard input of the xxd command.
  • xxd -p converts the data read from its standard input in a hexadecimal dump, and the modifier -p (plain) tells the program to output only the digits without offsets nor textual representation.
  • fold -w 2 inserts a newline character every two characters (-w 2), converting the input stream to a newline-separated byte list.
  • sort, as the name suggests, sorts the lines grouping the byte values.
  • uniq -c counts the occurrences of each value in the input data.
  • Optionally, a bit of awk magic converts the output format to the requested in the original post.
0x2b3bfa0
  • 257
1

If the file is very large you can count and sort as you go

od -t x1 -w1 -v -An binaryfile |
    awk '{h[$1]++} END {for (v in h) {printf "%d\t%s\n", h[v], v} }' |
    sort -k2

If you need a POSIX solution

od -t x1 -v -An binaryfile |
    tr ' ' '\n' |
    awk '$1 > "" { h[$1]++ } END { for (v in h) {printf "%d\t%s\n", h[v], v} }' |
    sort -k2
Chris Davies
  • 116,213
  • 16
  • 160
  • 287