3

Hexdump's canonical format displays an ASCII translation of whatever it's looking at in the right column. I have a binary file, containing non-ASCII strings, for which I know some (not all) of the character set. Is there a way to tell hexdump to use a custom character set when producing its text column?

Andrew
  • 1,115

1 Answers1

2

For Debian, that's part of bsdmainutils, whose source is in git. The program attempts (there are some unspecified limitations) to display multibyte characters (e.g., UTF-8) for the -c option.

The source for that is in conv.c, which notes:

/*
 * Multibyte characters are disabled for hexdump(1) for backwards
 * compatibility and consistency (none of its other output formats
 * recognize them correctly).
 */

however, the code does this only if odmode is set, which is activated only when the executable is invoked as od. The actual od in Debian is the GNU version, which does not do this. You can get the multibyte feature by copying hexdump to (preferably not /usr/bin) od, and running that, e.g.,

~/bin/od -bc foo

As an example, from ncurses-examples, bulgarian.txt is

Показване на помощна информация -- 1
Създаване на дялове             -- 2
Избор на дял и форматиране      -- 3
Записване в избрания дял        -- 4
Инсталиране на LILO             -- 5
Изход от програмата             -- 6

displays in the GNU version as

0000000 320 237 320 276 320 272 320 260 320 267 320 262 320 260 320 275 
        320 237 320 276 320 272 320 260 320 267 320 262 320 260 320 275 
0000020 320 265 040 320 275 320 260 040 320 277 320 276 320 274 320 276 
        320 265     320 275 320 260     320 277 320 276 320 274 320 276 
0000040 321 211 320 275 320 260 040 320 270 320 275 321 204 320 276 321 
        321 211 320 275 320 260     320 270 320 275 321 204 320 276 321 
0000060 200 320 274 320 260 321 206 320 270 321 217 040 055 055 040 061
        200 320 274 320 260 321 206 320 270 321 217       -   -       1 
0000100 012 320 241 321 212 320 267 320 264 320 260 320 262 320 260 320 
         \n 320 241 321 212 320 267 320 264 320 260 320 262 320 260 320 
0000120 275 320 265 040 320 275 320 260 040 320 264 321 217 320 273 320
        275 320 265     320 275 320 260     320 264 321 217 320 273 320 
0000140 276 320 262 320 265 040 040 040 040 040 040 040 040 040 040 040 
        276 320 262 320 265
...

and in the BSD version as

0000000    П  **   о  **   к  **   а  **   з  **   в  **   а  **   н  **
0000020    е  **       н  **   а  **       п  **   о  **   м  **   о  **
0000040    щ  **   н  **   а  **       и  **   н  **   ф  **   о  **   р
0000060   **   м  **   а  **   ц  **   и  **   я  **       -   -       1
0000100   \n   С  **   ъ  **   з  **   д  **   а  **   в  **   а  **   н
0000120   **   е  **       н  **   а  **       д  **   я  **   л  **   о
0000140   **   в  **   е  **
....

The reason for the compatibility/consistency comment is that hexdump's side-by-side format doesn't allow for double-width characters, and multibyte characters can be double-width. The format used for od allows this (and you can see from the example that it only attempts to display those characters).

Thomas Dickey
  • 76,765