Hexdump's canonical format displays an ASCII translation of whatever it's looking at in the right column. I have a binary file, containing non-ASCII strings, for which I know some (not all) of the character set. Is there a way to tell hexdump to use a custom character set when producing its text column?
1 Answers
For Debian, that's part of bsdmainutils
, whose source is in git. The program attempts (there are some unspecified limitations) to display multibyte characters (e.g., UTF-8) for the -c
option.
The source for that is in conv.c, which notes:
/*
* Multibyte characters are disabled for hexdump(1) for backwards
* compatibility and consistency (none of its other output formats
* recognize them correctly).
*/
however, the code does this only if odmode
is set, which is activated only when the executable is invoked as od
. The actual od
in Debian is the GNU version, which does not do this. You can get the multibyte feature by copying hexdump
to (preferably not /usr/bin
) od
, and running that, e.g.,
~/bin/od -bc foo
As an example, from ncurses-examples, bulgarian.txt
is
Показване на помощна информация -- 1
Създаване на дялове -- 2
Избор на дял и форматиране -- 3
Записване в избрания дял -- 4
Инсталиране на LILO -- 5
Изход от програмата -- 6
displays in the GNU version as
0000000 320 237 320 276 320 272 320 260 320 267 320 262 320 260 320 275
320 237 320 276 320 272 320 260 320 267 320 262 320 260 320 275
0000020 320 265 040 320 275 320 260 040 320 277 320 276 320 274 320 276
320 265 320 275 320 260 320 277 320 276 320 274 320 276
0000040 321 211 320 275 320 260 040 320 270 320 275 321 204 320 276 321
321 211 320 275 320 260 320 270 320 275 321 204 320 276 321
0000060 200 320 274 320 260 321 206 320 270 321 217 040 055 055 040 061
200 320 274 320 260 321 206 320 270 321 217 - - 1
0000100 012 320 241 321 212 320 267 320 264 320 260 320 262 320 260 320
\n 320 241 321 212 320 267 320 264 320 260 320 262 320 260 320
0000120 275 320 265 040 320 275 320 260 040 320 264 321 217 320 273 320
275 320 265 320 275 320 260 320 264 321 217 320 273 320
0000140 276 320 262 320 265 040 040 040 040 040 040 040 040 040 040 040
276 320 262 320 265
...
and in the BSD version as
0000000 П ** о ** к ** а ** з ** в ** а ** н **
0000020 е ** н ** а ** п ** о ** м ** о **
0000040 щ ** н ** а ** и ** н ** ф ** о ** р
0000060 ** м ** а ** ц ** и ** я ** - - 1
0000100 \n С ** ъ ** з ** д ** а ** в ** а ** н
0000120 ** е ** н ** а ** д ** я ** л ** о
0000140 ** в ** е **
....
The reason for the compatibility/consistency comment is that hexdump's
side-by-side format doesn't allow for double-width characters, and multibyte characters can be double-width. The format used for od
allows this (and you can see from the example that it only attempts to display those characters).

- 76,765