5
$ echo "hello" | od
0000000 062550 066154 005157
0000006

I know that the first column represents the byte offset. But I don't see how the other numbers are formed. According to man the above should be "octal bytes". However the option -b is supposed to "select octal bytes" as well and it prints something different:

$echo "hello" | od -b
0000000 150 145 154 154 157 012
0000006

EDIT: This is by the way what I would expect to appear i.e. the ascii values of all characters in 'hello\n' as what I would expect to be called "octal bytes".

1 Answers1

5

od doesn't show bytes by default, it shows words in octal. This may not quite be intuitive, but don't forget od is a very old command :-) I'll use a somewhat simpler example than you did:

$ echo -en '\01\02' | od
0000000 001001
0000002

As Intel uses a little-endian architecture, the bytes \01\02 are interpreted as 00000010 00000001 in binary.

As octal digits each represent 3 bits, we can group that number like this:

(0)(000)(001)(000)(000)(001)

So the octal representation of those 2 bytes is:

001001

For day to day use this is pretty useless; perhaps back in the day it was handy for manually debugging memory dumps :-)

Your hello\n example is:

h = 01101000
e = 01100101
l = 01101100
l = 01101100
o = 01101111
\n= 00001010

It's a bit more complicated now, because octal digits represent 3 bits, but bytes are 8 bits; so padding is added :-( The result symbollicaly is:

PehPllP\no

Remember, each set of 2 bytes is swapped due to the endianness. The P is a padding of 2 bits. The result in octal is (using a slash as separator):

00/01100101/01101000/00/01101100/01101100/00/00001010/01101111

Now in octal groups of 3 bits:

000 110 010 101 101 000 000 110 110 001 101 100 000 000 101 001 101 111

Translated into octal digits:

062550066154005157

This matches your result.

In conclusion you've probably learnt that od without options is worse than useless :-)

wurtel
  • 16,115
  • Can you explain to me why this swap of bytes happens in little endianness? Also I don't understand why the padding is added only every two bytes. It would be more intuitive for me if there was one bit of padding for each byte. – user2820379 Nov 18 '14 at 14:53
  • The padding is added for every word, where a word consists of two bytes. Little endian means that the least significant byte comes first, so you calculate the real value of a word you have to multiply the second byte by 256 and then add the first byte. Multiplying by 256 is the same as shifting the bits to the left 8 positions, so the result is the same as swapping the bytes. I'm sure the wikipedia link can explain it better than I can. – wurtel Nov 19 '14 at 08:11
  • Okay I think I understood it a lot better now. Another question: Why does \01\02 equal to two bytes only and 0101 equal to four bytes? (the latter one makes sense to me) – user2820379 Nov 19 '14 at 16:45
  • The \01 is octal notation for one byte, it can range from \0 to \0377. I don't know why you think 0101 is equal to four bytes though. – wurtel Nov 24 '14 at 07:39
  • I thought because of ASCII code? 1 byte = 1 char? – user2820379 Nov 24 '14 at 09:23
  • Only if it's a string, and not a number. That wasn't clear from your comment. – wurtel Nov 24 '14 at 09:30