109

From what I understand, a compiler makes a binary file that consists of 1's and 0's that a CPU can read. I have a binary file but how do I open it to see the 1's and 0's that are there? A text editor says it can't open it...

P.S. I have an assembly compiled binary that should be plain binary code of 1's and 0's?

Martin Zeltin
  • 1,241
  • 2
  • 9
  • 6
  • 2
    when you display a binary file, you will see it as ascii characters – magor May 10 '16 at 10:27
  • 2
    duplicate issue http://stackoverflow.com/questions/1765311/viewing-file-in-binary-in-terminal – magor May 10 '16 at 10:39
  • no - OP specified "assembly compiled binary". That does not address the question. For instance, it's not a music file, and it has structure. Without OP providing additional information, an unstructured tool is the place to start. – Thomas Dickey May 10 '16 at 11:11
  • 1
    See my answer. And be warned that the term binary is used in two ways totally different in practice : "A binary file" means a file whose context is not pure ASCII-text. "A binary number" means a number written using its binary form. – Pierre-Olivier Vares May 10 '16 at 12:38
  • @mazs ASCII? I think UTF-8 is more likely, or some code page if the program thinks it seems to be encoded that way through huristics. – JDługosz May 11 '16 at 05:17
  • It has been a long time since i did assembly programming, but in that time the compiled file contained characters from the ASCII table (characters from 0 to 255). www.asciitable.com – magor May 11 '16 at 08:14
  • 1
    You're not going to get what you're asking for. On a hard drive, the file is represented by magnetic changes. When the file is read, it's turned into electrical pulses, which in turn are turned into binary by the processor. Binary itself is only a representation of a number as stated in the answers below. These numbers are then interpreted by the individual programs as the format expected by them, whether this is text or images or something different. You won't see 1s and 0s by opening a binary file in nano. – James Hyde May 12 '16 at 07:59

11 Answers11

185

According to this answer by tyranid:

hexdump -C yourfile.bin 

unless you want to edit it of course. Most Linux distros have hexdump by default (but obviously not all).


Update

According to this answer by Emilio Bool:

xxd does both binary and hexadecimal

For bin :

xxd -b file

For hex :

xxd file
Michael Mrozek
  • 93,103
  • 40
  • 240
  • 233
Rahul
  • 13,589
56

Various people have answered some aspects of the query, but not all.

All files on computers are stored as 1's and 0's. Images, text files, music, executable applications, object files, etc.

They are all 0's and 1's. The only difference is that they are interpreted differently depending upon what opens them.

When you view a text file using cat, the executable (cat in this case) reads all the 1's and 0's and it presents them to you by converting them into characters from your relevant alphabet or language.

When you view a file using an image viewer, it takes all the 1's and 0's and turns them into an image, depending on the format of the file and some logic to work it all out.

Compiled binary files are no different, they are stored as 1's and 0's.

arzyfex's answer gives you the tools to view those files in different ways, but reading a file as binary works for any file on a computer, as does viewing it as octal, or hex, or indeed ASCII, it just might not make sense in each of those formats.

If you want to understand what an executable binary file does, you need to view it in a way which shows you the assembler language (as a start), which you can do using,

objdump -d /path/to/binary

which is a disassembler, it takes the binary content and converts it back into assembler (which is a very low level programming language). objdump is not always installed by default, so may need to be installed depending on your Linux environment.

Some external reading.

NB: as @Wildcard points out, it's important to note the files don't contain the characters 1 and 0 (as you see them on the screen), they contain actual numeric data, individual bits of information which are either on (1) or off (0). Even that description is only an approximation of the truth. They key point is that if you do find a viewer which shows you the 1's and 0's, even that is still interpreting the data from the file and then showing you the ASCII characters for 0 and 1. The data is stored in a binary format (see the Binary number link above). Pierre-Olivier's community wiki entry covers this in more detail.

EightBitTony
  • 21,373
  • 1
    Good exposé. You may want to add that the characters that you see in a line of text as "1" or "0" are not stored as a single "1" or "0" by the computer; the OP seems to have a confusion about that. – Wildcard May 12 '16 at 00:19
  • 1
    I would quibble (i.e., disagree) with your statement, "When you view a text file using cat, the executable (cat in this case) reads all the 1's and 0's and it presents them to you by converting them into characters from your relevant alphabet or language." cat doesn't do that; all cat does is write bytes to the standard output (unless you're using the "harmful" options).  The terminal program (and/or the terminal hardware, if applicable, i.e., its firmware) determines how to render bytes as characters, possibly with an assist from the TTY driver. – G-Man Says 'Reinstate Monica' May 19 '16 at 08:35
  • I don't disagree, but at some point, all simple descriptions break down, the question is how far down the rabbit hole you go before you stop describing things simply. – EightBitTony May 19 '16 at 09:58
17

At low level, a file is encoded as a sequence of 0's and 1's.

But even programmers rarely go there in practice.

First (and more important than this story of 0's and 1's), you have to understand that anything that the computer manipulates is encoded with numbers.

  • A character is coded with a number, using character set tables. For example, the letter 'A' has a value of 65 when coded using ASCII. See http://www.asciitable.com

  • A pixel is coded with one or more numbers (There are a lot of graphical formats) For example, in standard 3-colors format, a yellow pixel is encoded as : 255 for Red, 255 for Green, 0 for Blue. See http://www.quackit.com/css/css_color_codes.cfm (choose a color and see the R,G & B cells)

  • A binary-executable file is written in Assembly; each assembly instruction is coded as numbers. For example, the assembly instruction MOVB $0x61,%al is coded by two numbers : 176,97 See http://www.sparksandflames.com/files/x86InstructionChart.html (Each instruction has an associated number from 00 to FF, because the hexadecimal notation is used, see below)

Secondly : each number can have multiple representations or notations.

Say I have 23 apples.

  • If I make groups of ten apples, I will get: 2 groups of ten and 3 lone apples. That's exactly what we mean when we write 23 : a 2 (tens), then a 3 (units).
  • But I can also make groups of 16 apples. So I'll get one Group-of-16, and 7 lone apples. In hexadecimal notation (that's how called the 16 radix), I'll write : 17 (16 + 7). To distinguish from decimal notation, hexadecimal notation is generally noted with a prefix or a suffix : 17h, #17 or $17. But how to represent more than 9 Group-of-16, or more of 9 alone-apples? Simply, we use letters from A (10) to F (15). The number 31 (as in 31 apples) is written as #1F in hexadecimal.

  • On the same line, we can do group-of-two-apples. (And group of two group-of-two apples, i.e group-of-2x2-apples, and so on). Then 23 is : 1 group-of-2x2x2x2-apples, 0 group-of-2x2x2-apples, 1 group-of-2x2-apples, 1 group of 2 apples, and 1 lone apple Which will be noted 10111 in binary.

(See https://en.wikipedia.org/wiki/Radix)

Physically, mechanisms allowing two states (switches) are easy to do, as well on disk that in memory storage.

That's why data and programs, seen as numbers, are written and manipulated in their binary form.

Then translated - depending upon the data type - into their appropriate form (letter A, yellow pixel) or executed (MOV instruction).

hexdump lists the numbers coding the data (or the assembly program) in it's hexadecimal form. You can then use a calculator to get the corresponding binary form.

10

I would start with od (octal dump), and depending on the system, may find tools such as objdump useful.

Thomas Dickey
  • 76,765
6

You could open it in a hex editor which shows it as a series of hexadecimal values. xxd file

What are you trying to accomplish?

theblazehen
  • 129
  • 1
  • But I thought computer can only read 1's and 0's. Can I see those? I'm trying to understand how computers work – Martin Zeltin May 10 '16 at 10:30
  • 3
    That alone won't help you much. If you want to learn how exactly it works, then on a Linux box have a look at the ELF file format, and https://en.wikipedia.org/wiki/X86_instruction_listings. If you just want to see the code that gets generated by the compiler have a look at running it with gdb. Since you want to get more "low level" check out nand2tetris.org as well. For assembly language I hear that 6502 and mips assembly is a lot nicer than x86_64 / x86 assembly – theblazehen May 10 '16 at 10:33
  • @theblazehen Modern x86 family assembler is a beast. 8086 was managable, and I think just about any CPU from around that era (late 1970s to first half of the 1980s) should be tolerable as far as assembler goes. – user May 11 '16 at 15:13
6

The Linux strings command prints the strings of printable characters in files, e.g.:

$ strings /usr/bin/gnome-open 
/lib64/ld-linux-x86-64.so.2
3;o:)
libgnome-2.so.0
_ITM_deregisterTMCloneTable
g_object_unref
gmon_start__
g_dgettext
_Jv_RegisterClasses
g_strdup
_ITM_registerTMCloneTable
g_error_free
gnome_program_init
libgnome_module_info_get
libgio-2.0.so.0
g_ascii_strncasecmp

etc... it's pretty more readable than binary.

  • The OP asked how do I open it to see the 1's and 0's that are there? but the strings command will strip off most of the bytes he wants to see. – jlliagre May 12 '16 at 12:02
  • @jlliagre - while you are correct, the strings command - especially with a longer length like strings -n 6 - really helps figuring out what a binary file has in it if it contains any string constants, etc.. This answer should have been a comment, then it would have been fine. – Joe May 14 '16 at 01:23
  • @Joe Yes, I do not question the strings command usefulness, just the fact it doesn't answer the OP question here. – jlliagre May 14 '16 at 07:38
5

An important part about which you still seem confused: Hexadecimal values are just a different representation of binary values. Most hex editors or hexdumps will display values in the hexadecimal base, because it's more readable than in the binary base.

E.g.:

Binary:

xxd -b README.md                                                                
00000000: 00100011 00100000

Which is 35 and 32 in decimal

xxd README.md                                                                   
00000000: 2320

Also 35 and 32 in decimal

4

bvi is a Binary VIsual editor with vim keybindings. It's available on most linux systems.

enter image description here

2

You can view the file in binary in vim, by:

  • Opening the file in vim
  • Entering :% !xxd -b

The xxd command can be tweaked further, for example:

  • By adding -g4, which will group the bits in 32-bit packs
  • By adding -c4, which will format the output, to have 4 bytes per line

Adding both of the flags above, will give you one 32-bit integer per line.

Leandros
  • 712
  • 2
  • 7
  • 16
2

GHex is your friend :)

You can install it using command line:

  • Ubuntu:
    sudo apt-get install ghex
    
  • Fedora:
    sudo yum install ghex
    
AdminBee
  • 22,803
craken
  • 181
1

You can do it with e.g., this ruby one-liner:

$ ruby -e 'while c=STDIN.read(1); printf "%08b" % c.bytes.first; end'

Traditional C based system have lousy support for outputting stuff in binary, AFAIK. It's usually not very useful as it's quite hard to read unlike hexadecimal dumps.

Petr Skocik
  • 28,816