2

I am trying to understand linux syscalls mechanism. I am reading a book and it in the book it says that exit function look like that(with gdb):

mov $0x0,%ebx
mov $0x1,%eax
80 int $0x80

I understand that this is a syscall to exit, but in my Debian it looks like that:

jmp    *0x8049698
push   $0x8
jmp    0x80482c0

maybe can someone explain me why it's not the same? When I try to do disas on 0x80482c0 gdb prints me:

No function contains specified address.

Thanks!

  • The strace program is a valuable tool for examining system calls. Do a strace ls to see whith what system call ls terminates. – ott-- Jan 21 '16 at 19:54
  • The syscall implementation (even on x86, i.e., 32 bits) has changed a lot in Linux. Take a peek at a description of doing it from assembly. – vonbrand Jan 23 '16 at 00:13

1 Answers1

2

Your code's exit() call ends up getting linked to the C library (libc) function exit(), which may not actually do the int $0x80.

The call from your code's invocation of exit() function is actually compiled as call instruction into the Program Linkage Table, or PLT. The run-time dynamic linker takes care of mapping the file /usr/lib/libc.so into memory. That's the C library. The run-time dynamic linker also fixes up entries in the PLT to eventually end up calling the code mapped in from /usr/lib/libc.so.

As near as I can tell (I'm using Arch linux), your second 3 instructions are the PLT entry, which gdb calls "exit@plt" when I single step into it. The jmp 0x80482c0 jumps to another address which finally jumps into the libc.so code.

You can demonstrate this to yourself with a fairly complicated exercize. First, you've got the address of the PLT table entry, whatever gdb tells you is the address of the jmp *0x8049698 - that's the adress of "exit@plt". On my x86 Arch linux box:

(gdb) disassemble 0x8048310,+20
Dump of assembler code from 0x8048310 to 0x8048324:
   0x08048310 <exit@plt+0>:     jmp    *0x80496e8
   0x08048316 <exit@plt+6>:     push   $0x10
   0x0804831b <exit@plt+11>:    jmp    0x80482e0

Then do readelf -e _program_ > elf.headers. Look in the file elf.headers. You will find a line of text that says "Section Headers:" Somewhere in Section Headers, you'll see something like this:

  [ 9] .rel.dyn          REL             08048290 000290 000008 08   A  5   0  4
  [10] .rel.plt          REL             08048298 000298 000020 08  AI  5  12  4
  [11] .init             PROGBITS        080482b8 0002b8 000023 00  AX  0   0  4
  [12] .plt              PROGBITS        080482e0 0002e0 000050 04  AX  0   0 16

"exit@plt" is at address 0x8048310. That's right in the ".rel.plt" section. ".rel.plt" probably stands for "relocation program linkage table".

Now we get to the part where the int $0x80 may not even exist. Do ldd _program_. Again, Arch linux x86 says this:

linux-gate.so.1 (0xb77d9000)
libc.so.6 => /usr/lib/libc.so.6 (0xb7603000)
/lib/ld-linux.so.2 (0xb77da000)

See that "linux-gate.so.1"? That contains the actual code that does the system call. It might be int $0x80, or it might be a sysenter instruction, or it might be something else. The Linux kernel is supposed to put a "small shared library" in a process's address space with the actual code, and then hand off the address of that small shared library in the ELF "auxilliary vector". Do man vdso for some details. The dynamic linker, /lib/ld-linux.so.2 knows the details of the ELF auxilliary vector, and ultimately puts the address of linx-gate.so.1 into the PLT somewhere, so actual C function calls can end up making efficient system calls.

If you do multiple invocations of ldd _program_, you will see that the address of linux-gate.so.1 isn't the same from invocation to invocation. The kernel actually doesn't put the top-of-stack at the same address every time to try to confuse malware that needs to know stack locations to get its own code executed.

  • Thank you very much! You answer is very helpfull! I am comming from the Windows world, and I am an expert in Windows Internals, and really new to Linux, so I am trying to explore it. Do you have any good reference to Linux internals? (in windows we have Windows Internals). – gtalst12 Jan 22 '16 at 14:09
  • @gtalst12 - Only some of the above material traditionally counts as "internals" since the dynamic linking is considered a libc function. If you look at the Musl C library (http://www.musl-libc.org/) it does dynamic linking somewhat differently, I believe. Since Linux and GNU libc are open source, there's less emphasis on "internals" and more on "interfaces" (or "APIs" if you will). The attitude seems to be that you can work it out on your own, if you dare to do something completely non-standard and non-portable. You have to glean info from blog posts, man pages and stackexchange. –  Jan 22 '16 at 14:13