21

Unfortunately I lost my source code and I just have the output file that made with gcc in linux and I don’t have any access to my pc now.is there any way to convert output file to source file (in c under linux)?

mahsa
  • 211
  • 1
    What you want is called a decompiler. You might find some help with this answer: http://stackoverflow.com/questions/193896/whats-a-good-c-decompiler – Eric Renouf Sep 15 '15 at 12:18
  • IDA Pro with the decompiler module is the only practical solution that actually works with large executables. – fpmurphy Feb 21 '17 at 23:28
  • @fpmurphy1 You have got Hopper, which is comparable in quality to IDA Pro and which license is a fraction of the price. – Rui F Ribeiro Jan 19 '18 at 22:33
  • @fpmurphy1 I have not yet managed to see the quality of the code generated by Avast...who uses Intel 32-bit platforms anymore? Besides I have not used Wintel for decades now. see https://unix.stackexchange.com/questions/418354/understanding-what-a-linux-binary-is-doing/418357 The difference in price is quite significative however, Hex-rays/IDA pro start from 1500USD for a personal license to some extortionate values for commercial licenses like 5000USD or up AFAIK, Hopper is 100USD for a single user, and 130 for a single computer. – Rui F Ribeiro Jan 22 '18 at 02:15
  • @RuiFRibeiro. A hell of a lot of malware that I examine is still 32-bit. – fpmurphy Jan 24 '18 at 04:27
  • @fpmurphy1 It is clearly a plataform in decline, especially in the Intel world. No small wonder avast released it. It is legacy code as we speak. I do not question you, I do prefer however to have something that can deal both with 32 and 64 bits. I havent had at work a 32 bit VM for a few good years now. – Rui F Ribeiro Jan 24 '18 at 11:42

3 Answers3

45

So you had a cow, but you inadvertently converted it to hamburger, and now you want your cow back.

Sorry, it just doesn't work that way.

Simply restore the source file from your backups.

Ah, you didn't have backups. Unfortunately, the universe doens't give you a break for that.

You can decompile the binary. That won't give you your source code, but it'll give you some source code with the same behavior. You won't get the variable names unless it was a debug binary. You won't get the exact same logic unless you compiled without optimizations. Obviously, you won't get comments.

I've used Boomerang to decompile some programs, and the result was more readable than the machine code. I don't know if it's the best tool out there. Anyway, don't expect miracles.

  • 1
    Boomerang looks rather neat; shame the documentation references gcc -O4 since that does absolutely nothing (beyond -O3) if memory serves me right. Your last sentence of course is extremely valid as well as your first five sentences. That's not to say the rest isn't valid so much as you're making a very strong point about the importance of backing up regularly. +1 – Pryftan Jan 28 '18 at 02:16
8

Several tools are common in reverse engineering an executable.

  1. The command "file" which takes the file path as the first parameter so you can determine (in most cases) what type of executable you have.
  2. Disassemblers which show EXACTLY what the executable does but is difficult to read for those that don't write assembly code on that specific architecture or have experience with disassembly.
  3. Decompilers like Boomerang, Hex-rays, and Snowman can provide some greater readability but they do not recover the actual variable names or syntax of the original program and they are not 100% reliable, especially in cases where the engineers that created the executable tested with these packages and tried to obfuscate the security further.
  4. Data flow diagrams or tables. I know of no free tool to do this automatically, but a Python or Bash script over the top of a text parser of the assembly output (which can be written in sed or Perl) can be helpful.
  5. Pencil and paper, believe it or not, for jotting flows and ideas.

In most cases I've seen, the code needed to be rewritten from scratch, maintained as an assembly language program, or reconstituted by re-applying change requests to an older version.

  • 2
    #1: True although it has its faults too. #3: I guess those are commercial? I'm just curious academically (I have redundant backups so no need for that type of thing). #4: cflow (though that uses the source there are some that work on the binary - with some caveats of course) comes to mind. There are others out there, depending on what you are after. As for graphical output I can't help there as I don't like or need graphical output for that type of thing (I'd find it more distracting actually). #5: very true. You can also use a text file here, of course. – Pryftan Jan 28 '18 at 02:06
3

What you want to do is called "decompiling". There are many decompilers out there and it's not practical to cover them all here.

However, as a general remark: The conversion from C source to executable machine code is lossy. For instance:

  • Comments are irreversibly lost
  • Variable names are gone
  • Sometimes loops are unrolled for performance
  • Functions may be rearranged

It is rare for code to be compiled as written. Most compilers these days will drastically change your code to optimize it. So when you decompile, the compiler can only guess at what the source code must have looked like, it has no way of knowing what your code was, because that's gone. If the decompiler is good, the code you get will at least be compilable back into an equivalent executable, and then you can start slowly refactoring it to be readable. But most likely the decompiler will produce absolutely unreadable spaghetti code, and it will be a huge headache to decipher it. Sometimes, it might end up being less work to just re-write the program from scratch.

Bagalaw
  • 945
  • On the subject of comments something I recently noticed is - and I have no idea if this would allow comments to be read by a decompiler nor do I expect decompilers to even look for this type of thing - this: -C Do not discard comments. All comments are passed through to the output file, except for comments in processed directives, which are deleted along with the directive. It highlights side-effects as well as that of the -CC option (this is for gcc though probably the cpp instead). Not that I expect it to apply to the OP but maybe of interest to some. – Pryftan Jan 28 '18 at 02:27