Reverse-engineering a binary-only Linux library and executable

I’ve been trying to reverse-engineer a particular proprietary binary-only Linux library, to learn what algorithms it uses. Unsurprisingly, the vendor ships both the library and the application that calls it without debugging information and with symbols stripped. However, because they dynamically link to the library, the function names of the library entry points are still present. Better yet, many of the library entry points are C++ methods, so the mangled names encode the argument profile. Initially I had a hard time setting breakpoints on the C++ methods because I was trying to use the mangled name, but then I discovered that once the library is loaded, you can give a partial command like

break 'Class::Method

then hit tab and get autocompletion. The single quote is important. If there are multiple matches, either because you didn’t type the full method name, or because there are multiple methods of the same name but with different argument profiles, it will give a list of the possible choices.

I wanted to start debugging only after a specific file has been opened, but many other files are opened first. It took me a little while to come up with a suitable conditional breakpoint command for the open() system call:

br open if strcmp(*((char**)($rsp+48)),"/lib64/libm.so.6")==0

The 48 is the offset in the stack frame of the path pointer, and is probably different on other architectures.

I haven’t yet figured out much detail of the data structures used by the library, but I found which library function does the specific transformation in which I’m interested. The arguments don’t seem to point directly to the data I care about, so they probably point to objects that in turn point to the data. Eventually I’ll have to puzzle it out, but for now I’ve found a way to identify the areas of memory which the function alters. (There may be an easier way.)

I break on entry to the function, then use gdb’s gcore command to save a core dump. Then I use the finish command, which will run the function and break again when it returns. I do a second gcore, create hex dumps of the two core dumps using objdump -s, and diff the two dumps.

It would be really nice to have a gdb command to search memory. I haven’t yet found such a command, though the idea was discussed on a gdb mailing list back in 2001.

Reverse-engineering a binary-only Linux library and executable

One Response to Reverse-engineering a binary-only Linux library and executable

Leave a Reply Cancel reply

Meta

Archives

Categories

Blogroll