0

I need to run a proprietary C++ application under Linux and I need to understand if it contains any functions outside advertised features.

Is there a way to list all the API calls that the application makes granted it doesn't use Linux syscalls and only uses standard stdc++ functions?

A similar question was asked almost a decade ago and the answer isn't satisfactory to me at all. I need something completely automated.

  • If it runs on Linux, and does anything like access files, communicate over the network, etc., it uses system calls; whether that’s through stdc++ or some other library doesn’t matter particularly in the end. – Stephen Kitt Feb 13 '24 at 20:05
  • Just to confuse the issue a bit more, are you positive that the code base is pure C++ and that they did not pull in any C, FORTRAN, etc. code and statically link it in? – doneal24 Feb 13 '24 at 21:29
  • I'm positive it's pure C++. – Artem S. Tashkinov Feb 14 '24 at 10:10

1 Answers1

2

TL;DR: you can hook regular function calls across shared object boundaries with bpftrace. However, especially for what we consider C++ standard library, that's but a small part of the actual C++ "standard libary" functionality. But, even if you could trace all these function calls, this would give you no guarantees whatsoever.


There's no general way to know which functions get called: in the end, you cannot know which part of the standard library got statically linked into the program; most of it will even be inline, by design! In C++, an enormous amount of standard functionality is in C++ templates, so part of the compilation unit of your program, not libstdc++. Even worse, even when just using functionality that is most definitely from a shared library, a "function call" simply isn't something "special"; it's just setting up the registers you need to pass arguments and jumping to an address. For shared objects, that address is usually resolved using the standard dynamic linker at startup time, but there's absolutely no guarantee that is the only thing that happens. Especially if a program loads libraries at runtime (anything that does plugins, for example, or scripting languages, or…), it's not as easy as just listing the table of functions to be imported.

You can trace your userland program and insert a hook for every function in the C++ standard library, and log that. bpftrace would probably be the tool of choice here. Install bpftrace, and look into its example tools, typically in /usr/share/bpftrace/tools, especially bashreadline.bt:

#!/usr/bin/bpftrace
/*
 * bashreadline    Print entered bash commands from all running shells.
 *                 For Linux, uses bpftrace and eBPF.
 *
 * This works by tracing the readline() function using a uretprobe (uprobes).
 *
 * USAGE: bashreadline.bt
 *
 * This is a bpftrace version of the bcc tool of the same name.
 *
 * Copyright 2018 Netflix, Inc.
 * Licensed under the Apache License, Version 2.0 (the "License")
 *
 * 06-Sep-2018  Brendan Gregg   Created this.
 */

BEGIN { printf("Tracing bash commands... Hit Ctrl-C to end.\n"); printf("%-9s %-6s %s\n", "TIME", "PID", "COMMAND"); }

uretprobe:/bin/bash:readline { time("%H:%M:%S "); printf("%-6d %s\n", pid, str(retval)); }

Write a small awk, Python, PERL or PL1 program that generates such a uretprobe:executable name:function name for each entry in objdump -T /lib64/libstdc++.so (or whatever library you think the program might be using; you can figure that one out via strace, looking for open calls). This is all very scriptable!

I need to run a proprietary C++ application under Linux and I need to understand if it contains any functions outside advertised features.

Yeah, that won't give you any guarantees. Best case, you see what addresses get called. Whether the program prepared a ROP trampoline to make the external library do whatever it wants in the right situation can't be done. Generally, you can only observe what your program does during a normal run. But unless you read all of its disassembly, you couldn't tell whether it does something different when run on a Fri 13th, or when UID=1234, or the CPUID ends in 7, or … And: Any undesirable functionality in any external library that you don't want the program to call, the programmers could just have included in the program itself (either by copying the functionality, or just through static linking).

That's why operating systems, and UNIXoids especially segment privileges across an operating system/user land boundary: no guarantees can be made on the behaviour of any given program, but Linux can guarantee (to a satisfactory amount of certainty) that a program that's not allowed to access a file cannot access it; and the allowing of access is impossible for the program itself to achieve.