How do I verify a Linux distro against its public source code?

Question

This question is not about how to trace build artifacts and signatures back to its "trusted source"—the OS maintainers. I'm asking this question with the deepin Linux distro in mind, which some users have legitimate worries about downloading and installing an ISO image built and distributed by a Chinese-based company.

Even if security experts are able to inspect and clear the vast quantity of code of an OS distro, even ignoring all the trusted upstream dependencies, how can I be sure that the maintainers haven't slipped in something nefarious outside of the public code base during the build? Even for reputable distros like Ubuntu, it's still good to have a "trust but verify" mentality.

The typical answer I've seen is to just "Build it yourself!". For a small open source app or library, that may not be difficult, but for a full Linux distro, that process would be quite an undertaking, and is definitely out of the question for the Linux beginners that deepin targets.

Ideally, every release of a critical open source software would be built from identical commits and pegged dependencies by independent parties, and there would be a tool that compares, at an instruction level for executables, all artifacts from those builds to ensure they are truly "identical" (ignoring differences due to timestamps).

So, does such a tool exist?

https://reproducible-builds.org/? – muru Jun 25 '20 at 05:49 — muru, Jun 25 '20 at 05:49

Stephen Kitt · Answer 1 · 2020-06-25T06:43:25.160

Unfortunately, it is currently not possible to automatically verify a distribution release against its source code. There are, however, intermediate results which can be used to some extent; the Reproducible Builds project is the main driver.

Reproducibility is the ability, given the same inputs and the same tools, to produce the same output artifacts. Various projects participating in the Reproducible Builds effort thus allow some level of reproducibility; for example:

many Debian packages are reproducible, so given the same source package, and the same build environment (compiler, build dependencies etc.), you will get bit-for-bit identical packages;
Debian’s official Docker images are reproducible, so given the same reference packages (not source code), and the same build tools, you will get bit-for-bit identical root file systems.

End-user verification of the artifacts produced above isn’t yet possible, notably because the files which record the build environment aren’t published (as far as I’m aware). They are preserved though, so at some point this will become possible.

I don’t think any “large” distribution supports reproducibility all the way from the source code to the distribution media (ISO etc.), let alone verification. There are a number of obstacles in the way:

not all software can be built reproducibly, yet;
reproducible artifacts embed “knowledge” of their build environment, and that is fluid over the duration of a distribution’s preparation (so verifying a distribution would involve reconstructing its history);

and probably others I’m not thinking of just now. Tails does publish fully-reproducible distribution media, so this is technically possible, but it requires the distribution to be made reproducible; a distribution can’t be verified by external actors if it isn’t published in a reproducible manner. (This doesn’t stop external audits from existing, and sometimes being useful.)

Look at the various projects referenced on the Reproducible Builds site, some of them are focused on making all this simpler; see for example in-toto. The Reproducible Builds project itself publishes a number of tools which can help produce and analyse reproducible artifacts.

One point in your question is already largely addressed, even without reproducibility and verifiability:

how can I be sure that the maintainers haven't slipped in something nefarious outside of the public code base during the build?

Most distributions (I don’t know about Deepin) only publish binary artifacts which have been built on their own build infrastructure, so maintainers can’t slip anything in outside of the public code base (public as in, recorded with the artifact). This doesn’t remove the trust you implicitly place in the distribution itself, which is your main concern here, but at least it removes individual maintainers’ ability to publish malicious artifacts without also publishing the corresponding source code.

See also Ken Thompson’s classic “Reflections on trusting trust”, and subsequent research on the topic of trusting compilers; and the following questions here:

How do I verify a Linux distro against its public source code?

1 Answers1