[Python-Dev] buildbot (original) (raw)

Brian Warner warner at lothar.com
Wed Jan 11 10:24:29 CET 2006


The reason I want static pages is for security concerns. It is not easy whether buildbot can be trusted to have no security flaws, which might allow people to start new processes on the master, or (perhaps worse) on any of the slaves.

These are excellent points. While it would take a complete security audit to satisfy the kind of paranoia I tend to carry around, I can mention a couple of points about the buildbot's security design that might help you make some useful decisions about it:

The buildmaster "owns" (spelled "pwns" or "0wnz0red" these days, according to my "leet-speak phrasebook" :) the buildslaves. It can make them run whatever shell commands it likes, therefore it has full control of the buildslave accounts. It is appropriate to give the buildslaves their own account, with limited privileges.

The codebase "owns" the buildslaves too: most build commands will wind up running './configure' or 'make' or something which executes commands that are provided by the checked out source tree.

Nobody is supposed to "own" the buildmaster: it performs build scheduling and status reporting according to its own design and configuration file. A compromised codebase cannot make the buildmaster do anything unusual, nor can a compromised buildslave. The worst that a buildslave can do is cause a DoS attack by sending overly large command-status messages, which can prevent the buildmaster from doing anything useful (in the worst case causing it to run out of memory), but cannot make it do anything it isn't supposed to.

The top-level IRC functions can be audited by inspecting the command_ methods, as you've already seen.

The HTTP status page can be audited similarly, once you know how Twisted-Web works (there is a hierarchy of Resource objects, each component of the URL path uses Resource.getChild to obtain the child node in this tree, once the final child is retrieved then the 'render' method is called to produce HTML). The Waterfall resource and all its children get their capabilities from two objects: Status (which provides read-only status information about all builds) and Control (which is the piece that allows things like "force build"). The knob that disables the "Force Build" button does so by creating the Waterfall instance with control=None. If you can verify that the code doesn't go out of its way to acquire a Control reference through some private-use-only attribute, then you can be reasonably confident that it isn't possible to make the web server do anything to trigger a build. It's not restricted-execution mode or anything, but it's designed with capability-based security in mind, and that may help someone who wishes to audit it.

The PBListener status interface is similar: PB guarantees that only remote_* methods can be invoked by a remote client, and the PBListener object only has a reference to the top-level Status object.

The slave->master connection (via the 'slaveport') uses PB, so it can be audited the same way. Only the remote_* (and perspective_*) methods of objects which are provided to the buildslave can be invoked. The buildslaves are allowed to do two things to the top-level buildmaster: force a build that is run on their own machine, and invoke an empty 'keepalive' method. During a build, they can send remote_update and remote_complete messages to the current BuildStep: this is how they deliver status information (specifically the output of shell commands). By inspecting buildbot.process.step.RemoteCommand.remote_update, you can verify that the update is appended to a logfile and nothing else.

PB's serialization is designed specifically to be safe (in explicit contrast to pickle). Arbitrary classes cannot be sent over the wire. The worst-case attack is DoS, specifically memory exhaustion.

Any application which can talk to the outside world is a security concern. The tools that we have to insure that these applications only do what we intended them to do are not as fully developed as we would like (I hang out with the developers of E, and would love to implement the buildbot in a capability-secure form of Python, but such a beast is not available right now, and I'm spending too much time writing Buildbot code to get around to writing a more secureable language too). So we write our programs in as clear a way as possible, and take advantage of tools that have been developed or inspected by people we respect.

These days my paranoia tells me to trust a webserver written in Python more than one written in C. Buffer overruns are the obvious thing, but another important distinction is how Twisted's web server architecture treats the URL as a path of edges in a tree of Resource instances rather than as a pathname to a file on the disk. I don't need to worry about what kind of URLs might give access to the master.cfg file (which could contain debugging passwords or something), as long as I can tell that none of the Resource instances give access to it. This also makes preventing things like http://foo/../../oops.txt much much easier.

Preferring a Twisted web server over Apache reveals my bias, both in favor of Python and the developers of Twisted and Twisted's web server, and I quite understand if you don't share that bias. I think it would be quite possible to create a 'StaticWaterfall' status class, which would write HTML to a tree of files each time something changed. There are a number of status event delivery APIs in the buildbot which could cause a method to be called each time a Step was started or finished, and these could just write new HTML to a file. It would consume a bit more disk space, but would allow an external webserver to provide truly read-only access to build status. If you'd like me to spend some cycles on this, please let me know.. perhaps others would prefer this style of status delivery too.

cheers, -Brian



More information about the Python-Dev mailing list