on moving to buildbot for reals

People are often very confused by the state of where Mozilla is with regard to Tinderbox versus Buildbot. They are both continuous integration systems, and you'd think that just jumping wholesale would be easier than the unholy marriage I've described in the past.

The big distinctions are these:

  • server vs. client - Buildbot clients and server are tightly coupled, and communicate through an active TCP connection (managed by Twisted). Tinderbox clients simply send email to the server, one for build start and one for build stop (build stop has the status specified, which changes color on Tinderbox server). The logfile for the build may be attached to the "end" email.
  • Tinderbox server vs. Buildbot server - tinderbox.mozilla.org puts up with a lot of load. Buildbot server can probably not handle this. Also, Tinderbox server has a bunch of features that Mozilla developers depend on, like setting status, etc.

Personally I feel that Tinderbox is the wrong way to visualize what developers actually need, but I'll save that for a later and more productive post :) For now, suffice to say that Tinderbox server does a lot more and can handle way more load than Buildbot server.

However, Buildbot server does have some very nice qualities, like being able to see the log in real-time, and being able to stop and force builds. So, an interim solution is to have Buildbot server send email to Tinderbox server on behalf of it's clients, so you get Buildbot as an administrative, developer-only interface, and Tinderbox server as the general, public interface.

The 1.8 and 1.9 nightly builders are already exposed to nightly users; there are a couple kinks to work out, so I won't link to it right now (I'll let the people that are actually maintaining it do that :P), but the glorious future is that developers can stop and kick builds as well as see real-time logs.

So, that's all well and good, and I think fairly well understood. Now here's the hairy part - the 1.8 and 1.9 nightly Buildbot clients are turning around and calling Tinderbox! WTF! (note that the unittest and moz2 buildbots do not do this, only the 1.8/1.9 nightly boxes). This is because Tinderbox client contains code to do a bunch of things:

  • mozilla-specific build process
  • performance testing
  • create updates
  • publish updates (nightly AUS only)
  • rebooting windows 9x between builds (not joking)
  • support for a bajillion products and platforms (mostly through huge "if" blocks)
  • support for hybrid depend/clobber builders
  • support for uploading to various locations on FTP
  • much, much more

Some of these features are very useful and not available elsewhere, and some are obviously not useful anymore. The error and log handling leaves a lot to be desired; it's not something trivially fixable, unfortunately (lots of people have tried, resulting in not one but two attempted rewrites).

Getting all of the useful bits of this into Buildbot has been a real challenge, but Ben Hearsum has all of the important bits worked out for moz2. I'm hoping to spend some time packaging that up as a BuildFactory, to make it easy to reuse this code for other branches and products (mostly because I'd really like to see bug 421586 get fixed), strictly as a community member of course :)

You can read more about Buildbot process-specific factories (that's a nice example of what a GNU Autoconf style project could use, which comes with Buildbot) but suffice to say it's a way of encapsulating the basic build process so you don't need to copy and paste "cvs co client.mk", "make -f client.mk MOZ_CO_PROJECT=blah" for each builder in your Buildbot master.cfg

This brings up the other big missing piece, which is that Buildbot's awesome Source class can't be used because it doesn't understand that it can't just update the whole "mozilla" CVS module, but needs to use the client.mk instead. This means that built-in clobber support and the built-in "tryserver" support can't be used (the current Mozilla implementations have a lot of custom code).

Bug 414031 suggests a possible way to implement support for it. Although it's kind of a pain to implement, using a driver script like this is fairly common in Java projects, so I think some kind of generic support might be feasible.

If you're not sure what I'm talking about here and why Source can't be used out of the box, the client.mk only does a partial checkout of the "mozilla" CVS module depending on which MOZ_CO_PROJECT is specified. Also, it can and does check out different versions of subdirectories, such as NSPR and NSS.

In other words, this is not your typical "checkout module && ./configure && make" project, although it is deceptively close in some ways :) It'd probably be better to have basic support for this flow, just based on principle of least surprise. I think that it also has material effect on tool support and new developers, too.

links

social