[Scummvm-devel] The new Buildbot: what’s happening, what needs to happen

Colin Snover scummvm-devel at zetafleet.com
Sun Dec 17 04:08:15 CET 2017


Team,

One of my goals for the next release of ScummVM is to automate the
process of building and generating release packages. My hope is that
this will allow us to have a faster release cycle, more consistent
release quality, and free up manpower to focus on the many parts of
ScummVM which cannot be so easily automated (like adding more engines! :)).

The first two major parts of this project are nearly complete: the
upgrade of Buildbot, and the introduction of infrastructure management
using Ansible.

Here’s an overview of these changes.

Buildbot:

* Each compiler is now in its own Docker container and its mechanism of
generation is well-known and recorded in Git
* Buildbot workers can now be securely distributed across servers
* Obsolete workers in the old Buildbot have been replaced by new workers
in the new Buildbot
* All compilers have been updated; the oldest is now GCC 4.9.1
* Buildbot has been updated, it is now using Buildbot 0.9
* New packages are generated immediately upon successful build, instead
of only once nightly
* Login for manually controlling builds is integrated with GitHub
* Manual builds using different configure flags can be triggered from
the admin panel
* Failed workers run first on the next run, instead of whenever Buildbot
feels like it
* Debug symbols are split into separate packages when using the default
packager
* New build downloads are linked directly from the Buildbot interface
* [Policy change] Pending approval, porters will be responsible for
maintaining their own containers moving forward
* [Policy change] Pending approval, ports in the master branch will need
to have a Buildbot worker
* [Request for access] I need consent and admin access to the GitHub org
to transfer the new Buildbot repository to the org
* [Request for access] I need consent and admin access to the GitHub org
to set up OAuth for the new Buildbot, or someone to create and send me
tokens
* [Request for access] When it is time to deploy, I would be like to
have access to, or have someone available with access to, vm2’s
management panel so that if there is a big explosion (unlikely, but
non-zero since the kernel needs updating) I can boot into recovery mode
and fix it

Infrastructure:

* Installation, configuration, and management of new server software
(e.g. new Buildbot) and users is now recorded in Ansible playbooks in Git
* Server timezones change to UTC to follow international standards
* Firewalls are a thing now
* User management is a thing now, using keys pulled automatically from
GitHub
* [Policy change] Pending approval, server changes will be managed
through Ansible instead of directly installing and modifying software on
servers going forward
* [Request for access] I need consent and admin access to the GitHub org
to transfer the Ansible repository to the org
* [Request for information] vm.scummvm.org needs to have its final
remaining services evaluated and moved away (subversion, planet, and
doxygen), and then needs to be wiped and reloaded with an up-to-date
64-bit OS. Who can do this?

Please send any feedback about the questions and policy changes above,
after reading this entire email since some questions may be addressed by
the extra information below.

The remaining timeline for this work is:

1. Run full builds of ScummVM on all of the workers and address any
remaining compilation failures (I’ve fixed most of these already);
2. Once everything is building, a PR for all the necessary build patches
will be created;
3. Make any other necessary tweaks to the deployment configuration in
Ansible so everything critical will continue to be backed up appropriately;
4. Back up the old Buildbot stuff somewhere else in case everything
explodes horribly and needs to be rolled back;
5. Once the PR has landed, and any outstanding access issues or other
concerns have been resolved, deploy the new Buildbot.

Beyond this point, I hope that other people will take over and do what’s
needed so that we can do things like create proper packages/installers
for the various platforms (currently only the Windows worker generates
proper installers), code sign the releases properly, publish to app
stores using e.g. fastlane, etc.. This has been a big distraction for me
for the last month or more and I just want to get back to working on
ScummVM itself.

Here’s more information on these changes:

The Buildbot upgrade
--------------------

Currently, Buildbot consists of a big lump of binaries built in an
unknown manner and which must all run against the same host system. This
is not maintainable nor upgradable, and it prevents the host system from
being updated without running the risk of breaking Buildbot completely.

The new Buildbot is containerised using Docker, so each platform has its
own separate environment which can be managed and upgraded independently
from the host system and from the other workers. The steps used to
create each environment are written in Dockerfiles, which are saved in a
Git repository, so it is never a mystery how the compilers and libraries
were generated, and newer images can be regenerated from this source
information to incorporate new library or compiler updates. Workers can
be easily moved around to different servers as necessary in order to
balance load and disk space, and developers can trivially create copies
of part or all of the entire Buildbot system to run locally.

(Shameless plug: “<rsn8887> the new buildbot is awesome. I can built
*ANY* binary for *ANY* system using the same commands without any hassle”)

The new Buildbot was used to create ScummVM 2.0 for Windows and is
successfully generating binaries[1] for every active port listed on our
Platforms page in the wiki.

Shortly, I will be opening a pull request containing various patches
related to this work. Some of these patches will break the current
Buildbot, and users still building certain ports using older toolchains
and libraries. As such, Buildbot will be out of commission for a short
time so that the old Buildbot data may be archived away in case we need
to roll back for some reason, and the new Buildbot installed in its place.

An ongoing problem with the current Buildbot has been disk space
exhaustion, and while this is in the process of being resolved by _sev
and our generous hosting provider, it is possible that some workers will
not be enabled initially in order to ensure there is enough available
space on the main server over the near term. The hope is that
vm.scummvm.org can be paved over with a fresh OS and then will be able
to receive any remaining workers, which will also reduce the amount of
time it takes to run builds.

The loadout of ports built on the new Buildbot has changed to reflect
the up-to-date list of actively maintained ports on the wiki. These are
the changes:

Added          | Removed        | Updated
-------------- | -------------- | --------------
FreeMiNT (GCC  | Android-MIPS   | AmigaOS (5.4)
 7.2)          | Android-x86    | Android-ARM (NDK r15c / Clang 5)
Haiku (5.4)    | Dingux         | Debian (6.3)
Maemo (5.1)    | DS             | Dreamcast (7.2)
Raspberry Pi   | GameCube       | GCW0 (4.9.1)
 (6.3)         | GP2X           | Haiku (5.4)
               | GP2XWiz        | iOS 7+ (Clang 3.8.1)
               | iOS 3-6        | macOS (Clang 3.8.1)
               | N64            | PS3 (7.2)
               | OpenPandora    | PSP (4.9.3)
               | Ouya           | Vita (7.2)
               | PS2            | Windows (6.3)
               | WebOS

I did try to add SamsungTV and OS/2 workers, since these are listed as
active ports, and was unsuccessful. For SamsungTV, I was not able to
create a working compilation of glibc for its ancient kernel. For OS/2,
I could not find any cross-compiler, although there are apparently still
up-to-date GCCs being produced for that platform.

Once the new Buildbot up and running, I would like to make it a
requirement for all ports in the master branch to have a Buildbot worker
image maintained by the porter for that platform. This will save
everyone time over the long run and improve the experience for our users
since the release process will be automated and it will free core
ScummVM team members from being wholly responsible for maintaining ports
infrastructure in addition to everything else they already need to do.

The repository for this work is currently at
<https://github.com/csnover/scummvm-buildbot>.

Infrastructure management
-------------------------

As with the old Buildbot, our server management process is currently a
mystery process where changes are made directly on the servers without
any way to create new deployments or record change histories.

In figuring out how to deploy the new Buildbot to ScummVM servers, bgK
mentioned having used Ansible to perform deployments at work, and others
I know who work in sysops shared their positive experiences managing
systems using this tool.

Over the last week, I created an initial deployment plan for the new
Buildbot using Ansible which I hope can be used (along with Docker) for
all service deployments moving forward in order to make maintaining our
server infrastructure easier, safer, and more transparent. As Docker
allows anyone to create an identical copy of any of our application
environments locally, Ansible allows anyone to create an identical[2]
copy of our host server environments by running Ansible against any
other machine running the same base OS.

This is a less user-facing change than the Buildbot change and mostly
only affects anyone responsible for maintaining our services, though
will probably also result in some short downtime for the main site as
some badly deferred server updates need to be applied, such as a kernel
update to prevent service failures when restarting containers. As such,
since we are near release time, I do not plan on deploying these changes
until at least a few weeks after the release to avoid unnecessary
downtime at this critical moment.

Since I needed to set up VM environments for testing the deployment
anyway, I also introduced some playbooks for securing the servers, so
ingress firewalls will now exist, and user accounts for team members who
are no longer team members or do not need access to maintain server
infrastructure will not exist.

At this point I’ve deployed the site playbook to some fresh VMs, and
everything seems to be working correctly, though there is also the
chance that some additional cleaning up will need to be done on the
actual production server since none of the current services have been
copied over into playbooks except for bits and pieces of what was needed
for the new Buildbot. Any issues with this can hopefully be addressed on
an ad-hoc basis and then services can be put into playbooks to solve
this problem for deployments to whatever different servers may exist in
the future.

The repository for this work is currently at
<https://github.com/csnover/scummvm-infrastructure>.

---

[1] Since I don’t have actual hardware to test all the binaries, I can
only say that the Debian, macOS, Maemo, PSP, Raspberry Pi, Vita, and
Windows binaries have been verified as working. Testing (and probably
tweaking) will be needed for the remaining ports.

[2] Obviously, our security keys are encrypted so are not available for
any person to install. I don’t necessarily expect this repository to
remain public since there isn’t much reason for the general public to
duplicate our infrastructure, it’s mostly for allowing current and
future server maintainers the ability to easily set up a local testing
environment and for making it unburdensome to deploy our services to new
servers in the future.

That’s all from me for now. Looking forward to your feedback. Thanks for
reading this long, long email!

-- 
Colin Snover
https://zetafleet.com




More information about the Scummvm-devel mailing list