Reducing our Scala CI workflow runtime in half without changing any code

Keeping a short runtime for CI workflows is ideal, this posts details the results from an experiment we did to speed up our CI workflow for a Scala project integrated with CodePreview.

Instead of preparing a dummy project to run the experiments, we’ll do everything on a real-world project - the Scala-webapp-template - which is the base for many of the projects from our company.

This experiment aims to see how fast we can get without changing any code from the project, which is something that most Scala projects should be able to follow.

Summary

For the impatient, this is a summary:

Our workflow is commonly taking 17 minutes to execute on a github-runner (example).
There are two obvious bottlenecks in our workflow, the Compile phase which is taking ~6 minutes, and, the CodePreview step that’s taking ~9 minutes.
By optimizing the workflow with the github-runner, we were able to reduce the CI runtime to 15 minutes, where the Compile phase went down to only 2 minutes! The rest of the time is being spent on the CodePreview phase (still ~8 minutes), the setup/post-setup actions taking the remaining ~3 minutes.
By setting up a self-hosted runner, we have been able to lower the compile phase to 1 minute! The complete workflow execution was lowered to 6 minutes (less than half comparing to the initial runtime).
For some reason, DigitalOcean droplets are outperforming AWS EC2 instances but the margin isn’t that big.

The improvements are outstanding given the required effort to set this up.

Context

The project’s tech stack is:

Scala (backend).
Scala.js (frontend, requires node.js and scalablytyped).
Ansible (deployment scripts).
Github actions (the CI).
setup-everything-scala (a convenient action that setups jdk/scala/node + coursier cache).

In order to avoid doing repetitive actions multiple times, the CI workflow is composed by a single job that does everything (compile -> deploy preview).

The commit history is at this PR where you can see the iterative experiments described in this post.

For the purpose of this post, we’re running the CI workflows with the existing coursier cache, this cache has most of the required JVM libraries already downloaded, this is because the setup-everything-scala action already includes this with no extra effort required from our side.

CodePreview

CodePreview is a service that launches a new preview environment with every Pull Request, including backend/frontend and its necessary dependencies, for the case of this project, we are deploying a Scala backend app + 2 Scala.js frontend apps, the only dependency is a Postgres database which is re-created on every deployment.

Different to other alternatives, CodePreview does not use Kubernetes nor Docker, allowing its customers to pay a single price for unlimited users.

Current CI process

These are the steps required to create create a preview environment (execution flow):

From here, we can see that the bottlenecks are the Compile phase and the CodePreview step.

Let’s start by paying attention to the first bottleneck, the Compile phase which is taking ~6m minutes (with cached dependencies).

Referenced commit

Github runner

Let’s see how fast we can get by taking advantage of Github runners.

Github runner - 1st try - Cache ScalablyTyped generated jars

ScalablyTyped is one of our more costly steps in our Scala.js modules, this is because we are depending on a few huge js libraries, by caching ~/.ivy2/local, we cache the artifacts generated by ScalablyTyped, still, the huge node_modules/ directory isn’t cached by this (execution flow):

The Compile phase is still taking ~6m, there is no apparent improvement so far, let’s run the workflow again to take advantage of the recently cached jars (execution flow):

We finally start getting progress, the Compile phase has improved considerably, going from 6m18s to 3m39s.

Referenced commit

Github runner - 2nd try - Cache **/target directory

In Scala, the target/ directories include the compiled and the auto-generated code, think about this, when you pull code from a repository, sbt uses incremental compilation, recompiling only the changed files (execution flow):

The compile time has been reduced from 6m18s to 2m19s!

Let’s run the same workflow again, this time, Compile phase took 2m04s, a very similar result (execution flow):

Referenced commit

Github runner summary

We have been able to speed-up one of the slowest CI steps from ~6 minutes to only ~2 minutes! This is a considerable improvement, and, you can easily take advantage of this for any other Scala project.

Self-hosted runner

From now, we’ll see how fast we can get the CodePreview step, the Compile phase will likely get affected but it isn’t a bottleneck anymore.

Github-actions provides VMs with 4 cpu’s and 8G memory (ref).

We can get the runtime faster by increasing the CI resources, which requires a self-hosted runner, in this case, our budget would define how fast we can get.

The self-hosted runner can also help on reducing the overall runtime by pre-installing the necessary tools to our VM, for example:

Install ansible takes ~30s.
Cache action takes ~30s.
Setup Scala takes ~50s.
Extra steps requiring a few seconds, take ~30s.

This means that we could reduce most steps to negligible times, having two steps to improve:

Compile phase taking ~2m.
Create preview env taking ~8m.

Self-hosted runner - 1st try

Let’s create a t2.large VM from AWS, configured with all dependencies to complete the CodePreview workflow + set it up as a self-hosted runner at Github. I have selected this instance because it is the most similar to the VM provided by Github.

Run the CI workflow a few times to take advantage of the cache (workflow execution):

The results are amazing! Almost all steps take a negligible time now, the only slow step is the CodePreview one which is mostly composed by Ansible scripts:

Compile phase is now taking only ~20 seconds!
CodePreview step is still taking ~7 minutes, a minor improvement from previous tries.
The whole workflow is taking ~8 minutes which is half than how we started.

At this stage, increasing the self-hosted runner resources seems to be the only way to go from now.

Referenced commit

Self-hosted runner - 2nd try

Let’s increase the resources, we’ll follow the same steps than the 1st try but use a t2.xlarge VM from AWS:

Unfortunately, we didn’t get a noticeable improvement, Compile phase is still taking ~30 seconds and CodePreview step is taking slightly less than 7 minutes (workflow execution):

Self-hosted runner - 3rd try

Let’s increase the resources once more, we’ll follow the same steps than the 1st try but use a t2.2xlarge VM from AWS:

The Compile phase says the same while the CodePreview step is now running in 6 minutes, a decent improvement (workflow execution):

Self-hosted runner - 4th try (DigitalOcean)

A few weeks ago, I have tried running this optimization with a self-hosted runner from a DigitalOcean Droplet, the overall runtime got as low as 6 minutes which I find very reasonable given that the workflow involves a lot of expensive steps:

NOTE: Unfortunately, DigitalOcean is facing an incident right now which has prevented me from creating such a Droplet, given that my workflow was executed in a private repository, I can’t share access to it.

Further optimizations

This post has covered the CodePreview step which is now the bottleneck, there are some ways to optimize this out:

Given that the job startup/cleanup times are negligible, we can create many jobs to deploy all the apps in parallel (backend/web/admin), my hypothesis is that we can get the step to run in half of its time (3 minutes), which would be an outstanding result but I’m yet to try this out.
The Ansible scripts haven’t been optimized, we can likely save some seconds from this, still, the gains won’t likely be as much as running the deployments concurrently.
The Ansible scripts execution time also depend on the server hosting the previews, increasing those resources would likely make the workflow faster, reducing a few extra seconds.

Conclusion

If your CI process involves only compiling the code without running any tests, you can easily get the runtime to 1 minute, which is amazing! Even including a step to prepare a production build would slow this down to less than 2 minutes (yes, there are companies having projects with no automated tests at all):

In another post, I’ll explore how to optimize a different workflow that executes many integration tests, such a workflow is 28 minutes. I wonder how much we can improve this, I have the feeling that we could get this to execute in less than 10 minutes without much effort.

What alternative approaches have you taken? and, what have been your results?