10th May 2023
Keeping a short runtime for CI workflows is ideal, this posts details the results from an experiment we did to speed up our CI workflow for a Scala project integrated with CodePreview.
Instead of preparing a dummy project to run the experiments, we'll do everything on a real-world project - the Scala-webapp-template - which is the base for many of the projects from our company.
This experiment aims to see how fast we can get without changing any code from the project, which is something that most Scala projects should be able to follow.
For the impatient, this is a summary:
Compilephase which is taking ~6 minutes, and, the CodePreview step that's taking ~9 minutes.
Compilephase went down to only 2 minutes! The rest of the time is being spent on the CodePreview phase (still ~8 minutes), the setup/post-setup actions taking the remaining ~3 minutes.
The improvements are outstanding given the required effort to set this up.
The project's tech stack is:
In order to avoid doing repetitive actions multiple times, the CI workflow is composed by a single job that does everything (compile -> deploy preview).
The commit history is at this PR where you can see the iterative experiments described in this post.
For the purpose of this post, we're running the CI workflows with the existing coursier cache, this cache has most of the required JVM libraries already downloaded, this is because the
setup-everything-scala action already includes this with no extra effort required from our side.
CodePreview is a service that launches a new preview environment with every Pull Request, including backend/frontend and its necessary dependencies, for the case of this project, we are deploying a Scala backend app + 2 Scala.js frontend apps, the only dependency is a Postgres database which is re-created on every deployment.
Different to other alternatives, CodePreview does not use Kubernetes nor Docker, allowing its customers to pay a single price for unlimited users.
These are the steps required to create create a preview environment (execution flow):
From here, we can see that the bottlenecks are the
Compile phase and the CodePreview step.
Let's start by paying attention to the first bottleneck, the
Compile phase which is taking ~6m minutes (with cached dependencies).
Let's see how fast we can get by taking advantage of Github runners.
ScalablyTyped is one of our more costly steps in our Scala.js modules, this is because we are depending on a few huge js libraries, by caching
~/.ivy2/local, we cache the artifacts generated by ScalablyTyped, still, the huge
node_modules/ directory isn't cached by this (execution flow):
Compile phase is still taking ~6m, there is no apparent improvement so far, let's run the workflow again to take advantage of the recently cached jars (execution flow):
We finally start getting progress, the
Compile phase has improved considerably, going from 6m18s to 3m39s.
In Scala, the
target/ directories include the compiled and the auto-generated code, think about this, when you pull code from a repository,
sbt uses incremental compilation, recompiling only the changed files (execution flow):
The compile time has been reduced from 6m18s to 2m19s!
Let's run the same workflow again, this time,
Compile phase took 2m04s, a very similar result (execution flow):
We have been able to speed-up one of the slowest CI steps from ~6 minutes to only ~2 minutes! This is a considerable improvement, and, you can easily take advantage of this for any other Scala project.
From now, we'll see how fast we can get the CodePreview step, the
Compile phase will likely get affected but it isn't a bottleneck anymore.
Github-actions provides VMs with 4 cpu's and 8G memory (ref).
We can get the runtime faster by increasing the CI resources, which requires a self-hosted runner, in this case, our budget would define how fast we can get.
The self-hosted runner can also help on reducing the overall runtime by pre-installing the necessary tools to our VM, for example:
Install ansibletakes ~30s.
Cache actiontakes ~30s.
Setup Scalatakes ~50s.
This means that we could reduce most steps to negligible times, having two steps to improve:
Compilephase taking ~2m.
Create preview envtaking ~8m.
Let's create a
t2.large VM from AWS, configured with all dependencies to complete the CodePreview workflow + set it up as a self-hosted runner at Github. I have selected this instance because it is the most similar to the VM provided by Github.
Run the CI workflow a few times to take advantage of the cache (workflow execution):
The results are amazing! Almost all steps take a negligible time now, the only slow step is the CodePreview one which is mostly composed by Ansible scripts:
Compilephase is now taking only ~20 seconds!
At this stage, increasing the self-hosted runner resources seems to be the only way to go from now.
Let's increase the resources, we'll follow the same steps than the 1st try but use a
t2.xlarge VM from AWS:
Unfortunately, we didn't get a noticeable improvement,
Compile phase is still taking ~30 seconds and
CodePreview step is taking slightly less than 7 minutes (workflow execution):
Let's increase the resources once more, we'll follow the same steps than the 1st try but use a
t2.2xlarge VM from AWS:
Compile phase says the same while the
CodePreview step is now running in 6 minutes, a decent improvement (workflow execution):
A few weeks ago, I have tried running this optimization with a self-hosted runner from a DigitalOcean Droplet, the overall runtime got as low as 6 minutes which I find very reasonable given that the workflow involves a lot of expensive steps:
NOTE: Unfortunately, DigitalOcean is facing an incident right now which has prevented me from creating such a Droplet, given that my workflow was executed in a private repository, I can't share access to it.
This post has covered the CodePreview step which is now the bottleneck, there are some ways to optimize this out:
If your CI process involves only compiling the code without running any tests, you can easily get the runtime to 1 minute, which is amazing! Even including a step to prepare a production build would slow this down to less than 2 minutes (yes, there are companies having projects with no automated tests at all):
In another post, I'll explore how to optimize a different workflow that executes many integration tests, such a workflow is 28 minutes. I wonder how much we can improve this, I have the feeling that we could get this to execute in less than 10 minutes without much effort.
What alternative approaches have you taken? and, what have been your results?