marian-dev has builds which takes > 30mins. When I first tried to build marian-dev to edit something in sentencepiece on my personal laptop, a Lenovo ThinkPad X1 carbon - it took ages. Often I had to remove the built files and run a clean build once again. Sometimes I had to build Release, other times Debug. These days I develop on an 80-core Intel Xeon Phi, so the build times are not as much an issue. But still every now and then some noob tries to build the project on their local machine without the know-hows and often takes very very long to finish.

The same was the case across Windows, Linux and MacOS and cross-compilation targetting WebAssembly via emscripten when I started working for the bergamot-project - all of which had a CI build running. bergamot-translator uses a fork of marian-dev and the situation is pretty much the same.

My code is compiling

My code is compiling

This was a point of frustration when I started, and over weekends, outside officially assigned tasks I have successfully managed to bring down the time required for each one by one.

ccache

ccache speeds up compilation by using previous compilations. The principle is quite simple - each compilation unit can be associated with (source-file, compiler, compilation-args). If we hash all 3 and store the cached result somewhere, we will safely be able to reuse it in future compilations.

ccache, at the time of writing this post, supports most of Linux and gcc/clang, MacOS and AppleClang. It recently managed support for MSVC on Windows, although bergamot-translator still uses the fork with a release (We should switch soon, when I have free time). emscripten compiler (emcc) running on any platform has some form of support. That’s pretty much all our builds - so all that’s left was to slowly add support one-by-one.

Local builds

ccache is quite easy to set up for local builds. Chances are ccache is available in your operating system’s official package manager. The following example works with Ubuntu.

$ sudo apt-get install ccache 
$ cmake $BUILD_DIRECTORY                        \
    -DCMAKE_CXX_COMPILER_LAUNCHER=ccache        \
    -DCMAKE_C_COMPILER_LAUNCHER=ccache          \
    -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache  

From there-on, compilation results are going to be cached and we can rely on ccache. Most of my development happens on a Linux system, so I’m sorted. The library, however, is intended to be cross-platform (Windows, Mac, Linux, now Android). Due to Mozilla’s decisions, we also have a WebAssembly target. There’s no way I’m building everything while I local-test unless I am testing parts relevant to the platform. For that, we have GitHub CI.

GitHub Actions

bergamot-translator uses GitHub Actions for CI. Not much documentation for GitHub actions existed when I started using it for bergamot-translator, although I found the integrated offering quite convenient. The repository was originally developed in private, but my setting up CI exhausted the private repository minutes (using the more expensive MacOS Runners) in under two days. The solution was to make the development public - no worries, it was meant to be open-sourced anyway. But we were still using more resources than necessary and adding more items to the matrix would have been difficult.

bergamot-translator compiles with -march=native for some performance reasons. This led to rather fragmented compiler flags as a function of hardware. This is not a problem when I am on my development machine and the hardware remains the same. But GitHub runners, we’ve discovered are not uniform - some have avx512 capabilities while others have avx2 capabilities.

The general skeleton on optimizing build turnaround time with ccache on GitHub actions is the same across platforms. I use the environment to store a bunch of variables, these extend to $GITHUB_ENV, but I’d want to reuse the variable store in a matrix as well so the structure looks like the following:

env:
  ccache_basedir: ${{ github.workspace }}
  ccache_dir: "${{ github.workspace }}/.ccache"
  ccache_compilercheck: content
  ccache_compress: 'true'
  ccache_compresslevel: 9
  ccache_maxsize: 200M
  ccache_cmake: -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache

The place to store $CCACHE_DIR is GitHub and needs to sustain across builds. The following generates variables to sort the lookup by recency on the working PR.

- name: Generate ccache_vars for ccache based on machine
  shell: bash
  id: ccache_vars
  run: |-
    echo "::set-output name=hash::$(echo ${{ env.ccache_compilercheck }})"
    echo "::set-output name=timestamp::$(date '+%Y-%m-%dT%H.%M.%S')"

If the first commit, we may alternatively look into the last built main branch.

- name: Cache-op for build-cache through ccache
  uses: actions/cache@v2
  with:
    path: ${{ env.ccache_dir }}
    key: ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }}-${{ github.ref }}-${{ steps.ccache_vars.outputs.timestamp }}
    restore-keys: |-
      ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }}-${{ github.ref }}
      ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }}
      ccache-${{ matrix.identifier }}

The following is redundant and over-engineered, but I like to keep things this way for swappability of env.var and matrix.var.

- name: ccache environment setup
  run: |-
    echo "CCACHE_COMPILER_CHECK=${{ env.ccache_compilercheck }}" >> $GITHUB_ENV
    echo "CCACHE_BASEDIR=${{ env.ccache_basedir }}" >> $GITHUB_ENV
    echo "CCACHE_COMPRESS=${{ env.ccache_compress }}" >> $GITHUB_ENV
    echo "CCACHE_COMPRESSLEVEL=${{ env.ccache_compresslevel }}" >> $GITHUB_ENV
    echo "CCACHE_DIR=${{ env.ccache_dir }}" >> $GITHUB_ENV
    echo "CCACHE_MAXSIZE=${{ env.ccache_maxsize }}" >> $GITHUB_ENV

I often leave a prolog and epolog step to diagnose over CI whether the cache is working as intended.

- name: ccache prolog
  run: |-
    ccache -s # Print current cache stats
    ccache -z # Zero cache entry

# Build commands go here.

- name: ccache epilog
  run: |
    ccache -s # Print current cache stats

With the above skeleton, turns out it is actually quite easy to set it all up. Ignoring the countless hours spent debugging how a container rolled out in a machine somewhere with feedback turnaround absurd high until the cache started working, of course. Now I get to copy-paste the above and speed up compilations across my projects.

Linux / MacOS Linux/MacOS both worked quite out of the box with the above setup, and both had the bash shell.

Python The python shared library via pybind11 used the gcc or clang under Linux to build, so getting this one was as simple as copying over the Linux YAML lines and adding a bunch of python keys.

Android cross-compilation Android cross-compilation is used as “it builds” check on CI for ARM backend, which I’m pursuing at the time of writing this post. Since CMake has nice integrations as visible above, cross-compiling with a toolchain allowed me to use ccache with minimal changes required.

Windows Windows was the odd one. Compiling things on Windows with MSVC especially has never been a fun experience. I don’t think much of the developer crowd like this either.

Most of the implementation followed Speeding up C++ GitHub Actions using ccache. It took some time and searching and trial and error to get it to work, and the functionality is integrated now - bergamot-translator#308. Because bash wasn’t available, cmake was used to generate timestamps and such required.

- name: Download ccache
  shell: cmake -P {0}
  run: |
    set(ccache_url "https://github.com/cristianadam/ccache/releases/download/v${{ env.ccache_version }}/${{ runner.os }}.tar.xz")
    file(DOWNLOAD "${ccache_url}" ./ccache.tar.xz SHOW_PROGRESS)
    execute_process(COMMAND ${CMAKE_COMMAND} -E tar xvf ./ccache.tar.xz)
    if(ret AND NOT ret EQUAL 0)
      message( FATAL_ERROR "Bad exit status")
    endif()
- name: Generate ccache_vars for ccache based on machine
  shell: cmake -P {0}
  id: ccache_vars
  run: |-
    string(TIMESTAMP current_date "%Y-%m-%d-%H;%M;%S" UTC)
    message("::set-output name=timestamp::${current_date}")
    message("::set-output name=hash::${{ env.ccache_compilercheck }}")
- name: ccache prolog
  run: |-
    ${{github.workspace}}\ccache.exe -sv # Print current cache stats
    ${{github.workspace}}\ccache.exe -z # Print current cache stats

# Insert build command here.

- name: ccache epilog
  run: |-
    ${{github.workspace}}\ccache.exe -sv # Print current cache stats

Some MSVC flags like /Zi where unfriendly to cache, so had to get rid of that (it was debug information, most likely).

Few dependencies (pcre2, protobuf) comes via vcpkg and are slower than what I’d want at the moment. We will look into speeding this up eventually.

emscripten The emscripten ccache mostly referred to pyiodide implementation. Weird flex, but emcc uses ccache compiled onto WebAssembly target and then uses it further in compilation. Since WebAssembly is intended to be a portable target - I made a choice the ccache builds cached.

Further optimizations Originally marian-dev provided builds with debug info (-DCMAKE_BUILD_TYPE=RelWithDebInfo), which was inherited by bergamot-translator. This meant the compiled units had information on which lines which instructions correspond to and the information increases the size on the disk. Larger object files meant longer to compile and also getting into trouble with GitHub’s free limits.

Outcome

Compilation turnaround times we reduced as follows (in minutes):

  1. Linux: 25m ➔ 5m
  2. MacOS: 30m ➔ 6m
  3. WebAssembly: 15m ➔ 5m (2m if optimized further)
  4. Python: 30m ➔ 6m
  5. Windows: 30m ➔ 10m (depending on vcpkg being nice).
It has indeed saved time in the long run.

It has indeed saved time in the long run.

Good enough to be picked up by downstream repositories as well, turns out: XapaJIaMnu/translateLocally@a4e3e3b.

While this has served to reduce the compute footprint, turnaround time for developers, the ability gained by ccache has also encouraged me to add more builds - most certainly an instance of Jevons Paradox.

Cache Invalidation is a potential problem. If at some point in the future some bug corrupts a cache entry, builds can fail. The assumption is that this does not happen often, even if it does, we can just edit a flag to recache and then the builds will go back to work.

Functional ccache builds for all these can be found in browsermt/bergamot-translator.