Engineering

TLDR: By porting some of our microservices from Java to Go, we reduced their resident memory footprint by orders of magnitude.

In the beginning there was Java

Java

The AeroFS Appliance is composed of many microservices, and a vast majority of them were written in Java. Historically, this has not caused us any problems, as the appliance serves thousands of users at various of our customers without any performance issues.

However, after our move to Docker, we noticed a sharp increase of the appliance’s memory footprint. We considered a handful of fancy docker monitoring tools before settling on this low-tech but incredibly helpful script:

for line in `docker ps | awk '{print $1}' | grep -v CONTAINER`; do \
    echo $(( `cat /sys/fs/cgroup/memory/docker/$line*/memory.usage_in_bytes` / 1024 / 1024 ))MB \
        $(docker ps | grep $line | awk '{printf $NF" "}') ; \
done | sort -n

This outputs a list of running containers, sorted by amount of resident memory, as accounted for by Linux control groups. A sample output might look like:

46MB web
66MB verification
74MB openid
82MB havre
105MB logcollection
146MB sp
181MB sparta

This investigation uncovered a number of Java services with a surprisingly high memory footprint, sometimes wildly out of proportion to their complexity, or lack thereof. We identified several major factors behind these symptoms:

  1. an increase in the number of running JVMs, as each tomcat servlet was placed into a separate container
  2. reduced opportunity for the many JVMs to share read-only memory: the JVM itself, all the shared libraries it depends on, and of course the many JARs used by multiple services
  3. memory isolation could in some cases confuse some sizing heuristics, which lead to larger caches being allocated by some services

Having cut my teeth writing z80 assembly on devices with as many as 64 kilobytes of memory, I was quite excited at the thought of reclaiming hundreds of megabytes of precious RAM. In a fortunate turn of events, our next hackathon was a few days away, which provided a perfect opportunity to focus on the issue and a good excuse to explore new tools.

Codename: Greasefire

My main objective for this hackathon project was to burn through layers of metaphorical fat to reduce the overall footprint of the AeroFS appliance.

Specifically, my criteria for success were:

  • CPU usage should not noticeably increase
  • Stability and memory safety should be preserved
  • Resident memory usage should decrease by a factor 2 or more

In keeping with the spirit of the hackathon, I also wanted to explore new languages and tools so simply tweaking the existing services was not an acceptable approach.

To maximize the likelihood of the hackathon producing a demoable, and maybe even deployable, result, it was important to pick a reasonably-sized target. The obvious choice was the TeamServer probe (team-servers in the appliance status page), a small tomcat servlet with a single HTTP route and very straightforward internal logic.

The goal was thus to produce a server:

  1. with a compatible API
  2. packaged as a docker image

Trying new tools

To meet the cpu and memory requirements, the strongest candidates were compiled languages designed with systems programming in mind. Although a hackathon project is not bound by any expectation that the result will be immediately usable, I kept maintainability in mind and steered clear of the more obscure alternatives.

I quickly narrowed down the pool of strong contenders to Go and Rust. Both could fairly easily be compiled into small static binaries perfectly suited to run in minimal containers. Both promised adequate performance, memory safety, good concurrency support and, crucially, a smaller memory footprint than the JVM.

Rust’s intricate type system looked especially intriguing. However, it was far less mature than Go, having not yet reached the 1.0 milestone at the time. Rust was also hampered by its lack of good libraries for HTTP and lower-level networking.

We had previously tried porting one of our services to Go, back in 2013, but at the time we had faced memory leaks and quickly stopped the experiment. Two years later, Go looked a lot more mature and was deemed suitable for this experiment.

Go is similar enough to languages of the C family to be easy to pick up but has enough peculiarities to require frequent back-and-forth between code and documentation during the first few days. Fortunately, the documentation is well-written and is helpfully linked to the source of the standard library, which is tremendously helpful to clear up any confusion and get exposed to idiomatic code.

I was also particularly delighted by the existence of a standard language coding style, the magical gofmt tool that enforces it and how easily it integrates with a text editor such as vim (full disclosure: this hackathon was also my first time using vim for anything larger than a single-line edit).

Enter the gopher

 

Results

It took about a day to familiarize myself with Go and port the simple service I picked for this hackathon. The results were extremely promising:

  • Code size was reduced by almost half, from 175 lines down to 96.
  • Resident memory usage dropped from 87MB down to a mere 3MB, a 29x reduction!
  • The resulting docker image shrunk from 668MB to 4.3MB, a 155x reduction!
    Granted, the largest layers were shared by multiple services so the actual
    reduction of appliance disk footprint will be much smaller as long as there
    is at least one Java service. Still, this number feels extremely satisfying.

With almost an entire day left in the hackathon, I turned my attention to the Certificate Authority (ca in the appliance status page). This service receives Certificate Signing Requests from internal services and desktop clients and returns signed certificates used to secure both peer-to-peer content transfer between clients and client-server communications.

When this new CA finally replaced its Java equivalent, a few days after the end of hackathon, it brought the resident memory of the CA down by a whopping 100x!

This project won the “Technical Amazingness” award at the hackathon and started an ongoing effort to trim the resource footprint of the appliance.

As of version 1.1.5, four more services have been ported to Go, bringing the total count to six and the cumulative reduction of memory footprint to almost 1 Gigabyte. In every case we’ve seen a similar drop in code size and sometimes we even achieved a significant reduction in CPU usage or noticeable bandwidth improvements.

— Hugues & the AeroFS Team.