Mold, a modern drop-in replacement for current Unix linkers, has reached version 1.0. Written by the original creator of the LLVM lld linker, mold aims to be several times faster than its predecessor.
According to its author, Rui Ueyama, mold would allow developers to increase their productivity by significantly speeding up the debug-edit-rebuild cycles:
Concretely speaking, I want to use the linker to link a Chromium executable (1.8 GiB in size) just in 1 second. LLVM’s lld, the fastest open-source linker which I originally created a few years ago, takes about 12 seconds to link Chromium on my machine. So the goal is 12x performance bump over
lld. Compared to GNU gold, it’s more than 50x.
Ueyama published a benchmark showing that mold is already able to link Chrome, 2 GB in size, in slightly over 2 seconds, meaning with a 5x speed-up over
lld and over 25 times faster than GNU gold. While this result is still more than double the stated goal, it definitely makes for very impressive credentials.
While the Chrome benchmark is particularly promising, it must be also noted that results vary widely across programs. For example, mold is only twice as fast as
lld compiling Clang 13, while still 30+ times faster than GNU gold at the same task.
Mold achieves its performance by aggressively leveraging parallelism on multi-core CPUs. As Ueyama notes, existing linkers aren’t doing a great job at scaling with available cores.
As you can see, mold uses all available cores throughout its execution and finishes quickly. On the other hand, lld failed to use available cores most of the time. In this demo, the maximum parallelism is artificially capped to 16 so that the bars fit in the GIF.
Ueyama benchmarks also show that mold takes roughly double time to link a program than
cp to copy it to a different location, which shows the real goal here is to achieve
To this aim, mold explores a number of alternative design choices which include the idea of preloading input object files from disk and process them as soon as possible, that is before all the relevant input object files are ready. So, linking has two steps, with a first step consisting of speculatively parsing and preprocessing input files, which does not require all of them to be ready. By executing this first step as early as possible, the second step may complete sooner as input files become progressively ready.
Additional speedup is granted by executing some computationally intensive tasks in the first step, such as resolving symbols using string interning and merging string sections.
There are also approaches that Ueyama analyzed and finally rejected, such as adopting incremental linking, creating a new file format alternative to ELF, or using
inotify to watch for object files.
There are lots of other low-level details that affect a linker performance that become relevant when the goal is to shave off every millisecond, such as using forks to make the main linker process exit faster when an output file has been written to disk. Besides mold’s design document, you can find an interesting read in this Hacker News thread where Ueyama answers a number of questions.
Mold is still a very young project and, in spite of its 1.0 versioning, not yet ready to replace
lld in production according to the author. Still, if you want to try it out, that is really easy using
-ld-path command line arguments.
gcc requires you to use instead the
-B flag to specify the directory where your custom
ld is to be found.