Optimizer (original) (raw)

This doc page is specific to features shipped in Scala 2, which have either been removed in Scala 3 or replaced by an alternative. Unless otherwise stated, all the code examples in this page assume you are using Scala 2.

Lukas Rytz (2018)

Andrew Marki (2022)

The Scala 2.12 / 2.13 Inliner and Optimizer

In Brief

Read more to learn more.

Intro

The Scala compiler has included an inliner since version 2.0. Closure elimination and dead code elimination were added in 2.1. That was the first Scala optimizer, written and maintained by Iulian Dragos. He continued to improve these features over time and consolidated them under the -optimise flag (later Americanized to -optimize), which remained available through Scala 2.11.

The optimizer was re-written for Scala 2.12 to become more reliable and powerful – and to side-step the spelling issue by calling the new flag -opt. This post describes how to use the optimizer in Scala 2.12 and 2.13: what it does, how it works, and what are its limitations.

The options were simplified for 2.13.9. This page uses the simplified forms.

Motivation

Why does the Scala compiler even have a JVM bytecode optimizer? The JVM is a highly optimized runtime with a just-in-time (JIT) compiler that benefits from over two decades of tuning. It’s because there are certain well-known code patterns that the JVM fails to optimize properly. These patterns are common in functional languages such as Scala. (Increasingly, Java code with lambdas is catching up and showing the same performance issues at run-time.)

The two most important such patterns are “megamorphic dispatch” (also called “the inlining problem”) and value boxing. If you’d like to learn more about these problems in the context of Scala, you could watch the part of my Scala Days 2015 talk (starting at 26:13).

The goal of the Scala optimizer is to produce bytecode that the JVM can execute fast. It is also a goal to avoid performing any optimizations that the JVM can already do well.

This means that the Scala optimizer may become obsolete in the future, if the JIT compiler is improved to handle these patterns better. In fact, with the arrival of GraalVM, that future might be nearer than you think! But for now, we dive into some details about the Scala optimizer.

Constraints and assumptions

The Scala optimizer has to make its improvements within fairly narrow constraints:

However, even when staying within these constraints, some changes performed by the optimizer can be observed at run-time:

Binary compatibility

Scala minor releases are binary compatible with each other, for example, 2.12.6 and 2.12.7. The same is true for many libraries in the Scala ecosystem. These binary compatibility promises are the main reason for the Scala optimizer not to be enabled everywhere.

The reason is that inlining a method from one class into another changes the (binary) interface that is accessed:

class C {
  private[this] var x = 0
  @inline final def inc(): Int = { x += 1; x }
}

When inlining a callsite c.inc(), the resulting code no longer calls inc, but instead accesses the field x directly. Since that field is private (also in bytecode), inlining inc is only allowed within the class C itself. Trying to access x from any other class would cause an IllegalAccessError at run-time.

However, there are many cases where implementation details in Scala source code become public in bytecode:

class C {
  private def x = 0
  @inline final def m: Int = x
}
object C {
  def t(c: C) = c.x
}

Scala allows accessing the private method x in the companion object C. In bytecode, however, the classfile for the companion C$ is not allowed to access a private method of C. For that reason, the Scala compiler “mangles” the name of x to C$$x and makes the method public.

This means that m can be inlined into classes other than C, since the resulting code invokes C.C$$x instead of C.m. Unfortunately this breaks Scala’s binary compatibility promise: the fact that the public method m calls a private method x is considered to be an implementation detail that can change in a minor release of the library defining C.

Even more trivially, assume that method m was buggy and is changed to def m = if (fullMoon) 1 else x in a minor release. Normally, it would be enough for a user to put the new version on the classpath. However, if the old version of c.m was inlined at compile-time, having the new version of C on the run-time classpath would not fix the bug.

In order to safely use the Scala optimizer, users need to make sure that the compile-time and run-time classpaths are identical. This has a far-reaching consequence for library developers: libraries that are published to be consumed by other projects should not inline code from the classpath. The inliner can be configured to inline code from the library itself using -opt:inline:my.package.**.

The reason for this restriction is that dependency management tools like sbt will often pick newer versions of transitive dependencies. For example, if library A depends on core-1.1.1, B depends on core-1.1.2 and the application depends on both A and B, the build tool will put core-1.1.2 on the classpath. If code from core-1.1.1 was inlined into A at compile-time, it might break at run-time due to a binary incompatibility.

Using and interacting with the optimizer

The compiler flag for enabling the optimizer is -opt. Running scalac -opt:help shows how to use the flag.

By default (without any compiler flags, or with -opt:default), the Scala compiler eliminates unreachable code, but does not run any other optimizations.

-opt:local enables all method-local optimizations, for example:

Individual optimizations can be disabled. For example, -opt:local,-nullness-tracking disables nullness optimizations.

Method-local optimizations alone typically don’t have any positive effect on performance, because source code usually doesn’t have unnecessary boxing or null checks. However, local optimizations can often be applied after inlining, so it’s really the combination of inlining and local optimizations that can improve program performance.

-opt:inline enables inlining in addition to method-local optimizations. However, to avoid unexpected binary compatibility issues, we also need to tell the compiler which code it is allowed to inline. This is done by specifying a pattern after the option to select packages, classes, and methods for inlining. Examples:

Running scalac -opt:help explains how to use the compiler flag.

Inliner heuristics and @inline

When the inliner is enabled, it automatically selects callsites for inlining according to a heuristic.

As mentioned in the introduction, the main goal of the Scala optimizer is to eliminate megamorphic dispatch and value boxing. In order to keep this post from growing too long, a followup post will include the analysis of concrete examples that motivate which callsites are selected by the inliner heuristic.

Nevertheless, it is useful to have an intuition of how the heuristic works, so here is an overview:

To prevent methods from exceeding the JVM’s method size limit, the inliner has size limits. Inlining into a method stops when the number of instructions exceeds a certain threshold.

As you can see in the list above, the @inline and @noinline annotations are the only way for programmers to influence inlining decisions. In general, our recommendation is to avoid using these annotations. If you observe issues with the inliner heuristic that can be fixed by annotating methods, we are very keen to hear about them, for example in the form of a bug report.

A related anecdote: in the Scala compiler and standard library (which are built with the optimizer enabled), there are roughly 330 @inline-annotated methods. Removing all of these annotations and re-building the project has no effect on the compiler’s performance. So the annotations are well-intended and benign, but in reality unnecessary.

For expert users, @inline annotations can be used to hand-tune performance critical code without reducing abstraction. If you have a project that falls into this category, please let us know, we’re interested to learn more!

Finally, note that the @inline annotation only has an effect when the inliner is enabled, which is not the case by default. The reason is to avoid introducing accidental binary incompatibilities, as explained above.

Inliner warnings

The inliner can issue warnings when callsites cannot be inlined. By default, these warnings are not issued individually, but only as a summary at the end of compilation (similar to deprecation warnings).

$> scalac Test.scala '-opt:inline:**'
warning: there was one inliner warning; re-run enabling -Wopt for details, or try -help
one warning found

$> scalac Test.scala '-opt:inline:**' -Wopt
Test.scala:3: warning: C::f()I is annotated @inline but could not be inlined:
The method is not final and may be overridden.
  def t = f
          ^
one warning found

By default, the inliner issues warnings for invocations of methods annotated @inline that cannot be inlined. Here is the source code that was compiled in the commands above:

class C {
  @inline def f = 1
  def t = f           // cannot inline: C.f is not final
}
object T extends C {
  override def t = f  // can inline: T.f is final
}

The -Wopt flag has more configurations. With -Wopt:_, a warning is issued for every callsite that is selected by the heuristic but cannot be inlined. See also -Wopt:help.

Inliner log

If you’re curious (or maybe even skeptical) about what the inliner is doing to your code, you can use the -Vinline verbose flag to produce a trace of the inliner’s work:

package my.project
class C {
  def f(a: Array[Int]) = a.map(_ + 1)
}
$> scalac Test.scala '-opt:inline:**' -Vinline my/project/C.f
Inlining into my/project/C.f
 inlined scala/Predef$.intArrayOps (the callee is annotated `@inline`). Before: 15 ins, after: 30 ins.
 inlined scala/collection/ArrayOps$.map$extension (the callee is a higher-order method, the argument for parameter (evidence$6: Function1) is a function literal). Before: 30 ins, after: 94 ins.
  inlined scala/runtime/ScalaRunTime$.array_length (the callee is annotated `@inline`). Before: 94 ins, after: 110 ins.
  [...]
  rewrote invocations of closure allocated in my/project/C.f with body <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>a</mi><mi>n</mi><mi>o</mi><mi>n</mi><mi>f</mi><mi>u</mi><mi>n</mi></mrow><annotation encoding="application/x-tex">anonfun</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">an</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span></span></span></span>f$1: INVOKEINTERFACE scala/Function1.apply (Ljava/lang/Object;)Ljava/lang/Object; (itf)
 inlined my/project/C.$anonfun$f$1 (the callee is a synthetic forwarder method). Before: 654 ins, after: 666 ins.
 inlined scala/runtime/BoxesRunTime.boxToInteger (the callee is a forwarder method with boxing adaptation). Before: 666 ins, after: 674 ins.