KS2012: ARM: A big.LITTLE update [LWN.net] (original) (raw)

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

The ARM big.LITTLE architecture is an asymmetric multi-processor platform, with powerful and power-hungry processors coupled with less-powerful (in both senses) CPUs using the same instruction set. Big.LITTLE presents some challenges for the Linux scheduler. Paul McKenney gave a readout of the status of big.LITTLE support at the ARM minisummit, which he really meant to serve as an "advertisement" for the scheduling micro-conference at the Linux Plumbers Conference that started the next day.

The idea behind big.LITTLE is to do frequency and voltage scaling by other means, he said. Because of limitations imposed by physics, there is a floor to frequency and voltage scaling on any given processor, but that can be worked around by adding another processor with fewer transistors. That's what has been done with big.LITTLE.

There are basically two ways to expose the big.LITTLE system to Linux. The first is to treat each pair as a single CPU, switching between them "almost transparently". That has the advantage that it requires almost no changes to the kernel and applications don't know that anything has changed. But, there is a delay involved in making the switch, which isn't taken into account by the power management code, so the power savings aren't as large as they could be. In addition, that approach requires paired CPUs (i.e. one of each size), but some vendors are interested in having one little and many big CPUs in their big.LITTLE systems.

The other way to handle big.LITTLE is to expose all of the processors to Linux, so that the scheduler can choose where to run its tasks. That requires more knowledge of the behavior of processes, so Paul Turner has apatch set that gathers that kind of information. Turner said that the scheduler currently takes averages on a per-CPU basis, but when processes move between CPUs, some information is lost. His changes cause the load average to move with the processes, which will allow the scheduler to make better decisions.

Turner's patches are on their third revision, and have been "baking on our systems at Google" for a few months. There are no real to-dos outstanding, he said. Peter Zijlstra said that he had wanted to merge the previous revision, but that there was "some funky math" in the patches, which has since been changed. Turner said that he measured a 3-4% performance increase using the patches, which means we get "more accurate tracking at lower cost". It seems likely that the patches will be merged soon.

McKenney said that Turner's patches have been adapted by Morten Rasmussen to be used on big.LITTLE systems. The measurements are used to try to determine where a task should be run. Over time, though, the task's behavior can change, so the scheduler checks to see if that has happened and if the placement still makes sense. There are still questions about when "race to idle" versus spreading tasks around makes the most sense, and there have been some related discussions of that recently on the linux-kernel mailing list.

Currently, the CPU hotplug support is less than ideal for removing CPUs that have gone idle. But Thomas Gleixner is reworking things to "make hotplug suck less", McKenney said. For heavy workloads, the process of offlining a processor can take multiple seconds. After Gleixner's rework, that drops to 300ms for an order of magnitude decrease. Part of the solution is to remove stop_machine() calls from the offlining path. There are multiple reasons for making hotplug work better, McKenney said, including improving read-copy update (RCU), reducing realtime disruption, and providing a low-cost way to clear things off of a CPU for a short time. He also noted that it is not an ARM-only problem that is being solved here, as x86 suffers from significant hotplug delays too.

The session finished up with a brief discussion of how to describe the architecture of a big.LITTLE system to the kernel. Currently, each platform has its own way of describing the processors and caches in its header files, but a more general way, perhaps using device tree or some kind of runtime detection mechanism, is desired.

Index entries for this article
Kernel big.LITTLE