Pass by Const Reference or Value (original) (raw)
ISO/IEC JTC1 SC22 WG21 N3538 - 2013-03-06
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Introduction
Background
Problem
Approaches
Solution
Either Or
Undefined Behavior
Compiler Dealiasing
Discussion
Revision History
References
Introduction
Efficiency and expressiveness are hallmarks of C++, but sometimes those hallmarks conflict in ways that force programmers to compromise on one or the other. The progammer's choice of passing a given argument by const reference or by value is one of those compromises. That compromise has become more of a problem with modern calling conventions.
In this paper, we describe the problem, discuss possible approaches to reduce the problem, and explore a solution that introduces a new language feature.
Background
Consider the following code for some class type
.
`extern type va1(type input); extern type va2(type input);
void vf1(type& output, type input) { output += va1(input); output += va2(input); }`
Passing class types by value is expensive, as the compiler must often
- allocate a temporary local variable of the type (increasing cache pressure),
- copy the bytes of the argument to the temporary,
- pass a pointer to the temporary into the function,
- access the bytes of the parameter indirectly, and
- deallocate the temporary on return.
To avoid this expense, the traditional response in C is to pass the type "by pointer". However, in C++, the type can be passed by const reference, which provides both the syntactic convenience of pass-by-value and the efficiency of pass-indirectly.
The performance difference, coupled with the convenience, has resulted in an automatic tendency of programmers to pass classes by const reference. The class template complex
(26.4 Complex numbers [complex.numbers]) is a prime example of this tendency.
Unfortunately, the semantics are not the same, as there is now the potential for a parameter to alias with other objects. In the following rewrite of the above example, the input parameter may alias with the output parameter and so the first statement may modify the input parameter, thus introducing an error into the code at the second statement.
`extern type ra1(const type& input); extern type ra2(const type& input);
void rf1(type& output, const type& input) { output += ra1(input); output += ra2(input); }`
Programmers generally have three responses.
Ignore aliasing.
As actual aliasing is uncommon, programmers often forget to consider it, or decide that it will "never happen".
Document aliasing.
The programmer of a function may document when arguments may not alias. In C, the
restrict
type qualifier[C11restrict]provides this documentation. In essence, this approach gives up on trying to match the operational semantics of primitive scalar types.Overcome aliasing.
Programmer of a function may write extra code to overcome aliasing. This response eases the burden on the calling programmer at the expense of some loss in efficiency.
The simplest technique to overcoming aliasing is to copy the potentially aliasing parameter.
void rf2(type& output, const type& input) { type temp = input; output += ra1(temp); output += ra2(temp); }
This technique is both more complex and less efficient than simply passing the parameter by value. While the technique can be useful when dealing with legacy interfaces, it should not be a primary technique.
The next technique is to detect the aliasing and copy the parameter only when necessary.
void rf3(type& output, const type& input) { if ( &output == &input ) { type temp = input; output += ra1(temp); output += ra2(temp); } else { output += ra1(input); output += ra2(input); } }
This technique introduces a comparison and may introduce an instruction pipeline bubble. For small classes, this overhead may exceed the cost of simply copying the class.
The next technique is to write the function in two phases. The first phase reads from parameters and writes only to temporaries, not potentially aliasable objects. The second phase writes results. This technique may not be possible if subordinate function calls mutate arguments.
void rf4(type& output, const type& input) { type temp1 = ra1(input); type temp2 = ra2(input); output += temp1; output += temp2; }
This technique requires allocating more local storage, which puts pressure on the memory cache.
One can also mix the last two techniques.
void rf5(type& output, const type& input) { if ( &output == &input ) { type temp1 = ra1(input); type temp2 = ra2(input); output += temp1; output += temp2; } else { output += ra1(input); output += ra2(input); } }
The value in this technique depends on several attributes of the platform.
Finally, even when programmers properly avoid aliased writes, there is additional expense in reading the parameters because the compiler may be unable to determine that there are no aliases, and hence keep values in memory rather than in registers.
Problem
The latest generation of calling conventions will pass small classes in registers, provided that those classes have a trivial copy constructor and trivial destructor. (This convention was often introduced with the introduction of 64-bit addresses, such as AMD64 [AMD64abi], IA-64 [IA64abi], and SPARC V9 [SparcV9abi].) Thus small classes can have value parameter performance near that of primitive scalar types. Just as importantly, aliasing for such parameters is not possible, and thus they are safer.
The problem is that a programmer writing portable software (or defining widely used interfaces) cannot know whether the software will run on an old calling convention or on a new one. Furthermore, the programmer cannot know what constitutes 'small'. Thus, the programmer has no choice but to pessimize some platforms.
Small classes are quite common. They are used for handles, numbers, coordinates, etc. These classes are often critical to overall application performance. Being unable to write efficient portable programs for these classes is a significant problem.
Approaches
There are at least three approaches to reducing the problem.
Pass small classes by value.
We can change our habits to pass trivially copyable classes of sixteen bytes (two
double
s) or less by value. With this approach, complex numbers would primarily be passed by value, not by reference as they are in the class templatecomplex
(26.4 Complex numbers [complex.numbers]). This approach avoids aliasing issues, enables removing some indirect references when accessing parameters, but may introduce copying on older platforms. The net performance change is unclear.Add the
restrict
qualifier to C++.The
restrict
qualifier would at least enable the compiler to remove some indirect references when accessing reference parameters. This approach effectively requires the calling programmer to avoid aliasing. It also does not take full advantage of modern calling conventions.Add another language feature.
Adding a more carefully targeted language feature may make code less sensitive to the platform and enable compilers to better optimize the program.
As the first two approaches are relatively well understood, we consider only the third approach in detail.
Solution
Our solution for the third approach is to introduce a syntax for 'input' parameters. These parameters give the compiler the choice of passing the parameter by reference or by value. The Ada programming languages uses this solution for its parameter modes[AdaLRMparam] [AdaRDparam].
Pragmatically, the choice of how to pass such parameters will be defined by some combination of the platform ABI[AMD64abi] [IA64abi] [SparcV9abi]and the C++ ABI[ItaniumCXXabi] [SparcCXXabi].
As a strawman, we propose a ptr-operator of a declaratorof the form
ptr-operator::
|
attribute-specifier-seqoptcv-qualifier-seqopt
but with the additional constraints that the const
qualifier must be present and that the operator may only be used on parameters.
Example vf1 becomes:
`extern type oa1(const type| input); extern type oa2(const type| input);
void of1(type& output, const type| input) { output += oa1(input); output += oa2(input); }`
This example works well when the 'input' parameter is actually passed by value. However, when passed by reference, aliasing issues arise and the feature needs detailed semantics for aliasing. We discuss various choices in these semantics below.
Either Or
One semantics choice is that the parameter is either exactly const reference or exactly (const) value. The programmer may assume only the intersection of guarantees.
- Because the parameter might be a const reference, the programmer must assume the parameter may alias.
- Because the parameter might be a value, the programmer must assume the parameter may be a distinct object from all others.
- Because the parameter might be a value, the programmer may not assume a pointer to the parameter will persist beyond the lifetime of the function.
- More importantly, value parameters are sliced and const reference parameters are not, so the virtual semantics of such parameters would be significantly different. So, it might be unwise to use this parameter mode with polymorphic classes.
Under this choice, overcoming aliasing with the technique used in rf3has the virtue that if pass-by-value is chosen, then the condition becomes statically false, and the condition and corresponding dead code may be eliminated. There is no loss in efficiency for the fast case.
Undefined Behavior
Another choice is to make aliasing undefined behavior. This approach has been used, for example, in Fortran66[Fortran66args], Ada83[AdaLRMparam] [AdaRDparam], and C11[C11restrict]. Taking this approach in C++ would not be novel.
In practice, read-read aliasing causes no trouble, so we only need to make write-write and read-write aliasing undefined. Going further, (non-concurrent) writes to mutable members that do not change the abstract state of an object produce well-defined behavior.
Fortran, Ada, and C make the prohibition on aliases a global program property, which makes it an incomputable property. In the worst case, this approach could lead to many latent bugs. However, the approach has proven workable in practice because programmers need only follow a few rules to avoid the problem. First, calling programmers ensure that arguments do not alias each other. Second, called programmers ensure that the functions do not access objects accessible by callers.
While in this choice programmer are ultimately responsible for avoiding aliasing, static analysis tools and runtime checks can help programmers substantially.
Compiler Dealiasing
We can choose to require that the compiler dealias the arguments. This dealiasing is really only feasible for aliasing between parameters. We must rely on programmers to avoid aliases on non-local objects.
Only this choice tries to preserve the illusion that operations on classes are the same as operations on primitive types.
There are a few strategies in implementing the dealiasing. In all cases the alias checking is potentially O(n2) in the number of arguments.
Detect in Callers
The compiler does alias analysis at the call site and copies arguments that it cannot prove are unaliased. Many aliases can be eliminated a-priori because temporary values cannot be aliased. On the other hand, the compiler will not know which parameters are potentially conflicting within the function implementation.
Detect in Callees
The compiler does conflict analysis in the function implementation and copies parameters that alias with another. This approach limits dealiasing to the function body, rather than the more numerous function call sites. There is no argument information available to avoid some of the checks.
Leak Information
The compiler can leak information about conflicts from the callee to the caller, enabling a joint detection in both callers and callee. Then the caller can make minimal copies, only those at the intersection of aliases and conflicts. While this strategy works well in source-based environments, it works less well on systems exploiting ABIs. (Thanks to Chandler Carruth.)
Multiple Implementations
The compiler can produce multiple instances of the function, each representing protecting against some subset of conflicts. At the call site, the compiler matches the aliases against the versions of the callee and chooses to call the one needing the least number of dealiasing copies. (Thanks to Chandler Carruth.)
Discussion
Suppose we choose to adopt both of the first two approaches above — pass small classes by value and add the restrict
qualifier. What do we then loose by not adopting the 'input' parameter solution presented?
- We lose the effective prohibition on argument references persisting pass the function return.
- We lose the illusion of class operations behaving like primitive type operations.
- We lose the documentation or enforcement of intent that the parameter is not intended to be polymorphic.
Revision History
This paper revises N3445 = 12-0135 - 2012-09-23 as follows.
- Make editorial corrections.
- Add a 'Revision History' section.
References
[AdaLRMparam]
Ada '83 Language Reference Manual, Section 6.2 Formal Parameter Modes,http://archive.adaic.com/standards/83lrm/html/lrm-06-02.html#6.2
[AdaRDparam]
Rationale for the Design of the Ada Programming Language, Section 8.2 Parameter Modeshttp://archive.adaic.com/standards/83rat/html/ratl-08-02.html#8.2
[AMD64abi]
System V Application Binary Interface, AMD64 Architecture Processor Supplement, Draft Version 0.99.6,http://www.x86-64.org/documentation/abi.pdf, Michael Matz, Jan Hubička, Andreas Jaeger, Mark Mitchell, July 2012, Chapter 3: Low-Level System Information, 3.2 Function Calling Sequence
[C11restrict]
ISO/IEC 9899:2011 Programming languages -- C,http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1570.pdf, Section 6.7.3.1 Formal definition of restrict
[Fortran66args]
USA Standard FORTRAN (USAS X3.9-1966), (also known as Fortran 66),ftp://ftp.nag.co.uk/sc22wg5/ARCHIVE/Fortran66.pdf, section 8.3.2 Referencing External Functions, section 8.4.2 Referencing Subroutines
[IA64abi]
Itanium Software Conventions and Runtime Architecture Guide,http://www.intel.com/content/dam/www/public/us/en/documents/guides/itanium-software-runtime-architecture-guide.pdf, Section 8.5 Parameter Passing
[ItaniumCXXabi]
Itanium C++ ABI,http://mentorembedded.github.com/cxx-abi/abi.html
[SparcCXXabi]
The C++ Applicatio Binary Interface, SPARC Processor SupplementSun Microsystems, Inc., December 1995, Section 3.3: Function Calling Sequence
[SparcV9abi]
SPARC Compliance Definition 2.4.1 http://www.sparc.org/standards/64.psabi.1.34.ps.Z, SPARC Internatlional, Inc., July 1999, Chapter 3: Low-Level System Information, Low-Level System Information (64-bit psABI), Function Calling Sequence