Runtime-sized arrays with automatic storage duration (original) (raw)
N3467=12-0157
revision of N3412=12-0102
Jens Maurer
2012-10-29
Status
The proposal presented in this paper was approved by EWG and handed over to CWG during the Portland meeting of WG21.
Changes compared to N3366
- discuss array of runtime bound in range-based for statement
- multidimensional arrays with multiple runtime bounds is an open issue
- adjust grammar in 8 dcl.decl
- for "bound is 0", range-checking of bound, and "too many initializers", behave as-if in array-new
Changes compared to N3412
- removed open issues after EWG discussion: no support for range-based for, only have the outermost dimension as a non-constant
- rename exception from std::bad_array_new_length to std::bad_array_length
- permitted allocation on heap
- change 4.2 conv.array to accomodate arrays with no elements
Motivation
Sometimes, a user wishes to allocate a local array whose size is not known at compile-time, but at runtime only. Nonetheless, the array's size will remain unchanged during the lifetime of the array.
Examples are
- interfacing with the readv/writev POSIX system calls
- small runtime-sized data structure to apply STL algorithms calls
- ...
This paper proposes to add local runtime-sized arrays with automatic storage duration to C++, for example:
void f(std::size_t n) { int a[n]; for (std::size_t i = 0; i < n; ++i) a[i] = 2*i; std::sort(a, a+n); }
Traditionally, the array bound "n" had to be a constant expression (see 8.3.4 dcl.array). For local arrays with automatic storage duration, this paper proposes to lift that restriction. The syntax is intended to be the same as that used for C99 variable length arrays (VLAs).
As a design guideline, the same rules should apply to "new T[n]
" and a local array "T a[n]
".
There is well-established existing practice with gcc, Clang, and Intel C++ all implementing a similar, if not identical feature. In fact, Douglas Gregor reported in c++std-ext-12553 on 2012-01-30:
Users really seem to want this feature. It's a fairly common extension, and when we tried to ban it out of principle (in Clang), our users reacted *very* strongly.
Scope
This paper does not propose to add all features of C99 variable length arrays to C++. In particular, the following features are explicitly excluded:
- multidimensional arrays, where other than the top level has a runtime bound (in analogy, array-new doesn't support that either)
- modifications to the function declarator syntax
- sizeof(a) being a runtime-evaluated expression returning the size of a
- "typedef int a[n];" evaluating "n" and passing that through the typedef Examples:
void f(std::size_t n) { int a[n]; unsigned int x = sizeof(a); // ill-formed const std::type_info& ti = typeid(a); // ill-formed typedef int t[n]; // ill-formed }
Discussion
Data structures that allocate from the heap access, by design, a global resource that is often highly contended in a multi-threaded program. Therefore, avoiding heap allocations is usually advantageous for performance. Allocating such data from the stack is much more efficient, because the stack is local to each thread and bytes on the stack are often cached locally. (Since each thread has a separate stack, it is unlikely that another thread on another CPU accesses the same data, thereby causing more expensive cache invalidations.)
The syntax does not require additional keywords. Instead, a restriction on the existing array declaration syntax is lifted in certain circumstances.
There is no reason to limit the feature to PODs as array element types, thus such a limitation is not proposed.
Stack overflow becomes more likely, in particular if the size depends on external input and is not properly checked. Some environments might therefore prohibit the use of the feature. Such a prohibition can be easily enforced with a static analysis tool.
There is no longer an upper bound on the size of a function's stack frame. This makes static analysis of stack usage harder.
This proposal now allows
template<int ...N> void f() { int x[] = { N ... }; }
to be well-formed even for f<>()
, although "x" will become an array of runtime bound for the zero-element case.
Alternatives
The following alternatives were considered:
- std::dynarray as presented in N2648 "C++ Dynamic Arrays" by Lawrence Crowl and Matt Austern: stack allocation is left as QoI, requiring "compiler detection" when using dynarray locally; also usable with static storage duration
- alloca(): not type-safe, no support for C++ constructors/destructors
- std::vector: allocates from heap, allows space "reservation", thus requiring more management data
- std::valarray: allocates from heap; additional unrelated requirements on contained type (see 26.2 numeric.requirements)
- "auto_buffer" (fixed-size local buffer plus overflow to the heap): may waste memory for fixed-size buffer; introduces additional layer of pointer indirection
Changes to the Working Draft
Change in 3.9 basic.types paragraph 10:
A type is a literal type if it is:
- ...
- an array of literal type other than an array of runtime bound; or
- ...
Change in 3.9.2 basic.compound paragraph 2 (the same wording is added by the proposed resolution of core issue 1464):
These methods of constructing types can be applied recursively; restrictions are mentioned in 8.3.1 dcl.ptr, 8.3.4 dcl.array, 8.3.5 dcl.fct, and 8.3.2 dcl.ref.Constructing a type such that the number of bytes in its object representation exceeds the maximum value representable in the type
std::size_t
(18.2 support.types) is ill-formed.
Change in 4.2 conv.array paragraph 1:
An
lvalue or rvalueexpression of type "array of N T", "array of runtime bound of T", or "array of unknown bound of T" can be converted to a prvalue of type "pointer to T". The result is a pointer to the first element of the array; if the array has no elements, the result is undefined.
Insert a new paragraph before 5.2.8 expr.typeid paragraph 2:
The
typeid
operator shall not be applied to an array of runtime bound.When
typeid
is applied to a glvalue expression ...
Change in 5.3.3 expr.sizeof paragraph 1:
... The
sizeof
operator shall not be applied to an expression that has function or incomplete type, to an enumeration type whose underlying type is not fixed before all its enumerators have been declared,to an array of runtime bound,to the parenthesized name of such types, or to an lvalue that designates a bit-field. ...
Drafting note: 5.3.7 expr.unary.noexcept does not need to be changed, because the declaration of an array of runtime bound cannot be lexically part of the operand of a noexcept; see also 5.1.2p2 expr.prim.lambda.
Change in 6.5.4 stmt.ranged paragraph 1:
- if
_RangeT
is an array type, begin-expr and_end-expr_ are__range
and__range + __bound
, respectively, where__bound
is the array bound. If_RangeT
is an array of unknownsizebound, an array of runtime bound, or an array of incomplete type, the program is ill-formed;- ...
Insert a new paragraph before 7.1.3 dcl.typedef paragraph 3:
A typedef-name shall not name an array of runtime bound.
In a given non-class scope, a
typedef
specifier can be used to redefine the name of any type declared in that scope to refer to the type to which it already refers. [ Example: ... ]
Change in 7.1.6.2 dcl.type.simple paragraph 3:
The type denoted by decltype(e) is defined as follows:
- if
e
has type "array of runtime bound", the program is ill-formed;- otherwise, if
e
is an unparenthesized ...
Change in 8 dcl.decl paragraph 4:
noptr-declarator: declarator-id attribute-specifier-seqopt noptr-declarator parameters-and-qualifiers noptr-declarator [
_constant-expression_opt_expression_opt ] _attribute-specifier-seq_opt ( ptr-declarator )
Drafting note: Section 8.1 [dcl.name] defining the grammar termtype-id is intentionally unchanged. Thus, constructing an array of runtime bound in a type-id is ill-formed, because the grammar continues to require all constant-expressions in array bounds.
Change in 8.3.1 dcl.ptr paragraph 1:
... Similarly, the optional attribute-specifier-seq (7.6.1) appertains to the pointer and not to the object pointed to. There shall be no pointers to arrays of runtime bound.
Change in 8.3.2 dcl.ref paragraph 5:
There shall be no references to references, no references to arrays of runtime bound, no arrays of references, and no pointers to references. ...
Change in 8.3.4 dcl.array paragraph 1 (partly taken from Mike Miller's drafting for core issue 1464):
In a declaration T D where D has the form
D1 [ ~~constant-expressionopt~~ expressionopt ] attribute-specifier-seqopt
and the type of the identifier in the declaration T D1 is "derived-declarator-type-list T", then the type of the identifier of D is an array type; if the type of the identifier of D contains the
auto
type-specifier, the program is ill-formed. T is called the array element type; this type shall not be a reference type, the (possibly cv-qualified) typevoid
, a function type, an array of unknown or runtime bound, or an abstract class type.Except as noted below, if the expression is omitted, the type of the identifier of D is "derived-declarator-type-list array of unknown bound of T". If the expression is present, it is implicitly converted tostd::size_t
. The expression is erroneous if:
- its value before converting to
std::size_t
or, in the case of an expression of class type, before application of the second standard conversion (13.3.3.1.2 over.ics.user) is less than zero;- its value is such that the size of the allocated object would exceed the implementation-defined limit (annex B implimits);
- the initializer of the object is a_braced-init-list_ whose number of (top-level)_initializer-clause_s exceeds the number of elements to initialize; or
- the object is initialized with a string literal and there are more initializers than there are array elements. If the expression, after converting to
std::size_t
, is a core constant expression and the_expression_ is erroneous, the program is ill-formed.- If the expression, after converting to
std::size_t
, is a core constant expression whose valueN
is greater than zero, the type of the identifier of D is "derived-declarator-type-list array of N T".- Otherwise, the type of the identifier of D is "derived-declarator-type-list array of runtime bound of T" and the value of the expression designates the number of elements
N
in the array. If the expression is erroneous, an exception of a type that would match a handler (15.3 except.handle) of typestd::bad_array_length
(18.6.2.2 xxx) is thrown.If the constant-expression (5.19 expr.const) is present, it shall be an integral constant expression and its value shall be greater than zero. The constant expression specifies the bound of (number of elements in) the array. If the value of the constant expression is N, the array has N elements numbered 0 to N-1, and the type of the identifier of D is "derived-declarator-type-list array of N T". An object of array typeIfN
is zero, an object of array type has no elements. Otherwise, itcontains a contiguously allocated non-empty set of N subobjects of type T.Except as noted below, if the constant expression is omitted, the type of the identifier of D is "derived-declarator-type-list array of unknown bound of T", an incomplete object type.The type "derived-declarator-type-list array of N T" is a different type from the type "_derived-declarator-type-list_array of unknown bound of T", see 3.9 basic.types. Any type of the form "cv-qualifier-seq array of N T" is adjusted to "array of N cv-qualifier-seq T", and similarly for "array of unknown bound of T". The optional attribute-specifier-seq appertains to the array. [ Example:typedef int A[5], AA[2][3]; typedef const A CA; // type is "array of 5 const int" typedef const AA CAA; // type is "array of 2 array of 3 const int"
void f(unsigned int n) { int a[n]; // type of "a" is "array of runtime bound of int" }
-- end example ] [ Note: ... ]
Change in 8.3.4 dcl.array paragraph 3:
When several "array of" specifications are adjacent, a multidimensional array is created
; only the first of the constant expressions that specify the bounds of the arrays may be omitted. In addition to ...
Add a new paragraph before 8.3.4 dcl.array paragraph 4:
An array of runtime bound shall only be used as the type of a local object with automatic storage duration. If the size of the array exceeds the size of the memory available for objects with automatic storage duration, the behavior is undefined [ Footnote: Implementations that detect this case are encouraged to throw an exception that would match a handler (15.3 except.handle) of type std::bad_array_length (18.6.2.2 xxx). ] It is unspecified whether a global allocation function (3.7.4 basic.stc.dynamic) is invoked to obtain storage for the array. If it is invoked, the corresponding global deallocation function is invoked to release the storage after the lifetime of the array ended. [ Footnote: Alternatively, an implementation could allocate such an array on the usual stack or obtain storage via
malloc
(20.6.13 c.malloc). ]
Change in 8.3.5 dcl.fct paragraph 8:
If the type of a parameter includes a type of the form "array of runtime bound of T", "pointer to array of unknown bound of T", or "reference to array of unknown bound of T," the program is ill-formed. [ Footnote: ... ] Functions shall not have a return type of type array or function, although they may have a return type of type pointer or reference to such things.
Change in 8.5.1 dcl.init.aggr paragraph 2:
... An empty initializer list
{}
shall not be used as theinitializer-clause for an array of unknown bound with static or thread storage duration.[ Footnote: The syntax provides for emptyinitializer-lists, but nonetheless C++ does not have zero length arrays. ]
Change in 8.5.1 dcl.init.aggr paragraph 6:
For types other than arrays of runtime bound (8.3.4 dcl.array), an
Aninitializer-list is ill-formed if the number ofinitializer-clauses exceeds the number of members or elements to initialize. [ Example: ... ]
Change in 8.5.2 dcl.init.string paragraph 2:
[ Note: There cannot be more initializers than there are array elements; see 8.3.4 dcl.array. [ Example:
char cv[4] = "asdf"; // error
is ill-formed since there is no space for the implied trailing '\0'. -- end example ] -- end note ]
Change in 9.2 class.mem paragraph 10:
Non-staticA non-static (9.4 class.static) datamembersmember shall not have incompletetypes.type or type "array of runtime bound". [ Note: In particular, a class C shall not contain a non-static member of class C, but it can contain a pointer or reference to an object of class C. ]
Change in 14.1 temp.param paragraph 7:
A non-type template-parameter shall not be declared to have floating point, class, array of runtime bound, or void type. [ Example: ... ]
Add a new section just before 18.6.2.2 new.badlength:
Class
bad_array_length
namespace std { class bad_array_length : public bad_alloc { public: bad_array_length() noexcept; }; }
The class
bad_array_length
defines the type of objects thrown as exceptions by the implementation to report an attempt to allocate an array of runtime bound with a size less than zero or greater than an implementation-defined limit (8.3.4 dcl.array).bad_array_length() noexcept;
Effects: constructs an object of class
bad_array_length
.
Remarks: the result of callingwhat()
on the newly constructed object is implementation-defined.