Array descriptors (original) (raw)

Tobias Burnus

unread,

Oct 10, 2008, 4:12:03 AM10/10/08

Hello all,

this is more an implementation question than a Fortran standard
question, but it relates to the latter due to the upcoming TR 29113
(further interoperability with C).

(An old proposal, which was the starting point for the TR, can be
found at http://j3-fortran.org/doc/year/06/06-171.txt which is new
enough for the issue below.)

There seem to be two ways of implementing the array descriptors ("dope
vectors") internally:

a) base_address

and

b) base_address
offset
where the base_address plus the offset defines the first memory
location (start) of the array.

[Additionally there are of cause the array bounds, some flags
(allocatable, pointer), length-kind parameters etc., but they are not
relevant for my question.]

(a) is used by Pathscale [1] and in the current draft of TR 29113,
(b) is used by Intel [2], gfortran and g95

Off-hand I don't see the advantage of using an offset, but I'm
positive there are advantage. Does anyone know -- off hand or with
thinking -- the pros and cons for either solution?

Tobias

PS: My motivation is twofold: (a) I want to see an optimal TR and (b)
gfortran's array descriptors need to be modified to fix some issues
and the idea was to use the TR-format also internally and not only for
C interoperability; especially as the current draft-TR has no "offset"
while gfortran does, one needs to decide whether one keeps the offset
(and maybe even adds it in the TR) or whether one gets rid of it in
gfortran.

[1] http://www.pathscale.com/ws/docs/UserGuide.pdf
[2] http://www.intel.com/software/products/compilers/docs/flin/main_for/mergedProjects/bldaps_for/common/bldaps_hndl_arrdesc.htm

glen herrmannsfeldt

unread,

Oct 10, 2008, 5:39:10 AM10/10/08

Tobias Burnus wrote:

> this is more an implementation question than a Fortran standard
> question, but it relates to the latter due to the upcoming TR 29113
> (further interoperability with C).

(snip)

> There seem to be two ways of implementing the array descriptors ("dope
> vectors") internally:

> a) base_address
>
> and

> b) base_address
> offset

I am not so sure I understand the difference, but there was a discussion
some time ago (and maybe in a different newsgroup) on the VAX descriptor.

(The calling convention was part of the VAX architecture such that it
would be OS and language independent.)

The VAX descriptor has the address of the beginning of the allocated
space, and also the virtual origin, that is, the address where element
(0,0,0,...) would be, whether or not it is part of the array.

That seems redundant, but is actually needed such that calls within
and between PL/I and Fortran, not to mention other languages with
array descriptors. (And in 1976, before Fortran had dynamic allocation.)

In PL/I, the called routine gets the upper and lower limits from
the calling routine. In Fortran, the called routine uses a lower
limit of 1, or a different one it specifies. PL/I can use the
virtual origin and the appropriate multipliers. Fortran can use
the real origin and the appropriate multipliers after subtracting
the (local) lower limit for each dimension.

Maybe that helps, I am not sure which offset you are describing.

-- glen

robert....@sun.com

unread,

Oct 10, 2008, 6:01:27 AM10/10/08

On Oct 10, 1:12 am, Tobias Burnus <bur...@net-b.de> wrote:
> Hello all,
>
> this is more an implementation question than a Fortran standard
> question, but it relates to the latter due to the upcoming TR 29113
> (further interoperability with C).
>
> (An old proposal, which was the starting point for the TR, can be

> found at http://j3-fortran.org/doc/year/06/06-171.txtwhich is new

> [2]http://www.intel.com/software/products/compilers/docs/flin/main_for/m...

Sun Fortran uses a third option. Sun Fortran uses an actual
origin, your base address, and a virtual origin, the address
of the possibly hypothetical array element indexed by
subscripts all of whose values are zero. The virtual origin
serves the same purpose as the offset you described, but it
saves an addition or subtraction (depending on which way the
offset goes).

The virtual origin provides a more efficient way of finding
the address of an array element given a sequence of
subscripts. For example, the canonical method for
computing the postition of an array element in a rank 3
array given in Section 6.5.3.2 of CD 1539-1 is

1 + (s1 - j1) + (s2 - j2)*d1 + (s3 - j3)*d2*d1

where s1, s2, and s3 are the subscript values, j1, j2, and
j3 are the lower bounds and d1 and d2 are the extent of the
firsrt and second dimensions. The extents are invariant as
long as the array is allocated, and the lower bounds are
usually fixed for relatively long times. Therefore,
regrouping the computation as

(1 + s1 + s2*d1 + s3*d2*d1) - (j1 + j2*d1 + j3*d2*d1)

allows the value of the second term to be cached and reused
as needed in subscript computations, which avoids the need
for three subtractions in the subscript caluculation. The
value of that term or its negation is the offset to which
you referred.

The offset (or vitual origin) is unnecessary, it is merely
an artifice to gain a bit of efficiency.

Bob Corbett

paul.rich...@gmail.com

unread,

Oct 10, 2008, 7:17:07 AM10/10/08

Dear All,

> The offset (or vitual origin) is unnecessary, it is merely
> an artifice to gain a bit of efficiency.

I guess that one reason to have the base address, which necessitates
the offset or virtual origin, is the need to be able to free or to
reallocate the memory for pointers and allocatable arrays in scopes
other than the parent scope of the array.

A related question is that of the representation of the 'dimension'
structure:

gfortran and g95 use; stride, lower_bound and upper_bound, largely
because the data is represented as an array of the element type and
referencing is done modulo the number of bytes for the alignment of
the type. If I have understood correctly, this takes maximum
advantage of the information held in the backend about the
architecture of the target.

Other compilers and the proposed TR use; lower_bound, extent (number
of elements along the dimension) and sm (distance between elements
in bytes along the dimension.)

One of our motives for changing the gfortran descriptor was to
progress from the first representation to the second. However, it is
not not evident to me that it matters. Constant folding in the
backend of the compiler should result in the same arithmetic being
done at runtime.

Thus, other than the not-inconsiderable work being good for our
immortal souls, does any body know of a benefit that would be gained
from changing the representation? Providing the necessary interface
for c_interop is relatively trivial in either case. The addition of
the missing flags can be done without much disruption.

Thanks in advance for your thoughts.

Paul

PS For your reference:
The Gfortran Array Descriptor

typedef struct descriptor_dimension
{
index_type stride;
index_type lbound;
index_type ubound;
}
descriptor_dimension;

#define GFC_ARRAY_DESCRIPTOR(r, type) \
struct {\
type *data;\
size_t offset;\
index_type dtype;\
descriptor_dimension dim[r];\
}

For a 32 bit machine;

• The first longword (bytes 0 to 3) contains the base address. The
base address plus the offset defines the first memory location (start)
of the array.
• The second longword (bytes 4 to 7) contains the offset. The offset
is added to the base address to define the start of the array.
• Third longword (bytes 8 to 11) contains bits 1-3 'rank' , bits 4-6
'type' , bits 7-32 'size' bytes
• The remaining longwords (bytes 12 to 95) contain information about
each dimension (up to seven). Each dimension is described by three
additional longwords:
o The stride
o The lower bound
o The upper bound

robert....@sun.com

unread,

Oct 11, 2008, 1:08:17 AM10/11/08

> > The offset (or virtual origin) is unnecessary, it is merely

> > an artifice to gain a bit of efficiency.
>
> I guess that one reason to have the base address, which necessitates
> the offset or virtual origin, is the need to be able to free or to
> reallocate the memory for pointers and allocatable arrays in scopes
> other than the parent scope of the array.

Just as the virtual origin can be computed from the actual
origin (base address), the strides, and the lower bounds,
the actual origin can be computed from the virtual origin,
the strides and the lower bounds.

> A related question is that of the representation of the 'dimension'
> structure:
>
> gfortran and g95 use; stride, lower_bound and upper_bound, largely
> because the data is represented as an array of the element type and
> referencing is done modulo the number of bytes for the alignment of
> the type. If I have understood correctly, this takes maximum
> advantage of the information held in the backend about the
> architecture of the target.
>
> Other compilers and the proposed TR use; lower_bound, extent (number
> of elements along the dimension) and sm (distance between elements
> in bytes along the dimension.)
>
> One of our motives for changing the gfortran descriptor was to
> progress from the first representation to the second. However, it is
> not not evident to me that it matters. Constant folding in the
> backend of the compiler should result in the same arithmetic being
> done at runtime.
>
> Thus, other than the not-inconsiderable work being good for our
> immortal souls, does any body know of a benefit that would be gained
> from changing the representation?

Yes. array operations can be coded more efficiently
using extents rather than lower and upper bounds.
When extents are used, array operations scalarize into
nested loops from the extent down to zero or, for some
machines, from the negation of the extent up to zero.
For such loops, it is usually better to use the actual
origin than the virtual origin. The virtual origin
tends to be better when explicit subscripts are involved.
For array operations, where array indexing is implicit,
the actual origin tends to be better.

BTW, Sun Fortran almost never references the lower bounds
that are contained in an array pointer. The lower bounds
are not needed for either explicit subscripting or array
operations. The lower bounds are needed to implement
LBOUND and UBOUND, and that is about it. For that reason,
Sun Fortran puts the lower bounds at the end of the array
pointer.

Some people argued against including the virtual origin as
part of an array pointer. They argued that the time spent
computing the virtual origin was unlikely to be recouped
by the savings in time for array indexing. The argument is
particularly string for rank 1 array pointers.

Bob Corbett

paul.rich...@gmail.com

unread,

Oct 11, 2008, 3:37:38 AM10/11/08

Bob,

> Yes. array operations can be coded more efficiently
> using extents rather than lower and upper bounds.
> When extents are used, array operations scalarize into
> nested loops from the extent down to zero or, for some
> machines, from the negation of the extent up to zero.
> For such loops, it is usually better to use the actual
> origin than the virtual origin. The virtual origin
> tends to be better when explicit subscripts are involved.
> For array operations, where array indexing is implicit,
> the actual origin tends to be better.

Ah yes. Quite a lot of the gfortran scalarizer is devoted to sorting
out the loop ranges:-) I don't think that we take a runtime
performance hit though.

Cheers

Paul

robert....@sun.com

unread,

Oct 11, 2008, 6:12:10 AM10/11/08

The hit might be small, especially if the array operation
being done is slow, but I suspect gfortran does take a
performance hit.

Bob Corbett

glen herrmannsfeldt

unread,

Oct 11, 2008, 11:56:29 PM10/11/08

robert....@sun.com wrote:
(snip)

> Sun Fortran uses a third option. Sun Fortran uses an actual
> origin, your base address, and a virtual origin, the address
> of the possibly hypothetical array element indexed by
> subscripts all of whose values are zero. The virtual origin
> serves the same purpose as the offset you described, but it
> saves an addition or subtraction (depending on which way the
> offset goes).

> The virtual origin provides a more efficient way of finding
> the address of an array element given a sequence of
> subscripts. For example, the canonical method for
> computing the postition of an array element in a rank 3
> array given in Section 6.5.3.2 of CD 1539-1 is

> 1 + (s1 - j1) + (s2 - j2)*d1 + (s3 - j3)*d2*d1

I thought for Fortran, where the called routine doesn't
necessarily use the same virtual origin, it wouldn't be
so useful. If you want to allow calls between Fortran
and PL/I, though, it helps to have both.

> where s1, s2, and s3 are the subscript values, j1, j2, and
> j3 are the lower bounds and d1 and d2 are the extent of the
> firsrt and second dimensions. The extents are invariant as
> long as the array is allocated, and the lower bounds are
> usually fixed for relatively long times. Therefore,
> regrouping the computation as

> (1 + s1 + s2*d1 + s3*d2*d1) - (j1 + j2*d1 + j3*d2*d1)

> allows the value of the second term to be cached and reused
> as needed in subscript computations, which avoids the need
> for three subtractions in the subscript caluculation. The
> value of that term or its negation is the offset to which
> you referred.

-- glen

Jugoslav Dujic

unread,

Oct 13, 2008, 11:14:23 AM10/13/08

Tobias Burnus wrote:
> b) base_address
> offset
> where the base_address plus the offset defines the first memory
> location (start) of the array.

> [Additionally there are of cause the array bounds, some flags
> (allocatable, pointer), length-kind parameters etc., but they are not
> relevant for my question.]

(snip)

> (a) is used by Pathscale [1] and in the current draft of TR 29113,
> (b) is used by Intel [2], gfortran and g95

(snip)

I'm in a hurry now, but I think you misinterpreted, at least in the
Intel case, (and possibly even for the gnu family) -- the statement

"the base_address plus the offset defines the first memory

location (start) of the array." is false, (If I understood you well)

For Intel descriptors, "base_address" of 1-D array X means

LOC(X(LBOUND(X,1)))

that is, LOC(X(1)) for 1-based arrays.

As Bob Corbett pointed out, offset is used only as optimization sugar.
It corresponds to the "virtual" location of X(0), (or X(0,0,...) for
multi-dimensional arrays); For 1-d array X, it is typically negative:

offset = LOC(X(0)) - LOC(X(LBOUND(X,1)))

Thus, by examining the descriptor, you can always tell the location
of the first element, without looking at the offset.

--
Jugoslav
www.xeffort.com
Please reply to the newsgroup.
You can find my real e-mail on my home page above.