Support index size != pointer width · Issue #65473 · rust-lang/rust (original) (raw)

Preliminaries

usize is the pointer-sized unsigned integer type [1].
It is also Rust's index type for slices and loops; this definition works well when pointer size corresponds to the space of indexable objects (most targets today). Informally, uintptr_t == size_t.

Note that the target pointer width is indisputably set by the LLVM data layout string.
It would be correct to say that it is currently impossible to have usize different to target_pointer_width without breaking numerous assumptions in rustc [2, 3].

Unfortunately, uintptr_t == size_t doesn't hold for all architectures. For context, I've worked toward (not active) compiling Rust for MIPS/CHERI (CHERI128) [4]. This target has 128-bit capability pointers (as in layout string), and a 64-bit processor and address space.

I also assume that we don't want programmers messing with pointers in Safe Rust, and that they shouldn't have to care how a pointer (or reference) is represented/manipulated by an architecture.

Problem

I think that more than one type is necessary here, to distinguish between the "index" or "size" component of a pointer (a la size_t), and the space required to contain a pointer (uintptr_t).

To me, the ideal solution is to change usize to be in line with size_t and not uintptr_t. As @briansmith notes, this would be a breaking semantic change. I claim that this is only problematic on architectures where uintptr_t != size_t. As such, code breakage from changing this assumption is constrained to targets where the code was already broken.

Why not have a 128-bit usize? This is technically feasible, and it's the basis of my compilation of Rust for CHERI. But:

It may not be necessary to define and expose a uintptr_t type. It's optionally defined in C; I'm not sure programmers want to use such a type, and it could be relegated to the compiler. I haven't thought about this seriously, though.

The key issue is the conflict between index size and pointer width. How can we resolve this conflict, and support architectures with index size != pointer width? (or: why isn't this a problem at all?)

Other questions

Is this a better kind of broken? I don't know, that's what this issue is for. What is certain is that lots of libc-using code probably depends on usize == uintptr_t == size_t and that these will break in either case.

Is provenance a problem? From my experience with the Rust compiler, no [6]. Integers (usize) are never cast back to pointers and dereferenced. We already know this at some level (rust-lang/unsafe-code-guidelines#52). This suggests no fundamental link between indexing (i.e. usize) and pointer width.

Will we really see 128-bit pointers in our lifetime? I don't speak with authority on CHERI, but 64 bits definitely isn't enough for the "usual" 48-bit address space there [7].

But CHERI breaks the C specification; how can we discuss this issue in terms of C types? This issue really isn't about CHERI [8], or C. I won't speculate on the C specification or whether it's helpful for Rust. I use C types as the people likely to engage with this issue are familiar with them.

What about LLVM address spaces? This is a whole new can of worms. I believe rustc will only use one LLVM address space, and in particular won't support two address spaces with different pointer widths. This is an issue for CHERI in hybrid capability mode, but also of supporting any architecture with multiple address spaces. AVR-Rust probably cares about address spaces and may have some expertise here.

Notes

[1] From https://doc.rust-lang.org/std/primitive.usize.html
[2] As remarked by @gnzlbg in rust-lang/libc#1400 (comment); this related problem is a bit subtle and quite complex.
[3] It isn't clear (to me!) whether this is primarily a compiler implementation problem or a semantic problem, but that is not the subject of this issue.
[4] This issue does not motivate support of a particular architecture, though there has been community interest in CHERI.
[5] This is relevant when finding out the size of an object, for example. While generating instructions to extend or truncate the integers is possible, this seems a silly use of cycles at compile time (and possibly runtime).
[6] My experience is limited to rustc (c. 1.35 nightly), libcompiler_builtins, libcore, and liballoc. Some modification was needed to make this work, but no egregious violations.
[7] See CHERI Concentrate for an overview of the considerations.
[8] In particular I'm not asking for help in porting Rust to CHERI, or any other platform. However, I would like support for other architectures to be technically possible.

(edits because I accidentally posted early)