Support index size != pointer width · Issue #65473 · rust-lang/rust (original) (raw)
Preliminaries
usize
is the pointer-sized unsigned integer type [1].
It is also Rust's index type for slices and loops; this definition works well when pointer size corresponds to the space of indexable objects (most targets today). Informally, uintptr_t == size_t
.
Note that the target pointer width is indisputably set by the LLVM data layout string.
It would be correct to say that it is currently impossible to have usize
different to target_pointer_width
without breaking numerous assumptions in rustc [2, 3].
Unfortunately, uintptr_t == size_t
doesn't hold for all architectures. For context, I've worked toward (not active) compiling Rust for MIPS/CHERI (CHERI128) [4]. This target has 128-bit capability pointers (as in layout string), and a 64-bit processor and address space.
I also assume that we don't want programmers messing with pointers in Safe Rust, and that they shouldn't have to care how a pointer (or reference) is represented/manipulated by an architecture.
Problem
I think that more than one type is necessary here, to distinguish between the "index" or "size" component of a pointer (a la size_t
), and the space required to contain a pointer (uintptr_t
).
To me, the ideal solution is to change usize
to be in line with size_t
and not uintptr_t
. As @briansmith notes, this would be a breaking semantic change. I claim that this is only problematic on architectures where uintptr_t != size_t
. As such, code breakage from changing this assumption is constrained to targets where the code was already broken.
Why not have a 128-bit usize
? This is technically feasible, and it's the basis of my compilation of Rust for CHERI. But:
- Bounds checks explode from 2 instructions to 7. Yes, this occurs with optimisation on, but no, I haven't profiled it on real-world applications.
- rustc tries to index into LLVM intrinsics such as
memcpy
with 128-bit integers. This isn't defined in the backend, and arguably shouldn't be defined. I will not be the last person to wonder whymemcpy
doesn't generate any instructions. - The address space is 64 bits.
ptr as int
gives an LLVMi64
, which can't be cast/isn't comparable to ani128
; again there is no good reason to manipulate 128-bit integers here. Likewise when callinginttoptr
, which is a valid instruction even if the result can't be dereferenced [5].
It may not be necessary to define and expose a uintptr_t
type. It's optionally defined in C; I'm not sure programmers want to use such a type, and it could be relegated to the compiler. I haven't thought about this seriously, though.
The key issue is the conflict between index size and pointer width. How can we resolve this conflict, and support architectures with index size != pointer width? (or: why isn't this a problem at all?)
Other questions
Is this a better kind of broken? I don't know, that's what this issue is for. What is certain is that lots of libc-using code probably depends on usize == uintptr_t == size_t
and that these will break in either case.
Is provenance a problem? From my experience with the Rust compiler, no [6]. Integers (usize
) are never cast back to pointers and dereferenced. We already know this at some level (rust-lang/unsafe-code-guidelines#52). This suggests no fundamental link between indexing (i.e. usize
) and pointer width.
Will we really see 128-bit pointers in our lifetime? I don't speak with authority on CHERI, but 64 bits definitely isn't enough for the "usual" 48-bit address space there [7].
But CHERI breaks the C specification; how can we discuss this issue in terms of C types? This issue really isn't about CHERI [8], or C. I won't speculate on the C specification or whether it's helpful for Rust. I use C types as the people likely to engage with this issue are familiar with them.
What about LLVM address spaces? This is a whole new can of worms. I believe rustc will only use one LLVM address space, and in particular won't support two address spaces with different pointer widths. This is an issue for CHERI in hybrid capability mode, but also of supporting any architecture with multiple address spaces. AVR-Rust probably cares about address spaces and may have some expertise here.
Related
- The question of whether
usize == uintptr_t
(Deprecate pointer-width integer aliases libc#1400) - Assuming that
usize
==size_t
will break C FFI code (Are raw pointers to sized types usable in C FFI ? unsafe-code-guidelines#99). This isn't a problem per se, but we almost encourage wrong assumptions in unsafe code. - The problem of
usize
being linked to the bitness of the architecture (What about: volatile, concurrency, and interaction with untrusted threads unsafe-code-guidelines#152) - This very fragile code to print out pointer width demonstrates the level of assumption in the Rust codebase (Support 16-bit targets in get_pointer_width #56567); also related: Policy for assumptions about the size of usize rfcs#1748
Notes
[1] From https://doc.rust-lang.org/std/primitive.usize.html
[2] As remarked by @gnzlbg in rust-lang/libc#1400 (comment); this related problem is a bit subtle and quite complex.
[3] It isn't clear (to me!) whether this is primarily a compiler implementation problem or a semantic problem, but that is not the subject of this issue.
[4] This issue does not motivate support of a particular architecture, though there has been community interest in CHERI.
[5] This is relevant when finding out the size of an object, for example. While generating instructions to extend or truncate the integers is possible, this seems a silly use of cycles at compile time (and possibly runtime).
[6] My experience is limited to rustc (c. 1.35 nightly), libcompiler_builtins, libcore, and liballoc. Some modification was needed to make this work, but no egregious violations.
[7] See CHERI Concentrate for an overview of the considerations.
[8] In particular I'm not asking for help in porting Rust to CHERI, or any other platform. However, I would like support for other architectures to be technically possible.
(edits because I accidentally posted early)