Efficient integer formatting into fixed-size buffer · Issue #546 · rust-lang/libs-team (original) (raw)
Proposal
Problem statement
The standard library provides highly optimized implementations of integer to decimal string conversions, but these are only accessible via the core::fmt
machinery, which forces 1-2 layers of dynamic dispatch between user code and the actual formatting logic. Benchmarks in the itoa crate demonstrate that side-stepping fmt
makes formatting much more efficient. Currently, any Rust user who wants that performance has to use third-party crates like itoa or lexical(-core) which essentially duplicate standard library functionality.
Motivating examples or use cases
- serde-json
http
: integer to HTTP header valuesmaud
: rendering integers in HTML templatesarrow-json
: converting data from the Arrow format to JSON- ... and basically everything else that depends on itoa (the
lexical*
crates also include float formatting and integer/float parsing, so their list of dependents is less illustrative). - rustix used to use itoa for putting integers (like file descriptors) into paths without intermediate allocations, but recently switched to a simplified homebrew implementation.
compact_str
uses itoa for 128 bit integers, but for smaller integers they vendor the standard library code and modify it to write directly into their custom string type's buffer. However, in this case the buffer size is based on the magnitude of the integer, not the worst-case size.
Solution sketch
impl {iN, uN, usize, isize} { const MAX_STR_LEN: usize; fn format_into(self, buf: &mut [MaybeUninit; Self::MAX_STR_LEN]) -> &str; }
This can be used from safe code, though it's a little more noisy than the itoa API since MaybeUninit::uninit_array
is slated for removal:
use_str(itoa::Buffer::new().format(n)); use_str(n.format_into(&mut [const { MaybeUninit::uninit() }; TypeOfN::MAX_STR_LEN])); // With uninit_array the length could be inferred: use_str(n.format_into(&mut MaybeUninit::uninit_array()));
Alternatively, unsafe code can write directly into the buffer they want, e.g., for the itoa usage in http could write directly into the (edit: not so simple, see first reply)spare_capacity_mut()
of the BytesMut
it creates. I believe it could also replace the homebrew integer formatting in rustix::DecInt
.
Alternatives
The obvious option would be to import the API of itoa directly (single Buffer
type with fn format(&mut self, n: impl SealedTrait) -> &str
), since it's already widely used. However:
Not being able to format directly into part of a buffer you own is insufficient for some users, who end up vendoring their own integer formatting code (e.g.,(edit: not so simple, see first reply)rustix
andcompact_str
as mentioned under motivation).- If Rust later adds const-generic
u<N>
andi<N>
types with generous limits onN
(e.g., Generic Integers V2: It's Time rfcs#3686) then a one-size-fits-all buffer may become excessively large. Even with a limit of N <= 4096, it would be well over a kilobyte. Even though the buffer doesn't have to be initialized, it'll still increase the stack frame size, which can have undesirable side effects on code generation (stack probes are generally inserted for frames larger than one page, SP-relative loads and stores need larger offsets that may no longer fit into an immediate operand). - The trait is extra API surface. While it's useful to expose (for bounds and for accessing the associated constant
MAX_STR_LEN
), the general trend in the standard library is to have inherent associated functions and constants on every integer type, not traits implemented for every integer type.
Other alternatives:
- lexical-core has
fn write(n: impl SomeTrait, buf: &mut [u8]) -> &[u8]
, but this requiresunsafe
to get a string out of it, and even if the return type is changed to&str
, it can panic if the buffer is too small for the givenn
and requires a fully initialized buffer. - lexical has
fn to_string(n: impl SomeTrait) -> String
but this requiresalloc
(not justcore
) and does an unnecessary heap allocation when the result is immediately copied into another buffer.
Links and related work
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.