Define UB in float-to-int casts to saturate by Mark-Simulacrum · Pull Request #71269 · rust-lang/rust (original) (raw)

I've also done some research and summarized the state AIUI (though I would be happy for folks to update this comment if there's details I missed):

This was briefly discussed in the last language team meeting, and we agreed that it seems time to write up a comment about the current state and the plan to (finally!) close this soundness hole.

The current state is:

The support for the unchecked casts via unsafe methods from floats to integers of all sizes was stabilized during this cycle, and will hit stable June 4th.
-Zsaturating-float-casts currently implements the following semantics, quoting from the code:

Semantically, the mathematical value of the input is rounded towards zero to the next mathematical integer, and then the result is clamped into the range of the destination integer type. Positive and negative infinity are mapped to the maximum and minimum value of the destination integer type. NaN is mapped to 0.

We gathered benchmark results as to the performance of the saturating code, and those resulted in the following:

Most use cases have essentially no impact
Notably, bench_rgb from the image crate has a 22% slowdown;
- There is a scattering of similar regressions elsewhere.

There has been some commentary throughout this thread that the current routine implemented in the compiler is not as optimized as possible, but AFAICT, exploiting most of the wins requires platform-specific code, likely written in inline assembly due to insufficient support on LLVM's side.

Some background on what other languages do:

Julia defines the following functions:
- unsafe_trunc(Int64, x) maps directly to the corresponding LLVM intrinsic fptosi
- trunc(Int64, x) throws an exception for out of range values
- convert(x) throws an exception for out of range or non-integer values
Java (look for "narrowing conversion of a floating-point number"):
- NaN is converted to 0
- Finite floating point numbers are rounded to integer (toward zero), either i64 or i32, and then truncated into the final type (essentially, in pseudo-Rust syntax, I think this would be f64 as i64 as iN).
- Infinities are sent to iN::{MIN, MAX}.
- Floating point values that cannot be represented in a iN are also saturated to iN::{MIN, MAX}.
- Note that Java does not have unsigned integer primitives.
C++ (stripped down presuming Rust supports only IEEE arithmetic, but see the link for full details):
- if the value can be represented exactly by the target type, it is unchanged
- if the value can be represented, but cannot be represented exactly, the result is the nearest value. It is unspecified whether FE_INEXACT is raised in this case.
- if the value cannot be represented, FE_INVALID is raised and the result value is unspecified.
WASM
- Has two opcode sets (for the matrix of {i,u}N by f{32,64})
  * saturating which implement:
  * infinities go to MAX/MIN, NaN goes to 0
  * trapping which trap on the infinities and NaNs.
Go (look for "Conversions between numeric types")
- Truncates toward zero if representable,
- Non-representable: result is implementation-dependent (I have not tested)
C#
- Checked context:
  * NaN or infinity, throw exception
  * Rounded towards zero
  * If cannot be represented, throw exception
- Unchecked context:
  * Exceptions are instead unspecified values

Hard to pull out anything completely conclusive from this, seems to be a mix of options. But seems like overall rounding to zero and saturation is the common case. It's also what -Zsaturating-float-casts does today in Rust.

It seems clear that the saturating (and NaN → 0) behavior isn't a bad option, and given that it's at least shared by some other languages and already implemented in Rust, I'm inclined to recommend that it's the behavior we stabilize for Rust itself.

In particular, we would define f{32,64} as {u,i}N to behave as follows, defining the undefined behavior:

Round to zero, and representable values cast directly.
NaN goes to 0
Values beyond the limits of the type are saturated to the "nearest value" (essentially rounding to zero, in some sense) in the integral type, so e.g. f32::INFINITY would go to {u,i}N::MAX.

To my knowledge, there has not been opposition to this definition, beyond perhaps wanting to leave it even more open (e.g. stating that the values are unspecified rather than defined to be these). I think in practice we try to avoid that sort of lack of specificity, and beyond potential for performance wins on some targets for other behavior, there doesn't seem to be much point — in most cases the unchecked cast functions should be sufficient, or we can provide intrinsics in the future which are "fast but less nice."

We know that this behavior is a fairly sizable performance regression for some crates, in which case where possible the recommendation is to switch to the unchecked casts. Obviously, that implies that the code never encounters NaN or otherwise non-representable values, but AFAICT, that's true of the cases we know of.

We can separately work on improving the performance of the saturating casts, but I don't think that would be a blocker for stabilization.