Add __divtf3
by tgross35 · Pull Request #622 · rust-lang/compiler-builtins (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation12 Commits5 Checks24 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Division for f128
. Still pretty buggy
Unfortunately this algorithm has an exponent rounding error with subnormals for f128
, llvm/llvm-project#93401.
__divtf3(0x00000000000000000000000000000001, 0x0001ffffffffffffffffffffffffffff): std: 0x3f8e0000000000000000000000000001, builtins: 0x3f8f0000000000000000000000000001
The f128 issue is proving pretty tricky to fix. To help slim down the surface area, I am going to split the generic refactor to another PR.
f128 division is working with full iterations 🎉 I need to figure out what is wrong with the half-width iterations, and do some more documentation.
It looks like there are some platform-specific issues with the system libraries:
- Incorrect NaN handling on i686 Linux,
__divtf3(0x00000000000000000000000000000000, 0x80000000000000000000000000000000): std: 0x00000000000000000000000000000000, builtins: 0x7fff8000000000000000000000000000
. 0/-0 should be NaN, the system symbols are reporting 0. - Incorrect handling of subnormals on aarch64 Linux,
__divtf3(0x00000000000000000000000000000001, 0x0001ffffffffffffffffffffffffffff): std: 0x3f8f0000000000000000000000000001, builtins: 0x3f8e0000000000000000000000000001
. Builtins is correct here, the system symbol gets the exponent wrong.
The i686 issue appears to be an ABI issue: GCC aligns f128
arguments to 16 bytes on the stack, whereas Clang/LLVM does not. Looking at the 32-bit x86 System V ABI specification, GCC's behaviour appears to be correct:
Padding may be needed to increase the size of each parameter to enforce alignment according to the values in Table 2.1.
I believe the relevant LLVM issue is llvm/llvm-project#77401.
Interesting, I am surprised that doesn't show up for the other functions.
I suspect that the aarch64 division issue is another case of llvm/llvm-project#91840 that is still waiting on your fix to be merged.
I think this is ready for a look when you get the chance @Amanieu.
f128
seems to require one more half-width iteration than expected, and this still isn't as tidy as I would like. However, tests pass so as-is seems reasonable enough to unblock division in std
as well as #614. I plan to cycle back for the other fixes.
tgross35 marked this pull request as ready for review
Even with the extra iteration performance isn't bad. Not sure why we are that much faster here.
div_f128/compiler-builtins
time: [261.10 µs 262.32 µs 264.06 µs]
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) high mild
9 (9.00%) high severe
div_f128/system time: [329.93 µs 330.71 µs 331.47 µs]
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
f32
and f64
are essentially unchanged.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, though I didn't check the algorithm in detail.
Float division requires some shift operations on big integers; implement right shift here.
Add some bounds to integer types that allow our function trait bounds to be slightly less verbose. Also clarify documentation and remove a redundant operation.
Float division currently has a separate div32
and div64
for f32
and f64
, respectively. Combine these to make use of generics. This
will make it easier to support f128
division, and reduces a lot of
redundant code.
This includes a simplification of division tests.
Use the new generic division algorithm to expose __divtf3
and
__divkf3
.
Thanks for taking a look! I'll have some more improvements to do here but at least this gets us off the ground.
tgross35 added a commit to tgross35/rust that referenced this pull request
tgross35 added a commit to tgross35/rust that referenced this pull request
tgross35 added a commit to tgross35/rust that referenced this pull request
tgross35 added a commit to tgross35/rust that referenced this pull request
bors added a commit to rust-lang-ci/rust that referenced this pull request
cberner pushed a commit to cberner/rust that referenced this pull request
RalfJung pushed a commit to RalfJung/miri that referenced this pull request
cberner pushed a commit to cberner/rust that referenced this pull request