Add `__divtf3` by tgross35 · Pull Request #622 · rust-lang/compiler-builtins (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Division for f128. Still pretty buggy

Unfortunately this algorithm has an exponent rounding error with subnormals for f128, llvm/llvm-project#93401.

__divtf3(0x00000000000000000000000000000001, 0x0001ffffffffffffffffffffffffffff): std: 0x3f8e0000000000000000000000000001, builtins: 0x3f8f0000000000000000000000000001

The f128 issue is proving pretty tricky to fix. To help slim down the surface area, ~~I am going to split the generic refactor to another PR.~~

f128 division is working with full iterations 🎉 I need to figure out what is wrong with the half-width iterations, and do some more documentation.

It looks like there are some platform-specific issues with the system libraries:

Incorrect NaN handling on i686 Linux, __divtf3(0x00000000000000000000000000000000, 0x80000000000000000000000000000000): std: 0x00000000000000000000000000000000, builtins: 0x7fff8000000000000000000000000000. 0/-0 should be NaN, the system symbols are reporting 0.
Incorrect handling of subnormals on aarch64 Linux, __divtf3(0x00000000000000000000000000000001, 0x0001ffffffffffffffffffffffffffff): std: 0x3f8f0000000000000000000000000001, builtins: 0x3f8e0000000000000000000000000001. Builtins is correct here, the system symbol gets the exponent wrong.

The i686 issue appears to be an ABI issue: GCC aligns f128 arguments to 16 bytes on the stack, whereas Clang/LLVM does not. Looking at the 32-bit x86 System V ABI specification, GCC's behaviour appears to be correct:

Padding may be needed to increase the size of each parameter to enforce alignment according to the values in Table 2.1.

I believe the relevant LLVM issue is llvm/llvm-project#77401.

Interesting, I am surprised that doesn't show up for the other functions.

I suspect that the aarch64 division issue is another case of llvm/llvm-project#91840 that is still waiting on your fix to be merged.

I think this is ready for a look when you get the chance @Amanieu.

f128 seems to require one more half-width iteration than expected, and this still isn't as tidy as I would like. However, tests pass so as-is seems reasonable enough to unblock division in std as well as #614. I plan to cycle back for the other fixes.

tgross35 marked this pull request as ready for review

August 19, 2024 21:16

Even with the extra iteration performance isn't bad. Not sure why we are that much faster here.

div_f128/compiler-builtins
                        time:   [261.10 µs 262.32 µs 264.06 µs]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe
div_f128/system         time:   [329.93 µs 330.71 µs 331.47 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

f32 and f64 are essentially unchanged.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, though I didn't check the algorithm in detail.

Float division requires some shift operations on big integers; implement right shift here.

Add some bounds to integer types that allow our function trait bounds to be slightly less verbose. Also clarify documentation and remove a redundant operation.

Float division currently has a separate div32 and div64 for f32 and f64, respectively. Combine these to make use of generics. This will make it easier to support f128 division, and reduces a lot of redundant code.

This includes a simplification of division tests.

Use the new generic division algorithm to expose __divtf3 and __divkf3.

Thanks for taking a look! I'll have some more improvements to do here but at least this gets us off the ground.

tgross35 added a commit to tgross35/rust that referenced this pull request

Sep 24, 2024

tgross35 added a commit to tgross35/rust that referenced this pull request

Sep 27, 2024

tgross35 added a commit to tgross35/rust that referenced this pull request

Sep 28, 2024

tgross35 added a commit to tgross35/rust that referenced this pull request

Sep 28, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request