NVPTX support for new asm! by westernmagic · Pull Request #72439 · rust-lang/rust (original) (raw)

This is false, I checked on NVPTX and values are not zero-extended. The upper bits are undefined. https://rust.godbolt.org/z/9yU64A

They are, implicitly:

A destination register wider than the specified type may be used. The value loaded is sign-extended to the destination register width for signed integers, and is zero-extended to the destination register width for unsigned and bit-size types. See Table 25 for a description of these relaxed type-checking rules.

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-ld
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#operand-size-exceeding-instruction-type-size

Since clang gives the registers bit sizes, they are all zero extended in this implementation.

Here's a small clang experiment as proof:

#include #include

global void foo(uint32_t x, uint64_t * y) { asm volatile ( "mov.u64 %0, %1;" : "=l"(*y) : "l"(x) ); }

host int main() { uint64_t y = UINT64_MAX; uint64_t * yd; cudaMalloc(&yd, sizeof(uint64_t)); cudaMemcpy(yd, &y, sizeof(uint64_t), cudaMemcpyHostToDevice); foo<<<1, 1>>>(0xBEEF, yd); cudaMemcpy(&y, yd, sizeof(uint64_t), cudaMemcpyDeviceToHost); cudaFree(yd); printf("%lx\n", y); }

If you wish, I can add the links I referenced to the documentation