(original) (raw)

Thanks for the details Matt, now I think I can suggest an alternative or provide more reasonable explanation why what you’re doing is right!

Assuming we stick to your current approach, it makes sense to mark the operation legal for vgpr because from RBS point of view the boundaries of the expansion of the instruction are indeed vgpr.

Now, if we want to leverage the infrastructure provided by RBS for this case, a possible alternative would require to unroll the loop (or use a pseudo that would fake the loop unrolling).

Here is what it would look like:

Essentially we would replace:

\`\`\`

= use vgpr

\`\`\`

into

\`\`\`

sgpr1, …, sgprN = extract\_reg vgpr

= use sM sgpr1

…

= use sM sgprN

\`\`\`

In terms of mapping, you would describe vgpr as being broken down in N element of sgpr. The applyMapping would insert the \`extract\_reg\` for repairing and expand the vector use into the scalar uses (or one pseudo with N sgpr as input).

Honestly, I don’t know how much benefit you would get to expose these details to RBS today.

The advantages I see moving forward are:

- At some point I wanted to teach RBS to materialize the repairing code next to the definition (as opposed to the use like we do right now) so that we can reuse that repairing for something else (what you’ve pointed out in your previous reply)

- In the future RBS is supposed to be smart enough to decide that something needs to be scalarized because its uses are cheaper that way. I.e., in your case, the definition of vgpr could be scalarized from the start and we wouldn’t have to insert this repairing code (in other words, RBS would have chosen to scalarize the def of the the vgpr into sgpr instead of repairing the use of the vgpr into sgpr).

Therefore, if we choose not to expose these details to RBS, we shut the door to potential improvements on how the repairing points are inserted/shared and how the cost model is able to deal with choosing the best instructions based on its uses.

Obviously, I haven’t spent a lot of time thinking about for your case, hence it could be completely bogus!

Cheers,

-Quentin

On Feb 26, 2019, at 4:58 PM, Matt Arsenault <arsenm2@gmail.com> wrote:

On Feb 26, 2019, at 7:46 PM, Quentin Colombet <qcolombet@apple.com> wrote:

The only use I would have for the copy is as as a means of passing which registers were already created for the new mapping, after which point I would need to delete it.

Could you describe in pseudo code what the expansion of vgpr into sgpr looks like?
e.g., = use vgpr
And you only support = use sgpr

It’s serializing the vector operation. There’s an additional optimization to reduce the number of loop iterations when multiple work items/lanes/threads have the same value in them which happens in practice, but essentially it does:

Save Execution Mask
For (Lane : Wavefront/Warp) {
Enable Lane, Disable all other lanes
SGPR = read SGPR value for current lane from VGPR
VGPRResult\[Lane\] = use\_op SGPR
}
Restore Execution Mask

Eventually it might be nice to have optimizations to only emit one of these loops when multiple consecutive instructions need the same register handled (which I suspect will happen frequently with image samplers), but I haven’t really thought about what that should look like yet.

-Matt