[llvm-dev] how to allocate consecutive register? (original) (raw)

Ruiling Song via llvm-dev llvm-dev at lists.llvm.org
Mon Sep 12 05:48:25 PDT 2016


Seems like ARM target use reg_sequnce to form a register tuple and let the store instruction accept that register tuple. Did I understand it correct? What if the address is 64bit while the value is 32bit? Is there any simple way? reg_sequence looks like only accept same type sub-registers.

But the real difficulty for me is I have already ran-out of lanemask bits. I gave a brief introduction of Intel GPU register in the thread: http://lists.llvm.org/pipermail/llvm-dev/2016-August/103953.html

And in the later trial, I hit the lanemask bits ran-out issue. http://lists.llvm.org/pipermail/llvm-dev/2016-August/104017.html Later I choose to define all register tuples using only Rw02047, and using subw031, I reached RegQ_SIMD8 at most! Some piece of RegisterInfo.td are listed: 11 foreach Index = 0-31 in { 12 def subw#Index : SubRegIndex<16, !shl(Index, 4)>; 13 }

18 class IntelGPUReg<string n, bits<13> regIdx> : Register { 20 bits<1> regFile; 21 22 let Namespace = "IntelGPU"; 23 let HWEncoding{12-0} = regIdx; 24 let HWEncoding{15} = regFile; 25 } 26 foreach Index = 0-2047 in { 27 def Rw#Index : IntelGPUReg <"Rw"#Index, !shl(Index, 1)> { 28 let regFile = 0; 29 } 30 } 31 32 // b-->byte w-->word d-->dword q-->qword 33 34 def gpr_w : RegisterClass<"IntelGPU", [i16], 16, 35 (sequence "Rw%u", 0, 2047)> { 36 let AllocationPriority = 1; 37 }

83 def gpr_q_simd8 : RegisterTuples<[subw0, subw1, subw2, subw3, subw4, subw5, subw6, subw7, 84 subw8, subw9, subw10, subw11, subw12, subw13, subw14, subw15, 85 subw16, subw17, subw18, subw19, subw20, subw21, subw22, subw23, 86 subw24, subw25, subw26, subw27, subw28, subw29, subw30, subw31], 87 [(add (decimate gpr_w, 16)), 88 (add (decimate (shl gpr_w, 1), 16)), 89 (add (decimate (shl gpr_w, 2), 16)), 90 (add (decimate (shl gpr_w, 3), 16)), 91 (add (decimate (shl gpr_w, 4), 16)), 92 (add (decimate (shl gpr_w, 5), 16)), 93 (add (decimate (shl gpr_w, 6), 16)), .... 117 (add (decimate (shl gpr_w, 30), 16)), 118 (add (decimate (shl gpr_w, 31), 16))]>;

def RegQ_SIMD8 : RegisterClass<"IntelGPU", [i64, f64], 64, (add gpr_q_simd8)>;

If I introduce larger register tuple, then I need more lanemask bits. Maybe I need to find some other way. Or increase lanemask bits greatly. But for now it is hard for me as I am not quite familiar with llvm register allocator. Any suggestion? If I do not state the problem clearly, please feel free to drop a mail.

2016-09-11 14:50 GMT+08:00 Tim Northover <t.p.northover at gmail.com>:

On 9 September 2016 at 21:19, Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Make the store instruction takes only one operand, a tuple register. > You have examples of tuple registers in the ARM backend.

The difficult bit will be if there are loads with the same property. I don't think you can easily encode the fact that one half of a register is read and the other written. Tim. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/c3de7700/attachment.html>



More information about the llvm-dev mailing list