methods with scalarized arguments (original) (raw)

John Rose john.r.rose at oracle.com
Fri May 18 20:42:22 UTC 2018

Previous message (by thread): methods with scalarized arguments
Next message (by thread): methods with scalarized arguments
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On May 18, 2018, at 8:07 AM, Roland Westrelin <rwestrel at redhat.com> wrote:

Hi John,

Are you imagining a single nmethod with two entry points? Currently, nmethods do have two entry points for distinct calling sequences. This might add two more: <VEP, UEP> x <Buffered, Scalarized>.

Thanks for the details. There are some more tricks we could play, maybe. To be clear, I'm not proposing a solution, just throwing out ideas.

The way the calling convention is implemented in MVT, scalarized arguments can be in registers or on stack. There are cases where the scalarized calling convention needs more stack spaces for arguments than the buffered calling convention. Something like:

m(v1, v2, v3, v4, v5) a buffered call would have all 5 arguments in registers but a scalarized call could required stack space for some of the arguments (say if all 5 values have 4 integer fields).

Suppose An are argument registers. We can neglect FP and vector regs for now. For some n, An is in a special stack location, not really a register, but that doesn't change the logic of what I'm talking about. Then the buffered calling sequence would probably be:

m(A0=v1, A1=v2, A2=v3, A3=v4, A4=v5)

The scalarized calling sequence could be:

m(A0=v1.f1, A1=v1.f2, A2=v1.f3, A3=v1.f4, A4=v2.f1, A5=v2.f2, A6=v2.f3, A7=v2.f4, A8=v3.f1, A9=v3.f2, A10=v3.f3, A11=v3.f4, A12=v4.f1, A13=v4.f2, A14=v4.f3, A15=v4.f4, A16=v5.f1, A17=v5.f2, A18=v5.f3, A19=v5.f4)

Clearly many of those An will be in the stack. Also, it is clear that there is a need for more stack here than for the previous calling sequence. Is this close to what you are describing?

So if m() has 2 entry points, the buffered entry point will need to extend the area on the stack for arguments so it can put scalarized arguments on the stack before it jumps to the scalarized entry point. That stack space is in the caller so that wouldn't play well with stack walking.

Yes; callers allocate stack area for all arguments. The callee is allowed to use this area for any purpose whatever. The callee must not touch any other area of the caller's stack frame, nor may it attempt to deallocate the callee-allocated argument area. The caller will eventually deallocate this area.

Note that callers are free to shuffle data around inside the area allocated by the callee. This means that if the callee somehow "knows" to allocate lots of space for the caller, the caller can use it as scratch. This is the trick that the SPARC ABI uses to support varargs. All callers allocate enough stack memory to store a varargs dump area, even if nobody is using varargs. The cost is tiny: You just have some extra stack area, contiguous with any stacked arguments, to hold the arguments in registers. How much extra area? Six words, since SPARC has six argument registers. In this way, a varargs method can dump all six arguments to their dump area, and from that point on all arguments are in a linear array on stack (caller allocated stack!). It's as if the caller passed arguments scalarized in registers, but the callee converted the call to buffered on stack. Not as complex as what we are dealing with with value types, but some interesting parallels IMO.

I think it would be possible (not necessarily desirable—just brainstorming here) for compiled-code callers which pass buffered value types to also allocate enough outgoing argument space their stack frame to allow the caller to de-buffer everything. That would give us frameless adapters, wouldn't it?

There would have to be some bookkeeping to remember which items are value types and which aren't, and calling sequences couldn't be invalidated by suddenly loading new value types that were (up until now) just unknown types. But that's not a practical problem in the JIT, I think. Value types are loaded and known, mostly, by the time the JIT sets up calls. There are corner cases where nothing is known; in those cases there should be a slower handshake of some sort which prevents reformatting of arguments. Idea: Just like the Linux ABI passes a vector count in rax (low byte), we could contrive to pass an indication of how prepared the caller is for the callee to unpack the arguments. We would only want to do that for calls which are potentially problematic, not all calls, unless the indication could be smuggled into the code stream of the caller. (SPARC V8 ABI does the code stream trick also, for struct returns, but it's ugly.)

Tobias suggested 2 entry points and one calls the other: the buffered entry point allocates stack space, shuffle arguments, pushes some on the stack and calls the scalarized entry point. That would solve the stack space problem but quite likely introduces other challenges (do we emit the call from one entry to the other at compile time or call into the runtime and resolve it?

If everything is in one nmethod, then there's no need for resolution. A call (or jump, if frameless) would transfer from the argument shuffling code to the real entry point. Its offset would be location independent and assigned by the branch resolver in output.cpp.

Is the buffered entry point apparent in C2 IR or is it a custom generated blob of assembly?

IR, I suppose. There's already C2 code for converting between buffered and scalarized views of values.

How does this affect stack walking?

If frameless, not very much. If frame-ful, then there would be repetitions of methods in the walk, unless they were suppressed. To suppress them, we'd want to mark the PC ranges of the argument shuffling code specially, so the stack walker could see when an nmethod was in that state. The stack walker already make some small distinctions between states in a code blob (frame pointer is not set up before a particular PC). If the argument shuffling code were put into a distinct nmethod section, distinct from the main code section, then a simple range check could tell the stack walker which state the frame was in.

Do we want to filter one of the activation of method m() from the stack that are reported on exceptions etc.?)

Yes.

Or we compile 2 separates methods which sound like a waste of resources, requires runtime code to keep track of 2 separate nmethods for 1 method, runtime logic for dispatching and compilation policy change to trigger compilation of either one of the nmethods.

That sounds less desirable. Although we take this ugly path with OSR method versions.

If we go with the 2 entry point solution, then null values are never allowed in compiled code.

To me that is a feature not a bug!

I'm not closing the door to nullable value types, but I am saying that we want each method to know clearly, from local information, which value types are nullable, and to expect that this occurs (for now) only in legacy code. In new code, value types are never nullable (in today's designs; the future can wait). What I particularly want to avoid is a thought process like "nullability isn't important, because legacy code might throw us a null, and we might want to do something with it, so all value types are mostly-not-null-but-maybe-sometimes".

Most VT methods will be new code, and null has no overlap with values in such code. If a legacy method passes a null to new code expecting a value, there must be an NPE. If we try to work around the null, instead of throwing, we are hurting the optimization of 99.9% of all future value type code, on the grounds that legacy code must be given permission to infect arbitrary new code with null values.

So, let's stay with one view on nulls per method, not two.

With the 2 nmethods solution, the buffered nmethod could support running with null values at full speed (i.e. null values would not have to be gated so they don't enter the method). And it doesn't matter if m() is legacy or not. If it's legacy and null are passed around then only the buffered nmethod would ever be compiled and executed. If it's legacy and null are not passed then only the scalarized nmethod would ever be compiled and executed.

That's a nice story, except for my objection above. You sketch four states: (nullable VT, clean VT) x (legacy, new). I want to reject the (nullable VT, new) combination totally. If in the future we do value types, the fourth state might be useful, but in that case the user is writing nulls on purpose (instead of legacy code playing badly by accident), and it seems likely that other techniques will apply in that world.

— John

Previous message (by thread): methods with scalarized arguments
Next message (by thread): methods with scalarized arguments
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the valhalla-dev mailing list