RFR 8209138: Symbol constructor uses u1 as the element type of its name argument (original) (raw)

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Wed Oct 3 12:38:12 UTC 2018


Harold, this is nicely done!  Thank you for perservering in this cleanup and following through on the issues raised.  I agree not to change more in the SA if it works. Thanks, Coleen

On 10/2/18 4:51 PM, Harold David Seigel wrote:

Hi David,

Thanks for looking at this again. Please see this latest webrev and also in-line comments below: http://cr.openjdk.java.net/~hseigel/bug8209138.4/webrev/

On 10/1/2018 9:45 PM, David Holmes wrote: Hi Harold,

Thanks for persevering with this! Thanks! On 2/10/2018 6:23 AM, Harold David Seigel wrote: Hi Coleen,

Please review this updated webrev: http://cr.openjdk.java.net/~hseigel/bug8209138.3/webrev/index.html It uses u1 for Symbol instead of char.  I also added a "char charat(int index)" function to class Symbol.  It's for callers that want a char from a Symbol instead of a u1. I was in two minds about this as we really are comparing the i'th byte with a given char value. But I suppose if I read charat(i) as byteascharat(i) then it's just a type-conversion operation. Though I am curious what byteat usages remain after this change? There were a few uses of byteat() in heapDumper.cpp, jvmciCompilerToVM.cpp, and jvmtiTagMap.cpp.  I changed these calls to charat() because the function results were being treated as chars.  Since there were no more uses of Symbol::byteat() and ciSymbol::byteat(), I removed them. src/hotspot/share/ci/ciSymbol.hpp Can you humour me and please change this comment:  // Return the i-th char, where i < utf8length_ _to:_  _// Return the i-th utf byte as a char, where i < utf8length_ _Done_ _---_ _src/hotspot/share/oops/symbol.hpp_ _char charat(int index) const {_ _assert(index >=0 && index < length(), "symbol index overflow"); return base()[index]; } I'm surprised this doesn't need a (char) cast to keep the compiler happy. Some compilers may need it I think and it doesn't hurt to add it. I added the cast. --- src/hotspot/share/runtime/vmStructs.cpp There's a comment in: ./jdk.hotspot.agent/share/classes/sun/jvm/hotspot/types/Field.java that needs updating:  FIXME: among other things, this interface is not sufficient to  describe fields which are themselves arrays (like Symbol's  jbyte body[1]). s/jbyte/u1/ Done Also: ./jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Symbol.java accesses the array as jbytes - that may need adjusting. The serviceability agent builds okay and passes all its tests. So, I'd prefer not to open this potential can of worms. ---- Otherwise all seems fine. Thanks! Harold Thanks, David -----

Thanks! Harold

On 9/27/2018 10:03 AM, coleen.phillimore at oracle.com wrote: Hi Harold,  I think we've decided to make u1 the type we carry around in Symbol instead of char, since that's more accurate to represent utf8.  Since the goal is to make the types consistent, can you try to change it all to u1 instead. Keeping the name getbyte() and see how bad the casting situation is? thank you! Coleen

On 9/27/18 9:17 AM, Harold David Seigel wrote: Hi David, Please review this updated webrev: http://cr.openjdk.java.net/~hseigel/bug8209138.2/webrev/index.html It contains additional changes to use char for class Symbol. Hopefully, your concerns w.r.t. signed and unsigned types were resolved by our offline discussions. Thanks, Harold

On 8/21/2018 2:24 AM, David Holmes wrote: On 21/08/2018 3:19 PM, Kim Barrett wrote: On Aug 20, 2018, at 8:53 PM, David Holmes <david.holmes at oracle.com> wrote:

Hi Harold, On 21/08/2018 12:43 AM, Harold David Seigel wrote: Hi, Please review this change for bug JDK-8209138. The fix changes class Symbol in symbol.hpp to use type char instead of types u1 and jbyte and renames relevant functions by replacing 'byte' with 'char'. For example, 'Symbol::byteatput()' is now 'Symbol::charatput()'. It's not clear to me exactly what all these things should be. In a lot of places we seem to be dealing with UTF8 character representations, which to me should be u1 (afterall that is how they are specified in the classfile format!) which is just an unsigned "byte". But char is signed (in our build). Not sure what’s meant by “in our build” but char is unsigned when building HotSpot for aarch64 / arm. See, for example, JDK-8209586. Ah! We have always set -fsigned-char for the Oracle ARM and ARM64 builds (and previously PPC32), but now I see that, as you say, Aarch64 does not set this (nor PPC64). Interesting and unfortunate. I'm somewhat surprised that we haven't encountered more issues due to this. In any case we are still converting from an unsigned type to a signed type in many cases with this change. David -----

The current change just seems to push out the boundary where we convert between things. For example ciSymbol now has a char base()** but still has a byteat() method - should that not know be charat() for consistency? Especially given byteat() returns getsymbol()->char-at() ** Granted the existing jbyte seems the wrong choice too. I think we have a lot of type confusion in our code, but it isn't clear to me that this particular change is necessarily changing things in the right direction. Cheers, David

Open Webrev: http://cr.openjdk.java.net/~hseigel/bug8209138/webrev/index.html JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8209138 The change was tested by running Mach5 tiers 1 and 2 tests and builds on Linux-x64, Windows, and Mac OS X, running tiers 3-5 tests on Linux-x64, and by running JCK-11 API, Lang and VM tests on Linux-x64. Thanks, Harold



More information about the hotspot-runtime-dev mailing list