Improving performance and reducing object allocations of java.util.UUID to/from string (original) (raw)

Steven Schlansker stevenschlansker at gmail.com
Wed Jan 9 17:51:47 UTC 2013


Hello again,

I sent this email a week ago and have received no replies. Is there any step I have missed necessary to contribute to the JDK libraries?

I am very interested in making your lives easier, so please let me know if I am in the wrong place or are otherwise misguided.

Thanks! Steven

On Dec 29, 2012, at 9:25 AM, Steven Schlansker <stevenschlansker at gmail.com> wrote:

Hello core-libs-dev,

My company uses UUIDs throughout our software. We recently identified that the java.util.UUID implementations of fromString and toString are horridly inefficient. An incomplete list of the inefficiencies: * fromString uses a regular expression that is not even precompiled * fromString uses a regular expression to parse out the "-" markers, requiring the allocation of many String and String[] objects, when a simple linear scan works just fine * fromString unnecessarily allocates multiple Long objects * toString creates a char[64], a String object which copies that, and a sub-String object for each of the 5 hex digit segments * toString produces a fixed-length result but involves multiple StringBuilder concatenations Everyone that I have shown the relevant code to has been horrified by the complete lack of care towards object allocations. While I am a big fan of the "object allocation is free until otherwise proven" philosophy, we traced multiple production problems down to these hotspots. But instead of just whining about it, I wish to contribute a patch which I believe fixes the problem. This is my first attempt to submit anything to the JDK, so please be nice :-) My initial attempt has yielded micro benchmarks with very favorable outcomes. By taking the appropriate code from java.lang.Long, modifying it to work on a single pre-allocated array+offset rather than returning a String, and scanning for dashes instead of regex splitting I reduced times 3-6x and object allocations to the low single digits. My Google Caliper benchmark is available here, to ensure that I have not committed any benchmark sins: https://gist.github.com/4356621 benchmark instances Bytes ns linear runtime JdkUuidFromString 51.00 1544.0 608.2 ============================== NewUuidFromString 2.00 72.0 179.1 ======== JdkUuidToString 31.00 1720.0 321.4 =============== NewUuidToString 3.00 200.0 51.5 == In particular, the reduction of object allocations from ~1.5KB to 72/200 bytes dramatically reduces GC pressure when you sit in a tight loop deserializing millions of UUIDs from strings. I believe the same patch can (and should?) apply to jdk7u and jdk8, as the code does not seem to have changed. I would appreciate any feedback on the code style, efficiency, or correctness that you can offer. I have run the "java/util/UUID" jtreg suite against this and it passes. We stole the more thorough Apache Harmony tests for this and they pass as well: https://github.com/stevenschlansker/components-ness-core/blob/master/src/test/java/com/nesscomputing/uuid/TestNessUUID.java I have not yet secured a CLA, but my company is in the process of signing one. The rest of the message is a "hg export" of my change set, which is current to the tip of jdk7. Happy holidays, and thank you for your time! Steven Schlansker

# HG changeset patch # User Steven Schlansker <steven at nesscomputing.com> # Date 1356737383 0 # Node ID a812c963b96f08112f6805cbc380b12af196f788 # Parent 9b8c96f96a0f9a5801b55530a387fefbe50482a3 java.util.UUID: improve performance of UUID#toString and UUID#fromString by avoiding many unnecessary object allocations. benchmark instances Bytes ns linear runtime JdkUuidFromString 51.00 1544.0 608.2 ============================== NewUuidFromString 2.00 72.0 179.1 ======== JdkUuidToString 31.00 1720.0 321.4 =============== NewUuidToString 3.00 200.0 51.5 == diff -r 9b8c96f96a0f -r a812c963b96f src/share/classes/java/util/UUID.java --- a/src/share/classes/java/util/UUID.java Mon Jun 27 13:21:34 2011 -0700 +++ b/src/share/classes/java/util/UUID.java Fri Dec 28 23:29:43 2012 +0000 @@ -189,26 +189,79 @@ * described in {@link #toString} * */ - public static UUID fromString(String name) { - String[] components = name.split("-"); - if (components.length != 5) - throw new IllegalArgumentException("Invalid UUID string: "+name); - for (int i=0; i<5; i++)_ _- components[i] = "0x"+components[i];_ _+ public static UUID fromString(String str) {_ _+ int dashCount = 4;_ _+ int [] dashPos = new int [6];_ _+ dashPos[0] = -1;_ _+ dashPos[5] = str.length();_ _- long mostSigBits = Long.decode(components[0]).longValue();_ _+ for (int i = str.length()-1; i >= 0; i--) { + if (str.charAt(i) == '-') { + if (dashCount == 0) { + throw new IllegalArgumentException("Invalid UUID string: " + str); + } + dashPos[dashCount--] = i; + } + } + + if (dashCount > 0) { + throw new IllegalArgumentException("Invalid UUID string: " + str); + } + + long mostSigBits = decode(str, dashPos, 0) & 0xffffffffL; mostSigBits <<= 16;_ _- mostSigBits |= Long.decode(components[1]).longValue();_ _+ mostSigBits |= decode(str, dashPos, 1) & 0xffffL;_ _mostSigBits <<= 16;_ _- mostSigBits |= Long.decode(components[2]).longValue();_ _+ mostSigBits |= decode(str, dashPos, 2) & 0xffff);_ _- long leastSigBits = Long.decode(components[3]).longValue();_ _+ long leastSigBits = decode(str, dashPos, 3) & 0xffffL;_ _leastSigBits <<= 48;_ _- leastSigBits |= Long.decode(components[4]).longValue();_ _+ leastSigBits |= decode(str, dashPos, 4) & 0xffffffffffffL;_ _return new UUID(mostSigBits, leastSigBits);_ _}_ _+ private static long decode(final String str, final int [] dashPos, final int field) {_ _+ int start = dashPos[field]+1;_ _+ int end = dashPos[field+1];_ _+ if (start >= end) { + throw new IllegalArgumentException("Invalid UUID string: " + str); + } + long curr = 0; + for (int i = start; i < end; i++) {_ _+ int x = getNibbleFromChar(str.charAt(i));_ _+ curr <<= 4;_ _+ if (curr < 0) {_ _+ throw new NumberFormatException("long overflow");_ _+ }_ _+ curr |= x;_ _+ }_ _+ return curr;_ _+ }_ _+_ _+ private static final int NUMALPHADIFF = 'A' - '9' - 1;_ _+ private static final int LOWERUPPERDIFF = 'a' - 'A';_ _+_ _+ private static int getNibbleFromChar(final char c)_ _+ {_ _+ int x = c - '0';_ _+ if (x > 9) { + x -= NUMALPHADIFF; // difference between '9' and 'A' + if (x > 15) { + x -= LOWERUPPERDIFF; // difference between 'a' and 'A' + } + if (x < 10) {_ _+ throw new IllegalArgumentException(c + " is not a valid character for an UUID string");_ _+ }_ _+ }_ _+_ _+ if (x < 0 || x > 15) { + throw new IllegalArgumentException(c + " is not a valid character for an UUID string"); + } + + return x; + } + // Field Accessor Methods /** @@ -372,19 +425,46 @@ * @return A string representation of this {@code UUID} */ public String toString() { - return (digits(mostSigBits >> 32, 8) + "-" + - digits(mostSigBits >> 16, 4) + "-" + - digits(mostSigBits, 4) + "-" + - digits(leastSigBits >> 48, 4) + "-" + - digits(leastSigBits, 12)); + return toString(getMostSignificantBits(), getLeastSignificantBits()); } - /** Returns val represented by the specified number of hex digits. */ - private static String digits(long val, int digits) { - long hi = 1L << (digits * 4);_ _- return Long.toHexString(hi | (val & (hi - 1))).substring(1);_ _+ public static String toString(long msb, long lsb) {_ _+ char[] uuidChars = new char[36];_ _+_ _+ hexDigits(uuidChars, 0, 8, msb >> 32); + uuidChars[8] = '-'; + hexDigits(uuidChars, 9, 4, msb >> 16); + uuidChars[13] = '-'; + hexDigits(uuidChars, 14, 4, msb); + uuidChars[18] = '-'; + hexDigits(uuidChars, 19, 4, lsb >> 48); + uuidChars[23] = '-'; + hexDigits(uuidChars, 24, 12, lsb); + + return new String(uuidChars); } + private static void hexDigits(char[] dest, int offset, int digits, long val) { + long hi = 1L << (digits * 4);_ _+ toUnsignedString(dest, offset, digits, hi | (val & (hi - 1)), 4);_ _+ }_ _+_ _+ private final static char[] HEXDIGITS = {_ _+ '0' , '1' , '2' , '3' , '4' , '5' ,_ _+ '6' , '7' , '8' , '9' , 'a' , 'b' ,_ _+ 'c' , 'd' , 'e' , 'f'_ _+ };_ _+_ _+ private static void toUnsignedString(char[] dest, int offset, int len, long i, int shift) {_ _+ int charPos = len;_ _+ int radix = 1 << shift;_ _+ long mask = radix - 1;_ _+ do {_ _+ dest[offset + --charPos] = HEXDIGITS[(int)(i & mask)];_ _+ i >>>= shift; + } while (i != 0 && charPos > 0); + } + /** * Returns a hash code for this {@code UUID}. *



More information about the core-libs-dev mailing list