Optimizing UUID#fromString(String) (original) (raw)

Jon Chambers jon.chambers at gmail.com
Sat Jan 27 16:05:09 UTC 2018

Previous message: JDK-8196298: Add null Reader and Writer
Next message: Optimizing UUID#fromString(String)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello!

I've recently had reason to take a look at performance around parsing and stringifying UUIDs. In exploring the space, I identified some opportunities to optimize the implementation of UUID#fromString as it currently exists ( http://hg.openjdk.java.net/jdk/jdk/file/fd237da7a113/src/java.base/share/classes/java/util/UUID.java#l196 ).

Because UUID strings are of a known structure and length (32 hexadecimal digits and four dashes) and because UUIDs are exactly 128 bits in length, we know exactly how each character in a UUID string maps to bits in the parsed UUID. We always know, for example, that the first character in a UUID string maps to the four highest bits in the UUID, the second character maps to the four bits below that, and so on.

With that knowledge, we can cut out a lot of the generality and bounds-checking we'd normally expect of a string-to-number parser. I've built an implementation with that in mind: https://github.com/jchambers/fast-uuid/blob/master/src/main/java/com/eatthepath/uuid/FastUUID.java#L108. In benchmarks ( https://github.com/jchambers/fast-uuid/blob/master/benchmark/src/main/java/com/eatthepath/UUIDBenchmark.java#L55-L63), this implementation is about six times faster than the current JDK implementation (9.0.4+11) and 14 times faster than the implementation in 1.8.

The experimental implementation is more strict about UUID format (the current JDK implementation allows for variable-length blocks of hex digits between dashes while the experimental one doesn't), and I'll defer to you folks as to whether its handling of technically-malformed UUID strings is acceptable. As discussed via Twitter ( https://twitter.com/cl4es/status/956308599277486080), we might consider using the fixed-length parsing approach if we know the UUID string is exactly 36 characters long and fall back to the looser parser otherwise. I also recognize that this is partially reinventing the wheel when it comes to parsing hex strings, and the tradeoff between consistency and performance is certainly worthy of consideration.

Regardless, I wanted to call this optimization opportunity to your attention, and would be happy to offer a proper patch if this seems like a worthwhile change.

Cheers, and thank you for your consideration!

-Jon

Previous message: JDK-8196298: Add null Reader and Writer
Next message: Optimizing UUID#fromString(String)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the core-libs-dev mailing list