RFR: 8013395 StringBuffer.toString performance regression impacting embedded benchmarks (original) (raw)

David Holmes david.holmes at oracle.com
Fri May 10 06:47:42 UTC 2013


Hi David,

On 10/05/2013 4:39 PM, David Schlosnagle wrote:

One other quick comment, if the toStringCache is non-null during invocation of toString(), that seems to imply that the StringBuffer/StringBuilder has not been mutated since the last invocation of toString(), is there any reason to still use the String copy constructor? This would address the http://bugs.sun.com/bugdatabase/viewbug.do?bugid=6239985 portion of http://bugs.sun.com/bugdatabase/viewbug.do?bugid=6219959.

For example, I'd envisage in AbstractStringBuilder (assuming previous comments of pushing toStringCache from StringBuffer to AbstractStringBuilder): @Override public String toString() { if (toStringCache == null) { toStringCache = new String(value, 0, count); } return toStringCache; }

Right this was our first thought too, but the specification for toString states that a new String is created. So to go this path we'd also have to push through a spec change for StringBuilder/StringBuffer. Given this is an obscure corner case I wanted to go the most minimal route possible. The copy-constructor doesn't copy the array which is what 6239985 was complaining about.

Thanks, David

- Dave

On Fri, May 10, 2013 at 2:25 AM, David Schlosnagle <schlosna at gmail.com_ _<mailto:schlosna at gmail.com>> wrote: Hi David, Would it be beneficial to push the toStringCache up to AbstractStringBuilder so that StringBuilder.toString() benefits from this cache as well? It seems like this would affect both StringBuilder and StringBuffer for repeated calls to toString(), although StringBuffer would obviously have the synchronization overhead as well (assuming the locking is not elided away by HotSpot). Thanks, Dave On Fri, May 10, 2013 at 2:03 AM, David Holmes <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote: Short version: Cache the value returned by toString and use it to copy-construct a new String on subsequent calls to toString(). Clear the cache on any mutating operation. webrev: http://cr.openjdk.java.net/~_dholmes/8013395/webrev.v2/ <http://cr.openjdk.java.net/~dholmes/8013395/webrev.v2/> Testing: microbenchmark for toString performance; new regression test for correctness; JPRT testset core as a sanity check Still TBD - full SE benchmark (?) Thanks, David --------- Long version: One of the goals for JDK8 is to provide a path from Java ME CDC to Java SE (or SE Embedded). In the embedded space some pretty old benchmarks still get used for doing comparisons between JRE's. One of which makes heavy use of StringBuffer.toString, without modifying the StringBuffer in between. Up to Java 1.4.2 a StringBuffer and a String could share the underlying char[]. This meant that toString simply needed to create a new String that referenced the StringBuffer's char[] with no copying of the array needed. In Java 5 the String/StringBuffer implementations were completely revised: StringBuilder was introduced for non-synchronized use, and a new AbstractStringBuilder base class added for it and StringBuffer. In that implementation toString now has to copy the StringBuffer's char[]. This resulted in a significant performance regression for toString() and a bug - 6219959 - was opened. There is quite an elaborate evaluation in that bug report but bottom line was that "real code doesn't depend on this - won't fix". At some stage ME also updated to the new Java 5 code and they also noticed the problem. As a result CDC6 included a variation of the caching strategy that is proposed here. Going forward because we want people to be able to compare ME and SE with their familiar benchmarks, we would like to address this corner case and fix it using the caching strategy outlined. As a data point an 8K StringBuffer that takes ~1ms to be converted to a String initially, can process subsequent toString() calls in a few microseconds. So that performance issue is addressed. However we've added a write to a field in all the mutating methods which obviously adds some additional computational effort - though I have no doubt it is lost in the noise for all but the smallest of mutating methods. Even so this should be run against regular SE benchmarks to ensure there are no performance regressions there - so if anyone has a suggestion as to the best benchmark to run to exercise StringBuffer (if it exists), please let me know. Thanks for reading this far :)



More information about the core-libs-dev mailing list