8058779: Faster implementation of String.replace(CharSequence, CharSequence) (original) (raw)
Ivan Gerasimov ivan.gerasimov at oracle.com
Wed May 27 14:27:28 UTC 2015
- Previous message: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
- Next message: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thank you Peter for looking at this!
On 27.05.2015 16:15, Peter Levart wrote:
Hi Ivan,
How does this implementation compare to your's two regarding speed: http://cr.openjdk.java.net/~plevart/jdk9-dev/String.replace/webrev.01/
You need to check for possible overflow here: 2287 resLen += repLen + ss.length();
It is not much shorter than your's first, but more object oriented ;-) Well, the main difference is that you store the positions of substrings in a linked-list instead of an array. But arrays are more efficient.
On small data the performance is almost the same as for the 'array' version: MyBenchmark.test thrpt 40 1'046'137.437 ± 13961.121 ops/s
But on huge data the performance is much worse, comparing to even current implementation. Here are the results of running the stress test, I had posted earlier:
time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)
- 100663296
real 0m12.945s user 0m15.670s sys 0m7.189s
It took 6 times more time to complete, comparing to the current implementation, which uses regexp engine.
Sincerely yours, Ivan
Regards, Peter
On 05/27/2015 11:36 AM, Ivan Gerasimov wrote: Hi Sherman!
Please take a look at my other webrev, that I provided in the first message. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ I used StringJoiner there, which in some aspects seems to fit better here, comparing to StringBuilder. It does reduce the code, but of course runs slower. In that version every part of the source string had to be converted to a separate String, which add another redundant copying. I still prefer the version, which deals with arrays. I don't think it's too complex. It surely adds some complexity to the original code, but it's only 70 lines of code in total. Everything else are comments and empty lines. Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan,
It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/
In the check at 2300-2301 and 2351-2352 I replaced MAXARRAYSIZE with Integer.MAXVALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. ---------------------------------------- public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAXVALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } ---------------------------------------- $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real 0m4.525s user 0m4.402s sys 0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim201505231925-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim201505231925-b00, mixed mode) 26) 201326592 real 0m2.139s user 0m1.960s sys 0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody!
I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/
Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan
- Previous message: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
- Next message: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]