StringBuilder version of java.util.regex.Matcher.append* (original) (raw)
Jeremy Manson jeremymanson at google.com
Tue Mar 25 21:07:59 UTC 2014
- Previous message: StringBuilder version of java.util.regex.Matcher.append*
- Next message: Review request for 8035807: Convert use of sun.misc.BASE64Encoder/Decoder with java.util.Base64
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Okay. Thanks, Sherman. Here's an updated version.
I've diverged a bit from Peter's version. In this version, appendExpandedReplacement takes a StringBuilder. The implications is that In the StringBuilder case, it saves creating a new StringBuilder object. In the StringBuffer case, it creates a new StringBuilder, but it doesn't have to acquire and release all of those locks.
I also noticed a redundant cast to (int), which I removed.
Jeremy
On Fri, Mar 21, 2014 at 7:34 PM, Xueming Shen <xueming.shen at oracle.com>wrote:
let's add the StringBuilder method(s), if you can provide an updated version, I can run the rest (since it's to add new api, there is an internal CCC process need to go through).
-Sherman
On 3/21/14 5:18 PM, Jeremy Manson wrote: So, this is all a little opaque to me. How do we make the go/no-go decision on something like this? Everyone who has chimed in seems to think it is a good idea. Jeremy On Thu, Mar 20, 2014 at 10:38 AM, Jeremy Manson <jeremymanson at google.com>wrote: Sherman,
If you had released it then (which you wouldn't have been able to do, because you would have to wait another two years for Java 7), you would have found that it improved performance even with C2. It is only post-escape-analysis that the performance in C2 equalized. Anyway, I think adding the StringBuilder variant and deferring / dealing with the Appendable differently is the right approach, FWIW. Jeremy
On Thu, Mar 20, 2014 at 10:25 AM, Xueming Shen <xueming.shen at oracle.com>wrote: 2009? I do have something similar back to 2009 :-) http://cr.openjdk.java.net/~sherman/regexreplace/webrev/ Then the ball was dropped around the discussion of whether or not the IOE should be thrown. But if we are going to/have to have explicit StringBuilder/Buffer pair anyway, then we can keep the Appendable version as private for now and deal with the StringBuilder and Appendable as two separate issues. -Sherman
On 03/20/2014 09:52 AM, Jeremy Manson wrote: That's definitely an improvement - I think that when I wrote this (circa 2009), I didn't think about Appendable. I take it my argument convinced someone? :) Jeremy
On Thu, Mar 20, 2014 at 1:32 AM, Peter Levart<peter.levart at gmail.com_ _>wrote: On 03/19/2014 06:51 PM, Jeremy Manson wrote: I'm told that the diff didn't make it. I've put it in a Google drive folder... https://drive.google.com/file/d/0BGaXa6O4K5LY3Y0aHpranM3aEU/ edit?usp=sharing Jeremy Hi Jeremy, Your factoring-out of expandReplacement() method exposed an opportunity to further optimize the code. Instead of creating intermediate StringBuilder instance for each expandReplacement() call, this method could append directly to resulting StringBuffer/StringBuilder, like in the following: http://cr.openjdk.java.net/~plevart/jdk9-dev/MatcherWithStringBuilder/ webrev.01/
Regards, Peter
On Wed, Mar 19, 2014 at 10:41 AM, Jeremy Manson<_ _jeremymanson at google.com> wrote: Hi folks, We've had this internally for a while, and I keep meaning to bring it up here. The Matcher class has a few public methods that take StringBuffers, and we've found it useful to add similar versions that take StringBuilders. It has two benefits: - Users don't have to convert from one to the other when they want to use the method in question. The symmetry is nice. - The StringBuilder variants are faster (if lock optimizations don't kick in, which happens in the interpreter and the client compiler). For interpreted / client-compiled code, we saw something like a 25% speedup on String.replaceAll(), which calls into this code. Any interest? Diff attached. Jeremy
-------------- next part -------------- diff --git a/src/share/classes/java/util/regex/Matcher.java b/src/share/classes/java/util/regex/Matcher.java --- a/src/share/classes/java/util/regex/Matcher.java +++ b/src/share/classes/java/util/regex/Matcher.java @@ -65,7 +65,8 @@
- new strings whose contents can, if desired, be computed from the match
- result. The {@link #appendReplacement appendReplacement} and {@link
- #appendTail appendTail} methods can be used in tandem in order to collect
- the result into an existing string buffer, or the more convenient {@link
- the result into an existing string buffer or
- string builder. Alternatively, the more convenient {@link
- #replaceAll replaceAll} method can be used to create a string in which every
- matching subsequence in the input sequence is replaced.
- @@ -792,14 +793,118 @@
that does not exist in the pattern
*/ public Matcher appendReplacement(StringBuffer sb, String replacement) {
// If no match, return error if (first < 0) throw new IllegalStateException("No match available");
// Process substitution string to replace group references with groups
StringBuilder result = new StringBuilder();
appendExpandedReplacement(replacement, result);
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Append the match substitution
sb.append(result);
lastAppendPosition = last;
return this;
- }
- /**
* Implements a non-terminal append-and-replace step.
*
* <p> This method performs the following actions: </p>
*
* <ol>
*
* <li><p> It reads characters from the input sequence, starting at the
* append position, and appends them to the given string builder. It
* stops after reading the last character preceding the previous match,
* that is, the character at index {@link
* #start()} <tt>-</tt> <tt>1</tt>. </p></li>
*
* <li><p> It appends the given replacement string to the string builder.
* </p></li>
*
* <li><p> It sets the append position of this matcher to the index of
* the last character matched, plus one, that is, to {@link #end()}.
* </p></li>
*
* </ol>
*
* <p> The replacement string may contain references to subsequences
* captured during the previous match: Each occurrence of
* <tt>$</tt><i>g</i><tt></tt> will be replaced by the result of
* evaluating {@link #group(int) group}<tt>(</tt><i>g</i><tt>)</tt>.
* The first number after the <tt>$</tt> is always treated as part of
* the group reference. Subsequent numbers are incorporated into g if
* they would form a legal group reference. Only the numerals '0'
* through '9' are considered as potential components of the group
* reference. If the second group matched the string <tt>"foo"</tt>, for
* example, then passing the replacement string <tt>"$2bar"</tt> would
* cause <tt>"foobar"</tt> to be appended to the string builder. A dollar
* sign (<tt>$</tt>) may be included as a literal in the replacement
* string by preceding it with a backslash (<tt>\$</tt>).
*
* <p> Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
* the replacement string may cause the results to be different than if it
* were being treated as a literal replacement string. Dollar signs may be
* treated as references to captured subsequences as described above, and
* backslashes are used to escape literal characters in the replacement
* string.
*
* <p> This method is intended to be used in a loop together with the
* {@link #appendTail appendTail} and {@link #find find} methods. The
* following code, for example, writes <tt>one dog two dogs in the
* yard</tt> to the standard-output stream: </p>
*
* <blockquote><pre>
* Pattern p = Pattern.compile("cat");
* Matcher m = p.matcher("one cat two cats in the yard");
* StringBuilder sb = new StringBuilder();
* while (m.find()) {
* m.appendReplacement(sb, "dog");
* }
* m.appendTail(sb);
* System.out.println(sb.toString());</pre></blockquote>
*
* @param sb
* The target string builder
*
* @param replacement
* The replacement string
*
* @return This matcher
*
* @throws IllegalStateException
* If no match has yet been attempted,
* or if the previous match operation failed
*
* @throws IndexOutOfBoundsException
* If the replacement string refers to a capturing group
* that does not exist in the pattern
*/
- public Matcher appendReplacement(StringBuilder sb, String replacement) {
// If no match, return error
if (first < 0)
throw new IllegalStateException("No match available");
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Append the match substitution
appendExpandedReplacement(replacement, sb);
lastAppendPosition = last;
return this;
- }
- /**
* Processes replacement string to replace group references with
* groups.
*/
- private StringBuilder appendExpandedReplacement(
String replacement, StringBuilder result) { int cursor = 0;
StringBuilder result = new StringBuilder(); while (cursor < replacement.length()) { char nextChar = replacement.charAt(cursor);
@@ -852,8 +957,8 @@ cursor++; } else { // The first number is always a group - refNum = (int)nextChar - '0'; - if ((refNum < 0)||(refNum > 9)) + refNum = nextChar - '0'; + if ((refNum < 0) || (refNum > 9)) throw new IllegalArgumentException( "Illegal group reference"); cursor++; @@ -864,7 +969,7 @@ break; } int nextDigit = replacement.charAt(cursor) - '0'; - if ((nextDigit < 0)||(nextDigit > 9)) { // not a number + if ((nextDigit < 0) || (nextDigit > 9)) { // not a number break; } int newRefNum = (refNum * 10) + nextDigit; @@ -884,13 +989,7 @@ cursor++; } } - // Append the intervening text - sb.append(text, lastAppendPosition, first); - // Append the match substitution - sb.append(result);
lastAppendPosition = last;
return this;
return result;
}
/**
@@ -913,6 +1012,25 @@ }
/**
* Implements a terminal append-and-replace step.
*
* <p> This method reads characters from the input sequence, starting at
* the append position, and appends them to the given string builder. It is
* intended to be invoked after one or more invocations of the {@link
* #appendReplacement appendReplacement} method in order to copy the
* remainder of the input sequence. </p>
*
* @param sb
* The target string builder
*
* @return The target string builder
*/
- public StringBuilder appendTail(StringBuilder sb) {
sb.append(text, lastAppendPosition, getTextLength());
return sb;
- }
- /**
- Replaces every subsequence of the input sequence that matches the
- pattern with the given replacement string.
- @@ -950,7 +1068,7 @@ reset(); boolean result = find(); if (result) {
StringBuffer sb = new StringBuffer();
StringBuilder sb = new StringBuilder(); do { appendReplacement(sb, replacement); result = find();
@@ -1000,7 +1118,7 @@ reset(); if (!find()) return text.toString();
StringBuffer sb = new StringBuffer();
StringBuilder sb = new StringBuilder(); appendReplacement(sb, replacement); appendTail(sb); return sb.toString();
- Previous message: StringBuilder version of java.util.regex.Matcher.append*
- Next message: Review request for 8035807: Convert use of sun.misc.BASE64Encoder/Decoder with java.util.Base64
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]