String.contains(CharSequence) calls toString on argument (original) (raw)
Ulf Zibis Ulf.Zibis at CoSoCo.de
Thu May 28 10:35:41 UTC 2015
- Previous message: String.contains(CharSequence) calls toString on argument
- Next message: String.contains(CharSequence) calls toString on argument
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
I more would like:
2199 return (s instanceof String) 2200 ? indexOf((String) s) >= 0; 2201 : indexOf(value, 0, value.length, s, 0, s.length(), 0) >= 0;
or:
2199 return (s instanceof String 2200 ? indexOf((String) s) 2201 : indexOf(value, 0, value.length, s, 0, s.length(), 0)) 2202 >= 0;
or:
2199 int index = (s instanceof String) 2200 ? indexOf((String) s) 2201 : indexOf(value, 0, value.length, s, 0, s.length(), 0); 2202return index >= 0;
-Ulf
Am 28.05.2015 um 12:06 schrieb Tomasz Kowalczewski:
Hello all,
encouraged by your responses I did a simple implementation and performed some JMH experiments. Details below. Webrev [1] contains my changes. I deliberately wanted to do minimal code changes. The gist of it is implementing indexOf that works on CharSequence instead of String's internal char array. Both versions are almost identical (I did not change existing implementation in place to not impact all other String-only paths that use it). In my first attempt I just inlined (and simplified) indexOf implementation into String::contains, but in the end decided to create general purpose (package private) version. This might prove useful for others[2] JMH test project is here [3]. All benchmarks were run using "java -jar benchmarks.jar -wi 10 -i 10 -f -prof gc" on Amazon EC2 instance (c4.xlarge, 4 virtual cores). I have run it against three java builds: a. stock java 8u40 b. my own build from clean jdk9 sources c. my own build from modified jdk9 sources Results with gc profiler enabled are included in benchmark repo. This gist contains summary results [4]. I know that this test covers only very narrow space of possible performance regressions (ideas form more comprehensive tests are welcome). The results show that String only path is not impacted by my changes. When using actual CharSequence (StringBuilder in this case) we are mostly winning, exception is case when CharSequence is of medium size and has many partial matches in searched string. I hope this exercise will be useful to someone and that this change might be considered for inclusion in OpenJDK. If not then maybe at least tests for String::indexOf? [1] http://openjdk.s3-website-us-east-1.amazonaws.com/ [2] Changeset from http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-May/033678.html uses indexOf that could accept CharSequence instead of String. [3] https://github.com/tkowalcz/openjdk-jmh [4] https://gist.github.com/tkowalcz/d8f5b9435e184f65fd2a Regards, Tomasz Kowalczewski On Sat, Mar 21, 2015 at 9:21 AM, Tomasz Kowalczewski <_ _tomasz.kowalczewski at gmail.com> wrote:
There are other places which accept CharSequence and iterate over it not caring about possible concurrent modification (the only sane way IMHO). Examples are String.contentEquals and String.join that uses StringJoiner which iterates its argument char by char.
It looks like contains is the exception in this regard. On 20 mar 2015, at 17:05, Vitaly Davidovich <vitalyd at gmail.com> wrote:
If CharSequence is mutable and thread-safe, I would expect toString() implementation to provide the atomic snapshot (e.g. do the synchronization). Therefore, I find Sherman's argument interesting. That's my point about "it ought to be up to the caller" -- they're in a position to make this call, but String.contains() is playing defensive because it has no clue of the context. On Fri, Mar 20, 2015 at 11:55 AM, Aleksey Shipilev <_ _aleksey.shipilev at oracle.com> wrote:
If CharSequence is mutable and thread-safe, I would expect toString() implementation to provide the atomic snapshot (e.g. do the synchronization). Therefore, I find Sherman's argument interesting.
On the other hand, we don't seem to be having any details/requirements for contains(CharSequence) w.r.t. it's own argument. What if String.contains pulls the values one charAt at a time? The concurrency requirements are not enforceable in CharSequence alone then, unless you call the supposed-to-be-atomic toString() first. -Aleksey.
On 03/20/2015 06:48 PM, Vitaly Davidovich wrote: Yes, but that ought to be for the caller to decide. Also, although the resulting String is immutable, toString() itself may observe mutation.
On Fri, Mar 20, 2015 at 11:40 AM, Xueming Shen <_ _xueming.shen at oracle.com> wrote:
On 03/20/2015 02:34 AM, Tomasz Kowalczewski wrote:
Hello! Current implementation of String.contains that accepts CharSequence calls toString on it and passes resulting string to indexOf(String). This IMO defeats the purpose of using CharSequences (that is to have a mutable character buffer and not allocate unnecessary objects). It is arguable that cs.toString() may serve the purpose of taking a snapshot of an otherwise "mutable" character buffer? -Sherman
Is changing this a desirable development? It seems pretty straightforward to port indexOf(String) to use CharSequence. If all you need is patch then I can work on it (I have signed OCA) just wanted to make sure it is not a futile work.
- Previous message: String.contains(CharSequence) calls toString on argument
- Next message: String.contains(CharSequence) calls toString on argument
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]