RFR: 8007395 StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters (original) (raw)
Xueming Shen xueming.shen at oracle.com
Fri Apr 26 17:25:13 UTC 2013
- Previous message: hg: jdk8/tl/langtools: 8010304: javac should detect all mutable implicit static fields in langtools using a plugin
- Next message: RFR: 8007395 StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi
Please help review the proposed fix for
8007395: StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters
http://cr.openjdk.java.net/~sherman/8007395/webrev
The root cause is the "iterative optimization" class GroupCurly fails to backtrack correctly when matching/finding fails, if the previously matched (for each iteration) have different size (for example a CharProperty regex constructor that can match both bmp and non-bmp in this case). The existing implementation does have the mechanism to deal with the "different sized" matching result for each iteration, see ln#4451, by "recursively" entering into a new layer of match0, but it incorrectly uses the latest matched size to backtrack all the way back to the "cmin" when the "next" matching fails (so in this case, it backtrack by two char all the way back to "cmin", when in fact it should back off by 2 only for the last surrogate pair, then using 1 for the rest). Each match0() really should only backtrack to its starting iteration count, and leave the rest to its "invoker". The fix is an easy two-line fix, to make sure backtrack backs off correctly with the appropriate matching size.
Thanks, -Sherman
- Previous message: hg: jdk8/tl/langtools: 8010304: javac should detect all mutable implicit static fields in langtools using a plugin
- Next message: RFR: 8007395 StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]