RFR: 8007395 StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters (original) (raw)

Xueming Shen xueming.shen at oracle.com
Fri Apr 26 17:25:13 UTC 2013


Hi

Please help review the proposed fix for

8007395: StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters

http://cr.openjdk.java.net/~sherman/8007395/webrev

The root cause is the "iterative optimization" class GroupCurly fails to backtrack correctly when matching/finding fails, if the previously matched (for each iteration) have different size (for example a CharProperty regex constructor that can match both bmp and non-bmp in this case). The existing implementation does have the mechanism to deal with the "different sized" matching result for each iteration, see ln#4451, by "recursively" entering into a new layer of match0, but it incorrectly uses the latest matched size to backtrack all the way back to the "cmin" when the "next" matching fails (so in this case, it backtrack by two char all the way back to "cmin", when in fact it should back off by 2 only for the last surrogate pair, then using 1 for the rest). Each match0() really should only backtrack to its starting iteration count, and leave the rest to its "invoker". The fix is an easy two-line fix, to make sure backtrack backs off correctly with the appropriate matching size.

Thanks, -Sherman



More information about the core-libs-dev mailing list