RFR JDK-8013254: Constructor \w need update to add the support of \p{Join_Control} (original) (raw)
Xueming Shen xueming.shen at oracle.com
Tue Apr 30 21:01:12 UTC 2013
- Previous message: RFR JDK-8013254: Constructor \w need update to add the support of \p{Join_Control}
- Next message: hg: jdk8/tl/jdk: 8010416: Add a way for java.sql.Driver to be notified when it is deregistered
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
My apology, the webrev is at
http://cr.openjdk.java.net/~sherman/8013254/webrev/
-Sherman
On 04/30/2013 10:01 AM, Xueming Shen wrote:
Hi,
It appears we dropped the ball on u+200c and u+200d when we updated the "simple word boundaries" back to jdk7 [1]. You can find most of the related discussion here [2]. These 2 code points are listed as one of the issues we were trying to fix but obviously the final doc and implementation don't address them. Mainly because the \p{JoinControl} was not explicitly listed in TR#18 "compatibility" section back then (the earlier version) [3], though these 2 code points are explicitly mentioned at section RL1.4 Simple Word Boundaries [4]. The \p{JoinControl} (u+200c and u+200d) has been added/listed in the "compatibility" section in the latest version of TR#18 [5]. The proposed change here is to (1) add these two code points back to the collection of \w (2) list them explicitly into the \w definition as \p{JoinControl} (3) list JoinControl as one of the supported binary properties. http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000381.html The webrev for RegExTest.java above includes the change for 8013252 which is being reviewed as well, I'm not separating them out just for convenience. The regression/unit tests may not that "direct", here is a direct version to verify the fix. Matcher wordU = Pattern.compile("\w", Pattern.UNICODECHARACTERCLASS).matcher(""); System.out.println(wordU.reset("\u200c").find()); System.out.println(wordU.reset("\u200d").find()); thanks -Sherman [1] http://ccc.us.oracle.com/7039066 [2] http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000381.html [3] http://www.unicode.org/reports/tr18/tr18-13.html#CompatibilityProperties [4] http://www.unicode.org/reports/tr18/tr18-13.html#SimpleWordBoundaries [5] http://www.unicode.org/reports/tr18/#CompatibilityProperties
- Previous message: RFR JDK-8013254: Constructor \w need update to add the support of \p{Join_Control}
- Next message: hg: jdk8/tl/jdk: 8010416: Add a way for java.sql.Driver to be notified when it is deregistered
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]