[7] Review request for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair (original) (raw)
Vladislav Karnaukhov Vladislav.Karnaukhov at oracle.com
Mon Aug 27 14:51:51 UTC 2012
- Previous message: How to search for a JRE 7 Mac Bug?
- Next message: [7] Review request for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,
could you please review a new version? Please find webrev here: http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.04/
Regards,
- Vlad
On 6/29/2012 2:09 AM, Phil Race wrote:
That would work but I think its cleaner to just move it all into mapNumericReference as below.
-phil. diff --git a/src/share/classes/javax/swing/text/html/parser/Parser.java b/src/share/classes/javax/swing/text/html/parser/Parser.java --- a/src/share/classes/javax/swing/text/html/parser/Parser.java +++ b/src/share/classes/javax/swing/text/html/parser/Parser.java @@ -952,7 +952,7 @@ ch = readCh(); break; } - char data[] = {mapNumericReference((char) n)}; + char data[] = mapNumericReference(n); return data; } addString('#'); @@ -1021,7 +1021,7 @@ } /** - * Converts numeric character reference to Unicode character. + * Converts numeric character reference to char array. * * Normally the code in a reference should be always converted * to the Unicode character with the same code, but due to @@ -1030,13 +1030,21 @@ * to displayable characters with other codes. * * @param c the code of numeric character reference. - * @return the character corresponding to the reference code. + * @return a char array corresponding to the reference code. */ - private char mapNumericReference(char c) { - if (c < 130 || c > 159) { - return c; + private char[] mapNumericReference(int c) { + char[] data; + if (c >= 0xffff) { // outside unicode BMP. + try { + data = Character.toChars(c); + } catch (IllegalArgumentException e) { + data = new char[0]; } - return cp1252Map[c - 130]; + } else { + data = new char[1]; + data[0] = (c < 130 || c > 159) ? (char)c : cp1252Map[c - 130]; + } + return data; }
On 06/28/12 07:58 AM, Vladislav Karnaukhov wrote: Hello Phil, Pavel,
thank you for your comments. I've reworked fix and testcase, please find new webrev here: http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.03/ Regards, - Vlad On 6/27/2012 10:13 PM, Phil Race wrote: Well its not only unnecessary but is likely wrong .. I don't think you looked at what mapNumericReference() does.
-phil. On 6/27/2012 11:03 AM, Vladislav Karnaukhov wrote: Hello Phil,
I used Character.toChars() in both branches because I wanted to delegate code point conversion to char or surrogate pair entirely to Character class... Regards, - Vlad On 6/26/2012 9:49 PM, Phil Race wrote: I don't understand why you call Character.toChars() if you've just determined you don't need to ?
ie what was wrong with data = ( n >>> 16 == 0) ? {mapNumericReference((char) n)} : Character.toChars(n); ? In the case of an invalid supplementary pair, maybe it would be safer to return { ' ' } ?
One thing I see in the parsing code that is not new or changed here, that may bear examination, is that there's a loop that keeps on reading so long so long there are new digits. I am not sure its wise to keep going once you overflow. -phil. On 6/26/2012 12:37 AM, Vladislav Karnaukhov wrote: Hello Pavel, I can provide you with the link to 6u19, but this is direct forward-port and no code changes were made. I'll make changes as you've pointed out in 1) and 2) About 3) - is it a requirement to use "? :" operator? I personally prefer single-line if-else, but I don't want to argue over code style, and surely I'll follow code design practices. Regards, - Vlad On 6/25/2012 6:43 PM, Pavel Porvatov wrote: Hi Vladislav, Do you have a link to the fix for 6u19? I didn't investigate the fix deeply, but 1. private final int MAXBMPBOUND = 65535; should be static (otherwise variable name should be in lower case) 2. Add a space in single line comments 3. + char data[]; + if (n <= MAXBMPBOUND) { + data = Character.toChars(mapNumericReference((char) n)); + } else { + data = Character.toChars(n); + } + return data; can be written in one line via "? :" operator and looks more readable for me Thanks, Pavel Hello,
please review the fix for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair. This is a forward port from JDK6 (fixed escalated issue, fix integrated) to JDK7. The issue is a defect in Swing HTML Parser: if the codepoint is outside BMP (Unicode Plain 0), Parser incorrectly decodes codepoint into surrogate pair. The fix is to use Character.toChars() method if codepoint value is greater than upper bound of BMP. Webrev: http://cr.openjdk.java.net/~vkarnauk/6836089/webrev.00/ Bug description: http://bugs.sun.com/bugdatabase/viewbug.do?bugid=6836089 Regards, - Vlad
- Previous message: How to search for a JRE 7 Mac Bug?
- Next message: [7] Review request for 6836089: Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]