Loading... (original) (raw)
IDN.toASCII("示例.com", IDN.USE_STD3_ASCII_RULES) throws:
Exception ... java.lang.IllegalArgumentException: Contains non-LDH characters
at java.net.IDN.toASCIIInternal(IDN.java:275)
at java.net.IDN.toASCII(IDN.java:118)
Per step 3, section 4.1, RFC 3490:
3. If the UseSTD3ASCIIRules flag is set, then perform these checks:
(a) Verify the absence of non-LDH ASCII code points; that is, the
absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
(b) Verify the absence of leading and trailing hyphen-minus; that
is, the absence of U+002D at the beginning and end of the
sequence.
However, in the impl of IDN is checking far more strictly than above:
private static String toASCIIInternal(String label, int flag)
...
if (useSTD3ASCIIRules) {
for (int i = 0; i < dest.length(); i++) {
int c = dest.charAt(i);
if (!isLDHChar(c)) {
throw new IllegalArgumentException(
"Contains non-LDH characters");
}
}
...
}
private static boolean isLDHChar(int ch){
// high runner case
if(ch > 0x007A){
return false;
}
...
}
isLDHChar() does not accept Unicode bigger than 0x007A. For example
"0x3041" ("あ") is denied. It is too strict to convert Unicode with IDN.toASCII().
I run a simple test with an Internationalized Domain Names command line
tool, idn, on linux:
$ idn --usestd3asciirules www.示例.com
www.xn--fsq092h.com
It means that Unicode is acceptable to IDN toASCII conversion (idn tool) even the
UseSTD3ASCIIRules is set.