[XERCESJ-1156] MalformedURLException occurs when SYSTEMID value is korean character relative url (original) (raw)

If a xml file has a DOCTYPE and the SYSTEMID of DOCTYPE has a korean character relative url,
MalformedException occurs during parsing.
I guess that the same problem will occur if japanese or chinese character is used in SYSTEMID.

This does not occur in JDK 1.4.2 with its internal crimson parser.
I found that Xerces 2.6.2, Xerces 2.8.0 and JDK 1.5 have this problem.

This is related to XERCESJ-391 which seems to be fixed over Xerces 2.6.2 in my test.

The test case xml/dtd and patch for Xerces 2.6.2 will be uploaded.
The patch is simple. The SYSTEMID value which has non-us-ascii character needs to be escaped.
The escaping logic is from the XMLEntityManager#getUserDir().

I think that xerces 2.8.0 can be patched in the same way.