A bug in filesystem bootstrap (unix/ linux) prevents (original) (raw)

Xueming Shen xueming.shen at oracle.com
Thu Jul 5 18:02:11 UTC 2012


On 07/05/2012 01:40 AM, Dawid Weiss wrote:

export LCALL=enUS.UTF-8; java Foo Not really, the shell won't let you use a multibyte locale (because of issues with null-terminated strings). And multibyte (with BOM) is most fun when you're trying to find buggy code ;)

Encodings for those Chinese, Japnese, Korean locales are all "multibyte" . UTF-8 is a multibyte encoding, most recent unix/linux platform should have no problem to work with "multibyte" locale. In fact UTF-16 is normally not categorized as multibye (mb), but wide char, as "wc". There are reason(s) why you (normally) only see utf-8 locale but no utf-16 locale on Unix/Linux based platforms and why you have "W" version of APIs and "A" version of APIs (and even "T" version for some APIs) on Windows platfrom.

I agree it might be helpful if there is mechanism that you can change the "default charset" used by various Java APIs, similar to what you do with Locale.setDefault(). With the introduction of sun.jnu.encoding (which takes over the responsibility of the encoding jvm used to talk to the underlying OS APIs) it might be possible to reduce the scope of system property file.encoding to only for the default encoding of the "file content" and do something here, but it is not on the priority list for now.

-Sherman

Btw, I need to make it clear here that sun.jnu.encoding is purely an implementation detail, app is not supposed to use it for whatever purpose.

I would assume there is no enUS.UTF-16 locale there :-) I wish there were. It'd make people care more ;) Dawid



More information about the core-libs-dev mailing list