A bug in filesystem bootstrap (unix/ linux) prevents (original) (raw)
Xueming Shen xueming.shen at oracle.com
Thu Jul 5 08:38:27 UTC 2012
- Previous message: A bug in filesystem bootstrap (unix/ linux) prevents
- Next message: A bug in filesystem bootstrap (unix/ linux) prevents
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Given you are using -Dfile.encoding=xyz from command line, guess you might not have problem to do something like (on a bash for example)
export LC_ALL=en_US.UTF-8; java Foo export LC_ALL=ja_JP.SJIS; java Foo
you might need to install all those supported locales on you system though, but isn't it something you are trying to test on? some actual users run on some actual user environment. You only need to test the locales (with various supported encodings) that really exist on your platform. In fact -Dfile.encoding actually is not a good testing methodology here.
I would assume there is no en_US.UTF-16 locale there :-)
-Sherman
On 7/5/2012 1:04 AM, Dawid Weiss wrote:
Thanks for the explanation, it helps.
Maybe I should follow-up with the rationale of why we want to override it in the first place. We randomize file.encoding to a small subset of JVM-supported encodings to make sure there are no hidden encoding-specific issues in Lucene/ Solr. This is actually a great way of catching calls to String.getBytes() and such which typically work fine but once moved to actual user environment start to break because of different encodings. Maybe I'm missing something but -Dfile.encoding seems to be the only way to change the defaults used for String.getBytes(), new String(byte[]) and such, correct? Sure, it's legacy APIs that are broken/ problematic from the very beginning but they're there and there should be a way to externally say "use this and that charset for the default"... Yes, we could do static analysis to catch these but the problem remains. Dawid On Thu, Jul 5, 2012 at 9:52 AM, Xueming Shen<xueming.shen at oracle.com> wrote: The code cited is a little shortcut, if there is locale over there is indeed using utf-16, or any encoding that needs to switch/shift into ASCII (or its single byte charset area) with a shift/in/out character.. So far I'm not aware of any such a locale on any our supported platform. Historically, this kind of assumption might run into trouble when being ported to other platform, such as ebcdic based system (but I don't think it's a problem in this case). Ideally, the code probably should be coded to be able to deal with a mb type of "/", but obviously it was decided to take the short-cut for better performance here.
"We" have been taking the stand that file.encoding is an informative/read-only system property for a long time, mainly because of two reasons. First this property is really defined/implemented/used as the default encoding that the jvm uses to communicated with the underlying platform for local/encoding sensitive stuff, the default encoding of the file content, the encoding of the file path and the "text" encoding when use the platform APIs, for example. It's like a "contract" between the jvm and the underlying platform, it needs to be understood by both and agreed on by both. So it needs to be set based on what your underlying system is using, not something you want to set via either -D or System.setProperty. If your underlying locale is not UTF-16, I don't think you should expect the jvm could work correctly if it keeps "talking" in UTF-16 to the underlying system, for example, pass in a file name in utf-16, when your are running on a utf-8 locale (it is more complicated on a windows platform, when you have system locale and user locale, and historically file.encoding was used for both, consider if your system locale and user locale are set differently...). The property sun.jnu.encoding introduced in jdk6 (this is mainly to address the issue we have with file.encoding on windows platform though) somehow helps remove some "pressure" from the file.encoding, so in theory file.encoding should be used to only for the encoding of "file content", and the sun.jnu.encoding should be used when you need the encoding to talk to those platform APIs, so something might be done here (currently file.encoding and sun.jnu.encoding are set to the same thing on non-Windows platform). The other reason is the timing of how the file.encoding is being initialized and how it is being used during the "complicated" system initialization stage, almost everyone touched System. initializeSystemClass() got burned here and there in the past:-) So sometime you want to ask if it is worth the risk to change something work for a use scenario that is not "supported". That said, as I said above, something might be done to address this issue, but obviously not a priority for now. -Sherman if you want to do -Dfile.encoding=xyz, you are on your own, it might work, it might not work.
On 7/4/2012 11:00 PM, Dawid Weiss wrote: Well, what's the "right" way to enforce an initial encoding for charset-less string-to-byte conversions and legacy streams? I still think that snippet of code is buggy, no matter if file.encoding is or isn't a supported settable property. Besides, from what I see in JDK code base everything seems to be code in a way to allow external definition of file.encoding (comments inside System.c for example). Where is it stated that file.encoding is read-only? Dawid On Thu, Jul 5, 2012 at 3:09 AM, Xueming Shen<xueming.shen at oracle.com> wrote: -Dfile.encoding=xyz is NOT a supported configuration. file.encoding is supposed to be a read-only informative system property.
-Sherman
On 7/4/2012 1:21 PM, Dawid Weiss wrote: There is a similar bug: Bug 6795536 - No system start for file.encoding=x-SJIS0213 Yeah... I looked at the sources in that package and there is at least one more place which converts a String to bytes using getBytes(). This seems to be a trivial fix in UnixFileSystem though. Anyway, bug ID for this is: http://bugs.sun.com/bugdatabase/viewbug.do?bugid=7181721 Dawid In this case on Windows. -Ulf
Am 04.07.2012 14:43, schrieb Dawid Weiss: Hi folks. Run the following with -Dfile.encoding=UTF-16: public class TestBlah { public static void main(String []) throws Exception { TimeZone.getDefault(); } } This on linux (and any unixish system I think) will result in: java.lang.ExceptionInInitializerError at java.nio.file.FileSystems.getDefault(FileSystems.java:176) at sun.util.calendar.ZoneInfoFile$1.run(ZoneInfoFile.java:482) at sun.util.calendar.ZoneInfoFile$1.run(ZoneInfoFile.java:477) ... There is an encoding-sensitive part calling getBytes on the initial path (and this screws it up): // package-private UnixFileSystem(UnixFileSystemProvider provider, String dir) { this.provider = provider; this.defaultDirectory = UnixPath.normalizeAndCheck(dir).getBytes(); if (this.defaultDirectory[0] != '/') { throw new RuntimeException("default directory must be absolute"); } Filed a bug for this but don't have the ID yet. Dawid
- Previous message: A bug in filesystem bootstrap (unix/ linux) prevents
- Next message: A bug in filesystem bootstrap (unix/ linux) prevents
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]