Add support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs on Windows platforms (original) (raw)
David Holmes david.holmes at oracle.com
Mon May 8 23:50:57 UTC 2017
- Previous message: Add support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs on Windows platforms
- Next message: Add support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs on Windows platforms
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi John,
Responding back on the mailing lists. There are people on the mailing lists who are in a better position to evaluate the merits of the proposal. I searched the bug database and could not see this issue being raised in the past.
On 9/05/2017 8:46 AM, John Platts wrote:
The real reasons to add UTF-16 versions of these APIs is the following:
* The arguments passed into the wmain and wWinMain functions use UTF-16-encoded strings instead of UTF-8 strings * The arguments passed into the main and WinMain functions on Windows-platforms are in the ANSI character encoding instead of the UTF-8 character encoding * The NewString and GetStringChars APIs in the JNI already use UTF-16-encoded strings
Yes you are right the String functions already support UTF-16 as that is the format for char[] and so java.lang.String.
* Unicode APIs on Windows normally use UTF-16-encoded strings * The C11 and C++11 standards support UTF-16 strings through the char16t type and support for UTF-16 character literals with a u prefix
Thanks for the additional input.
David
------------------------------------------------------------------------ From: David Holmes <david.holmes at oracle.com> Sent: Sunday, May 7, 2017 7:47 PM To: John Platts Cc: hotspot-dev developers; core-libs-dev Libs Subject: Re: Add support for Unicode versions of JNICreateJavaVM and JNIGetDefaultJavaVMInitArgs on Windows platforms Added back jdk10-dev as a bcc. Added hotspot-dev and core-libs-dev (for launcher) for follow up discussions. Hi John, On 8/05/2017 10:33 AM, John Platts wrote: I actually did a search through the code that implements JNICreateJavaVM, and I found that the conversion of the strings is done using javalangString::createfromplatformdependentstr, which converts from the platform-default encoding to Unicode. In the case of Windows-based platforms, the conversion is done based on the ANSI character encoding instead of UTF-8 or Modified UTF-8.
The platform encoding detection logic on Windows is implemented javapropsmd.c, which can be found at jdk/src/windows/native/java/lang/javapropsmd.c in releases prior to JDK 9 and at src/java.base/windows/native/libjava/javapropsmd.c in JDK 9 and later. The encoding used for command-line arguments passed into the JNI invocation API is Cp1252 for English locales on Windows platforms, and not Modified UTF-8 or UTF-8. The documentation found at http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html also The Invocation API - Oracle <http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html> docs.oracle.com The Invocation API allows software vendors to load the Java VM into an arbitrary native application. Vendors can deliver Java-enabled applications without having to ... states that the strings passed into JNICreateJavaVM are in the platform-default encoding. Thanks for the additional details. I assume you are referring to: typedef struct JavaVMOption { char optionString; / the option as a string in the default platform encoding */ that comment should not form part of the specification as it is non-normative text. If the intent is truly to use the platform default encoding and not UTF-8 then that should be very clearly spelt out in the spec! That said, the implementation is following this so it is a limitation. I suspect this is historical. A version of JNICreateJavaVM that takes UTF-16-encoded strings should be added to the JNI Invocation API. The java.exe launchers and javaw.exe launchers should also be updated to use the UTF-16 version of the JNICreateJavaVM function on Windows platforms and to use wmain and wWinMain instead of main and WinMain. Why versions for UTF-16 instead of the missing UTF-8 variants? As I said the whole spec is intended to be based around UTF-8 so we would not want to throw in just a couple of UTF-16 based usages. Thanks, David A few files in HotSpot would need to be changed in order to implement the UTF-16 version of JNICreateJavaVM, but the change would improve consistency across different locales on Windows platforms and allow arguments that contain Unicode characters that are not available in the platform-default encoding to be passed into the JVM on the command line. The UTF-16-based version of JNICreateJavaVM also makes it easier to allocate string objects that contain non-ASCII characters as the strings are already in UTF-16 format, at least in cases where the strings contain Unicode characters that are not in Latin-1 or on VMs that do not support compact Latin-1 strings. The UTF-16-based version of JNICreateJavaVM should probably be implemented as a separate function so that the solution could be backported to JDK 8 and JDK 9 updates and so that backwards compatibility with the current JNICreateJavaVM implementation is maintained. Here is what the new UTF-16-based API might look like: typedef struct JavaVMInitArgsUTF16 { jint version; jint nOptions; JavaVMOptionUTF16 *options; jboolean ignoreUnrecognized; } JavaVMInitArgs; typedef struct JavaVMOptionUTF16 { char optionString; / the option as a string in the default platform encoding */ void *extraInfo; } JavaVMOptionUTF16; /* vmargs is an pointer to a JavaVMInitArgsUTF16 structure */ jint JNICreateJavaVMUTF16(JavaVM **pvm, void **penv, void *vmargs); /* vmargs is a pointer to a JavaVMInitArgsUTF16 structure */ jint JNIGetDefaultJavaVMInitArgsUTF16(void *vmargs); ------------------------------------------------------------------------ From: David Holmes <david.holmes at oracle.com> Sent: Thursday, May 4, 2017 11:07 PM To: John Platts; jdk10-dev at openjdk.java.net Subject: Re: Add support for Unicode versions of JNICreateJavaVM and JNIGetDefaultJavaVMInitArgs on Windows platforms Hi John, The JNI is defined to use Modified UTF-8 format for strings, so any Unicode character should be handled if passed in in the right format. Updating the JNI specification and implementation to accept UTF-16 directly would be a major undertaking. Is the issue here that you want a tool, like the java launcher, to accept arbitrary Unicode strings in a end-user friendly manner and then have it perform the modified UTF-8 conversion when invoking the VM? Can you give a concrete example of what you would like to be able to pass as arguments to the JVM? Thanks, David On 5/05/2017 1:04 PM, John Platts wrote: The JNICreateJavaVM and JNIGetDefaultJavaVMInitArgs methods in the JNI invocation API expect ANSI strings on Windows platforms instead of Unicode-encoded strings. This is an issue on Windows-based platforms since some of the option strings that are passed into JNICreateJavaVM might contain Unicode characters that are not in the ANSI encoding on Windows platforms.
There is support for UTF-16 literals on Windows platforms with wchart and wide character literals prefixed with the L prefix, and on platforms that support C11 and C++11 with char16t and UTF-16 character literals that are prefixed with the u prefix. jchar is currently defined to be a typedef for unsigned short on all platforms, but char16t is a separate type and not a typedef for unsigned short or jchar in C++11 and later. jchar should be changed to be a typedef for wchart on Windows platforms and to be a typedef for char16t on non-Windows platforms that support the char16t type. This change will make it possible to define jchar character and string literals on Windows platforms and on non-Windows platforms that support the C11 or C++11 standard. The JCHARLITERAL macro should be added to the JNI header and defined as follows on Windows: #define JCHARLITERAL(x) L ## x The JCHARLITERAL macro should be added to the JNI header and defined as follows on non-Windows platforms: #define JCHARLITERAL(x) u ## x Here is how the Unicode version of JNICreateJavaVM and JNIGetDefaultJavaVMInitArgs could be defined: typedef struct JavaVMUnicodeOption { const jchar optionString; / the option as a string in UTF-16 encoding */ void *extraInfo; } JavaVMUnicodeOption; typedef struct JavaVMUnicodeInitArgs { jint version; jint nOptions; JavaVMUnicodeOption *options; jboolean ignoreUnrecognized; } JavaVMUnicodeInitArgs; jint JNICreateJavaVMUnicode(JavaVM **pvm, void **penv, void *args); jint JNIGetDefaultJavaVMInitArgs(void *args); The java.exe wrapper should use wmain instead of main on Windows platforms, and the javaw.exe wrapper should use wWinMain instead of WinMain on Windows platforms. This change, along with the support for Unicode-enabled version of the JNICreateJavaVM and JNIGetDefaultJavaVMInitArgs methods, would allow the JVM to be launched with arguments that contain Unicode characters that are not in the platform-default encoding. All of the Windows platforms that Java SE 10 and later VMs would be supported on do support Unicode. Adding support for Unicode versions of JNICreateJavaVM and JNIGetDefaultJavaVMInitArgs will allow Unicode characters that are not in the platform-default encoding on Windows platforms to be supported in command-line arguments that are passed to the JVM.
- Previous message: Add support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs on Windows platforms
- Next message: Add support for Unicode versions of JNI_CreateJavaVM and JNI_GetDefaultJavaVMInitArgs on Windows platforms
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]