JNI UTF-8 encoding bug with some characters (original) (raw)
Ariel Weisberg ariel at weisberg.ws
Tue Jun 5 15:38:04 UTC 2012
- Previous message: Failing jdk testcase: java/lang/Math/WorstCaseTests.java
- Next message: JNI UTF-8 encoding bug with some characters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all,
Not sure what list this should go to.
I found an issue with JNI's GetStringUTFChars which is supposed to return a Java string in UTF-8 encoding. There is an attached test case. I tested on Ubuntu 12.04 (Linux aweisberg-desktop 2.6.32-41-generic #89-Ubuntu SMP Fri Apr 27 22🔞56 UTC 2012 x86_64 GNU/Linux) and CentOS 5 (Linux volt3b 2.6.18-308.4.1.el5 #1 SMP Tue Apr 17 17:08:00 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux) with JDK 6 update 32 and JDK 7 update 4.
For the following string "â��x一xxéyyԱ" I find that the first character is encoded correctly, but the second character (http://www.fileformat.info/info/unicode/char/1f032/index.htm) comes out with an invalid code point.
The result of String.getBytes("UTF-8") is c3a2f09f80b278e4b8807878c3a97979d4b1 and this matches the output I get from defining the string as a constant in C++.
The result of GetStringUTFChars is c3a2eda0bcedb0b278e4b8.
See this test case (https://s3.amazonaws.com/com.voltdb.aweisberg/utf8_encoding_bug.tgz) for a reproducer and how I displayed the values.
Thanks, Ariel
- Previous message: Failing jdk testcase: java/lang/Math/WorstCaseTests.java
- Next message: JNI UTF-8 encoding bug with some characters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]