RFR: JDK-8039751: UTF-8 decoder fails to handle some edge cases correctly (original) (raw)

Mark Thomas markt at apache.org
Wed Apr 9 23:43:23 UTC 2014


On 09/04/2014 15:51, Xueming Shen wrote:

Hi,

Please help review the fix for JDK-8039751. Issue: https://bugs.openjdk.java.net/browse/JDK-8039751 webrev: http://cr.openjdk.java.net/~sherman/8039751/webrev/

This is the corner case (in 4 bytes sequence) we missed when fixing 7096080 [1]. The UTF8 decoder correctly returns the malformed length for some malformed 4-byte illegal byte sequence (via Decoder.malformedN(...)), but it fails to do so if there is no enough (< 4 bytes) bytes in input buffer (via isMalfromed42(...)) The proposed change fixes these corner cases. Hey Mark, my reading of tomcat's test case suggests "malformed 4-byte sequence" is the only thing left after the jdk8 fix, right?

Thanks for such a quick response.

I agree with your reading of the Tomcat test case. There are two slightly different edge cases here.

The first is the one I explained in detail in the bug report where you know from the first two bytes that whatever the next two bytes are, the result is going to be larger than the largest valid code point.

The second is where you know from the first two bytes that whatever the next two bytes are, the code point should have been encoded in fewer bytes.

If I am reading your additional test cases correctly, you have both of these covered.

Many thanks,

Mark

Thanks! -Sherman [1] https://bugs.openjdk.java.net/browse/JDK-7096080



More information about the core-libs-dev mailing list