Warnings Cleanup in java.util. (more from hack day) (original) (raw)

Martijn Verburg martijnverburg at gmail.com
Sat Dec 3 03:05:23 PST 2011


Hi Sherman,

In order to keep this change within the scope of the intentions of the exercise I'm going to revert that section to what it was (I'll re-spin a patch). At this stage I won't add a @SuppressWarnings as I think this should be avoidable once it's looked at again in a little more depth.

The rest follows in-line. If this is starting to chew up your time then please don't feel obliged to answer, this is more for my own curiosities sake. This also isn't a suggestion to change the patch again, just theorising here :-)

On Friday, 2 December 2011, Xueming Shen <xueming.shen at oracle.com> wrote:

Martijn,

The proposed change is incorrect. + value = new String(vb, 0, 0, StandardCharsets.UTF8); First, shouldn't it at least be value = new String(vb, StandardCharsets.UTF8); or value = new String(vb, 0, vb.length, StandardCharsets.UTF8); Second, the "value" will be written out via dos.writeBytes(String) later, which only takes the low-byte of a each char of the target String object.

Right, so the need for the hi-byte constructor can be avoided, but by me ignoring the vb.length I introduced a bug, gotcha.

So the code was:

String value = e.getKey(); if (value != null) { byte[] vb = value.getBytes("utf8"); value = new String(vb, 0, 0, vb.length); }

My proposed (bad) change was:

String value = e.getKey(); if (value != null) { byte[] vb = value.getBytes(StandardCharsets.UTF_8); value = new String(vb, 0, 0, StandardCharsets.UTF_8); }

As you say, a better change would have been:

String value = e.getKey(); if (value != null) { byte[] vb = value.getBytes(StandardCharsets.UTF_8); value = new String(vb, 0, vb.length, StandardCharsets.UTF_8); }

Or in a more performing manner:

String value = e.getKey(); if (value != null) { // As at jdk8 build X - we don't use StandardCharsets.UTF_8 in getBytes for performance reasons byte[] vb = value.getBytes("utf8"); value = new String(vb, 0, vb.length, StandardCharsets.UTF_8); }

That "not properly" conversion is actually what we need here to make sure the output will be correctly in utf8. The only "reliable" replacement might be to use "iso8859-1" (not utf8), but I would recommend keep it un-touched.

Fair enough.

Thanks for clearing that up and apologies for causing pain :(

Cheers, Martijn



More information about the jdk8-dev mailing list