JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1) (original) (raw)

Xueming Shen xueming.shen at oracle.com
Thu Apr 10 19:17:22 UTC 2014

Previous message: JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1)
Next message: JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 04/10/2014 12:03 PM, Chris Hegarty wrote:

On 10 Apr 2014, at 19:50, Xueming Shen<xueming.shen at oracle.com> wrote:

On 04/10/2014 11:38 AM, Mike Duigou wrote: On Apr 10 2014, at 11:08 , Chris Hegarty<chris.hegarty at oracle.com> wrote:

On 10 Apr 2014, at 18:40, Mike Duigou<mike.duigou at oracle.com> wrote:

On Apr 10 2014, at 03:21 , Chris Hegarty<chris.hegarty at oracle.com> wrote: On 10 Apr 2014, at 11:03, Ulf Zibis<Ulf.Zibis at CoSoCo.de> wrote: Hi Chris, Am 10.04.2014 11:04, schrieb Chris Hegarty: Trivially, you could ( but of not have to ) use java.nio.charset.StandardCharsets.ISO88591 to avoid the cost of String to CharSet lookup. In earlier tests Sherman and I have found out, that the cost of initialization of a new charsets object is higher than the lookup of an existing object in the cache. And it's even better to use the same String instance for the lookup which was used to cache the charset. Interesting… thanks for let me know. Presumably, there is an assumption is StandardCharsets is not initialized elsewhere, by another dependency. Generally it's safe to assume that StandardCharsets will already be initialized. If it isn't initialized we should consider it an amortized cost. I'm which case why would the string version be more performant than the version that already takes the Charset? Doesn't the string version need to do a lookup? There is a cache in StringCoder that is only used in the byte[] getBytes(String charsetName) but not in the byte[] getBytes(Charset charset) case. The rationale in StringCodding::decode(Charset cs, byte[] ba, int off, int len) may need to be revisited as it is certainly surprising that the string constant charset name usage is faster than the CharSet constant. It's a surprising :-) In theory you can't cache the de/encoder of a charset from external world, as the same charset might return a different de/encoder next time. So it is decided to not cache the de/encoder for a coming charset back then. It might be reasonable to cache those from the StandardCharsets though. I would say that it is more than reasonable. ;-) And it is surprising to me too that this usage is not as fast as a constant string.

Charset.equals() does explicitly mention "same canonical name" as below

 /**
  * Tells whether or not this object is equal to another.
  *
  * <p> Two charsets are equal if, and only if, they have the same canonical
  * names.  A charset is never equal to any other type of object. </p>
  *
  * @return <tt>true</tt> if, and only if, this charset is equal to the
  *          given object
  */

But it is very reasonable :-) to assume someone might pass in a home-made charset implementation with the same canonical name as the one in our/jdk charset repository. Then we have another debate on which one should be used in this case.

-Sherman

Previous message: JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1)
Next message: JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the core-libs-dev mailing list