Request for review: Race conditions in java.nio.charset.Charset (original) (raw)
Martin Buchholz martinrb at google.com
Thu Oct 8 04:40:39 UTC 2009
- Previous message: Request for review: Race conditions in java.nio.charset.Charset
- Next message: Request for review: Race conditions in java.nio.charset.Charset
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
If you can show that a simple test program that appears to access only 2 charsets in fact causes accesses to 3 or 4, that is a serious problem with the 2-element cache.
People at Google are working on better caches, but I don't think they are quite ready today.
Perhaps, instead of a small charset cache, we could cache all the charsets, but for the large charsets like GB18030, we could, inside the charset implementation, cache the large data tables using a soft reference, and recompute as needed. Then most of the static memory used by an unused charset could be reclaimed.
In general, high quality caching is hard, much harder than it looks.
Martin
On Wed, Oct 7, 2009 at 15:58, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
I don't think it's worth a point fix here unless an actual wrong result can be demonstrated. I do think a more sophisticated charset cache would be good, but hard to get right.
The other point is the size of the cache, see http://bugs.sun.com/bugdatabase/viewbug.do?bugid=6795535. I have logged the usage of the Charset.lookup() method from a simple test which has only called ISO-8859-1 and IBM037 . As you can see, UTF-8 and cp1252 (default encoding on German Windows) is frequently requested from the VM, so IMO size 2 is too restrictive (note the different aliases UTF-8, utf-8 and UTF8): UTF-8 utf-8 UTF-8 Cp1252 UTF-8 UTF-8 UTF-8 UTF-8 UTF-8 UTF-8 UTF8 UTF8 Cp1252 Cp1252 Cp1252 Cp1252 Cp1252 Cp1252 Cp1252 Cp1252 Cp1252 Cp1252 Cp1252 Cp1252 UTF-8 IBM037 UTF-8 UTF-8 utf-8 ISO-8859-1 UTF-8 -Ulf
- Previous message: Request for review: Race conditions in java.nio.charset.Charset
- Next message: Request for review: Race conditions in java.nio.charset.Charset
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]