Proposal: Large arrays (original) (raw)

Joshua Bloch jjb at google.com
Tue Mar 24 11:03:29 PDT 2009


Much as like the idea of 63-bit addressable arrays (hell even 32 would be nice, as opposed to the 31 we have now), I believe the source impact is large. Consider our old friend Arrays.binarySearch:

public static int binarySearch(char[] a, char key)

Searches the specified array of chars for the specified value using the binary search algorithm. The array must be sorted (as by the sort(char[])<http://java.sun.com/javase/6/docs/api/java/util/Arrays.html#sort%28char%5B%5D%29>method) prior to making this call. If it is not sorted, the results are undefined. If the array contains multiple elements with the specified value, there is no guarantee which one will be found.

*Parameters:*a - the array to be searchedkey - the value to be searched for *Returns:*index of the search key, if it is contained in the array; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the array: the index of the first element greater than the key, or a.length if all elements in the array are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found. I don't see how we could change the return value to long--it's binary incompatible. That means we need a second version of the call that can tolerate big arrays, and people have to remember to call it.

And what do we do if the original version is given an array that's longer than Integer.MAX_VALUE locations? I suppose we have to throw an IllegalArgumentException. So how much damage will this cause to the libraries? How many methods will need twins that tolerate large arrays? How much damage will it cause to user code? I don't think these are easy questions to answer.

          Josh

On Tue, Mar 24, 2009 at 10:52 AM, Joseph D. Darcy <Joe.Darcy at sun.com> wrote:

james lowden wrote: > Proposal: Large arrays > > AUTHOR(S): > > J. Lowden > > VERSION: > > 1.0 Initial version. > > > OVERVIEW > > FEATURE SUMMARY: > > Java arrays are accessed via 32-bit ints, resulting in a maximum theoretical array size of 2147483647 elements. While > > this is enough for most uses, some applications that need to handle large sequential data sets in memory would benefit > > from the ability to work with arrays indexed using 64-bit indices. As "simply" changing the current array syntax to use > > longs as indices rather than ints would have the potential to break a large amount of existing code,

I don't think the source compatibility impact is that large actually. Let's say all arrays can potentially be 64-bit now. The existing code that uses an int expression to index into the array will still index into the same element and negative indexes will still cause an exception. Now, how should the length field be set? I'd guess it should saturate to Integer.MAXVALUE for oversize arrays and a new "long size()" method could be added to return the full length. The JVM would need to be modified to handle the long indexes of course; perhaps the wide bytecode prefix could be put to use here. [snip] > ALTERNATIVES: > > Build your own class for storing long-indexes sequences, possibly as an array-or-arrays. > This would be eased if the bracket operator could be used for user-defined types too since today functionally an array is just a fixed-length map from int to some other type. -Joe



More information about the coin-dev mailing list