Proposal: Large arrays (original) (raw)

james lowden jl0235 at yahoo.com
Tue Mar 24 10:05:26 PDT 2009


Not terribly relevant to the content of the proposal, but the fibonacci example should read:

long [[]] fib = new long [[3000000000L]]; fib [0] = 1; fib [1] = 1; for (long i = 2; i < fib.length; i++) fib [i] = fib [i - 1] + fib [i - 2];

--- On Tue, 3/24/09, james lowden <jl0235 at yahoo.com> wrote:

From: james lowden <jl0235 at yahoo.com> Subject: Proposal: Large arrays To: coin-dev at openjdk.java.net Date: Tuesday, March 24, 2009, 11:35 AM Proposal: Large arrays

AUTHOR(S): J. Lowden VERSION: 1.0 Initial version.

OVERVIEW FEATURE SUMMARY: Java arrays are accessed via 32-bit ints, resulting in a maximum theoretical array size of 2147483647 elements. While this is enough for most uses, some applications that need to handle large sequential data sets in memory would benefit from the ability to work with arrays indexed using 64-bit indices. As "simply" changing the current array syntax to use longs as indices rather than ints would have the potential to break a large amount of existing code, I'm not proposing doing any such thing. Instead, I'd like to see a separate but parallel array feature for creating and accessing both types of arrays. A rather boring example, which simply fills a byte array with hexadecimal "FF" values, is shown below: //int-indexed array byte [] foo = new byte [200000]; for (int i = 0; i < foo.length; i++)_ _foo [i] = (byte) 0xff;_ _//long-indexed array_ _byte [[]] foo2 = new byte [[20000000000L]];_ _for (long i = 0; i < foo2.length; i++)_ _foo2 [i] = (byte) 0xff;_ _Syntax for declaration, instantation, instanceof and_ _casting would use doubled square brackets to differentiate_ _it from_ _current array access. Single brackets can still be used_ _for access/mutation as the compiler should have sufficient_ _information to decide whether it can treat a specific_ _reference as an int-indexed or long-indexed_ _("large") array. The length field would be a long_ _rather than an int. Like standard arrays, large arrays_ _would derive directly from_ _java.lang.Object._ _MAJOR ADVANTAGE:_ _Java gains the ability to easily work with large sequential_ _data sets._ _MAJOR BENEFIT:_ _Declaring extremely large arrays simply isn't possible_ _right now; in cases where such a structure is desirable,_ _something_ _has to be hacked together. For applications dealing with_ _large data sets, the current upper limit on array size is_ _constricting, and will only grow more so as 64-bit systems_ _continue to become more common and RAM less expensive._ _MAJOR DISADVANTAGE:_ _This can't be implemented well solely via a compiler_ _patch; existing VMs would likely need to be changed to_ _support this_ _concept natively; new VM instructions parallel to the_ _existing array instructions would likely be required. Also,_ _a class_ _parallel to java.util.Arrays for performing standard_ _operations on large arrays would likely be desirable_ _(although_ _presumably fairly simple to implement.)_ _ALTERNATIVES:_ _Build your own class for storing long-indexes sequences,_ _possibly as an array-or-arrays._ _EXAMPLES_ _SIMPLE EXAMPLE:_ _// calculate the first 3 billion fibonacci numbers and_ _store them in an array_ _long [[]] fib = new long [[3000000000L]];_ _fib [0] = 0;_ _fib [1] = 0;_ _for (long i = 2; i < fib.length; i++)_ _fib [i] = fib [i - 1] + fib [i - 2];_ _ADVANCED EXAMPLE:_ _// this doesn't really do anything particular useful,_ _but does serve to show how casting and instanceof are_ _handled_ _byte [] foo = new byte [400000];_ _byte [[]] blah = new byte [[40000000000L]];_ _Object temp = Math.random () < 0.5 ? foo : blah;_ _if (foo instanceof byte [[]])_ _System.out.println (((byte [[]]) foo).length);_ _else_ _System.out.println (((byte []) foo).length);_ _DETAILS_ _SPECIFICATION:_ _Syntax-wise, the following expressions become legal:_ _SomeType [[]] foo; // declares foo as a long-indexed_ _"large" array of SomeType,_ _// where sometype is either a primitive or reference_ _type_ _foo = new SomeType [[somelongvalue]]; // instantiates a_ _large array of length_ _// somelongvalue, which is either a long_ _// or some type that can upconvert or unbox_ _// to a long_ _long somelong = foo.length; // large arrays will have a_ _"length" field, which is_ _// a long indicating the number of elements in the_ _array_ _foo instanceof SomeType [[]] // returns true if foo is a_ _large array_ _// of SomeType, false otherwise_ _(SomeType [[]]) foo // casts foo to a large array of_ _SomeType. If foo isn't_ _// a large array of SomeType or a subclass of SomeType,_ _// throws a ClassCastException_ _Large arrays are not castable or assignable to or from_ _int-indexed arrays of the same element type._ _int [[]] p = new int [40000]; // compiler error_ _int [] p = new int [[760000]]; // compiler error_ _int [] foo = (int []) (new int [[4040404040L]]); // throws_ _ClassCastException_ _int [[]] foo = (int [[]]) (new int [4040]); // throws_ _ClassCastException_ _Canonical class names for large arrays would be generated_ _in a similar fashion to those for standard arrays, but_ _using the opposite square bracket. The following table_ _shows some examples._ _declared type class name_ _int [[]] ]I_ _int [[]][[]] ]]I_ _Object [[]] ]Ljava.lang.Object;_ _The same set of indicator character (i.e., 'I' for_ _int, 'L' for reference types, etc.) as are currently_ _used for arrays_ _would be used for large arrays._ _A set of additional VM instructions parallel to the current_ _array instructions would be added to support this feature._ _The following table shows the current instruction, the_ _proposed instruction for large arrays, and any semantic_ _differences (reference_ _http://java.sun.com/docs/books/jvms/secondedition/html/Instructions2.doc.html):_ _array instruction proposed instruction differences_ _aaload lxaaload index value_ _becomes a long_ _aastore lxaastore index value_ _becomes a long_ _anewarray lxanewarray count value_ _becomes a long_ _arraylength lxarraylength return value_ _becomes a long_ _baload lxbaload index value_ _becomes a long_ _bastore lxbastore index value_ _becomes a long_ _caload lxcaload index value_ _becomes a long_ _castore lxcastore index value_ _becomes a long_ _daload lxdaload index value_ _becomes a long_ _dastore lxdastore index value_ _becomes a long_ _faload lxfaload index value_ _becomes a long_ _fastore lxfastore index value_ _becomes a long_ _iaload lxiaload index value_ _becomes a long_ _iastore lxiastore index value_ _becomes a long_ _laload lxlaload index value_ _becomes a long_ _lastore lxlastore index value_ _becomes a long_ _multianewarray lxmultianewarray countN values_ _become longs_ _newarray lxnewarray count value_ _becomes a long_ _saload lxsaload index value_ _becomes a long_ _sastore lxsastore index value_ _becomes a long_ _COMPILATION:_ _Compilation would be parallel to the present mechanism for_ _compiling arrays, but would output the lx* VM instructions_ _listed above for large arrays instead of the current array_ _instructions._ _TESTING:_ _Any test sufficient to test support for current arrays_ _should work for this as well._ _LIBRARY SUPPORT:_ _It would probably be desirable to create large array_ _versions of the various java.util.Arrays methods, whether_ _that be done by adding methods to the existing_ _java.util.Arrays class or putting them somewhere new. At_ _some point in the future, large java.util.List might be a desirable library feature, but that is beyond the scope of this proposal. REFLECTIVE APIS: An additional class (LargeArray or something of that sort) would need to be added to the java.lang.reflect package. This would have a similar set of methods and constructors to java.lang.reflect.Array, but with long "index" and "length" parameters where appropriate. OTHER CHANGES: VM needs to support the above-listed large array instructions. MIGRATION: Existing "hacks" to provide this kind of behavior could be replaced with large arrays. COMPATIBILITY BREAKING CHANGES: None. This feature simple doesn't exist; the proposed declaration/instantation syntax currently results in a compiler error. EXISTING PROGRAMS: No changes. REFERENCES EXISTING BUGS: 4880587 (64-Bit indexing for arrays) URL FOR PROTOTYPE (optional): None.



More information about the coin-dev mailing list