PROPOSAL: Binary Literals (original) (raw)

james lowden jl0235 at yahoo.com
Wed Mar 25 12:07:22 PDT 2009


Actually, that's a good idea in general for long numeric constants. 9_000_000_000_000_000L is easier to parse than 9000000000000000L.

--- On Wed, 3/25/09, Stephen Colebourne <scolebourne at joda.org> wrote:

From: Stephen Colebourne <scolebourne at joda.org> Subject: Re: PROPOSAL: Binary Literals To: coin-dev at openjdk.java.net Date: Wednesday, March 25, 2009, 1:50 PM See http://www.jroller.com/scolebourne/entry/changingjavaaddingsimplerprimitive for my take on this from long ago.

In particular, I'd suggest allowing a character to separate long binary strings: int anInt1 = 0b10100001010001011010000101000101; much more readable. Stephen 2009/3/25 Derek Foster <vapor1 at teleport.com>: > Hmm. Second try at sending to the list. Let's see if this works. (In the > meantime, I noticed that Bruce Chapman has mentioned something similar in his > another proposal, so I think we are in agreement on this. This proposal > should not be taken as to compete with his similar proposal: I'd quite like > to see type suffixes for bytes, shorts, etc. added to Java, in addition to > binary literals.) Anyway... > > > > > Add binary literals to Java. > > AUTHOR(S): Derek Foster > > OVERVIEW > > In some programming domains, use of binary numbers (typically as bitmasks, > bit-shifts, etc.) is very common. However, Java code, due to its C heritage, > has traditionally forced programmers to represent numbers in only decimal, > octal, or hexadecimal. (In practice, octal is rarely used, and is present > mostly for backwards compatibility with C) > > When the data being dealt with is fundamentally bit-oriented, however, using > hexadecimal to represent ranges of bits requires an extra degree of > translation for the programmer, and this can often become a source of errors. > For instance, if a technical specification lists specific values of interest > in binary (for example, in a compression encoding algorithm or in the > specifications for a network protocol, or for communicating with a bitmapped > hardware device) then a programmer coding to that specification must > translate each such value from its binary representation into hexadecimal. > Checking to see if this translation has been done correctly is accomplished > by back-translating the numbers. In most cases, programmers do these > translations in their heads, and HOPEFULLY get them right. however, errors > can easily creep in, and re-verifying the results is not straightforward > enough to be done frequently. > > Furthermore, in many cases, the binary representations of numbers makes it > much more clear what is actually intended than the hexadecimal one. For > instance, this: > > private static final int BITMASK = 0x1E; > > does not immediately make it clear that the bitmask being declared comprises > a single contiguous range of four bits. > > In many cases, it would be more natural for the programmer to be able to > write the numbers in binary in the source code, eliminating the need for > manual translation to hexadecimal entirely. > > > FEATURE SUMMARY: > > In addition to the existing "1" (decimal), "01" (octal) and "0x1" > (hexadecimal) form of specifying numeric literals, a new form "0b1" (binary) > would be added. > > Note that this is the same syntax as has been used as an extension by the GCC > C/C++ compilers for many years, and also is used in the Ruby language, as > well as in the Python language. > > > MAJOR ADVANTAGE: > > It is no longer necessary for programmers to translate binary numbers to and > from hexadecimal in order to use them in Java programs. > > > MAJOR BENEFIT: > > Code using bitwise operations is more readable and easier to verify against > technical specifications that use binary numbers to specify constants. > > Routines that are bit-oriented are easier to understand when an artifical > translation to hexadecimal is not required in order to fulfill the > constraints of the language. > > MAJOR DISADVANTAGE: > > Someone might incorrectly think that "0b1" represented the same value as > hexadecimal number "0xB1". However, note that this problem has existed for > octal/decimal for many years (confusion between "050" and "50") and does not > seem to be a major issue. > > > ALTERNATIVES: > > Users could continue to write the numbers as decimal, octal, or hexadecimal, > and would continue to have the problems observed in this document. > > Another alternative would be for code to translate at runtime from binary > strings, such as: > >   int BITMASK = Integer.parseInt("00001110", 2); > > Besides the obvious extra verbosity, there are several problems with this: > > * Calling a method such as Integer.parseInt at runtime will typically make it > impossible for the compiler to inline the value of this constant, since its > value has been taken from a runtime method call. Inlining is important, > because code that does bitwise parsing is often very low-level code in tight > loops that must execute quickly. (This is particularly the case for mobile > applications and other applications that run on severely resource-constrained > environments, which is one of the cases where binary numbers would be most > valuable, since talking to low-level hardware is one of the primary use cases > for this feature.) > > * Constants such as the above cannot be used as selectors in 'switch' > statements. > > * Any errors in the string to be parsed (for instance, an extra space) will > result in runtime exceptions, rather than compile-time errors as would have > occurred in normal parsing. If such a value is declared 'static', this will > result in some very ugly exceptions at runtime. > > > EXAMPLES: > > // An 8-bit 'byte' literal. > byte aByte = (byte)0b00100001; > > // A 16-bit 'short' literal. > short aShort = (short)0b1010000101000101; > > // Some 32-bit 'int' literals. > int anInt1 = 0b10100001010001011010000101000101; > int anInt2 = 0b101; > int anInt3 = 0B101; // The B can be upper or lower case as per the x in > "0x45". > > // A 64-bit 'long' literal. Note the "L" suffix, as would also be used > // for a long in decimal, hexadecimal, or octal. > long aLong = > 0b01010000101000101101000010100010110100001010001011010000101000101L; > > SIMPLE EXAMPLE: > > class Foo { > public static void main(String[] args) { >  System.out.println("The value 10100001 in decimal is " + 0b10100001); > } > > > ADVANCED EXAMPLE: > > // Binary constants could be used in code that needs to be > // easily checkable against a specifications document, such > // as this simulator for a hypothetical 8-bit microprocessor: > > public State decodeInstruction(int instruction, State state) { >  if ((instruction & 0b11100000) == 0b00000000) { >    final int register = instruction & 0b00001111; >    switch (instruction & 0b11110000) { >      case 0b00000000: return state.nop(); >      case 0b00010000: return state.copyAccumTo(register); >      case 0b00100000: return state.addToAccum(register); >      case 0b00110000: return state.subFromAccum(register); >      case 0b01000000: return state.multiplyAccumBy(register); >      case 0b01010000: return state.divideAccumBy(register); >      case 0b01100000: return state.setAccumFrom(register); >      case 0b01110000: return state.returnFromCall(); >      default: throw new IllegalArgumentException(); >    } >  } else { >    final int address = instruction & 0b00011111; >    switch (instruction & 0b11100000) { >      case 0b00100000: return state.jumpTo(address); >      case 0b01000000: return state.jumpIfAccumZeroTo(address); >      case 0b01000000: return state.jumpIfAccumNonzeroTo(address); >      case 0b01100000: return state.setAccumFromMemory(address); >      case 0b10100000: return state.writeAccumToMemory(address); >      case 0b11000000: return state.callTo(address); >      default: throw new IllegalArgumentException(); >    } >  } > } > > // Binary literals can be used to make a bitmap more readable: > > public static final short[] HAPPYFACE = { >   (short)0b0000011111100000; >   (short)0b0000100000010000; >   (short)0b0001000000001000; >   (short)0b0010000000000100; >   (short)0b0100000000000010; >   (short)0b1000011001100001; >   (short)0b1000011001100001; >   (short)0b1000000000000001; >   (short)0b1000000000000001; >   (short)0b1001000000001001; >   (short)0b1000100000010001; >   (short)0b0100011111100010; >   (short)0b0010000000000100; >   (short)0b0001000000001000; >   (short)0b0000100000010000; >   (short)0b0000011111100000; > } > > // Binary literals can make relationships > // among data more apparent than they would > // be in hex or octal. > // > // For instance, what does the following > // array contain? In hexadecimal, it's hard to tell: > public static final int[] PHASES = { >    0x31, 0x62, 0xC4, 0x89, 0x13, 0x26, 0x4C, 0x98 > } > > // In binary, it's obvious that a number is being > // rotated left one bit at a time. > public static final int[] PHASES = { >    0b00110001, >    0b01100010, >    0b11000100, >    0b10001001, >    0b00010011, >    0b00100110, >    0b01001100, >    0b10011000, > } > > > DETAILS > > SPECIFICATION: > > Section 3.10.1 ("Integer Literals") of the JLS3 should be changed to add the > following: > > IntegerLiteral: >        DecimalIntegerLiteral >        HexIntegerLiteral >        OctalIntegerLiteral >        BinaryIntegerLiteral         // Added > > BinaryIntegerLiteral: >        BinaryNumeral IntegerTypeSuffixopt > > BinaryNumeral: >        0 b BinaryDigits >        0 B BinaryDigits > > BinaryDigits: >        BinaryDigit >        BinaryDigit BinaryDigits > > BinaryDigit: one of >        0 1 > > COMPILATION: > > Binary literals would be compiled to class files in the same fashion as > existing decimal, hexadecimal, and octal literals are. No special support or > changes to the class file format are needed. > > TESTING: > > The feature can be tested in the same way as existing decimal, hexadecimal, > and octal literals are: Create a bunch of constants in source code, including > the maximum and minimum positive and negative values for integer and long > types, and verify them at runtime to have the correct values. > > > LIBRARY SUPPORT: > > The methods Integer.decode(String) and Long.decode(String) should be modified > to parse binary numbers (as specified above) in addition to their existing > support for decimal, hexadecimal, and octal numbers. > > > REFLECTIVE APIS: > > No updates to the reflection APIs are needed. > > > OTHER CHANGES: > > No other changes are needed. > > > MIGRATION: > > Individual decimal, hexadecimal, or octal constants in existing code can be > updated to binary as a programmer desires. > > > COMPATIBILITY > > > BREAKING CHANGES: > > This feature would not break any existing programs, since the suggested > syntax is currently considerd to be a compile-time error. > > > EXISTING PROGRAMS: > > Class file format does not change, so existing programs can use class files > compiled with the new feature without problems. > > > REFERENCES: > > The GCC/G++ compiler, which already supports this syntax (as of version 4.3) > as an extension to standard C/C++. > http://gcc.gnu.org/gcc-4.3/changes.html > > The Ruby language, which supports binary literals: > http://wordaligned.org/articles/binary-literals > > The Python language added binary literals in version 2.6: > http://docs.python.org/dev/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax > > EXISTING BUGS: > > "Language support for literal numbers in binary and other bases" > http://bugs.sun.com/bugdatabase/viewbug.do?bugid=5025288 > > URL FOR PROTOTYPE (optional): > > None. > >



More information about the coin-dev mailing list