PROPOSAL: Binary Literals (original) (raw)

Derek Foster vapor1 at teleport.com
Wed Mar 25 11:06:03 PDT 2009

Previous message: Proposal: Simplified syntax for dealing with parameterized types (correction to ALTERNATIVES section)
Next message: PROPOSAL: Binary Literals
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hmm. Second try at sending to the list. Let's see if this works. (In the meantime, I noticed that Bruce Chapman has mentioned something similar in his another proposal, so I think we are in agreement on this. This proposal should not be taken as to compete with his similar proposal: I'd quite like to see type suffixes for bytes, shorts, etc. added to Java, in addition to binary literals.) Anyway...

Add binary literals to Java.

AUTHOR(S): Derek Foster

OVERVIEW

In some programming domains, use of binary numbers (typically as bitmasks, bit-shifts, etc.) is very common. However, Java code, due to its C heritage, has traditionally forced programmers to represent numbers in only decimal, octal, or hexadecimal. (In practice, octal is rarely used, and is present mostly for backwards compatibility with C)

When the data being dealt with is fundamentally bit-oriented, however, using hexadecimal to represent ranges of bits requires an extra degree of translation for the programmer, and this can often become a source of errors. For instance, if a technical specification lists specific values of interest in binary (for example, in a compression encoding algorithm or in the specifications for a network protocol, or for communicating with a bitmapped hardware device) then a programmer coding to that specification must translate each such value from its binary representation into hexadecimal. Checking to see if this translation has been done correctly is accomplished by back-translating the numbers. In most cases, programmers do these translations in their heads, and HOPEFULLY get them right. however, errors can easily creep in, and re-verifying the results is not straightforward enough to be done frequently.

Furthermore, in many cases, the binary representations of numbers makes it much more clear what is actually intended than the hexadecimal one. For instance, this:

private static final int BITMASK = 0x1E;

does not immediately make it clear that the bitmask being declared comprises a single contiguous range of four bits.

In many cases, it would be more natural for the programmer to be able to write the numbers in binary in the source code, eliminating the need for manual translation to hexadecimal entirely.

FEATURE SUMMARY:

In addition to the existing "1" (decimal), "01" (octal) and "0x1" (hexadecimal) form of specifying numeric literals, a new form "0b1" (binary) would be added.

Note that this is the same syntax as has been used as an extension by the GCC C/C++ compilers for many years, and also is used in the Ruby language, as well as in the Python language.

MAJOR ADVANTAGE:

It is no longer necessary for programmers to translate binary numbers to and from hexadecimal in order to use them in Java programs.

MAJOR BENEFIT:

Code using bitwise operations is more readable and easier to verify against technical specifications that use binary numbers to specify constants.

Routines that are bit-oriented are easier to understand when an artifical translation to hexadecimal is not required in order to fulfill the constraints of the language.

MAJOR DISADVANTAGE:

Someone might incorrectly think that "0b1" represented the same value as hexadecimal number "0xB1". However, note that this problem has existed for octal/decimal for many years (confusion between "050" and "50") and does not seem to be a major issue.

ALTERNATIVES:

Users could continue to write the numbers as decimal, octal, or hexadecimal, and would continue to have the problems observed in this document.

Another alternative would be for code to translate at runtime from binary strings, such as:

int BITMASK = Integer.parseInt("00001110", 2);

Besides the obvious extra verbosity, there are several problems with this:

Calling a method such as Integer.parseInt at runtime will typically make it impossible for the compiler to inline the value of this constant, since its value has been taken from a runtime method call. Inlining is important, because code that does bitwise parsing is often very low-level code in tight loops that must execute quickly. (This is particularly the case for mobile applications and other applications that run on severely resource-constrained environments, which is one of the cases where binary numbers would be most valuable, since talking to low-level hardware is one of the primary use cases for this feature.)
Constants such as the above cannot be used as selectors in 'switch' statements.
Any errors in the string to be parsed (for instance, an extra space) will result in runtime exceptions, rather than compile-time errors as would have occurred in normal parsing. If such a value is declared 'static', this will result in some very ugly exceptions at runtime.

EXAMPLES:

// An 8-bit 'byte' literal. byte aByte = (byte)0b00100001;

// A 16-bit 'short' literal. short aShort = (short)0b1010000101000101;

// Some 32-bit 'int' literals. int anInt1 = 0b10100001010001011010000101000101; int anInt2 = 0b101; int anInt3 = 0B101; // The B can be upper or lower case as per the x in "0x45".

// A 64-bit 'long' literal. Note the "L" suffix, as would also be used // for a long in decimal, hexadecimal, or octal. long aLong = 0b01010000101000101101000010100010110100001010001011010000101000101L;

SIMPLE EXAMPLE:

class Foo { public static void main(String[] args) { System.out.println("The value 10100001 in decimal is " + 0b10100001); }

ADVANCED EXAMPLE:

// Binary constants could be used in code that needs to be // easily checkable against a specifications document, such // as this simulator for a hypothetical 8-bit microprocessor:

public State decodeInstruction(int instruction, State state) { if ((instruction & 0b11100000) == 0b00000000) { final int register = instruction & 0b00001111; switch (instruction & 0b11110000) { case 0b00000000: return state.nop(); case 0b00010000: return state.copyAccumTo(register); case 0b00100000: return state.addToAccum(register); case 0b00110000: return state.subFromAccum(register); case 0b01000000: return state.multiplyAccumBy(register); case 0b01010000: return state.divideAccumBy(register); case 0b01100000: return state.setAccumFrom(register); case 0b01110000: return state.returnFromCall(); default: throw new IllegalArgumentException(); } } else { final int address = instruction & 0b00011111; switch (instruction & 0b11100000) { case 0b00100000: return state.jumpTo(address); case 0b01000000: return state.jumpIfAccumZeroTo(address); case 0b01000000: return state.jumpIfAccumNonzeroTo(address); case 0b01100000: return state.setAccumFromMemory(address); case 0b10100000: return state.writeAccumToMemory(address); case 0b11000000: return state.callTo(address); default: throw new IllegalArgumentException(); } } }

// Binary literals can be used to make a bitmap more readable:

public static final short[] HAPPY_FACE = { (short)0b0000011111100000; (short)0b0000100000010000; (short)0b0001000000001000; (short)0b0010000000000100; (short)0b0100000000000010; (short)0b1000011001100001; (short)0b1000011001100001; (short)0b1000000000000001; (short)0b1000000000000001; (short)0b1001000000001001; (short)0b1000100000010001; (short)0b0100011111100010; (short)0b0010000000000100; (short)0b0001000000001000; (short)0b0000100000010000; (short)0b0000011111100000; }

// Binary literals can make relationships // among data more apparent than they would // be in hex or octal. // // For instance, what does the following // array contain? In hexadecimal, it's hard to tell: public static final int[] PHASES = { 0x31, 0x62, 0xC4, 0x89, 0x13, 0x26, 0x4C, 0x98 }

// In binary, it's obvious that a number is being // rotated left one bit at a time. public static final int[] PHASES = { 0b00110001, 0b01100010, 0b11000100, 0b10001001, 0b00010011, 0b00100110, 0b01001100, 0b10011000, }

DETAILS

SPECIFICATION:

Section 3.10.1 ("Integer Literals") of the JLS3 should be changed to add the following:

IntegerLiteral: DecimalIntegerLiteral HexIntegerLiteral
OctalIntegerLiteral BinaryIntegerLiteral // Added

BinaryIntegerLiteral: BinaryNumeral IntegerTypeSuffix_opt

BinaryNumeral: 0 b BinaryDigits 0 B BinaryDigits

BinaryDigits: BinaryDigit BinaryDigit BinaryDigits

BinaryDigit: one of 0 1

COMPILATION:

Binary literals would be compiled to class files in the same fashion as existing decimal, hexadecimal, and octal literals are. No special support or changes to the class file format are needed.

TESTING:

The feature can be tested in the same way as existing decimal, hexadecimal, and octal literals are: Create a bunch of constants in source code, including the maximum and minimum positive and negative values for integer and long types, and verify them at runtime to have the correct values.

LIBRARY SUPPORT:

The methods Integer.decode(String) and Long.decode(String) should be modified to parse binary numbers (as specified above) in addition to their existing support for decimal, hexadecimal, and octal numbers.

REFLECTIVE APIS:

No updates to the reflection APIs are needed.

OTHER CHANGES:

No other changes are needed.

MIGRATION:

Individual decimal, hexadecimal, or octal constants in existing code can be updated to binary as a programmer desires.

COMPATIBILITY

BREAKING CHANGES:

This feature would not break any existing programs, since the suggested syntax is currently considerd to be a compile-time error.

EXISTING PROGRAMS:

Class file format does not change, so existing programs can use class files compiled with the new feature without problems.

REFERENCES:

The GCC/G++ compiler, which already supports this syntax (as of version 4.3) as an extension to standard C/C++. http://gcc.gnu.org/gcc-4.3/changes.html

The Ruby language, which supports binary literals: http://wordaligned.org/articles/binary-literals

The Python language added binary literals in version 2.6: http://docs.python.org/dev/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax

EXISTING BUGS:

"Language support for literal numbers in binary and other bases" http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025288

URL FOR PROTOTYPE (optional):

None.

Previous message: Proposal: Simplified syntax for dealing with parameterized types (correction to ALTERNATIVES section)
Next message: PROPOSAL: Binary Literals
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the coin-dev mailing list