PROPOSAL: Unsigned Integer Widening Operator (original) (raw)
Bruce Chapman brucechapman at paradise.net.nz
Wed Mar 25 02:02:06 PDT 2009
- Previous message: PROPOSAL: Byte and Short Integer Literal Suffixes
- Next message: PROPOSAL: Unsigned Integer Widening Operator
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Title Unsigned Integer Widening Operator
latest html version at http://docs.google.com/Doc?id=dcvp3mkv_2k39wt5gf&hl
AUTHOR(S): Bruce Chapman
OVERVIEW
FEATURE SUMMARY: Add an unsigned widening operator to convert bytes (in particular), shorts, and chars (for completeness) to int while avoiding sign extension.
MAJOR ADVANTAGE:
Byte manipulation code becomes littered with (b & 0xFF) expressions in order to reverse the sign extension that occurs when a byte field or variable or array access appears on either side of an operator and is thus subject to widening conversion with its implicit sign extension. This masking with 0xFF can detract from the clarity of the code by masking the actual algorithm. It is the Java Language's rules and not the algorithm itself that demand this masking operation which can appear to be a redundant operation to the uninitiated.
It is highly intentional that the new operator (+) can be read as a cast to a positive.
MAJOR DISADVANTAGE: A new operator.
ALTERNATIVES:
explicit masking. If java.nio.ByteBuffer was extensible (it isn't) unsigned get and set methods could be added to hide the masking in an API. Extension methods could be employed to that end if they were implemented.
EXAMPLES SIMPLE EXAMPLE:
byte[] buffer = ...; int idx=...; int length=...;
int value=0;
for(int i = idx; i < idx + length; i++) {
value = (value << 8) | (buffer[i] & 0xff);
}
can be recoded as
for(int i = idx; i < idx + length; i++) {
value = (value << 8) | (+)buffer[i];
}
ADVANCED EXAMPLE:
private int getBerValueLength(byte[] contents, int idx) {
if((contents[idx] & 0x80) == 0) return contents[idx];
int lenlen = (+)contents[idx] ^ 0x80; // invert high bit which = 1
int result=0;
for(int i = idx+1; i < idx + 1 + lenlen; i++ ) {
result = (result << 8) | (+)contents[i];
}
return result;
}
DETAILS SPECIFICATION:
amend 15.15
The unary operators include +, -, ++, --, ~, !, unsigned integer widening operator and cast operators.
add the following to the grammars in 15.15
The following productions from §new section are repeated here for convenience:
UnsignedWideningExpression:
UnsignedIntegerWideningOperator UnaryExpression
UnsignedIntegerWideningOperator:
( + )
Add a new section to 15 - between "15.15 Unary Expressions" and "15.16 Cast Expressions" would seem ideal in terms of context and precedence level.
The unsigned integer widening operator is a unary operator which may be applied to expressions of type byte, short and char. It is a compile time error to apply this operator to other types.
UnsignedWideningExpression:
UnsignedIntegerWideningOperator UnaryExpression
UnsignedIntegerWideningOperator:
( + )
The unsigned integer widening operator converts its operand to type int. Unary numeric promotion (§) is NOT performed on the operand. For a byte operand, the lower order 8 bits of the resultant have the same values as in the operand. For short and char operands, the resultant's lower order 16 bits have the same value as the operand's. The remaining high order bits are set to zero. This is effectively a zero extend widening conversion and is equivalent to the following expression for byte operand x,
x & 0xFF
and equivalent to the following for a short or char operand y
y & 0xFFFF
Other sections have lists of operators for which various things apply. Add to these as appropriate - yet to be determined.
Note the specification above could also be ammended to allow the operator to zero extend an int to a long, however the utility value of this is uncertain. COMPILATION:
Compilation may be equivalent to the masking operation above. Hotspot could detect the redundant sign extend followed by masking out the sign extended bits and remove both. If that were the case the operator could be applied to every access of a byte field, variable or array to indicate treatment as unsigned byte, with no cost.
For a char, the operator is equivalent to a widening conversion to int. The new operator is permitted on a char expression because there is no reason to disallow it. However it would be equally effective if it did not apply to char.
TESTING:
There are no gnarly use cases, so testing is straight forward. It could be as simple as compiling and executing a main class with a handful of asserts.
LIBRARY SUPPORT:
No library support required.
REFLECTIVE APIS:
None
OTHER CHANGES:
none foreseen
MIGRATION:
COMPATIBILITY BREAKING CHANGES:
None
EXISTING PROGRAMS:
Tools could detect the specific masking operations used to zero extend a previously sign extended byte or short, and replace that with the new operator.
REFERENCES EXISTING BUGS:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4186775
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4879804
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4504839
URL FOR PROTOTYPE (optional): None
- Previous message: PROPOSAL: Byte and Short Integer Literal Suffixes
- Next message: PROPOSAL: Unsigned Integer Widening Operator
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]