Review/comment needed for the new public java.util.Base64 class (original) (raw)

Xueming Shen xueming.shen at oracle.com
Fri Nov 30 22:13:40 UTC 2012


Ulf,

The base64 implementation is in TL right now. It does address some of the issue you suggested in this email. And I remember I did try use "byte[]" alphabets, which I don't recall bring us any benefit, but I'm not sure in which setup. The latest is at

http://cr.openjdk.java.net/~sherman/8004212/webrev/

in which I'm trying to fix a corner case of incorrect return value from decode(buf, buf).

The Base64 now is in TL does not necessary mean "the issue is closed". You can continue suggest/comment on the latest version for any possible performance improvement, bug fix and even API change, if necessary.

-Sherman

On 11/30/2012 02:01 PM, Ulf Zibis wrote:

Hi Sherman,

is this ssue still open ? -Ulf

Am 03.11.2012 21:33, schrieb Ulf Zibis: Am 30.10.2012 23:40, schrieb Xueming Shen:

I'm "confused" about the order of xxcode() and Xxcoder. In other places e.g. charsets, we have de... before en..., which is also true alphabetical

should not be an issue. javadoc output should be in alphabetic order. If it is really concerns you, I can do a copy/paste:-) Yes, it doesn't matter much, but would be a nice conform style, so for me "personally" I would like the copy/paste:-) - What (should) happen(s), if lineSeparator.length == 0 ? do nothing. expected? I would say, empty line separator should not be allowed, so you might check: Objects.requireNonEmpty(lineSeparator); 603 static { 604 Arrays.fill(fromBase64, -1); 605 for (int i = 0; i < Encoder.toBase64.length; i++)_ _606 fromBase64[Encoder.toBase64[i]] = i;_ _607 fromBase64['='] = -2;_ _608 }_ _This causes the inner Encoder class to be loaded, even if not needed. So maybe move the toBase64 table to the outer class._ _Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)?_ _understood._ _It seems you have mixed 2 tweaks to one. ;-) See additional paragraph at the end ..._ _but it appears the hotspot likes it the constant/fixed length lookup_ _table is inside encoder._ _Yes, but see my suggestion 12 lines below._ _Same as you suggestion in your previous email to use_ _String in source and expand it during runtime. It saves about 500 bytes but slows_ _the server vm._ _Are you really sure? As it only runs once at class init, JIT should not compile this code._ _Executing the bytecode to init the final int[], value for value, by interpreter should not be faster as expanding a String source in a loop._ _Those repeating lines of "isURL? ...." is also supposed to be a_ _performance tweak to eliminate the array boundary check in loop._ _Yep, so I was thinking about:_ _653 private final int[] base64;_ _654 private final boolean isMIME;_ _655_ _656 private Decoder(boolean isURL, boolean isMIME) {_ _657 base64 = isURL ? fromBase64URL : fromBase64;_ _658 this.isMIME = isMIME;_ _659 }_ _....._ _911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst) {_ _912 int[] base64 = this.base64; // local copy for performance reasons (but maybe superfluous if HotSpot is intelligent enough)_ _or:_ _911 private int decodeBuffer(ByteBuffer src, ByteBuffer dst, int[] base64) {_ _....._ _Why you mix the algorithms to check the padding? :_ _824 if (b == -2) { // padding byte_ _890 if (b == '=') {_ _It is supposed reduce one "if" check for normal base64 character inside the_ _loop. I'm not that sure it really helps though._ _925 if (b == '=') {_ _--> causes one additional "if" in the main path of the loop, so should be slower for regular input 859 if (b == -2) { // padding byte --> causes one additional "if" in the side path of the loop, so should only affect irregular input Maybe the following code is little faster as no loading of the constant '-2' is required: 858 if ((b = base64[b]) < 0) { 859 if (++b < 0) { // padding byte '=' Once again (the following was meant for the decode algorithm, not initialisation): Have you compared performance with fromBase64 as byte[] (including local 'b' in decode() as byte)? Retrieving the bytes by b = base64[x] then could be done without address-shift and smaller/faster LoadB operations could be used by JIT. In an int[] table, the index offset must be shifted by 2 before. Maybe this doesn't directly affect the performance on x86/IA64 CPU, but wastes space in CPU cache for other tasks as a side effect. On ARM architectures I imagine, directly addressing a byte-stepped table could be faster than addressing a 4-byte-stepped table. At least the footprint of the table would be smaller. Little nit: You could delete line 641 for conformity with 629. -Ulf



More information about the core-libs-dev mailing list