[OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements (original) (raw)

Jim Graham james.graham at oracle.com
Fri May 10 23:13:11 UTC 2013


Hi Laurent,

On 5/9/13 11:50 PM, Laurent Bourgès wrote:

Jim,

I think that the ScanLineIterator class is no more useful and could be merged into Renderer directly: I try to optimize these 2 code paths (crossing / crossing -> alpha) but it seems quite difficult as I must understand hotspot optimizations (assembler code)...

I was originally skeptical about breaking that part of the process into an iterator class as well.

For now I want to keep pisces in Java code as hotspot is efficient enough and probably the algorithm can be reworked a bit; few questions: - should edges be sorted by Ymax ONCE to avoid complete edges traversal to count crossings for each Y value:

156 if ((bucketcount & 0x1) != 0) { 157 int newCount = 0; 158 for (int i = 0, ecur; i < count; i++) {_ _159 ecur = ptrs[i];_ _* 160 if (edgesInt[ecur + YMAX] > cury) { * 161 ptrs[newCount++] = ecur; 162 } 163 } 164 count = newCount; 165 }

This does not traverse all edges, just the edges currently "in play" and it only does it for scanlines that had a recorded ymax on them (count is multiplied by 2 and then optionally the last bit is set if some edge ends on that scanline so we know whether or not to do the "remove expired edges" processing).

- why multiply x2 and divide /2 the crossings (+ rounding issues) ?

202 for (int i = 0, ecur, j; i < count; i++) {_ _203 ecur = ptrs[i];_ _204 curx = edges[ecur /* + CURX */];_ _205 edges[ecur /* + CURX */] = curx + edges[ecur + SLOPE];_ _206_ _* 207 cross = ((int) curx) << 1;_ _* 208 if (edgesInt[ecur + OR] != 0 /* > 0 */) { 209 cross |= 1;

The line above sets the bottom bit if the crossing is one orientation vs. the other so we know whether to add one or subtract one to the winding count. The crossings can then be sorted and the orientation flag is carried along with the values as they are sorted. The cost of this trick is having to shift the actual crossing coordinates by 1 to vacate the LSBit.

- last x pixel processing: could you explain me ? 712 int pixxmax = x1 >> SUBPIXELLGPOSITIONSX; 713 int tmp = (x0 & SUBPIXELMASKX); 714 alpha[pixx] += SUBPIXELPOSITIONSX - tmp; 715 alpha[pixx + 1] += tmp; 716 tmp = (x1 & SUBPIXELMASKX); 717 alpha[pixxmax] -= SUBPIXELPOSITIONSX - tmp; 718 alpha[pixxmax + 1] -= tmp;

Are you referring to the 2 += and the 2 -= for each end of the span? If an edge crosses in a given pixel at 5 subpixel positions after the start of that pixel, then it contributes a coverage of "SUBPIXEL_POS_X minus 5" in that pixel. But, starting with the following pixel, the total coverage it adds for those pixels until it reaches the right edge of the span is "SUBPIXEL_POSITIONS_X". However, we are recording deltas and the previous pixels only bumped our total coverage by "S_P_X - 5". So, we now need to bump the accumulated coverage by 5 in the following pixel so that the total added coverage is "S_P_X".

Basically the pair of += lines adds a total S_P_X to the coverage, but it splits that sum over two pixels - the one where the left edge first appeared and its following pixel. Similarly, the two -= statements subtract a total of S_P_X from the coverage total, and do so spread across 2 pixels. If the crossing happened right at the left edge of the pixel then tmp would be 0 and the second += or -= would be wasted, but that only happens 1 out of S_P_X times and the cost of testing is probably less than just adding tmp to the second pixel even if it is 0.

Also note that we need to have an array entry for alpha[max_x + 1] so that the second += and -= don't store off the end of the array. We don't need to use that value since we will stop our alpha accumulations at the entry for max_x, but testing to see if the "second pixel delta value" is needed is more expensive than just accumulating it into an unused array entry.

Finally, it seems that hotspot settings (CompileThreshold=1000 and -XX:aggressiveopts) are able to compile theses hotspots better ...

What about if we use the default settings as would most non-server apps?

Thanks; probably the edgeBucket / edgeBucketCount arrays could be merged into a single one to improve cache affinity.

Interesting.

FYI, I can write C/C++ code but I never practised JNI code. Does somebody could help us to port only these 2 hotspot methods ?

Port 2 Hotspot methods? I'm not sure what you are referring to here?

PS: I attend a conference next week (germany) so I will be less available to work on code but I will read my emails.

Thanks for the heads up - as long as you don't time out and switch off of this project permanently - that would be a shame... :(

        ...jim


More information about the 2d-dev mailing list