(original) (raw)
Von meinem iPhone gesendet
Hi,
I have done more experiments to see the impact of
CodeCacheMinBlockLength and CodeCacheSegmentSize.
Both factors have an impact on the length of the freelist as well as
on the memory that is possibly wasted.
The table below contains detailed results. Here is a description of
the numbers and how they are
calculated:
* freelist length: number of HeapBlocks that are in the freelist
when the program finishes
* freelist[kb] : total memory [kB] that is in the freelist when
the program finishes.
* unused bytes in cb: unused bytes in all CodeBlob that are in the
code cache when the program
finishes. This number is calculated by
substracting the size of the HeapBlock in
which the nmethod is stored from the nmethod
size. Note that the HeapBlock size is
a multiple of CodeCacheMinBlockLength *
CodeCacheSegmentSize.
* segmap[kB]: size of the segment map that is used to map addresses
to HeapBlocks (i.e., find the
beginning of an nmethod). Increasing
CodeCacheSegmentSize decreases the segmap
size. For example, a CodeCacheSegmentSize of
32 bytes requires 32kB of segmap
memory per allocated MB in the code cache. A
CodeCacheSegmentSize of 64 bytes
requires 16kB of segmap memory per allocated
MB in the code cache....
max_used: maximum allocated memory in the code cache.
wasted_memory: =SUM(freelist + unused bytes in cb + segmap)
memory overhead = max_used / wasted_memory
The executive summary of the results is that increasing
CodeCacheSegmentSize has no negative
impact on the memory overhead (also no positive). Increasing
CodeCacheSegmentSize reduces
the freelist length, which makes searching the freelist faster.
Note that the results obtained with a modified freelist search
algorithm. In the changed version,
the compiler chooses the first block that is large enough from the
freelist (first-fit). In the old version,
the compiler looked for the smallest possible block in the freelist
into which the code fits (best-fit).
My experiments indicate that best-fit does not provide better
results (less memory overhead) than
first-fit.
To summarize, switching to a larger CodeCacheSegmentSize seems
reasonable.
Here are the detailed results:
failing test case
4 Blocks, 64 bytes
freelist length
freelist[kB]
unused bytes in cb
segmap[kB]
max_used
wasted
memory overhead
3085
2299
902
274
16436
3475
21.14%
3993
3366
887
283
16959
4536
26.75%
3843
2204
900
273
16377
3377
20.62%
3859
2260
898
273
16382
3431
20.94%
3860
2250
897
273
16385
3420
20.87%
22.07%
4 Blocks, 128 bytes
freelist length
freelist[kB]
unused bytes in cb
segmap[kB]
max_used
wasted
memory overhead
474
1020
2073
137
17451
3230
18.51%
504
1192
2064
136
17413
3392
19.48%
484
1188
2064
126
17414
3378
19.40%
438
1029
2061
136
17399
3226
18.54%
0
18.98%
Nashorn
4 Blocks, 64 bytes
freelist length
freelist[kB]
unused bytes in cb
segmap[kB]
max_used
wasted
memory overhead
709
1190
662
1198
76118
3050
4.01%
688
4200
635
1234
78448
6069
7.74%
707
2617
648
1178
74343
4443
5.98%
685
1703
660
1205
76903
3568
4.64%
760
1638
675
1174
74563
3487
4.68%
5.41%
4 Blocks, 128 bytes
freelist length
freelist[kB]
unused bytes in cb
segmap[kB]
max_used
wasted
memory overhead
206
824
1253
607
77469
2684
3.46%
247
2019
1265
583
74017
3867
5.22%
239
958
1230
641
81588
2829
3.47%
226
1477
1246
595
76119
3318
4.36%
225
2390
1239
596
76051
4225
5.56%
4.41%
compiler.compiler
4 Blocks, 64 bytes
freelist length
freelist[kB]
unused bytes in cb
segmap[kB]
max_used
wasted
memory overhead
440
943
263
298
18133
1504
8.29%
458
480
272
295
18443
1047
5.68%
536
1278
260
306
18776
1844
9.82%
426
684
268
304
18789
1256
6.68%
503
1430
258
310
18872
1998
10.59%
8.21%
Average
4 Blocks, 128 bytes
freelist length
freelist[kB]
unused bytes in cb
segmap[kB]
max_used
wasted
memory overhead
163
984
510
157
19233
1651
8.58%
132
729
492
151
18614
1372
7.37%
187
1212
498
152
18630
1862
9.99%
198
1268
496
155
18974
1919
10.11%
225
1268
496
152
18679
1916
10.26%
9.26%
On 02/05/2014 07:57 PM, Vladimir Kozlov
wrote:
On
2/5/14 8:28 AM, Albert wrote:
Hi Vladimir,
thanks for looking at this. I've done the proposed measurements.
The
code which I used to
get the data is included in the following webrev:
http://cr.openjdk.java.net/~anoll/8029799/webrev.01/
Good.
I think some people might be interested in getting that data, so
we
might want to keep
that additional output. The exact output format can be changed
later
(JDK-8005885).
I agree that it is useful information.
Here are the results:
- failing test case:
- original: allocated in freelist: 2168kB, unused bytes in
CodeBlob:
818kB, max_used: 21983kB
- patch : alloacted in freelist: 1123kB, unused bytes in
CodeBlob:
2188kB, max_used: 17572kB
- nashorn:
- original : allocated in freelist: 2426kB, unused bytes in
CodeBlob:
1769kB, max_used: 201886kB
- patch : allocated in freelist: 1150kB, unused bytes in
CodeBlob:
3458kB, max_used: 202394kB
- SPECJVM2008: compiler.compiler:
- original : allocated in freelist: 168kB, unused bytes in
CodeBlob: 342kB, max_used: 19837kB
- patch : allocated in freelist: 873kB, unused bytes in
CodeBlob: 671kB, max_used: 21184kB
The minimum size that can be allocated from the code cache is
platform-dependent.
I.e., the minimum size depends on CodeCacheSegmentSize and
CodeCacheMinBlockLength.
On x86, for example, the min. allocatable size from the code
cache is
64*4=256bytes.
There is this comment in CodeHeap::search_freelist():
// Don't leave anything on the freelist smaller than
CodeCacheMinBlockLength.
What happens if we scale down CodeCacheMinBlockLength when we
increase CodeCacheSegmentSize to keep the same bytes size of
minimum block?:
+ FLAG_SET_DEFAULT(CodeCacheSegmentSize, CodeCacheSegmentSize
* 2);
+ FLAG_SET_DEFAULT(CodeCacheMinBlockLength,
CodeCacheMinBlockLength/2);
Based on your table below those small nmethods will use only 256
bytes blocks instead of 512 (128*4).
Note for C1 in Client VM CodeCacheMinBlockLength is 1. I don't
know why for C2 it is 4. Could you also try
CodeCacheMinBlockLength = 1?
All above is with CodeCacheSegmentSize 128 bytes.
The size of adapters ranges from 400b to
600b.
Here is the beginning of the nmethod size distribution of the
failing
test case:
Is it possible it is in segments number and not in bytes? If it
really bytes what such (32-48 bytes) nmethods look like?
Thanks,
Vladimir
nmethod size distribution (non-zombie java)
-------------------------------------------------
0-16 bytes 0[bytes]
16-32 bytes 0
32-48 bytes 45
48-64 bytes 0
64-80 bytes 41
80-96 bytes 0
96-112 bytes 6247
112-128 bytes 0
128-144 bytes 249
144-160 bytes 0
160-176 bytes 139
176-192 bytes 0
192-208 bytes 177
208-224 bytes 0
224-240 bytes 180
240-256 bytes 0
...
I do not see a problem for increasing the CodeCacheSegmentSize
if tiered
compilation
is enabled.
What do you think?
Best,
Albert
On 02/04/2014 05:52 PM, Vladimir Kozlov wrote:
I think the suggestion is reasonable
since we increase CodeCache *5
for Tiered.
Albert, is it possible to collect data how much space is
wasted in %
before and after this change: free space in which we can't
allocate +
unused bytes at the end of nmethods/adapters? Can we squeeze
an
adapter into 64 bytes?
Thanks,
Vladimir
On 2/4/14 7:41 AM, Albert wrote:
Hi,
could I get reviews for this patch (nightly failure)?
webrev: http://cr.openjdk.java.net/~anoll/8029799/webrev.00/
bug: https://bugs.openjdk.java.net/browse/JDK-8029799
problem: The freelist of the code cache exceeds 10'000
items, which
results in a VM warning.
The problem behind the warning is that the
freelist
is populated by a large number
of small free blocks. For example, in
failing test
case (see header), the freelist grows
up to more than 3500 items where the
largest item on
the list is 9 segments (one segment
is 64 bytes). That experiment was done on
my laptop.
Such a large freelist can indeed be
a performance problem, since we use a
linear search
to traverse the freelist.
solution: One way to solve the problem is to increase the
minimal
allocation size in the code cache.
This can be done by two means: we can
increase
CodeCacheMinBlockLength and/or
CodeCacheSegmentSize. This patch follows
the latter
approach, since increasing
CodeCacheSegmentSize decreases the size
that is
required by the segment map. More
concretely, the patch doubles the
CodeCacheSegmentSize from 64 byte to 128 bytes
if tiered compilation is enabled.
The patch also contains an optimization in
the
freelist search (stop searching if we found
the appropriate size) and contains some
code cleanups.
testing: With the proposed change, the size of the
freelist is
reduced to 200 items. There is only
a slight increase in memory required by
code cache
by at most 3% (all data measured
for the failing test case on a Linux 64-bit
system,
4 cores).
To summarize, increasing the minimum
allocation size
in the code cache results in
potentially more unused memory in the code
cache due
to unused bits at the end of
an nmethod. The advantage is that we
potentially
have less fragmentation.
proposal: - I think we could remove CodeCacheMinBlockLength
without
loss of generality or usability
and instead adapt the parameter
CodeCacheSegmentSize at Vm startup.
Any opinions?
Many thanks in advance,
Albert