(original) (raw)

My previous mail contains an error. The size of a HeapBlock must be a multiple of CodeCacheSegmentSize and at least CodeCacheSegmentSize \* CodeCacheMinBlockLength.

Albert

Von meinem iPhone gesendet

Am 06.02.2014 um 17:32 schrieb Albert <albert.noll@oracle.com>:

Hi,

I have done more experiments to see the impact of
CodeCacheMinBlockLength and CodeCacheSegmentSize.

Both factors have an impact on the length of the freelist as well as
on the memory that is possibly wasted.

The table below contains detailed results. Here is a description of
the numbers and how they are

calculated:

* freelist length: number of HeapBlocks that are in the freelist
when the program finishes

* freelist[kb]     : total memory [kB] that is in the freelist when
the program finishes.

* unused bytes in cb: unused bytes in all CodeBlob that are in the
code cache when the program

                        finishes. This number is calculated by
substracting the size of the HeapBlock in

                        which the nmethod is stored from the nmethod
size. Note that the HeapBlock size is

                        a multiple of CodeCacheMinBlockLength *
CodeCacheSegmentSize.

* segmap[kB]: size of the segment map that is used to map addresses
to HeapBlocks (i.e., find the

                        beginning of an nmethod). Increasing
CodeCacheSegmentSize decreases the segmap

                        size. For example, a CodeCacheSegmentSize of
32 bytes requires 32kB of segmap

                        memory per allocated MB in the code cache. A
CodeCacheSegmentSize of 64 bytes

                        requires 16kB of segmap memory per allocated
MB in the code cache....

max_used: maximum allocated memory in the code cache.

wasted_memory: =SUM(freelist + unused bytes in cb + segmap)

memory overhead = max_used / wasted_memory

The executive summary of the results is that increasing
CodeCacheSegmentSize has no negative

impact on the memory overhead (also no positive). Increasing
CodeCacheSegmentSize reduces

the freelist length, which makes searching the freelist faster.

Note that the results obtained with a modified freelist search
algorithm. In the changed version,

the compiler chooses the first block that is large enough from the
freelist (first-fit). In the old version,

the compiler looked for the smallest possible block in the freelist
into which the code fits (best-fit).

My experiments indicate that best-fit does not provide better
results (less memory overhead) than

first-fit.

To summarize, switching to a larger CodeCacheSegmentSize seems
reasonable.

Here are the detailed results:

failing test case

4 Blocks, 64 bytes

freelist length freelist[kB] unused bytes in cb segmap[kB] max_used

wasted memory overhead

3085 2299 902 274 16436

3475 21.14%

3993 3366 887 283 16959

4536 26.75%

3843 2204 900 273 16377

3377 20.62%

3859 2260 898 273 16382

3431 20.94%

3860 2250 897 273 16385

3420 20.87%

22.07%

4 Blocks, 128 bytes

freelist length freelist[kB] unused bytes in cb segmap[kB] max_used

wasted memory overhead

474 1020 2073 137 17451

3230 18.51%

504 1192 2064 136 17413

3392 19.48%

484 1188 2064 126 17414

3378 19.40%

438 1029 2061 136 17399

3226 18.54%

0 18.98%

Nashorn

4 Blocks, 64 bytes

freelist length freelist[kB] unused bytes in cb segmap[kB] max_used

wasted memory overhead

709 1190 662 1198 76118

3050 4.01%

688 4200 635 1234 78448

6069 7.74%

707 2617 648 1178 74343

4443 5.98%

685 1703 660 1205 76903

3568 4.64%

760 1638 675 1174 74563

3487 4.68%

5.41%

4 Blocks, 128 bytes

freelist length freelist[kB] unused bytes in cb segmap[kB] max_used

wasted memory overhead

206 824 1253 607 77469

2684 3.46%

247 2019 1265 583 74017

3867 5.22%

239 958 1230 641 81588

2829 3.47%

226 1477 1246 595 76119

3318 4.36%

225 2390 1239 596 76051

4225 5.56%

4.41%

compiler.compiler

4 Blocks, 64 bytes

freelist length freelist[kB] unused bytes in cb segmap[kB] max_used

wasted memory overhead

440 943 263 298 18133

1504 8.29%

458 480 272 295 18443

1047 5.68%

536 1278 260 306 18776

1844 9.82%

426 684 268 304 18789

1256 6.68%

503 1430 258 310 18872

1998 10.59%

8.21% Average

4 Blocks, 128 bytes

freelist length freelist[kB] unused bytes in cb segmap[kB] max_used

wasted memory overhead

163 984 510 157 19233

1651 8.58%

132 729 492 151 18614

1372 7.37%

187 1212 498 152 18630

1862 9.99%

198 1268 496 155 18974

1919 10.11%

225 1268 496 152 18679

1916 10.26%

9.26%

On 02/05/2014 07:57 PM, Vladimir Kozlov
wrote:

On
2/5/14 8:28 AM, Albert wrote:

Hi Vladimir,

thanks for looking at this. I've done the proposed measurements.
The

code which I used to

get the data is included in the following webrev:

http://cr.openjdk.java.net/~anoll/8029799/webrev.01/

Good.

I think some people might be interested in getting that data, so
we

might want to keep

that additional output. The exact output format can be changed
later

(JDK-8005885).

I agree that it is useful information.

Here are the results:

- failing test case:

    - original: allocated in freelist: 2168kB, unused bytes in
CodeBlob:

818kB,   max_used: 21983kB

    - patch   : alloacted in freelist: 1123kB, unused bytes in
CodeBlob:

2188kB, max_used: 17572kB

- nashorn:

   - original : allocated in freelist: 2426kB, unused bytes in
CodeBlob:

1769kB, max_used: 201886kB

   - patch    : allocated in freelist: 1150kB, unused bytes in
CodeBlob:

3458kB, max_used: 202394kB

- SPECJVM2008: compiler.compiler:

   - original : allocated in freelist: 168kB, unused bytes in

CodeBlob: 342kB, max_used: 19837kB

   - patch     : allocated in freelist: 873kB, unused bytes in

CodeBlob: 671kB, max_used: 21184kB

The minimum size that can be allocated from the code cache is

platform-dependent.

I.e., the minimum size depends on CodeCacheSegmentSize and

CodeCacheMinBlockLength.

On x86, for example, the min. allocatable size from the code
cache is

64*4=256bytes.

There is this comment in CodeHeap::search_freelist():

// Don't leave anything on the freelist smaller than
CodeCacheMinBlockLength.

What happens if we scale down CodeCacheMinBlockLength when we
increase CodeCacheSegmentSize to keep the same bytes size of
minimum block?:

+     FLAG_SET_DEFAULT(CodeCacheSegmentSize, CodeCacheSegmentSize
* 2);

+     FLAG_SET_DEFAULT(CodeCacheMinBlockLength,
CodeCacheMinBlockLength/2);

Based on your table below those small nmethods will use only 256
bytes blocks instead of 512 (128*4).

Note for C1 in Client VM CodeCacheMinBlockLength is 1. I don't
know why for C2 it is 4. Could you also try
CodeCacheMinBlockLength = 1?

All above is with CodeCacheSegmentSize 128 bytes.

The size of adapters ranges from 400b to
600b.

Here is the beginning of the nmethod size distribution of the
failing

test case:

Is it possible it is in segments number and not in bytes? If it
really bytes what such (32-48 bytes) nmethods look like?

Thanks,

Vladimir

nmethod size distribution (non-zombie java)

-------------------------------------------------

0-16 bytes                                0[bytes]

16-32 bytes                                0

32-48 bytes                                45

48-64 bytes                                0

64-80 bytes                                41

80-96 bytes                                0

96-112 bytes                               6247

112-128 bytes                               0

128-144 bytes                               249

144-160 bytes                               0

160-176 bytes                               139

176-192 bytes                               0

192-208 bytes                               177

208-224 bytes                               0

224-240 bytes                               180

240-256 bytes                               0

...

I do not see a problem for increasing the CodeCacheSegmentSize
if tiered

compilation

is enabled.

What do you think?

Best,

Albert

On 02/04/2014 05:52 PM, Vladimir Kozlov wrote:

I think the suggestion is reasonable
since we increase CodeCache *5

for Tiered.

Albert, is it possible to collect data how much space is
wasted in %

before and after this change: free space in which we can't
allocate +

unused bytes at the end of nmethods/adapters? Can we squeeze
an

adapter into 64 bytes?

Thanks,

Vladimir

On 2/4/14 7:41 AM, Albert wrote:

Hi,

could I get reviews for this patch (nightly failure)?

webrev: http://cr.openjdk.java.net/~anoll/8029799/webrev.00/

bug: https://bugs.openjdk.java.net/browse/JDK-8029799

problem: The freelist of the code cache exceeds 10'000
items, which

results in a VM warning.

                 The problem behind the warning is that the
freelist

is populated by a large number

                 of small free blocks. For example, in
failing test

case (see header), the freelist grows

                 up to more than 3500 items where the
largest item on

the list is 9 segments (one segment

                 is 64 bytes). That experiment was done on
my laptop.

Such a large freelist can indeed be

                 a performance problem, since we use a
linear search

to traverse the freelist.

solution: One way to solve the problem is to increase the
minimal

allocation size in the code cache.

                 This can be done by two means: we can
increase

CodeCacheMinBlockLength and/or

                 CodeCacheSegmentSize. This patch follows
the latter

approach, since increasing

                 CodeCacheSegmentSize decreases the size
that is

required by the segment map. More

                 concretely, the patch doubles the

CodeCacheSegmentSize from 64 byte to 128 bytes

                 if tiered compilation is enabled.

                 The patch also contains an optimization in
the

freelist search (stop searching if we found

                 the appropriate size) and contains some
code cleanups.

testing:    With the proposed change, the size of the
freelist is

reduced to 200 items. There is only

                 a slight increase in memory required by
code cache

by at most 3% (all data measured

                 for the failing test case on a Linux 64-bit
system,

4 cores).

                 To summarize, increasing the minimum
allocation size

in the code cache results in

                 potentially more unused memory in the code
cache due

to unused bits at the end of

                 an nmethod. The advantage is that we
potentially

have less fragmentation.

proposal: - I think we could remove CodeCacheMinBlockLength
without

loss of generality or usability

                   and instead adapt the parameter

CodeCacheSegmentSize at Vm startup.

                   Any opinions?

Many thanks in advance,

Albert

		failing test case

	4 Blocks, 64 bytes
freelist length	freelist[kB]	unused bytes in cb	segmap[kB]	max_used	wasted	memory overhead
3085	2299	902	274	16436	3475	21.14%
3993	3366	887	283	16959	4536	26.75%
3843	2204	900	273	16377	3377	20.62%
3859	2260	898	273	16382	3431	20.94%
3860	2250	897	273	16385	3420	20.87%
						22.07%

	4 Blocks, 128 bytes
freelist length	freelist[kB]	unused bytes in cb	segmap[kB]	max_used	wasted	memory overhead
474	1020	2073	137	17451	3230	18.51%
504	1192	2064	136	17413	3392	19.48%
484	1188	2064	126	17414	3378	19.40%
438	1029	2061	136	17399	3226	18.54%
					0	18.98%

		Nashorn

	4 Blocks, 64 bytes
freelist length	freelist[kB]	unused bytes in cb	segmap[kB]	max_used	wasted	memory overhead
709	1190	662	1198	76118	3050	4.01%
688	4200	635	1234	78448	6069	7.74%
707	2617	648	1178	74343	4443	5.98%
685	1703	660	1205	76903	3568	4.64%
760	1638	675	1174	74563	3487	4.68%
						5.41%

	4 Blocks, 128 bytes
freelist length	freelist[kB]	unused bytes in cb	segmap[kB]	max_used	wasted	memory overhead
206	824	1253	607	77469	2684	3.46%
247	2019	1265	583	74017	3867	5.22%
239	958	1230	641	81588	2829	3.47%
226	1477	1246	595	76119	3318	4.36%
225	2390	1239	596	76051	4225	5.56%
						4.41%
		compiler.compiler

	4 Blocks, 64 bytes
freelist length	freelist[kB]	unused bytes in cb	segmap[kB]	max_used	wasted	memory overhead
440	943	263	298	18133	1504	8.29%
458	480	272	295	18443	1047	5.68%
536	1278	260	306	18776	1844	9.82%
426	684	268	304	18789	1256	6.68%
503	1430	258	310	18872	1998	10.59%
						8.21%	Average

	4 Blocks, 128 bytes
freelist length	freelist[kB]	unused bytes in cb	segmap[kB]	max_used	wasted	memory overhead
163	984	510	157	19233	1651	8.58%
132	729	492	151	18614	1372	7.37%
187	1212	498	152	18630	1862	9.99%
198	1268	496	155	18974	1919	10.11%
225	1268	496	152	18679	1916	10.26%
						9.26%