Why don't we use jemalloc? (original) (raw)

Note: while switching to jemalloc might be unoptimal, it could be still useful to gather information about positive and negative implications of jemalloc usage. Let's do that in this issue. I am not advocating to switch to jemalloc (yet), but imo that is worth investigation.

In some situations that I observed, it consumes slightly more memory (~5%), but it is able to significantly reduce the memory usage by orders of magnitude in some cases (basically in a subset of cases where glibc behaves significantly unoptimal).

In some situations, jemalloc consumes significantly more memory though, but appears to be faster.

Testcase 1 (based on #21967):

'use strict';

const bs = 4 * 1024 * 1024; // 4 MiB const retained = []; let i = 0, flag = false;

function tick() { i++; if (i % 1000 === 0) { console.log(RSS [${i}]: ${process.memoryUsage().rss / 1024 / 1024} MiB); } retained.push(Buffer.allocUnsafe(bs)); if (i === 5000) { console.log('Clearing retained and enabling alloc'); retained.length = 0; flag = true; } if (flag) Buffer.alloc(bs); // Buffer.alloc(bs - 10) seems to be fine here if (i < 10000) setImmediate(tick); }

tick();

Atm (with Node.js v10.7.0) it produces the following results:

RSS [1000]: 35.140625 MiB RSS [2000]: 40.24609375 MiB RSS [3000]: 45.25390625 MiB RSS [4000]: 49.27734375 MiB RSS [5000]: 53.29296875 MiB Clearing retained and enabling alloc RSS [6000]: 993.45703125 MiB RSS [7000]: 2233.32421875 MiB RSS [8000]: 3499.56640625 MiB RSS [9000]: 4792.9140625 MiB RSS [10000]: 5997.30859375 MiB

I traced that down to C++ malloc() behavior (testcase in #21967 (comment)).

This is what happens just with LD_PRELOAD=/usr/lib/libjemalloc.so:

RSS [1000]: 36.640625 MiB RSS [2000]: 42.62109375 MiB RSS [3000]: 48.21875 MiB RSS [4000]: 52.38671875 MiB RSS [5000]: 56.7890625 MiB Clearing retained and enabling alloc RSS [6000]: 58.8828125 MiB RSS [7000]: 62.77734375 MiB RSS [8000]: 66.9140625 MiB RSS [9000]: 71.2421875 MiB RSS [10000]: 75.453125 MiB

Testcase 2:

const arr = []; for (let i = 0; i < 1e4; i++) arr.push(Buffer.alloc(1e5)); console.log(RSS: ${process.memoryUsage().rss / 1024 / 1024} MiB);

Normal — 697 MiB, jemalloc — 37 MiB.

Testcase 3 (jemalloc consumes more memory):

function foo() { let a; for (let i = 0; i < 5; i++) a = Buffer.alloc(1e8, 1); console.log(RSS: ${process.memoryUsage().rss / 1024 / 1024} MiB); } const start = process.hrtime(); foo(); foo(); gc(); gc(); console.log(RSS: ${process.memoryUsage().rss / 1024 / 1024} MiB); foo(); const time = process.hrtime(start); console.log('Time:', time[0] + time[1] * 1e-9);

Normal:

RSS: 506.8125 MiB
RSS: 507.98046875 MiB
RSS: 31.2421875 MiB
RSS: 507.8828125 MiB
Time: 0.774285919

jemalloc:

RSS: 507.87890625 MiB
RSS: 604.40625 MiB
RSS: 604.625 MiB
RSS: 604.625 MiB
Time: 0.349531212

Testcase 4:

'use strict';

let i = 0; function tick() { const a = Buffer.alloc(1e7); if (i++ >= 1e4) return; setImmediate(tick); }

tick();

Measured with /usr/bin/time -f '%M KiB, %e seconds' node testcase-4.js.
Normal (1e4 * 1e7): 129 576 KiB, 11.41 seconds.
jemalloc (1e4 * 1e7): 34 928 KiB, 3.33 seconds.
Normal (1e5 * 1e6): 109 196 KiB, 11.51 seconds.
jemalloc (1e5 * 1e6): 35 636 KiB, 4.13 seconds.

Testcase 5 (like 4, but now we fill the buffer with 1-s):

'use strict';

let i = 0; function tick() { const a = Buffer.alloc(1e7, 1); if (i++ >= 1e4) return; setImmediate(tick); }

tick();

Normal (1e4 * 1e7): 139 060 KiB, 14.38 seconds.
jemalloc (1e4 * 1e7): 170 308 KiB, 10.65 seconds.
Normal (1e5 * 1e6): 105 120 KiB, 12.25 seconds.
jemalloc (1e5 * 1e6): 112 548 KiB, 10.99 seconds.

Testcase 6 (from #8871, where @bnoordhuis mentioned jemalloc):

const zlib = require('zlib'); const payload = Buffer.from(JSON.stringify({ some:"data" })); for (let i =0; i < 30000; ++i) zlib.deflate(payload, () => {});

No improvement in this case and a 5% loss.
Normal: 3 022 496 KiB, 5.88 seconds.
jemalloc: 3 179 716 KiB, 5.91 seconds.

/cc @addaleax @bnoordhuis @mscdex