msg261781 - (view) |
Author: Damian Myerscough (DamianMyerscough) |
Date: 2016-03-14 21:54 |
I have been running a simple script inside a Docker container to cause the container to trigger oom-killer. >>> mem = {} >>> for i in range(65535): ... mem[i] = "A" * 65535 When oom-killer is trigger I see a large number of involuntary context switches and major page faults which causes a spike in CPU and disk. # /usr/bin/time --verbose python Python 2.7.9 (default, Mar 1 2015, 12:57:24) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> mem = {} >>> for i in range(65535): ... mem[i] = "A" * 65535 ... Command terminated by signal 9 Command being timed: "python" User time (seconds): 0.13 System time (seconds): 3.83 Percent of CPU this job got: 17% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:22.94 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2096516 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 33536 Minor (reclaiming a frame) page faults: 524545 Voluntary context switches: 30706 Involuntary context switches: 137852 Swaps: 0 File system inputs: 8499072 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 I tried the same test using NodeJS/C++ and I could see a lot less involuntary context switches and major page faults which indicates this could be a Python issue. # /usr/bin/time --verbose ./alloc_forever 1024 5 524288000 initial_alloc_amount: 1024, sleep_duration: 5, alloc_amount: 524288000 memory: 501.07MB memory: 1001.29MB memory: 1501.19MB memory: 2001.09MB Command terminated by signal 9 Command being timed: "./alloc_forever 1024 5 524288000" User time (seconds): 0.63 System time (seconds): 0.97 Percent of CPU this job got: 7% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.61 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2096536 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 11 Minor (reclaiming a frame) page faults: 524178 Voluntary context switches: 17 Involuntary context switches: 284 Swaps: 0 File system inputs: 336 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 |
|
|
msg261782 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-03-14 22:00 |
I don't understand this issue... I don't see how Python can avoid context switches when writing data to memory... |
|
|
msg261783 - (view) |
Author: Robert Collins (rbcollins) *  |
Date: 2016-03-14 22:21 |
So this test script is making a 65K entry dict, and each item is a new, separate 65K string. The strings are allocated in single chunks, so we should expect couple hundred reference count writes total. AIUI involuntary context switches occur when there is CPU contention. What else do you have running? Notable Python is taking 3.83 seconds of system time, much more than your other tests. You may get some insight from strace -c. |
|
|
msg261795 - (view) |
Author: Damian Myerscough (DamianMyerscough) |
Date: 2016-03-15 02:32 |
@rbcollins there is nothing else running in the Docker container other than the python interpreter. I did an `strace` of the process mmap(NULL, 3149824, PROT_READ|PROT_WRITE, MAP_PRIVATE |
MAP_ANONYMOUS, -1, 0) = 0x7effc2d0c000 munmap(0x7effc300d000, 790528) = 0 @haypo, this only with Python when it is killed by oom-killer. NodeJS/C++ dont show nearly as many involuntary context switches or page faults. I logged the bug to see if anyone else has encountered this. |
|
msg261806 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2016-03-15 09:55 |
Why would it be a Python issue? By construction I doubt the OOM killer lets the process run anything before force-killing it. |
|
|
msg261809 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-03-15 11:31 |
> When oom-killer is trigger IMHO this is your root problem. > When oom-killer is trigger I see a large number of involuntary context switches ... The principle of *involuntarty* context switches is that the application is not responsible for them. > I tried the same test using NodeJS/C++ and I could see a lot less involuntary context switches and major page faults which indicates this could be a Python issue. Your rationale cannot be good. It is not a Python bug. OOM Killer is not a bug but a feature to limit abuse of resources on Linux. But yeah, in my experience, Linux becomes crazy when it reachs OOM Killer. You can try to tune your kernel, but it's much better to always avoid using all memory ;-) |
|
|
msg261836 - (view) |
Author: Damian Myerscough (DamianMyerscough) |
Date: 2016-03-15 23:53 |
Thanks for the feedback, I will continue to dig into this. I know processes go crazy sometimes when OOM killer kicks off, I just wanted to rule out a Python bug or if anyone in the community has seen this before. Thanks |
|
|