Issue 10156: Initialization of globals in unicodeobject.c (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Gregory.Andersen, amaury.forgeotdarc, franck, georg.brandl, lemburg, ncoghlan, pitrou, python-dev, serhiy.storchaka, skrah, stutzbach, vstinner
Priority: critical Keywords: patch

Created on 2010-10-20 17:27 by skrah, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_init_globals.patch skrah,2010-10-22 14:39 review
unicode_init_globals2.patch skrah,2010-10-24 10:21 review
unicode-leak.patch skrah,2011-04-11 07:31 patch by Daniel Stutzbach
unicode_globals-2.7.patch serhiy.storchaka,2013-01-07 11:28 review
unicode_globals-3.2.patch serhiy.storchaka,2013-01-07 11:28 review
unicode_globals-3.3.patch serhiy.storchaka,2013-01-07 11:28 review
unicode_globals-3.4.patch serhiy.storchaka,2013-01-07 11:28 review
unicode_globals-2.7_2.patch serhiy.storchaka,2013-01-24 20:21 review
unicode_globals-3.2_2.patch serhiy.storchaka,2013-01-24 20:21 review
unicode_globals-3.3_2.patch serhiy.storchaka,2013-01-24 20:21
unicode_globals-3.4_2.patch serhiy.storchaka,2013-01-24 20:21
unicode_globals-2.7_3.patch serhiy.storchaka,2013-01-25 19:51 review
unicode_globals-3.2_3.patch serhiy.storchaka,2013-01-25 19:51 review
unicode_globals-3.3_3.patch serhiy.storchaka,2013-01-25 19:51 review
unicode_globals-3.4_3.patch serhiy.storchaka,2013-01-25 19:51 review
Messages (29)
msg119226 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-10-20 17:27
This is one of two remaining "definitely lost" leaks in py3k. It started to appear in r70459. How to reproduce: make distclean && ./configure OPT="-O0 -g" --without-pymalloc && make valgrind --leak-check=full --suppressions=Misc/valgrind-python.supp ./python > VGOUT 2>&1 Then search for 'definitely'. This leak is not present in release-2.7. ==2058== 56 bytes in 1 blocks are definitely lost in loss record 918 of 2,136 ==2058== at 0x4C2412C: malloc (vg_replace_malloc.c:195) ==2058== by 0x4167DE: _PyObject_New (object.c:243) ==2058== by 0x42C278: _PyUnicode_New (unicodeobject.c:341) ==2058== by 0x4306BD: PyUnicodeUCS2_DecodeUTF8Stateful (unicodeobject.c:2100) ==2058== by 0x430671: PyUnicodeUCS2_DecodeUTF8 (unicodeobject.c:2065) ==2058== by 0x42C8F7: PyUnicodeUCS2_FromStringAndSize (unicodeobject.c:541) ==2058== by 0x42C973: PyUnicodeUCS2_FromString (unicodeobject.c:559) ==2058== by 0x50B432: PyDict_SetItemString (dictobject.c:2088) ==2058== by 0x4258DF: PyType_Ready (typeobject.c:3844) ==2058== by 0x517B64: PyStructSequence_InitType (structseq.c:522) ==2058== by 0x4F3B4F: _PyFloat_Init (floatobject.c:1905) ==2058== by 0x4813CE: Py_InitializeEx (pythonrun.c:217)
msg119237 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-10-20 22:10
The stack corresponds to the allocation of type(sys.float_info).__doc__. Why would only this object appear as a memory leak? It is certainly not deallocated, but all other types are in the same situation. For example, sys.int_info is very similar, and happens to be defined in r70459. Why doesn't it appear in the report?
msg119238 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-10-20 22:24
To add to the mystery, the leak disappears if the key value is not interned in PyDict_SetItemString: Index: Objects/dictobject.c =================================================================== --- Objects/dictobject.c (revision 70459) +++ Objects/dictobject.c (working copy) @@ -2088,7 +2088,6 @@ kv = PyUnicode_FromString(key); if (kv == NULL) return -1; - PyUnicode_InternInPlace(&kv); /* XXX Should we really? */ err = PyDict_SetItem(v, kv, item); Py_DECREF(kv); return err;
msg119239 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-10-20 22:29
Stefan Krah wrote: > > Stefan Krah <stefan-usenet@bytereef.org> added the comment: > > To add to the mystery, the leak disappears if the key value is not > interned in PyDict_SetItemString: I'm not sure how you determine what is a leak and what not. Interned Unicode objects stay alive until the interpreter is finalized. Are you suggesting that the finalization does not free the interned Unicode strings or not all of them ? > Index: Objects/dictobject.c > =================================================================== > --- Objects/dictobject.c (revision 70459) > +++ Objects/dictobject.c (working copy) > @@ -2088,7 +2088,6 @@ > kv = PyUnicode_FromString(key); > if (kv == NULL) > return -1; > - PyUnicode_InternInPlace(&kv); /* XXX Should we really? */ > err = PyDict_SetItem(v, kv, item); > Py_DECREF(kv); > return err;
msg119241 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-10-20 22:47
Marc-Andre Lemburg <report@bugs.python.org> wrote: > I'm not sure how you determine what is a leak and what not. > Interned Unicode objects stay alive until the interpreter > is finalized. > > Are you suggesting that the finalization does not free the > interned Unicode strings or not all of them ? No, Valgrind's "definitely lost" category means that all pointers to an allocated region have been lost, so it would not be possible to free the area. [1] There are hundreds of "possibly lost" warnings as well, but I did not report those. My experience is that Valgrind is usually correct with "definitely lost", see e.g. #10153. That said, of course it _could_ be a false alarm. [1] Last category from: http://mail.python.org/pipermail/python-dev/2002-October/029758.html
msg119356 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-10-22 00:03
Re disabling interning in PyDict_SetItemString: A comment in unicodeobject.c says that globals should not be used before calling _PyUnicode_Init. But in Py_InitializeEx (pythonrun.c) _PyUnicode_Init is called after _Py_ReadyTypes, _PyFrame_Init, _PyLong_Init, PyByteArray_Init and _PyFloat_Init. In fact, when I move _PyUnicode_Init up, the error concerning _PyFloat_Init disappears. Problem is, PyType_Ready also uses PyDict_SetItemString, but I presume that _Py_ReadyTypes has to be called before anything else. In that case it would be unavoidable that PyDict_SetItemString is used before _PyUnicode_Init, and it might be a good idea to disable interning.
msg119387 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-10-22 14:39
I've verified the leak manually. The cause is that global variables in unicodeobject.c, e.g. free_list, are used before _PyUnicode_Init() is called. Later on _PyUnicode_Init() sets these variables to NULL, losing the allocated memory. Here is an example of the earliest use of free_list during _Py_ReadyTypes (), well before _PyUnicode_Init(): Breakpoint 1, unicode_dealloc (unicode=0x1b044c0) at Objects/unicodeobject.c:392 392 switch (PyUnicode_CHECK_INTERNED(unicode)) { (gdb) bt #0 unicode_dealloc (unicode=0x1b044c0) at Objects/unicodeobject.c:392 #1 0x000000000044fc69 in PyUnicode_InternInPlace (p=0x7fff303852b8) at Objects/unicodeobject.c:9991 #2 0x000000000044fed3 in PyUnicode_InternFromString (cp=0x568861 "__len__") at Objects/unicodeobject.c:10025 #3 0x00000000004344d0 in init_slotdefs () at Objects/typeobject.c:5751 #4 0x0000000000434840 in add_operators (type=0x7be260) at Objects/typeobject.c:5905 #5 0x000000000042eec8 in PyType_Ready (type=0x7be260) at Objects/typeobject.c:3810 #6 0x000000000042edfc in PyType_Ready (type=0x7bde60) at Objects/typeobject.c:3774 #7 0x000000000041aa5f in _Py_ReadyTypes () at Objects/object.c:1514 #8 0x00000000004992ff in Py_InitializeEx (install_sigs=1) at Python/pythonrun.c:232 #9 0x000000000049957f in Py_Initialize () at Python/pythonrun.c:321 #10 0x00000000004b289f in Py_Main (argc=1, argv=0x1afa010) at Modules/main.c:590 #11 0x0000000000417dcc in main (argc=1, argv=0x7fff30385758) at ./Modules/python.c:59 (gdb) n 411 if (PyUnicode_CheckExact(unicode) && (gdb) 414 if (unicode->length >= KEEPALIVE_SIZE_LIMIT) { (gdb) 419 if (unicode->defenc) { (gdb) 423 *(PyUnicodeObject **)unicode = free_list; (gdb) n 424 free_list = unicode; (gdb) n 425 numfree++; (gdb) n 411 if (PyUnicode_CheckExact(unicode) && A possible fix could be to initialize the globals right at the start in main.c. Note that there are still several Unicode API functions in main.c before PyType_Ready has been called on the Unicode type. With the patch, Valgrind does not show the leak any longer.
msg119396 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-10-22 18:04
About the patch: why should _PyUnicode_Init() try to call _PyUnicode_InitGlobals() again?
msg119503 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-10-24 10:21
> why should _PyUnicode_Init() try to call _PyUnicode_InitGlobals() again? For the embedding scenario (when only Py_Initialize() is called) I wanted to preserve the old behavior of _PyUnicode_Init(). But this is not really enough. I wrote a new patch that also calls _PyUnicode_InitGlobals() at the beginning of Py_Initialize(). I don't like the fact that even more clutter is added to Py_Main(). Perhaps Py_Initialize() could be moved up or the Unicode functions could be moved down.
msg133502 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-04-11 07:31
[Merging with issue 11402] Daniel's patch is much simpler, but I think that unicode_empty and unicode_latin1 would need to be protected before _PyUnicode_Init is called. Is the module initialization procedure documented somewhere? I get the impression that unicodeobject.c depends on dict.c and dict.c depends on unicodeobject.c.
msg133504 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-04-11 07:36
Stefan Krah <report@bugs.python.org> wrote: > Is the module initialization procedure documented somewhere? I get > the impression that unicodeobject.c depends on dict.c and dict.c > depends on unicodeobject.c. s/dict.c/dictobject.c/
msg144749 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-10-01 20:59
The PEP-393 changes apparently fix this leak; at least I can't reproduce it in default any longer (but still in 3.2).
msg172103 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-10-05 17:24
See also #16143.
msg179221 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-06 20:33
Daniel's patch looks good for me. 2.7 looks affected too.
msg179223 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-06 20:43
unicode-leak.patch doesn't fix #16143 though. unicode_empty and unicode_latin1 need to be initialized, too. Actually we could close this in favor of #16143.
msg179234 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-06 22:29
> unicode-leak.patch doesn't fix #16143 though. unicode_empty and > unicode_latin1 need to be initialized, too. Indeed. I'll upload patches tomorrow.
msg179256 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-07 11:28
Here are patches for all four Python versions. They fixes possible usage of the followed non-initialized global variables: free_list, numfree, interned, unicode_empty, static_strings, unicode_latin1, bloom_linebreak, unicode_default_encoding.
msg179494 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-09 22:54
Nick, I'm adding you to the nosy list since this issue seems related to PEP 432. Quick summary: Globals are used in unicodeobject.c before they are initialized. Also, Unicode objects are created before PyType_Ready(&PyUnicode_Type) has been called. This happens during startup: _Py_InitializeEx_Private(): _Py_ReadyTypes(): PyType_Ready(&PyType_Type); [...] Many Unicode objects like "" or "__add__" are created. Uninitialized globals have led to a crash (#16143). This is fixed by Serhiy's patch, which always dynamically checks all globals for NULL before using them. However, Unicode objects are still created at this point. [...] PyType_Ready(&PyUnicode_Type); /* Called for the first time */ [...] _PyUnicode_Init: for (i = 0; i < 256; i++) /* Could leak if latin1 strings unicode_latin1[i] = NULL; have already been created. */ PyType_Ready(&PyUnicode_Type); /* Called a second time! */ So, considering PEP 432: Are these "pre-type-ready" Unicode objects safe to use, or should something be done about it?
msg179498 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013-01-09 23:10
There should still be a check in tp_new (IIRC) that calls PyType_Ready on unready types. While doing something systematic about this kind of problem is part of the rationale of PEP 432, that won't help earlier versions.
msg179504 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-10 00:21
Nick Coghlan <report@bugs.python.org> wrote: > There should still be a check in tp_new (IIRC) that calls PyType_Ready on > unready types. Indeed there is one in type_new(), but that isn't used here AFAICS. If you apply this patch and start up python, there are many "str: not ready" instances: diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c --- a/Objects/unicodeobject.c +++ b/Objects/unicodeobject.c @@ -14282,6 +14282,10 @@ PyUnicode_InternFromString(const char *cp) { PyObject *s = PyUnicode_FromString(cp); + + fprintf(stderr, "%s: %s\n", PyUnicode_Type.tp_name, + (PyUnicode_Type.tp_flags & Py_TPFLAGS_READY) ? "ready" : "not ready"); + if (s == NULL) return NULL; PyUnicode_InternInPlace(&s);
msg180546 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-24 20:21
There is a set of updated patches.
msg180547 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013-01-24 20:42
Serhiy's general approach here looks good to me (although there seem to be some unrelated changes to the re module in the current 3.2 patch). For PEP 432, I want to try to rearrange things so that _PyUnicode_Init is one of the *first* calls made in Py_BeginInitialization (even before the general call to Py_ReadyTypes), but that still won't invalidate the work done here.
msg180573 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-25 12:29
Since Rietveld didn't mail me this time: I left some comments on the 2.7 patch.
msg180579 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-25 13:40
The 2.7 comments also apply to the 3.2 patch. Otherwise the 3.2 patch (without the _sre changes :) looks good to me.
msg180617 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-25 19:51
> The 2.7 comments also apply to the 3.2 patch. Otherwise the 3.2 patch > (without the _sre changes :) looks good to me. Patches updated addressing new Stefan's comments. Unicode globals no longer reinitialized in _PyUnicode_Init(). Note that I have added a consistency check into the macro in 3.3+. I hope Rietveld will accept this set.
msg180623 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-25 21:12
Nice. I think the latest patches are commit-ready.
msg180652 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-26 10:21
New changeset 7c8ad0d02664 by Serhiy Storchaka in branch '2.7': Issue #10156: In the interpreter's initialization phase, unicode globals http://hg.python.org/cpython/rev/7c8ad0d02664 New changeset f7eda8165e6f by Serhiy Storchaka in branch '3.2': Issue #10156: In the interpreter's initialization phase, unicode globals http://hg.python.org/cpython/rev/f7eda8165e6f New changeset 01d4dd412581 by Serhiy Storchaka in branch '3.3': Issue #10156: In the interpreter's initialization phase, unicode globals http://hg.python.org/cpython/rev/01d4dd412581 New changeset cb12d642eed2 by Serhiy Storchaka in branch 'default': Issue #10156: In the interpreter's initialization phase, unicode globals http://hg.python.org/cpython/rev/cb12d642eed2
msg180654 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-26 10:33
Committed. Thank you for review, Stefan. Close this issue if the work is finished.
msg180721 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-26 23:21
Buildbots etc. look all good. Thanks for fixing this.
History
Date User Action Args
2022-04-11 14:57:07 admin set github: 54365
2013-01-26 23:21:43 skrah set resolution: fixedstage: patch review -> resolved
2013-01-26 23:21:26 skrah set status: open -> closed
2013-01-26 23:21:19 skrah set messages: +
2013-01-26 10:33:56 serhiy.storchaka set messages: +
2013-01-26 10:21:50 python-dev set nosy: + python-devmessages: +
2013-01-25 21:12:47 skrah set messages: +
2013-01-25 19:51:33 serhiy.storchaka set files: + unicode_globals-2.7_3.patch, unicode_globals-3.2_3.patch, unicode_globals-3.3_3.patch, unicode_globals-3.4_3.patchmessages: +
2013-01-25 13:40:39 skrah set messages: +
2013-01-25 12:29:15 skrah set messages: + versions: + Python 3.3, Python 3.4
2013-01-24 20:42:08 ncoghlan set messages: +
2013-01-24 20:21:09 serhiy.storchaka set files: + unicode_globals-2.7_2.patch, unicode_globals-3.2_2.patch, unicode_globals-3.3_2.patch, unicode_globals-3.4_2.patchmessages: +
2013-01-10 00:21:25 skrah set messages: +
2013-01-09 23:10:52 ncoghlan set messages: +
2013-01-09 22:54:13 skrah set nosy: + ncoghlanmessages: +
2013-01-07 23:30:53 skrah set priority: high -> critical
2013-01-07 23:30:38 skrah set nosy: + georg.brandl, pitrou, franck, Gregory.Andersen
2013-01-07 23:29:13 skrah link issue16143 superseder
2013-01-07 11:34:18 serhiy.storchaka set stage: commit review -> patch review
2013-01-07 11:28:43 serhiy.storchaka set files: + unicode_globals-2.7.patch, unicode_globals-3.2.patch, unicode_globals-3.3.patch, unicode_globals-3.4.patchmessages: +
2013-01-06 22:29:46 serhiy.storchaka set messages: +
2013-01-06 20:43:48 skrah set messages: +
2013-01-06 20:33:36 serhiy.storchaka set versions: + Python 2.7, - Python 3.3, Python 3.4nosy: + serhiy.storchakamessages: + stage: patch review -> commit review
2013-01-04 23:41:18 Arfrever set nosy: + Arfrever
2012-10-05 17:24:52 skrah set messages: + versions: + Python 3.3, Python 3.4
2012-04-20 17:58:50 mark.dickinson set nosy: - mark.dickinson
2011-10-01 20:59:19 skrah set messages: +
2011-04-11 07:36:29 skrah set messages: +
2011-04-11 07:32:27 skrah link issue11402 superseder
2011-04-11 07:31:07 skrah set files: + unicode-leak.patchnosy: + stutzbachmessages: +
2010-10-24 10:21:44 skrah set files: + unicode_init_globals2.patchmessages: +
2010-10-22 18:04:51 amaury.forgeotdarc set messages: +
2010-10-22 14:39:03 skrah set files: + unicode_init_globals.patchpriority: normal -> hightitle: Memory leak (r70459) -> Initialization of globals in unicodeobject.cmessages: + keywords: + patchstage: patch review
2010-10-22 00:03:30 skrah set messages: +
2010-10-20 22:47:11 skrah set messages: +
2010-10-20 22:29:25 lemburg set nosy: + lemburgmessages: +
2010-10-20 22:24:43 skrah set messages: +
2010-10-20 22:10:02 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2010-10-20 17:58:55 belopolsky set nosy: + vstinner
2010-10-20 17:27:05 skrah create