msg276924 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-18 22:56 |
Memory leak spotted by the issue #28195: path_converter() calls PyUnicode_AsWideCharString() which allocates a new buffer at each call, but this buffer is never released. On Python 3.5, PyUnicode_AsWideCharString() was used which handles internally the memory buffer and so release the memory later. Attached patch fixes the regression introduced in Python 3.6 beta 1 by the change e20c7d8a8187 ("Issue #27781: Change file system encoding on Windows to UTF-8 (PEP 529)"). |
|
|
msg276926 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-18 22:58 |
I didn't push the fix myself, because Steve Dower maybe made the change for a specific reason? |
|
|
msg276935 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2016-09-19 04:01 |
It's not clear to me that Py_UNICODE is guaranteed to be wchar_t for all time, that's all. If it is, go ahead. Otherwise the path_t object has the ability to clean up after itself, so perhaps it should be used here? |
|
|
msg276939 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-09-19 04:46 |
Deprecated functions and types of the C API ------------------------------------------- The :c:type:`Py_UNICODE` has been deprecated by :pep:`393` and will be removed in Python 4. All functions using this type are deprecated: Unicode functions and methods using :c:type:`Py_UNICODE` and :c:type:`Py_UNICODE*` types: * :c:macro:`PyUnicode_AS_UNICODE`, :c:func:`PyUnicode_AsUnicode`, :c:func:`PyUnicode_AsUnicodeAndSize`: use :c:func:`PyUnicode_AsWideCharString` (From Doc/whatsnew/3.3.rst) |
|
|
msg276957 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-19 08:44 |
Steve Dower added the comment: > It's not clear to me that Py_UNICODE is guaranteed to be wchar_t for all time, that's all. If it is, go ahead. Right now, it's the case and path_converter() leaks memory, so I proposed a simple and obvious fix :-) On Windows, it makes sense to continue to use the UTF-16 encoded string cached in Unicode objects, because this conversion is common, and it's convenient to not have to release the memory explicitly. The side effect is that we may waste memory in some cases. > Otherwise the path_t object has the ability to clean up after itself, so perhaps it should be used here? Maybe, but I'm not interested to write a more complex patch for Windows :-) Try to just call PyMem_Free(path->wide) (do nothing it's NULL). The advantage of the old code (and my patch) is to only convert a filename once to UTF-16 and then use the cached result. |
|
|
msg276959 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-19 08:54 |
Serhiy Storchaka added the comment: > The :c:type:`Py_UNICODE` has been deprecated by :pep:`393` and will be > removed in Python 4. All functions using this type are deprecated: Right... I tried to deprecate and remove all functions using Py_UNICODE but it's hard to change all this code. I gave up :-) > :c:func:`PyUnicode_AsUnicodeAndSize`: use :c:func:`PyUnicode_AsWideCharString` Sadly, it's not exactly the same: PyUnicode_AsWideCharString returns a new fresh buffer at each call. I'm not sure that it caches the result neither. Victor |
|
|
msg276962 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-09-19 09:13 |
path_converter.patch LGTM for 3.6 (and 3.7, 3.8), but we should find better solution for future versions. Could you please add a comment that PyUnicode_AsUnicodeAndSize is deprecated, but used because it is the simplest and the most efficient way for now? |
|
|
msg276965 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-19 09:54 |
Serhiy Storchaka added the comment: > Could you please add a comment that PyUnicode_AsUnicodeAndSize is deprecated, but used because it is the simplest and the most efficient way for now? See old issues #22271 and #22324. |
|
|
msg276966 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-09-19 09:56 |
New changeset 6232e610e310 by Victor Stinner in branch '3.6': Fix memory leak in path_converter() https://hg.python.org/cpython/rev/6232e610e310 |
|
|
msg276967 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-19 09:59 |
> path_converter.patch LGTM for 3.6 (and 3.7, 3.8), Ok, I pushed my simple fix. Feel free to modify the code if you see a better way to encode paths on Windows ;-) But it's a much larger project, and I'm not really interested to jump again in this silly deprecated APIs :-) It seems like the status quo is that Py_UNICODE is going to stay longer than expected and since it "just works", nobody really cares :-) I close this issue since the initial bug (memory leak) is now fixed. |
|
|