msg74502 - (view) |
Author: Jason Day (JDay) |
Date: 2008-10-08 00:27 |
On my system (Windows Server 2008 SP1 - 64-bit, Python 2.5.2 - 32-bit), simple actions like: >>> help(help) # Or any function or >>> import tempfile >>> f = tempfile.mktemp() result in this (rather confusing) error: TypeError: _getfullpathname() argument 1 must be (buffer overflow), not str Apparently, _getfullpathname() chokes on certain paths if they are not supplied as unicode. Locally, I was able to work around the issue by changing the call to _getfullpathname in ntpath.abspath to: path = str(_getfullpathname(unicode(path))) |
|
|
msg74504 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-10-08 00:42 |
The (buffer overflow) message indicates that the argument is too long to be converted. In your case, it seems that the path is longer than MAX_PATH=255 characters. What is the value of the argument of _getfullpathname()? |
|
|
msg74508 - (view) |
Author: Jason Day (JDay) |
Date: 2008-10-08 01:21 |
Running help() or mktemp() causes _getfullpathname to be called with the whole system path (791 characters). If you pass that to _getfullpathname as str it throws the aforementioned TypeError. If it's passed as unicode, it returns an empty string. The offending _getfullpathname call occurs on the first call to one of these methods. Future calls to either do not call it (unless, of course, the first failed). |
|
|
msg74510 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2008-10-08 02:23 |
According to http://msdn.microsoft.com/en-us/library/aa364963.aspx, current implementation have several problems. >In the ANSI version of this function, the name is limited to MAX_PATH >characters. To extend this limit to 32,767 wide characters, call the >Unicode version of the function and prepend "\\?\" to the path. :-( >If the lpBuffer buffer is too small to contain the path, the return >value is the size, in TCHARs, of the buffer that is required to hold >the path and the terminating null character. We should allocate new buffer with this size and retry like already doing for GetCurrentDirectory. And one decision problem... What should we do when too long str is passed to ntpath._getfullpathname? Report overflow error? Or convert to unicode and retry with GetFullPathNameW? Hmm.... |
|
|
msg74519 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-10-08 08:03 |
> Running help() or mktemp() causes _getfullpathname to be called > with the whole system path (791 characters) I am not sure to understand. Do you mean the whole PATH environment variable? I doubt that it is passed to _getfullpathname. Or do you have very long paths for one directory? the TEMP environment variable, for example? I'd be curious to see its value. |
|
|
msg74522 - (view) |
Author: Jason Day (JDay) |
Date: 2008-10-08 09:07 |
> I am not sure to understand. Do you mean the whole PATH environment > variable? I doubt that it is passed to _getfullpathname. > Or do you have very long paths for one directory? the TEMP environment > variable, for example? I'd be curious to see its value. I don't have it offhand, but it was the whole PATH environment variable, complete with semicolons. That's probably the *real* bug. Whatever was passing that into abspath didn't seem to mind getting back an empty string (although that may have been further processed in the function, I didn't follow past the call to _getfullpathname). > And one decision problem... What should we do when too long str is > passed to ntpath._getfullpathname? Report overflow error? Or convert to > unicode and retry with GetFullPathNameW? Hmm.... abspath should be able to be called with str or unicode of arbitrary lengths. Consumers of it shouldn't have to be concerned with the platform implementation when it can be smoothed over by the module. Whether this is done in abspath or _getfullpathname probably isn't too important, since end-users generally shouldn't be calling _getfullpathname, directly. |
|
|
msg74530 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-10-08 14:49 |
> I don't have it offhand, but it was the whole PATH environment > variable, complete with semicolons. That's probably the *real* bug. Indeed. Do you happen to have the complete traceback of the failing tempfile.mktemp() call? I don't see where it can use the PATH environment variable. |
|
|
msg74549 - (view) |
Author: Jason Day (JDay) |
Date: 2008-10-08 22:00 |
> Indeed. Do you happen to have the complete traceback of the failing > tempfile.mktemp() call? I don't see where it can use the PATH > environment variable. The problem was that somehow, on our systems, the TEMP environmental variable had been copied over with PATH. Most likely some batch file tried to store a copy of PATH, without realizing the significance of TEMP. [groan] Anyway, I still think that it's a bug that abspath() can't be called with a perfectly good str path, because of limitations with the windows api. I edited the bug title to reflect the actual bug. The str path length could be checked and upgraded to the Unicode version, if necessary (or try again with the unicode version, in the case of an exception). I think it's important to ensure that when abspath() is called with str, it returns str, even if it was upgraded to the unicode call. |
|
|
msg74593 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2008-10-09 18:37 |
I think attached patch "fix_getfullpathname.patch" will fix unicode issue at least. For ansi issue, I followed the manner of win32_chdir for now. After some investigation, GetFullPathNameA fails if output size is more than MAX_PATH even if input size is less than MAX_PATH. I feel it's difficult check this before invoking GetFullPathNameA. This is test for unicode issue. ///////////////////////////////////////////////////////// import unittest import ntpath import os class TestCase(unittest.TestCase): def test_getfullpathname(self): for count in xrange(1, 1000): name = u"x" * count path = ntpath._getfullpathname(name) self.assertEqual(os.path.basename(path), name) if __name__ == '__main__': unittest.main() |
|
|
msg74594 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2008-10-09 18:39 |
>I feel it's difficult to check this before invoking GetFullPathNameA. And error number via GetLastError() is vogus, sometimes 0, sometimes others. |
|
|
msg74607 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2008-10-09 21:33 |
Or, if PyArg_ParseTuple overflowed or GetFullPathNameA failed, (not check GetLastError() because it's vogus) try GetFullPathNameW like attached file "quick_hack_for_getfullpathname.patch". This inverses flow if (unicode_file_names()) { /* unicode */ } /* ascii */ # Maybe it would be nice if convert_to_unicode() functionality is built in PyArg_ParseTuple. (inverse of "et") Be care, this is quick hack, so maybe buggy. I confirmed test_os and test_ntpath passed though. ///////////////////////////////////////////////////// import unittest import ntpath import os class TestCase(unittest.TestCase): def test_getfullpathname(self): for c in ('x', u'x'): for count in xrange(1, 1000): name = c * count path = ntpath._getfullpathname(name) self.assertEqual(os.path.basename(path), name) if __name__ == '__main__': unittest.main() |
|
|
msg75628 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2008-11-08 03:49 |
Fixed unicode issue in r67154(trunk). I'm not sure how to handle long str, so I didn't touch it for now. |
|
|
msg77500 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-12-10 08:38 |
ocean-city, can you please backport this to the 2.5 branch? |
|
|
msg77501 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-12-10 08:39 |
As a follow-up: please don't forget to add Misc/NEWS entries (both for the already-committed patch, as well as for the backport). |
|
|
msg77523 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2008-12-10 09:28 |
>can you please backport this to the 2.5 branch? Sorry for clarification. Which should I backport r67154 or quick_hack_for_getfullpathname_v2.patch? |
|
|
msg77737 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-12-13 14:53 |
As it is apparently not clear what change exactly needs to be applied to 2.5, I'm declaring that this fix will not be backported. I would appreciate if somebody could summarize what still needs to be done, or (if nothing needs to be done), close this issue. |
|
|
msg77799 - (view) |
Author: Hirokazu Yamamoto (ocean-city) *  |
Date: 2008-12-14 14:59 |
GetFullPathNameW may return the required buffer size (non-zero value) when buffer is too short. Before r67154, this case was treated as success, so there was possibility of access to uninitialized buffer woutbuf. Fortunately, GetFullPathNameW sets '\0' to woutbuf (this is undocumented behavior), so abspath() returned empty string instead of segmentation fault. But this is still potentially dangerous, so I fixed this by allocating required size buffer and calling GetFullPathNameW again. abspath() returns correct result for any long unicode path now. But original poster hopes abspath() should return correct result for any long both *str* and unicode path. For str, this issue is not solved yet. -- I'm skeptical abspath() should support longer path than MAX_PATH if ANSI version of Windows API cannot handle such path. Anyway, I won't use such long path. ;-) |
|
|
msg175113 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-11-07 17:50 |
I doubt this issue exists on Python >= 3.2. See also . |
|
|
msg220007 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-06-08 00:46 |
I don't see that any further work can be done here owing to limitations of the Windows str API. Note that the same argument can be applied to . |
|
|
msg236972 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2015-03-01 18:29 |
Can we close this as I'm not aware of any possible way to fix this? See also . |
|
|
msg237007 - (view) |
Author: Eryk Sun (eryksun) *  |
Date: 2015-03-02 02:47 |
> Can we close this as I'm not aware of any possible way to fix this? Windows system and C runtime calls that take paths could be restricted to wide-character APIs, such as calling GetFullPathnameW in this case, or _wexecve instead of execve (issue 23462). Then for bytes paths an extension can call PyUnicode_FSDecoder (PyUnicode_DecodeMBCS). In posxmodule.c this can be handled in the path_converter function. path_converter https://hg.python.org/cpython/file/5d4b6a57d5fd/Modules/posixmodule.c#l698 path_converter could be moved to Python/fileutils.c to make it available for use by other modules such as io. |
|
|
msg338059 - (view) |
Author: Eryk Sun (eryksun) *  |
Date: 2019-03-16 07:32 |
I'm closing this as a resolved issue. Python 2 is approaching end of life, and I don't see a pressing need for my suggestion in . We should be using unicode for paths in Windows anyway since its file systems are natively Unicode. |
|
|