Issue 4071: ntpath.abspath fails for long str paths (original) (raw)

Created on 2008-10-08 00:27 by JDay, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fix_getfullpathname.patch ocean-city,2008-10-09 18:37 review
quick_hack_for_getfullpathname_v2.patch ocean-city,2008-10-09 22:41 1st PyArg_ParseTuple: "et#" => "S" review
Messages (22)
msg74502 - (view) Author: Jason Day (JDay) Date: 2008-10-08 00:27
On my system (Windows Server 2008 SP1 - 64-bit, Python 2.5.2 - 32-bit), simple actions like: >>> help(help) # Or any function or >>> import tempfile >>> f = tempfile.mktemp() result in this (rather confusing) error: TypeError: _getfullpathname() argument 1 must be (buffer overflow), not str Apparently, _getfullpathname() chokes on certain paths if they are not supplied as unicode. Locally, I was able to work around the issue by changing the call to _getfullpathname in ntpath.abspath to: path = str(_getfullpathname(unicode(path)))
msg74504 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-08 00:42
The (buffer overflow) message indicates that the argument is too long to be converted. In your case, it seems that the path is longer than MAX_PATH=255 characters. What is the value of the argument of _getfullpathname()?
msg74508 - (view) Author: Jason Day (JDay) Date: 2008-10-08 01:21
Running help() or mktemp() causes _getfullpathname to be called with the whole system path (791 characters). If you pass that to _getfullpathname as str it throws the aforementioned TypeError. If it's passed as unicode, it returns an empty string. The offending _getfullpathname call occurs on the first call to one of these methods. Future calls to either do not call it (unless, of course, the first failed).
msg74510 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-10-08 02:23
According to http://msdn.microsoft.com/en-us/library/aa364963.aspx, current implementation have several problems. >In the ANSI version of this function, the name is limited to MAX_PATH >characters. To extend this limit to 32,767 wide characters, call the >Unicode version of the function and prepend "\\?\" to the path. :-( >If the lpBuffer buffer is too small to contain the path, the return >value is the size, in TCHARs, of the buffer that is required to hold >the path and the terminating null character. We should allocate new buffer with this size and retry like already doing for GetCurrentDirectory. And one decision problem... What should we do when too long str is passed to ntpath._getfullpathname? Report overflow error? Or convert to unicode and retry with GetFullPathNameW? Hmm....
msg74519 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-08 08:03
> Running help() or mktemp() causes _getfullpathname to be called > with the whole system path (791 characters) I am not sure to understand. Do you mean the whole PATH environment variable? I doubt that it is passed to _getfullpathname. Or do you have very long paths for one directory? the TEMP environment variable, for example? I'd be curious to see its value.
msg74522 - (view) Author: Jason Day (JDay) Date: 2008-10-08 09:07
> I am not sure to understand. Do you mean the whole PATH environment > variable? I doubt that it is passed to _getfullpathname. > Or do you have very long paths for one directory? the TEMP environment > variable, for example? I'd be curious to see its value. I don't have it offhand, but it was the whole PATH environment variable, complete with semicolons. That's probably the *real* bug. Whatever was passing that into abspath didn't seem to mind getting back an empty string (although that may have been further processed in the function, I didn't follow past the call to _getfullpathname). > And one decision problem... What should we do when too long str is > passed to ntpath._getfullpathname? Report overflow error? Or convert to > unicode and retry with GetFullPathNameW? Hmm.... abspath should be able to be called with str or unicode of arbitrary lengths. Consumers of it shouldn't have to be concerned with the platform implementation when it can be smoothed over by the module. Whether this is done in abspath or _getfullpathname probably isn't too important, since end-users generally shouldn't be calling _getfullpathname, directly.
msg74530 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-08 14:49
> I don't have it offhand, but it was the whole PATH environment > variable, complete with semicolons. That's probably the *real* bug. Indeed. Do you happen to have the complete traceback of the failing tempfile.mktemp() call? I don't see where it can use the PATH environment variable.
msg74549 - (view) Author: Jason Day (JDay) Date: 2008-10-08 22:00
> Indeed. Do you happen to have the complete traceback of the failing > tempfile.mktemp() call? I don't see where it can use the PATH > environment variable. The problem was that somehow, on our systems, the TEMP environmental variable had been copied over with PATH. Most likely some batch file tried to store a copy of PATH, without realizing the significance of TEMP. [groan] Anyway, I still think that it's a bug that abspath() can't be called with a perfectly good str path, because of limitations with the windows api. I edited the bug title to reflect the actual bug. The str path length could be checked and upgraded to the Unicode version, if necessary (or try again with the unicode version, in the case of an exception). I think it's important to ensure that when abspath() is called with str, it returns str, even if it was upgraded to the unicode call.
msg74593 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-10-09 18:37
I think attached patch "fix_getfullpathname.patch" will fix unicode issue at least. For ansi issue, I followed the manner of win32_chdir for now. After some investigation, GetFullPathNameA fails if output size is more than MAX_PATH even if input size is less than MAX_PATH. I feel it's difficult check this before invoking GetFullPathNameA. This is test for unicode issue. ///////////////////////////////////////////////////////// import unittest import ntpath import os class TestCase(unittest.TestCase): def test_getfullpathname(self): for count in xrange(1, 1000): name = u"x" * count path = ntpath._getfullpathname(name) self.assertEqual(os.path.basename(path), name) if __name__ == '__main__': unittest.main()
msg74594 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-10-09 18:39
>I feel it's difficult to check this before invoking GetFullPathNameA. And error number via GetLastError() is vogus, sometimes 0, sometimes others.
msg74607 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-10-09 21:33
Or, if PyArg_ParseTuple overflowed or GetFullPathNameA failed, (not check GetLastError() because it's vogus) try GetFullPathNameW like attached file "quick_hack_for_getfullpathname.patch". This inverses flow if (unicode_file_names()) { /* unicode */ } /* ascii */ # Maybe it would be nice if convert_to_unicode() functionality is built in PyArg_ParseTuple. (inverse of "et") Be care, this is quick hack, so maybe buggy. I confirmed test_os and test_ntpath passed though. ///////////////////////////////////////////////////// import unittest import ntpath import os class TestCase(unittest.TestCase): def test_getfullpathname(self): for c in ('x', u'x'): for count in xrange(1, 1000): name = c * count path = ntpath._getfullpathname(name) self.assertEqual(os.path.basename(path), name) if __name__ == '__main__': unittest.main()
msg75628 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-11-08 03:49
Fixed unicode issue in r67154(trunk). I'm not sure how to handle long str, so I didn't touch it for now.
msg77500 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 08:38
ocean-city, can you please backport this to the 2.5 branch?
msg77501 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 08:39
As a follow-up: please don't forget to add Misc/NEWS entries (both for the already-committed patch, as well as for the backport).
msg77523 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-12-10 09:28
>can you please backport this to the 2.5 branch? Sorry for clarification. Which should I backport r67154 or quick_hack_for_getfullpathname_v2.patch?
msg77737 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-13 14:53
As it is apparently not clear what change exactly needs to be applied to 2.5, I'm declaring that this fix will not be backported. I would appreciate if somebody could summarize what still needs to be done, or (if nothing needs to be done), close this issue.
msg77799 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-12-14 14:59
GetFullPathNameW may return the required buffer size (non-zero value) when buffer is too short. Before r67154, this case was treated as success, so there was possibility of access to uninitialized buffer woutbuf. Fortunately, GetFullPathNameW sets '\0' to woutbuf (this is undocumented behavior), so abspath() returned empty string instead of segmentation fault. But this is still potentially dangerous, so I fixed this by allocating required size buffer and calling GetFullPathNameW again. abspath() returns correct result for any long unicode path now. But original poster hopes abspath() should return correct result for any long both *str* and unicode path. For str, this issue is not solved yet. -- I'm skeptical abspath() should support longer path than MAX_PATH if ANSI version of Windows API cannot handle such path. Anyway, I won't use such long path. ;-)
msg175113 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-07 17:50
I doubt this issue exists on Python >= 3.2. See also .
msg220007 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-08 00:46
I don't see that any further work can be done here owing to limitations of the Windows str API. Note that the same argument can be applied to .
msg236972 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-03-01 18:29
Can we close this as I'm not aware of any possible way to fix this? See also .
msg237007 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-03-02 02:47
> Can we close this as I'm not aware of any possible way to fix this? Windows system and C runtime calls that take paths could be restricted to wide-character APIs, such as calling GetFullPathnameW in this case, or _wexecve instead of execve (issue 23462). Then for bytes paths an extension can call PyUnicode_FSDecoder (PyUnicode_DecodeMBCS). In posxmodule.c this can be handled in the path_converter function. path_converter https://hg.python.org/cpython/file/5d4b6a57d5fd/Modules/posixmodule.c#l698 path_converter could be moved to Python/fileutils.c to make it available for use by other modules such as io.
msg338059 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-03-16 07:32
I'm closing this as a resolved issue. Python 2 is approaching end of life, and I don't see a pressing need for my suggestion in . We should be using unicode for paths in Windows anyway since its file systems are natively Unicode.
History
Date User Action Args
2022-04-11 14:56:40 admin set github: 48321
2019-03-16 07:32:12 eryksun set status: open -> closedmessages: + stage: resolved
2019-03-15 22:34:26 BreamoreBoy set nosy: - BreamoreBoy
2015-03-02 02:47:22 eryksun set nosy: + eryksunmessages: +
2015-03-01 18:29:10 BreamoreBoy set nosy: + tim.golden, zach.ware, steve.dowermessages: +
2014-06-08 00:46:57 BreamoreBoy set nosy: + BreamoreBoymessages: +
2012-12-20 20:06:01 serhiy.storchaka set type: crash -> behavior
2012-11-07 17:50:36 serhiy.storchaka set nosy: + serhiy.storchakamessages: + versions: - Python 3.2, Python 3.3, Python 3.4
2012-10-02 05:54:06 ezio.melotti set versions: + Python 3.2, Python 3.3, Python 3.4, - Python 2.6, Python 3.0
2008-12-14 14:59:25 ocean-city set messages: +
2008-12-13 14:53:57 loewis set priority: release blocker -> normalmessages: + versions: - Python 2.5, Python 2.5.3
2008-12-10 09:28:38 ocean-city set messages: +
2008-12-10 08:39:28 loewis set messages: +
2008-12-10 08:38:38 loewis set priority: release blockerassignee: ocean-citymessages: + resolution: acceptednosy: + loewis
2008-11-08 03:49:44 ocean-city set messages: + versions: + Python 2.7, Python 2.5.3
2008-10-09 22:41:33 ocean-city set files: - quick_hack_for_getfullpathname.patch
2008-10-09 22:41:22 ocean-city set files: + quick_hack_for_getfullpathname_v2.patch
2008-10-09 21:33:24 ocean-city set files: + quick_hack_for_getfullpathname.patchmessages: +
2008-10-09 18:39:53 ocean-city set messages: +
2008-10-09 18:37:38 ocean-city set files: + fix_getfullpathname.patchkeywords: + patchmessages: + versions: + Python 2.6, Python 3.0
2008-10-09 07:02:03 JDay set title: ntpath.abspath can fail on Win Server 2008 (64-bit) -> ntpath.abspath fails for long str paths
2008-10-08 22:00:12 JDay set messages: + title: ntpath.abspath fails for long str paths -> ntpath.abspath can fail on Win Server 2008 (64-bit)
2008-10-08 21:57:09 JDay set title: ntpath.abspath can fail on Win Server 2008 (64-bit) -> ntpath.abspath fails for long str paths
2008-10-08 14:49:48 amaury.forgeotdarc set messages: +
2008-10-08 09:07:06 JDay set messages: +
2008-10-08 08:03:41 amaury.forgeotdarc set messages: +
2008-10-08 02:23:07 ocean-city set nosy: + ocean-citymessages: +
2008-10-08 01:21:30 JDay set messages: +
2008-10-08 00:42:55 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2008-10-08 00:27:14 JDay create