Issue 12801: C realpath not used by os.path.realpath (original) (raw)

Issue12801

Created on 2011-08-20 23:47 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
i12801.patch rosslagerwall,2011-08-21 05:04
realpath.path vstinner,2012-02-21 23:51 review
Messages (20)
msg142583 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-20 23:47
We have our own quirky implementation of C realpath() in posixpath.py. It would probably be simpler, safer and more efficient to simply call the POSIX function instead (but it most be exposed somewhere, by posixmodule.c I suppose).
msg142587 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-20 23:57
The Python implementation (os.path.realpath) was introduced 10 years ago by the issue #461781. The Python implemtation has known bugs: - #9949: os.path.realpath on Windows does not follow symbolic links - #11397: os.path.realpath() may produce incorrect results See also the issue #1298813: _Py_wrealpath() must use canonicalize_file_name() if available.
msg142591 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-08-21 05:04
An initial patch. The first problem is that before, os.path.realpath('') == os.path.realpath('.') but this is not true for realpath(3). Thus, the test suite does not run because it relies on this behaviour.
msg142619 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-08-21 13:53
+ Return a string representing the canonicalized pathname of *path*. This sounds a bit strange to me; “Return a string representing the canonical version of *path*”. Would it be a good idea to use the full docstring of posixpath.realpath to os.realpath and then just do “realpath = os.realpath” in posixpath?
msg142909 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-08-24 20:11
> It would probably be simpler, safer and more efficient to simply call > the POSIX function instead (but it most be exposed somewhere, by > posixmodule.c I suppose). Indeed. > Would it be a good idea to use the full docstring of posixpath.realpath > to os.realpath and then just do “realpath = os.realpath” in posixpath? Yes. That would also be a way to add the '.' default argument. > An initial patch. I'm actually concerned because realpath doesn't seem as portable as I first thought. Here's what my man page says: """ OSIX.1-2001 says that the behavior if resolved_path is NULL is implementation-defined. POSIX.1-2008 specifies the behavior described in this page. """ So I'm not sure that all implementations support passing a NULL buffer as argument. Also: """ Solaris may return a relative pathname when the path argument is relative. """ Well, I guess we can also commit this patch and see if a buildbot breaks...
msg142913 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-24 20:18
i12801.patch is not correct: on Windows, you should never encode a filename to bytes. Use PyUnicode_Decode() + _Py_wrealpath(), or a #ifdef (as many other posixmodule functions). I prefer to reuse _Py_wrealpath because we will have to add special cases: realpath(NULL) and also maybe canonicalize_file_name().
msg142925 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-24 21:22
Patch to get #ifdef REALPATH_SUPPORT_NULL: diff --git a/configure.in b/configure.in --- a/configure.in +++ b/configure.in @@ -1539,6 +1539,17 @@ if test "$have_pthread_t" = yes ; then #endif ]) fi + +AC_MSG_CHECKING(for realpath with NULL) +realpath_support_null=no +AC_COMPILE_IFELSE([ + AC_LANG_PROGRAM([[#include <stdlib.h>]], [[char *p = realpath(__FILE__, NULL);]]) +],[realpath_support_null=yes],[]) +AC_MSG_RESULT($realpath_support_null) +if test "$have_pthread_t" = yes ; then + AC_DEFINE(REALPATH_SUPPORT_NULL, 1, + [Define if the realpath() function supports NULL as the second argument]) +fi CC="$ac_save_cc" AC_SUBST(OTHER_LIBTOOL_OPT)
msg142929 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-24 21:28
> I prefer to reuse _Py_wrealpath because we will have > to add special cases: realpath(NULL) and also maybe > canonicalize_file_name(). I read that canonicalize_file_name(name) just calls realpath(name, NULL), so we can maybe avoid this GNU exception. I realized that _Py_wrealpath() returns wchar_t*, so it's maybe better to take the first approach (#ifdef MS_WINDOWS).
msg142932 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-08-24 21:44
> Patch to get #ifdef REALPATH_SUPPORT_NULL: I'm not really familiar with autotools, but I have the impression that this will only check that the given code snippet compiles (and it will), and we want to check that a NULL buffer is supported at runtime. Am I missing something?
msg142943 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-08-24 22:59
Le mercredi 24 août 2011 23:45:00, vous avez écrit : > Charles-François Natali <neologix@free.fr> added the comment: > > Patch to get #ifdef REALPATH_SUPPORT_NULL: > I'm not really familiar with autotools, but I have the impression that > this will only check that the given code snippet compiles (and it > will), and we want to check that a NULL buffer is supported at > runtime. > Am I missing something? Oh, you are right. I don't know autotools, I'm trying to learn.
msg143152 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-29 16:07
Well, if we use two different paths based on the libc version, it might not be a good idea, since behaviour can be different in some cases. It would be nice to know if some modern platforms have a non-compliant realpath().
msg143163 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-08-29 17:22
> Well, if we use two different paths based on the libc version, it might not be a good idea, since behaviour can be different in some cases. Indeed. > It would be nice to know if some modern platforms have a non-compliant realpath(). Alas, it doesn't seem to hold for OpenBSD: http://old.nabble.com/Make-realpath(3)-conform-to-SUSv4-td32031895.html A patch supporting NULL was committed two months ago, which means we probably can't push this forward. I've been quite disappointed by POSIX lately...
msg143164 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-29 17:24
> Alas, it doesn't seem to hold for OpenBSD: > http://old.nabble.com/Make-realpath(3)-conform-to-SUSv4-td32031895.html > > A patch supporting NULL was committed two months ago, which means we > probably can't push this forward. > > I've been quite disappointed by POSIX lately... :-)
msg143165 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-08-29 17:25
> I've been quite disappointed by POSIX lately... POSIX the standard, or the implementers??
msg143167 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-08-29 17:55
> POSIX the standard, or the implementers?? > Both :-) For those wondering why we can't use PATH_MAX (ignoring the buffer overallocation), here's why: https://www.securecoding.cert.org/confluence/display/cplusplus/FIO02-CPP.+Canonicalize+path+names+originating+from+untrusted+sources """ Avoid using this function. It is broken by design since (unless using the non-standard resolved_path == NULL feature) it is impossible to determine a suitable size for the output buffer, resolved_path. According to POSIX a buffer of size PATH_MAX suffices, but PATH_MAX need not be a defined constant, and may have to be obtained using pathconf(3). And asking pathconf(3) does not really help, since on the one hand POSIX warns that the result of pathconf(3) may be huge and unsuitable for mallocing memory. And on the other hand pathconf(3) may return -1 to signify that PATH_MAX is not bounded. The libc4 and libc5 implementation contains a buffer overflow (fixed in libc-5.4.13). As a result, set-user-ID programs like mount(8) need a private version."""
msg146640 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-30 14:57
Another problem with the POSIX realpath(3): it can fail with ENOENT if the target path doesn't exist, whereas the current Python implementation "succeeds" even if a component in the path doesn't exist. Note the quotes around "succeeds", because I'm not sure whether this behavior is a good thing: the result is just plain wrong. I wonder whether we should break backwards compatibility on this particular point...
msg153916 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-02-21 23:51
realpath.patch: - Add os.realpath() - posixpath.realpath() uses os.realpath(), or os.getcwd() if the path is an empty string - os.realpath() uses canonicalize_file_name() if available, or use realpath() with a buffer of MAXPATHLEN bytes Don't use realpath(path, NULL) because it is not possible to test if realpath() supports NULL: OpenBSD did segfault if the buffer was NULL. Testing the support in configure is not a good idea because it depends on the version of the C library. You may use a more recent C library to compile Python for example. genericpath, macpath and os2emxpath are not patched to use os.realpath(), even if it is available. I don't know if these modules should be patched. ntpath uses the native Windows function (if available): GetFullPathNameW().
msg154247 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-02-25 13:08
> - os.realpath() uses canonicalize_file_name() if available, or use realpath() with a buffer of MAXPATHLEN bytes MAXPATHLEN is not necessarily defined (e.g. on the Hurd): if it's not defined, it is set either to MAX_PATH (if it's defined an greater than 1024), or arbitrarily to 1024. Thus, we can't pass it to realpath(3), since we don't know for sure that realpath(3) won't return a string greater than MAXPATHLEN, which could result in a buffer overflow. Also, there's the problem that realpath(3) returns ENOENT, whereas os.path.realpath() doesn't: without patch: $ python -c "import os; os.path.realpath('/nosuchfile')" with patch: $ ./python -c "import os; os.path.realpath('/nosuchfile')" Traceback (most recent call last): File "", line 1, in File "/home/cf/python/cpython/Lib/posixpath.py", line 385, in realpath return os.realpath(filename) FileNotFoundError: [Errno 2] No such file or directory: '/nosuchfile'
msg223878 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-24 19:42
Shall we close this, since realpath(3) is fundamentally broken, and pathlib now does The Right Thing?
msg223883 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-24 19:53
I agree, Python should not use the C function realpath() for all the reasons give in the issue.
History
Date User Action Args
2022-04-11 14:57:20 admin set github: 57010
2014-07-24 19:53:55 vstinner set status: open -> closedresolution: not a bugmessages: +
2014-07-24 19:42:57 neologix set messages: +
2012-02-26 08:07:21 Arfrever set nosy: + Arfrever
2012-02-25 13:08:13 neologix set messages: +
2012-02-21 23:51:28 vstinner set files: + realpath.pathmessages: +
2011-10-30 14:57:10 neologix set messages: +
2011-08-29 17:55:44 neologix set messages: +
2011-08-29 17:25:21 rosslagerwall set messages: +
2011-08-29 17:24:01 pitrou set messages: +
2011-08-29 17:22:18 neologix set messages: +
2011-08-29 16:07:12 pitrou set messages: +
2011-08-29 15:41:40 matejcik set nosy: + matejcik
2011-08-24 22:59:21 vstinner set messages: +
2011-08-24 21:45:00 neologix set messages: +
2011-08-24 21:31:04 santoso.wijaya set nosy: + santoso.wijaya
2011-08-24 21:28:33 vstinner set messages: +
2011-08-24 21:22:15 vstinner set messages: +
2011-08-24 20🔞44 vstinner set messages: +
2011-08-24 20:11:58 neologix set messages: +
2011-08-21 13:53:39 eric.araujo set nosy: + eric.araujomessages: +
2011-08-21 05:04:04 rosslagerwall set files: + i12801.patchkeywords: + patchmessages: +
2011-08-20 23:57:43 vstinner set messages: +
2011-08-20 23:47:30 pitrou create