Issue 570300: inspect.getmodule symlink-related failur (original) (raw)

Issue570300

Created on 2002-06-18 01:24 by amitar, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_res amitar,2004-07-12 22:08 still exists in 2.4a1
test_pydoc_func.bash amitar,2004-08-29 13:02 script for demonstrating the problem
fileid_patch_040829.cvsdiff amitar,2004-08-29 13:05 patch that should fix the problem
fileid_patch_040906.cvsdiff amitar,2004-09-06 21:14
Messages (12)
msg11232 - (view) Author: Amit Aronovitch (amitar) Date: 2002-06-18 01:24
news:ae3e29$pib$1@news.netvision.net.il Description: -------------- On a unix python2.2.1 installation I noticed that the documentations generated for modules by pydoc (in any mode - even the help command) did NOT contain any docs for functions. After some digging, I found out the reason for that, and now I believe it indicates a deeper problem with the "inspect" module, concerning file identification in the presence of symbolic or hard links, which I'll explain below, and also suggest solutions. Analysis: ----------- The reason the functions were dropped from the doc was pydoc's attempt to remove functions which were imported from other modules. This is done by something like "inspect.getmodule(my_func) is my_module". I found out that inspect.getmodule() returned "None" for these functions! Now, inspect.getmodule works by getting the function's filename, and then searching it in a dictionary containing the filenames for all the modules that were loaded ("modulesbyfile"). Unfortunately, the filename that getabsfile() returns for the function is not the same STRING as the one it returns for the module, but rather an equivalent unix path pointing to the same FILE (the reason for this fact is that the filename for a function is extracted from the code-object, which holds the path the module was referred to at the time it was COMPILED to .pyc, whereas the one for the module is taken from it's __file__, which holds the path it was referred to when it was IMPORTED - these two might differ even if it's the same file). So, the function's file is not found on the dictionary, and getmodule() returns None... Discussion: -------------- We see that the root cause of the problem is that "inspect" uses the "absolute path" (os.path.abspath()) for representing the file's identity. In unix systems, this might cause a problem, since this string is NOT unique (it is a unique path, but different paths may refer to the same file). If we only considered symbolic links, this could be resolved by scanning the path elements and "unfolding" any symlinks, but we must recall that unix can also has "hard links" which are equivalent references to the same inode, and can't be discriminated. So, if we want to resolve the problem in a portable way, we need an immutable (platform-dependant) object that will be unique to a FILE. This object could then be used for comparing files and as keys for dictionaries. A reasonable way to get it would be by means of a new function in the os module. e.g. : id = os.get_fileid(filename) def samefile(f1,f2): return os.get_fileid(f1) is os.get_fileid(f2) This function could be implemented by the inode number (os.stat(f).st_ino) on unix systems, and by the absolute path (os.path.abspath) on systems which do not support links (such as windows), or by anything else, as long as it would be immutable and unique for each file. Please let me know your opinion about this suggestion, Amit Aronovitch
msg11233 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2003-05-17 02:34
Logged In: YES user_id=357491 Just tested under 2.2.2 and 2.3b1 using a module containing just:: def blah(): """Hello"""" pass Ran ```help(test_mod)``` and had it spit out a FUNCTIONS section with the name of the function and its docstring. Am I missing something here?
msg11234 - (view) Author: Amit Aronovitch (amitar) Date: 2003-05-18 20:10
Logged In: YES user_id=564711 Sorry - seems like I forgot most basic step in prob- reporting - the "howtorepeat" :-) - so here it comes: How to repeat: ---------------------- (as I said - you need unix & symlinks to see this happening): ~> mkdir test ~> setenv PYTHONPATH ~/test ~> cat >test/test_mod.py "module doc" def blah(): "hello" pass ^D ~> python >> import test_mod >> help(test_mod) >> ^D [ Prints help - so far so good - no problem - but see now] ~> ln -s test test2 ~> setenv PYTHONPATH test2 ~> python >> import test_mod >> help(test_mod) [ Now the help shows up without the help of the blah function] Relating the example to my explanations above: ------------------------------------------------------------------------ The help of the blah() function is filtered out, because "inspect" takes "~/test/test_mod.pyc" as it's filename, and "~/test1/test_mod.pyc" as the module's filename. It can't tell that these are the same file (see details in my "Analysis" section above). True, this messing up with symlinks and PYTHONPATH is a bit ugly, but this is just to demonstrate the problem. The system where I noticed it is quite complex, with disks shared (automounted) across several platforms, and it needs a few symlinks to make things easyer to maintain. As I explained, I think that few little changes in modules such as "inspect" and "os" can make them identify files better in the presence of links.
msg11235 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2004-07-10 18:23
Logged In: YES user_id=357491 Well, looks like this problem has gone away, at least in 2.4. Closing out outdated.
msg11236 - (view) Author: Amit Aronovitch (amitar) Date: 2004-07-12 22:08
Logged In: YES user_id=564711 In my experience, problems don't just "go away" by themselves. Someone needs to actually fix them. So, I tested on 2.4a - and results are EXACTLY THE SAME (attached printout). It seems that no-one got to actually READ this lengthy description, so I'll have to send patches. Sorry I did not do that already, and sorry again but it seems I'm not going to get to that soon enough. I'll try to get it done by the end of July.
msg11237 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2004-07-13 00:26
Logged In: YES user_id=357491 Well, when a bug gets old and you don't have a test to make sure it has been fixed, yes, things just do "go away" in the stdlib. The amount of code change in the stdlib can easily lead to some other bug being fixed. And I did read it. But when the problem stopped presenting itself to me (and I don't know why; I spent a good amount of time on this on the July 10 Bug Day) I figured it was gone. If I can't reproduce it I can't try to fix it. But if you can come up with a patch to fix this feel free to assign it to me.
msg11238 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-08-13 17:08
Logged In: YES user_id=469548 I can reproduce the problem using the steps outlined below. Replacing the line (it's not even worth creating a new patch item): modulesbyfile[getabsfile(module)] = module.__name__ with modulesbyfile[os.path.realpath(getabsfile(module))] = module.__name__ fixes the problem.
msg11239 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2004-08-13 18:47
Logged In: YES user_id=357491 Checked in as rev. 1.52 for 'inspect'. Not going to backport since it is a semantic change. Thanks for the patch, Johannes.
msg11240 - (view) Author: Amit Aronovitch (amitar) Date: 2004-08-29 13:02
Logged In: YES user_id=564711 Pls see attached files. Note that jlgijsbers' patch does not resolve the full scope of the problem as described in my original post (see the "discussion" part) - namely: it only works for symbolic links. bcannon (re 12/7 msg): Sorry. I wrote the long explanations in hope it would save you time, but it seems they were not clear enough. To avoid trouble repeating the problem, this time I'll provide a shell script for testing it. Also provided is a proposed patch (against cvs snapshot from 29 Aug 2004). About the patch ----------------- I added a "fileid" function to os.path (as suggested in my original post). This means macpath os2emxpath and ntpath had to be touched as well as posixpath. (libposixpath.tex would also need an update if you decide to adopt this patch) Question about inspect.getabsfile ----------------------------------- I'm not sure if this function is ment to be an "internal use" or "interface" function. It does not appear in the module's documentation (libinspect.tex), but the pydoc module still uses it (and as far as I could see - it's the only module that uses it). After my patch, getabsfile is not used internally by inspect anymore, so should be deleted if "internal use". The use of this function in pydoc is for human readable output, so I don't think it's really necessary there (I think there's no need to do "normcase" there). tks for yr attention
msg11241 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2004-09-05 19:53
Logged In: YES user_id=357491 Reassigning to Johannes since he has checkin rights now. =) But honestly I think this bug is just not worth the hassle. This is only an issue if you are futzing with your file system in an uncommon way. 'help' is just for quick checks and thus if doesn't work for *every* situation it isn't going to be the end of the world. Plus I don't like how the patch touches so many files with the same chunk of code.
msg11242 - (view) Author: Amit Aronovitch (amitar) Date: 2004-09-06 21:14
Logged In: YES user_id=564711 Your'e the maintainers. I'll have to live with whatever you decide, but I still feel I must bring my knowledge/arguments to your attention. Basicly, i believe that providing an os.path.fileid function is more "correct" conceptually as opposed to 'realpath' (have a look at the implementation of the 'samefile' function in the pre-my-patch version of posixpath.py - shouldn't this be generalizable?), so would probably be more resistant to future bugs. (The specific problem that made me notice this bug would probably be solved by Johannes' patch, but problems could appear elsewhere for other people.) re: "... filesystem in an uncommon way" - a) You may still want to use python on systems where you dont have much control over the way your sysadmin organises the filesystem. b) Some systems have complex multi-platform network filesystems. Different platforms share the crossplatform files of the installation. It becomes necessary to use links to keep it managable (though I believe symlinks should almost always be enough). re: "'help' is just..." - Well, as I said before, the problem is really with *inspect*. This is a rather general purpose module, and bad things might happen if you get this unexpected 'None' output from inspect.getmodule re: "... the patch touches so many files" - This is the nature of the current os.path implementation. The common interface is reimplemented for each platform. For example, the "realpath" function has a default no-op implementation, which is repeated in many of the *path.py (p.s., I believe the 'realpath' function was added because people actually needed something like my fileid function, but missed the generalization - unique ID does not always have to be a PATH). I could have provided a default in the common os.py code, but decided this kind of implementation would be inconsistent with current module's style. If you think otherwise - I can provide such alternative. p.s. - I would appreciate a reply about my 'inspect.getabsfile' question - I believe functions should either be documented or used only internally (otherwise you can't change internal implementation without breaking external code). p.s. 2 - I'm uploading an up-to-date diff p.s. 3 - If you think this kind of conversation is inappropriate for the bts - pls let me know - you can use personal email. tks
msg11243 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-09-11 16:07
Logged In: YES user_id=469548 Gee Brett, thanks! I'm just going to agree with you though, the patch is too much trouble. I've made one small change (use __module__ for functions/methods too) which ensures there's only a problem when you're both using multiple hard links to one file (rather unlikely) *and* you're calling inspect.getmodule on a traceback, frame or code object (not all that likely either).
History
Date User Action Args
2022-04-10 16:05:25 admin set github: 36762
2002-06-18 01:24:15 amitar create