[Python-Dev] os.path.walk() lacks 'depth first' option (original) (raw)

Noah Spurrier noah@noah.org
Tue, 22 Apr 2003 01:32:21 -0700


Tim>The callback can rename b and e, and change the contents of the fnames list Tim>to ["bx", "ex"] so that walk will find the renamed directories. Etc.

Ha! This is sweet, but I would call this solution "nonobvious". But perhaps it is a good argument for not modifying os.path.walk(), yet should a walktree generator be included in Python's future I hope that it will have the explicit option for postorder depth first.

Tim> Sorry, I'm unmovable on this point. My typical uses for this function do Tim> have to separates dirs from non-dirs, walk() has to make the distinction Tim> anyway (for its internal use), and it's expensive for the client to do the Tim> join() and isdir() bits all over again (isdir() is a filesystem op, and at Tim> least on my box repeated isdir() is overwhelmingly more costly than Tim> partitioning or joining a Python list).

I'm probably less adamant on this point than you :-) And you are right, it's cheaper for me to simply run through both lists than it would be to loop over a conditional based on isdir().

Tim> What about that worries you? I don't like it because I have some Tim> directories with many thousands of files, and stuffing a long redundant path Tim> at the start of each is wasteful in the abstract. I'm not sure it really Tim> matters, though -- e.g., 10K files in a directory * 200 redundant chars each Tim> = a measly 2 megabytes wasted .

That was also what bothered me ;-) I guess it's more of a habit than necessity.

Tim> Not all Python platforms have symlinks, of course. The traditional answer

True, but checking a file with os.path.islink() should be safe even on platforms without links -- if the docs are to be believed. The docs says that platforms that don't support links will always return False for islink(). The Python docs are a little inconsistent on links.

  1. os.path.islink(path) claims to be only check links on UNIX and always false if symbolic links are not supported.
  2. os.readlink(path) is only available on UNIX and is not defined on Windows.
  3. os.path.realpath(path) claims to be only available on UNIX, but it is actually defined and returns the given path if you call it on Windows.

Tim> I'm finding you too hard to follow here, because your use of "depthfirst" Tim> and "breadthfirst" doesn't follow normal usage of the terms. Here's normal

You are right. I will stop calling it Breadth First now. Feel free to dope slap me.

This confusion on my part was due to the apparent order when one prints the elements of the names list when the visit function is called. It would print B, C, D, E, F, G, H, I, J, K, but that's the parent printing the children, not the children printing themselves as they are visited. Oh... (a small, dim light clicks on.)

Still, walktree should have the option to hit the bottom of a branch and then process on it's way back up (post-order).

OK, how is this following version?

Yours, Noah

from future import generators # needed for Python 2.2 import os

def walktree (basepath=".", postorder=True, ignorelinks=True): """This walks a directory tree, starting from the basepath directory. This is somewhat like os.path.walk, but using generators instead of a visit function. One important difference is that walktree() defaults to postorder with optional preorder, whereas the os.path.walk function allows only preorder. Postorder was made the default because it is safer if you are going to be modifying the directory names you visit. This avoids the problem of renaming a directory before visiting the children of that directory.

 The ignorelinks option determines whether to follow symbolic links.
 Some symbolic links can lead to recursive traversal cycles.
 A better way would be to detect and prune cycles.
 """
 children = os.listdir(basepath)

 dirs, nondirs = [], []

 for name in children:
     fullpath = os.path.join (basepath, name)
     if os.path.isdir (fullpath) and not (ignorelinks and os.path.islink(fullpath)):
         dirs.append(name)
     else:
         nondirs.append(name)

 if not postorder:
     yield basepath, dirs, nondirs

 for name in dirs:
     for next_branch in walktree (os.path.join(basepath, name), postorder, ignorelinks):
         yield next_branch

 if postorder:
     yield basepath, dirs, nondirs

def test(): for basepath, dirs, nondirs in walktree(): for name in dirs: print os.path.join(basepath, name) for name in nondirs: print os.path.join(basepath, name)

if name == 'main': test()