[Python-3000] Removal of os.path.walk (original) (raw)

Guido van Rossum guido at python.org
Thu May 1 01:02:31 CEST 2008


There is one use case I can see for an iterator-version of os.listdir() (to be named os.opendir()): when globbing a huge directory looking for a certain pattern. Using os.listdir() you end up needed enough memory to hold all of the names at once. Using os.opendir() you would need only enough memory to hold all of the names THAT MATCH.

On Wed, Apr 30, 2008 at 3:50 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:

> There's a big difference between "not enough memory" and "directory > consumes lots of memory". My company has some directories with several > hundred thousand entries, so using an iterator would be appreciated > (although by the time we upgrade to Python 3.x, we probably will have > fixed that architecture). > > But even then, we're talking tens of megabytes at worst, so it's not a > killer -- just painful.

But what kind of operation do you want to perform on that directory? I would expect that usually, you either a) refer to a single file, which you are either going to create, or want to process. In that case, you know the name in advance, so you open/stat/mkdir/unlink/rmdir the file, without caring how many files exist in the directory, or b) need to process all files, to count/sum/backup/remove them; in this case, you will need the entire list in the process, and reading them one-by-one is likely going to slow down the entire operation, instead of speeding it up. So in no case, you actually need to read the entries incrementally. That the C APIs provide chunk-wise processing is just because dynamic memory management is so painful to write in C that the caller is just asked to pass a limited-size output buffer, which then gets refilled in subsequent read calls. Originally, the APIs would return a single entry at a time from the file system, which was super-slow. Today, SysV all-singing all-dancing getdents provides multiple entries at a time, for performance reasons. Regards, Martin


Python-3000 mailing list Python-3000 at python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list