[Python-Dev] A smarter shutil.copytree ? (original) (raw)

Tarek Ziadé ziade.tarek at gmail.com
Mon Apr 21 09:48:47 CEST 2008


The pattern matching uses the src_dir to call glob.glob(), which returns the list of files to be excluded. That's why I added within the copytree() function.

To make an excluding_patterns work, it could be coded like this::

def excluding_patterns(*patterns):
    def _excluding_patterns(filepath):
        exclude_files = []
        dir_ = os.path.dirname(filepath)
        for pattern in patterns:
            pattern = os.path.join(dir_, pattern)
            exclude_files.extend(glob.glob(pattern))
        return path in exclude_files
    return _excluding_patterns

But I can see some performance issues, as the glob function will be called within the loop to test each file or folder::

def copytree(src, dst, exclude):
    ...
    for name in names:
        srcname = os.path.join(src, name)
        if exclude(srcname):
            continue
        ...
    ...

Adding it at the beginning of the copytree function would then be better for performance, but means that the callable has to return a list of matching files instead of the match result itself::

def excluding_patterns(*patterns):
    def _excluding_patterns(path):
        exclude_files = []
        for pattern in patterns:
            pattern = os.path.join(dir_, pattern)
            exclude_files.extend(glob.glob(pattern))
        return exclude_files

Then in copytree::

def copytree(src, dst, exclude):
    ...
    excluded = exclude(src)
    ...
    for name in names:
        srcname = os.path.join(src, name)
        if srcname in excluded:
            continue
        ...
    ...

But this means that people that wants to implement their own callable will have to provide a function that returns a list of excluded files, therefore they won't be free to implement what they want.

We could have two parameters, one for the glob-style sequence and one for the callable, to be able to use them at the appropriate places in the function, but I think this would make the function signature rather heavy::

def copytree(src, dst, exclude_patterns=None, exclude_function=None):
    ...

That's why I would be in favor of sequence-or-callable argument even if I admit that it is not the pretiest way to present an argument.

Regards

Tarek

On Mon, Apr 21, 2008 at 2:38 AM, Isaac Morland <ijmorlan at cs.uwaterloo.ca> wrote:

On Sun, 20 Apr 2008, Steven Bethard wrote:

> On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <ziade.tarek at gmail.com> wrote: > > > I have submitted a patch for review here: http://bugs.python.org/issue2663 > > > > glob-style patterns or a callable (for complex cases) can be provided > > to filter out files or directories. > > > > I'm not a big fan of the sequence-or-callable argument. Why not just > make it a callable argument, and supply a utility function so that you > can write something like:: > > excludefunc = shutil.excludingpatterns('*.tmp', 'testdir2') > shutil.copytree(srcdir, dstdir, exclude=excludefunc) > Even if a glob pattern filter is considered useful enough to be worth special-casing, the glob capability should also be exposed via something like your excludingpatterns constructor and additionally as a function that can be called by another function intended for use as a callable argument. If it is not, then doing something like "files matching these glob patterns except for those matching this non-glob-expressible condition and also those files matching this second non-glob-expressible condition" becomes painful because the glob part essentially needs to be re-implemented. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist

-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/



More information about the Python-Dev mailing list