[Python-Dev] A smarter shutil.copytree ?

Tarek Ziadé ziade.tarek at gmail.com
Mon Apr 21 09:48:47 CEST 2008


The pattern matching uses the src_dir to call glob.glob(), which returns
the list of files to be excluded. That's why I added within the
copytree() function.

To make an excluding_patterns work, it could be coded like this::

    def excluding_patterns(*patterns):
        def _excluding_patterns(filepath):
            exclude_files = []
            dir_ = os.path.dirname(filepath)
            for pattern in patterns:
                pattern = os.path.join(dir_, pattern)
                exclude_files.extend(glob.glob(pattern))
            return path in exclude_files
        return _excluding_patterns

But I can see some performance issues, as the glob function will
be called within the loop to test each file or folder::

    def copytree(src, dst, exclude):
        ...
        for name in names:
            srcname = os.path.join(src, name)
            if exclude(srcname):
                continue
            ...
        ...

Adding it at the beginning of the `copytree`  function would then
be better for performance, but means that the callable has to return
a list of matching files instead of the match result itself::

    def excluding_patterns(*patterns):
        def _excluding_patterns(path):
            exclude_files = []
            for pattern in patterns:
                pattern = os.path.join(dir_, pattern)
                exclude_files.extend(glob.glob(pattern))
            return exclude_files

Then in copytree::

    def copytree(src, dst, exclude):
        ...
        excluded = exclude(src)
        ...
        for name in names:
            srcname = os.path.join(src, name)
            if srcname in excluded:
                continue
            ...
        ...

But this means that people that wants to implement their own
callable will have to provide a function that returns a list
of excluded files, therefore they won't be free to implement
what they want.

We could have two parameters, one for the glob-style sequence
and one for the callable, to be able to use them at the
appropriate places in the function, but I think this would
make the function signature rather heavy::

    def copytree(src, dst, exclude_patterns=None, exclude_function=None):
        ...


That's why I would be in favor of sequence-or-callable argument
even if I admit that it is not the pretiest way to present
an argument.

Regards

Tarek

On Mon, Apr 21, 2008 at 2:38 AM, Isaac Morland <ijmorlan at cs.uwaterloo.ca> wrote:
> On Sun, 20 Apr 2008, Steven Bethard wrote:
>
>
> > On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <ziade.tarek at gmail.com>
> wrote:
> >
> > > I have submitted a patch for review here:
> http://bugs.python.org/issue2663
> > >
> > >  glob-style patterns or a callable (for complex cases) can be provided
> > >  to filter out files or directories.
> > >
> >
> > I'm not a big fan of the sequence-or-callable argument. Why not just
> > make it a callable argument, and supply a utility function so that you
> > can write something like::
> >
> >   exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2')
> >   shutil.copytree(src_dir, dst_dir, exclude=exclude_func)
> >
>
>  Even if a glob pattern filter is considered useful enough to be worth
> special-casing, the glob capability should also be exposed via something
> like your excluding_patterns constructor and additionally as a function that
> can be called by another function intended for use as a callable argument.
>
>  If it is not, then doing something like "files matching these glob patterns
> except for those matching this non-glob-expressible condition and also those
> files matching this second non-glob-expressible condition" becomes painful
> because the glob part essentially needs to be re-implemented.
>
>  Isaac Morland                   CSCF Web Guru
>  DC 2554C, x36650                WWW Software Specialist



-- 
Tarek Ziadé | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/


More information about the Python-Dev mailing list