Compairing filenames in a list

Joshua Landau joshua.landau.ws at gmail.com
Sun Sep 30 21:48:38 EDT 2012


On 30 September 2012 23:08, Arnaud Delobelle <arnodel at gmail.com> wrote:

> On 30 September 2012 02:27, Kevin Anthony <kevin.s.anthony at gmail.com>
> wrote:
> > I have a list of filenames, and i need to find files with the same name,
> > different extensions, and split that into tuples.  does anyone have any
> > suggestions on an easy way to do this that isn't O(n^2)?
>
> >>> import os, itertools
> >>> filenames = ["foo.png", "bar.csv", "foo.html", "bar.py"]
> >>> dict((key, tuple(val)) for key, val in
> itertools.groupby(sorted(filenames), lambda f: os.path.splitext(f)[0]))
> {'foo': ('foo.html', 'foo.png'), 'bar': ('bar.csv', 'bar.py')}
>

That seems wasteful. Sort is O(n log n)

I've seen this pattern a lot. Surely there should be an object for this...

filenames = ["foo.png", "bar.csv", "foo.html", "bar.py"]
>
> import os
>
> from collections import defaultdict
> grouped = defaultdict(list)
>
> for file in filenames:
>     splitname = os.path.splitext(file)
>     grouped[splitname[0]].append(splitname[1])
>
> grouped
> >>> defaultdict(<class 'list'>, {'foo': ['.png', '.html'], 'bar': ['.csv',
> '.py']})


This should be near-enough O(n) time. Pah, it's not like you need to
optimize this anyway!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20121001/09f59237/attachment.html>


More information about the Python-list mailing list