Finding duplicate file names and modifying them based on elements of the path

Wed Jul 18 18:49:14 EDT 2012

"Larry.Martell at gmail.com" <larry.martell at gmail.com> writes:
> I have an interesting problem I'm trying to solve. I have a solution
> almost working, but it's super ugly, and know there has to be a
> better, cleaner way to do it. ...
>
> My solution involves multiple maps and multiple iterations through the
> data. How would you folks do this?

You could post your code and ask for suggestions how to improve it.
There are a lot of not-so-natural constraints in that problem, so it
stands to reason that the code will be a bit messy.  The whole
specification seems like an antipattern though.  You should just give a
sensible encoding for the filename regardless of whether other fields
are duplicated or not.  You also don't seem to address the case where
basename, dir4, and dir5 are all duplicated.

The approach I'd take for the spec as you wrote it is:

1. Sort the list on the (basename, dir4, dir5) triple, saving original
   location (numeric index) of each item  
2. Use itertools.groupby to group together duplicate basenames.
3. Within the groups, use groupby again to gather duplicate dir4's,
4. Within -those- groups, group by dir5 and assign sequence numbers in
   groups where there's more than one file
5. Unsort to get the rewritten items back into the original order.

Actual code is left as an exercise.