python tool: finding duplicate code

Tim Peters tim.one at comcast.net
Wed May 29 22:17:51 EDT 2002


[Michal Wallace]
> In "Refactoring: Improving the Design of Existing Code",
> Martin Fowler and Kent Beck list duplicate code as their
> number one "Code Smell"
> ...
> It's just string-matching, so it won't find duplicate logic
> with different variable names or layout, but it *can* find
> cut and paste issues.
>
> (hmm... Come to think of it, someone could probably find
> *some* duplicate logic by running source files through the
> tokenizer first. I wonder if that would work...)

Brenda Baker has done some interesting work on this problem (not with Python
in mind, but million-line C systems):

    http://cm.bell-labs.com/who/bsb/

Her "On Finding Duplication and Near-Duplication in Large Software Systems"
is a good entry into the literature.

I have a self-serving reason for mentioning this:  if somebody whips up a
fast suffix tree for Python, I could put it to good use in ameliorating
difflib.py's worst-case time sinks <wink>.






More information about the Python-list mailing list