[code-quality] copy/paste detection tool

Sylvain Thénault sylvain.thenault at logilab.fr
Fri Jul 5 10:55:08 CEST 2013


Hello Lionel,

On 05 juillet 07:27, Lionel Barret wrote:
> My name is Lionel Barret, I attended Florent Xicluna”s Europython talk
> Tuesday and it reminded me of a clone detection tool I used in the past (on
> a 100k sloc codebase)
> 
> I talked about it with a few people (Florent Xicluna , Joe Gordon) and they
> were interested. Florent told me it was the list for this kind of
> discussion.
> 
> This tool named clonedigger (http://clonedigger.sourceforge.net/ ) detects
> copy/pasted code or independent writing of the same classes/functions
> across a big codebase. In my last job, I used to get a daily html report, a
> big overview of the things that have been copy/pasted/rewritten. it was
> really useful.
> 
> Sadly, it is unmaintained, the last upload dates from 2011. Besides, it”s
> using old packages (like the compiler package) and likely incompatible with
> python3 (either for running or for analyzing).
> 
> I really think this kind of tool should be part of any code-quality
> toolbox, like pyflakes, pep8, etc.
> 
> ( The tool itself is GPL, so no blocking there. ).
> 
> I just wanted to see if anybody would be interested by an updated version
> of the tool and who could help. From the top of my mind, the next steps
> would be contacting the original author, evaluate the work to do (obsolete
> modules used and python3 incompatibilities) and eventually refactor the
> code.

How does it compare to Pylint's similarity checker? Basically it will reports
you copy/pasted/rewritten code implying more than a configurable number of
lines, after some normalisation.

-- 
Sylvain Thénault, LOGILAB, Paris (01.45.32.03.12) - Toulouse (05.62.17.16.42)
Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations
Développement logiciel sur mesure:       http://www.logilab.fr/services
CubicWeb, the semantic web framework:    http://www.cubicweb.org


More information about the code-quality mailing list