String/source code analysis tools
François Pinard
pinard at iro.umontreal.ca
Tue May 11 19:43:41 EDT 2004
[Ira Baxter]
> "Moosebumps" <moosebumps at moosebumps.com> wrote in message
> news:j0Khc.25133$Q%5.6444 at newssvr27.news.prodigy.com...
> > I have a whole bunch of script files in a custom scripting
> > "language" that were basically copied and pasted all over the place
> > -- a huge mess, basically. I want to clean this up using Python --
> > and I'm wondering if there is any sort of algorithm for detecting
> > copied and pasted code with slight modifications.
> Not in Python, but could be used to do this. We offer a clone
> detection tool that works on very large source code basis, and detects
> cloned clone with "slight modifications". You'd have to provide a
> grammar for your 'scripting language'. See
> http://www.semanticdesigns.com/Products/Clone/index.html.
Thanks for the reference, I'm saving it for later perusal or study.
Many years ago, because I had a cleaning problem which I presume similar
to yours, I wrote then used a tool for this, but all in C. I called
it `mdiff' (for "multi-diff"), and it is likely found within some old
pretest of `Free wdiff' -- I did not really touch `wdiff' in years, even
if I ponder republishing it this summer, given I find some free time.
`mdiff' seeks for identical sequences of lines within one or more files
(I used it for many dozens of files at once). One difficulty was to
design a way for displaying the output in a usable way, and this was an
interesting problem at least. `mdiff' did the job for me, but I do not
really remember the state of this project nor how `mdiff' would behave
if recompiled today. But, as usual with me, if you feel like toying,
just ask for the sources, or wander for them from my home web page! :-)
--
François Pinard http://www.iro.umontreal.ca/~pinard
More information about the Python-list
mailing list