word stemming / conflation

Terry Hancock hancock at anansispaceworks.com
Sun Apr 21 09:44:27 EDT 2002


I was working on an algorithm to condense strings of
text into mnemonic labels (hopefully this will produce
more meaningful labels than simply assigning a number!).
I got the idea from the way later versions of
latex2html generate filenames.

Anyway, after tinkering with this for awhile I've
discovered that it can be a bottomless pit! A little
web research reveals various MS / PhD papers for doing
this task (which apparently is called "conflation" as
well as "word stemming") in English and lots of other
languages. Looks like it's a lot harder than you
might at first think. I didn't find much in Python,
though.

So, I'm wondering if there are some Python resources
(or better-yet already written modules) for doing
this sort of thing.  I don't really need a completely
robust system -- the occasional error will be quite
tolerable. So I'm more interested in quick and simple
solutions that work most of the time, rather than
the ultra-robust library-science type.

Thanks for any information or leads you might be
able to suggest,

Terry

-- 
------------------------------------------------------
Terry Hancock
hancock at anansispaceworks.com       
------------------------------------------------------





More information about the Python-list mailing list