"Is-it-a-word-module"

‘5ÛHH575-UAZWKVVP-7H2H48V3 thomas at cintra.no
Mon May 15 03:27:17 EDT 2000


Hi,

Just wondered if somebody has a tip on how to go about the following
problem:

In a indexing-module I now index alot of "words" that actually are
just a meaningless stream of characters, stuff like "translated" URLs,
other stuff generated by Internet-robots of some kind. At first I
thought I could just lookup each "word" in a dictionary, but then I
realized that alot of stuff, like names like my own, are not in the
dictionary but should be indexed. I therefore want some way of
guessing if a bunch of character actually could be a meaningful word.

I`ve found that variation of characters can be a good clue. 

If somebody has any other pointers I`d appreciate it. The module would
be announced here if anybody show any interest.

Thomas



More information about the Python-list mailing list