Catogorising strings into random versus non-random
Paul Rubin
no.email at nospam.invalid
Mon Dec 21 12:20:01 EST 2015
Steven D'Aprano <steve at pearwood.info> writes:
> Does anyone have any suggestions for how to do this? Preferably something
> already existing. I have some thoughts and/or questions:
I think I'd just look at the set of digraphs or trigraphs in each name
and see if there are a lot that aren't found in English.
> - I think nltk has a "language detection" function, would that be suitable?
> - If not nltk, are there are suitable language detection libraries?
I suspect these need longer strings to work.
> - Is this the sort of problem that neural networks are good at solving?
> Anyone know a really good tutorial for neural networks in Python?
> - How about Bayesian filters, e.g. SpamBayes?
You want large training sets for these approaches.
More information about the Python-list
mailing list