Problems With Accented Characters

Fuzzyman michael at foord.net
Sun Feb 22 18:47:52 EST 2004


I've written an anagram finder that produces anagrams from a
dictionary of words. The user can load their own dictionary.

( http://www.voidspace.org.uk/atlantibots/nanagram.html )

In order to ensure it is able to  find anagrams properly I wanted to
strip characters like punctuation etc from words in the dictionary and
words the user entered. I test(ed) against the 26 English letters (
string.ascii_lowercase ).

I now have someone who wants to use a French dictionary - with words
containing accented characters !! I have two choices - either map the
accented characters to their unaccented equivalent (slightly
innacurate) or treat the accented charcters as a separate letter (very
few anagrams). However - at the moment I can't experiment with either
because my default codec is the 7-bit ascii and crashes (sometimes !!)
when using the accented characters.

Has anyone any advice - or can point me to any resources - for
effectively handling these characters. I guess it's a latin-1 encoding
I want to use... I can't even work out how to cahnge the default
codec........

Thanks,

Fuzzy

http://www.voidspace.org.uk/atlantibots/pythonutils.html



More information about the Python-list mailing list