Changing the default text codec

Fuzzyman michael at foord.net
Mon Feb 23 03:50:27 EST 2004


Sorry if my terminology is wrong..... but I'm having intermittent
problems dealing with accented characters in python. (Only from the 8
bit latin-1 character set I think..)

I've written an anagram finder that produces anagrams from a
dictionary of words. The user can load their own dictionary.
 
( http://www.voidspace.org.uk/atlantibots/nanagram.html )

It's particularly difficult for me to understand what is happening -
because python's behaviour *seems* intermittent.

For example - if I run my program from IDLE and give it the word
'degré' (containing e-acute) then I get the error :

Exception in Tkinter callback
Traceback (most recent call last):
[snip..]
  File "D:\Python Projects\Nanagram1.3\Nanagram-GUI.pyw", line 123, in
prepare
    if letter in self.valid_letters:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
26: ordinal not in range(128)
Traceback (most recent call last):

It is testing each character of the users input to remove invalid
characters (like "-" and "'")...  It crashes when it comes tot he
e-acute.


*However* - If I run it by double clicking on the file then it appears
to work fine (e.g. if I ask it find anagrams of 'degré hello ma' then
it strips out the e-acute (thinking it's an invalid character) and
finds anagrams of the rest :

gleam	holder
hallo	merged

What I'd like to do is switch by default to an 8 bit codec (latin-1 I
think ?????) and then offer the user the choice of either mapping the
accented characters to their nearest equivalent (e-acute to e for
example) *or* treating them as seperate characters.............


I can't work out how to change the default codec (no matter what the
locale) ?

Anyone able to help - or point me to a useful resource ?? (I've tried
google - b4 u suggest it )



Fuzzy



More information about the Python-list mailing list