Changing the default text codec

Peter Otten __peter__ at web.de
Mon Feb 23 04:42:56 EST 2004


Fuzzyman wrote:

> Sorry if my terminology is wrong..... but I'm having intermittent
> problems dealing with accented characters in python. (Only from the 8
> bit latin-1 character set I think..)
> 
> I've written an anagram finder that produces anagrams from a
> dictionary of words. The user can load their own dictionary.
>  
> ( http://www.voidspace.org.uk/atlantibots/nanagram.html )
> 
> It's particularly difficult for me to understand what is happening -
> because python's behaviour *seems* intermittent.
> 
> For example - if I run my program from IDLE and give it the word
> 'degré' (containing e-acute) then I get the error :
> 
> Exception in Tkinter callback
> Traceback (most recent call last):
> [snip..]
>   File "D:\Python Projects\Nanagram1.3\Nanagram-GUI.pyw", line 123, in
> prepare
>     if letter in self.valid_letters:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
> 26: ordinal not in range(128)
> Traceback (most recent call last):
> 
> It is testing each character of the users input to remove invalid
> characters (like "-" and "'")...  It crashes when it comes tot he
> e-acute.
> 
> 
> *However* - If I run it by double clicking on the file then it appears
> to work fine (e.g. if I ask it find anagrams of 'degré hello ma' then
> it strips out the e-acute (thinking it's an invalid character) and
> finds anagrams of the rest :
> 
> gleam holder
> hallo merged
> 
> What I'd like to do is switch by default to an 8 bit codec (latin-1 I
> think ?????) and then offer the user the choice of either mapping the
> accented characters to their nearest equivalent (e-acute to e for
> example) *or* treating them as seperate characters.............
> 
> 
> I can't work out how to change the default codec (no matter what the
> locale) ?
> 
> Anyone able to help - or point me to a useful resource ?? (I've tried
> google - b4 u suggest it )

You can either explicitly convert your unicode strings:

unicodeword.encode("latin-1")

or try to modify your site.py from the default

encoding = "ascii"

to 

encoding = "latin-1"

Peter





More information about the Python-list mailing list