Changing the default text codec
Peter Otten
__peter__ at web.de
Mon Feb 23 04:42:56 EST 2004
Fuzzyman wrote:
> Sorry if my terminology is wrong..... but I'm having intermittent
> problems dealing with accented characters in python. (Only from the 8
> bit latin-1 character set I think..)
>
> I've written an anagram finder that produces anagrams from a
> dictionary of words. The user can load their own dictionary.
>
> ( http://www.voidspace.org.uk/atlantibots/nanagram.html )
>
> It's particularly difficult for me to understand what is happening -
> because python's behaviour *seems* intermittent.
>
> For example - if I run my program from IDLE and give it the word
> 'degré' (containing e-acute) then I get the error :
>
> Exception in Tkinter callback
> Traceback (most recent call last):
> [snip..]
> File "D:\Python Projects\Nanagram1.3\Nanagram-GUI.pyw", line 123, in
> prepare
> if letter in self.valid_letters:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
> 26: ordinal not in range(128)
> Traceback (most recent call last):
>
> It is testing each character of the users input to remove invalid
> characters (like "-" and "'")... It crashes when it comes tot he
> e-acute.
>
>
> *However* - If I run it by double clicking on the file then it appears
> to work fine (e.g. if I ask it find anagrams of 'degré hello ma' then
> it strips out the e-acute (thinking it's an invalid character) and
> finds anagrams of the rest :
>
> gleam holder
> hallo merged
>
> What I'd like to do is switch by default to an 8 bit codec (latin-1 I
> think ?????) and then offer the user the choice of either mapping the
> accented characters to their nearest equivalent (e-acute to e for
> example) *or* treating them as seperate characters.............
>
>
> I can't work out how to change the default codec (no matter what the
> locale) ?
>
> Anyone able to help - or point me to a useful resource ?? (I've tried
> google - b4 u suggest it )
You can either explicitly convert your unicode strings:
unicodeword.encode("latin-1")
or try to modify your site.py from the default
encoding = "ascii"
to
encoding = "latin-1"
Peter
More information about the Python-list
mailing list