Changing the default text codec

Paul Prescod paul at prescod.net
Mon Feb 23 04:48:43 EST 2004


Fuzzyman wrote:
> Sorry if my terminology is wrong..... but I'm having intermittent
> problems dealing with accented characters in python. (Only from the 8
> bit latin-1 character set I think..)

I would say that if you get a 100% failure rate in IDLE and a 100% 
success rate from a console program then your problem is not 
intermittent but environment specific.

> For example - if I run my program from IDLE and give it the word
> 'degri' (containing e-acute) then I get the error :

What do you mean "give it the word". Through raw_input()? Through a file?

However you are getting this information, it seems to me that in IDLE 
you are getting a Unicode object rather than an 8-bit string object. 
Convert it to an 8-bit string:

mydata.encode("latin-1")

 >  if letter in self.valid_letters:
 > UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position
 > 26: ordinal not in range(128)

Something looks suspicious here. I wouldn't expect self.valid_letters to 
have a 0x83 character in it because I would expect it to be hard-coded 
to ASCII in your program like:

valid_letters = "abcdefghijklmnopqrstuvwxyzABCDEF..."

On the other hand I wouldn't expect "letter" to have more than one 
character so how could it have a problem at position 26?

> What I'd like to do is switch by default to an 8 bit codec (latin-1 I
> think ?????) and then offer the user the choice of either mapping the
> accented characters to their nearest equivalent (e-acute to e for
> example) *or* treating them as seperate characters.............

Why change the default codec rather than explicitly using the codec you 
care about? If you want to work in the 8-bit world rather than the 
Unicode world, just use the "encode" function on the Unicode object. If 
you want to work in the Unicode world.

> I can't work out how to change the default codec (no matter what the
> locale) ?

I'd advise against fixing the problem in that way. Convert data 
appropriately when you bring it from the outside world into the Python 
program and ignore the default codec.

  Paul Prescod





More information about the Python-list mailing list