inserting Unicode character in dictionary - Python

Fri Oct 17 15:25:39 EDT 2008

On Fri, 17 Oct 2008 11:32:36 -0600, Joe Strout wrote:

> On Oct 17, 2008, at 11:24 AM, Marc 'BlackJack' Rintsch wrote:
> 
>>> kw = 'генских'
>>>
>> What do you mean by "does not work"?  And you are aware that the above
>> snipped doesn't involve any unicode characters!?  You have a byte
>> string there -- type `str` not `unicode`.
> 
> Just checking my understanding here -- are the following all true:
> 
> 1. If you had prefixed that literal with a "u", then you'd have Unicode.

Yes.

> 2. Exactly what Unicode you get would be dependent on Python properly
> interpreting the bytes in the source file -- which you can make it do by
> adding something like "-*- coding: utf-8 -*-" in a comment at the top of
> the file.

Yes, assuming the encoding on that comment matches the actual encoding of 
the file.

> 3. Without the "u" prefix, you'll have some 8-bit string, whose
> interpretation is... er... here's where I get a bit fuzzy.

No interpretation at all, just the bunch of bytes that happen to be in 
the source file.

> What if your source file is set to utf-8?  Do you then have a proper
> UTF-8 string, but the problem is that none of the standard Python
> library methods know how to properly interpret UTF-8?

Well, the decode method knows how to decode that bytes into a `unicode` 
object if you call it with 'utf-8' as argument.

> 4. In Python 3.0, this silliness goes away, because all strings are
> Unicode by default.

Yes and no.  The problem just shifts because at some point you get into 
similar troubles, just in the other direction.  Data enters the program 
as bytes and must leave it as bytes again, so you have to deal with 
encodings at those points.

Ciao,
	Marc 'BlackJack' Rintsch