unicode experiments + questions

Wed Mar 27 17:52:20 EST 2002

Irmen de Jong wrote:
> So I'm using the unicode escape char syntax, but that is cumbersome
> (where do I look up all my special characters?) and hard on the eyes.

Python 2.2 does not attempt to interpret bytes in string literals. So
you can embed non-ASCII characters in them if you want. You can look up
Unicode ordinals at http://www.unicode.org/charts/. Or you can just use
this code:

>>> ord('<your character here>'.encode('<your encoding here>'))

> I also have the following question:
> what exactly happens when I type  "print u" in Python, where u
> is a unicode string? for example;
> 
> >>> e=u'\u20ac'
> >>> e
> u'\u20ac'
> >>> print e
> €    (<--- this is an Euro symbol on my screen)

Is this what actually happens? Did you change your default encoding to
be something other than ASCII, possible by modifying site.py?

> What charset does the print convert to?
> I'm on Win2000, so when I type
> >>> print e.encode('cp1252')
> I get the Euro symbol. Does print automatically convert to the windows
> charset cp1252?

The print does not do any conversion; it just sends bytes to the output
stream.

> How does Python know this charset, because in my syte.py
encoding="iso-
> 8859-15".

Ahah! You did some customization. I think that it is better style to
leave the default encoding as ASCII and call encode/unicode explicitly.

> When I type
> >>> print e.encode('iso-8859-15')
> I don't see the Euro symbol, but some other weird symbol.

As you should, it your system does not use ISO-8859-15.

>
> For your interest, below is the test program I'm using to generate
> different encoded documents.
> Interestingly enough, UTF-7 is not understood by Opera 6. IE 5 and
Mozilla
> get it right.

I've never seen an HTML document encoded in UTF-7, so this doesn't
surprise me.

Cheers,
Brian