Encoding Questions

Tue Apr 19 14:38:07 EDT 2005

<jalil at feghhi.com> schrieb im Newsbeitrag 
news:1113934291.419258.314390 at o13g2000cwo.googlegroups.com...
| 1. I download a page in python using urllib and now want to convert and
| keep it as utf-8? I already know the original encoding of the page.
| What calls should I make to convert the encoding of the page to utf8?
| For example, let's say the page is encoded in gb2312 (simple chinese)
| and I want to keep it in utf-8?

Something like:

utf8_s = s.decode('gb2312').encode('utf-8')

- with s being the simplified chinese string - should work.

|
| 2. Is this a good approach? Can I keep any pages in any languages in
| this way and return them when requested using utf-8 encoding?
|
| 3. Does python 2.4 support all encodings?

See http://docs.python.org/lib/standard-encodings.html for an overview.

|
| By the way, I have set my default encoding in Python to utf8.
|

Why would you want to do that?

--

Vincent Wehren

|
| I appreciate any help.
|
| -JF
|