Totally confused by Python's string thing.

Doru-Catalin Togea doru-cat at ifi.uio.no
Mon Dec 16 11:34:30 EST 2002


Hi!

I am doing basic string manipulation with ActivePython 2.2 on Win2000 Pro.

getdefaultlocale() returns: "('no_NO', 'cp1252')"
getlocale() returns: "['Norwegian_Norway', '1252']"

when trying
	locale.setlocale(locale.LC_ALL, 'latin-1')
I get
	locale.Error: locale setting not supported

I am so totally confused, as calling doEncode, which is defined as
follows,

def doEncode(str):
	strCopy = str.encode('latin-1')

	for tag in myTags:
		strCopy = string.replace(strCopy, tag[0], tag[1])

	return strCopy

crashes when 'str' contains norwegian letters (åøæÅØÆ), with the following
error message:
...
File ... , line 53, in doEncode
   strCopy = str.encode('latin-1')
UnicodeError: ASCII decoding error: ordinal not in range(128)

Can you help me understand how python deals with strings?

1) According to
http://www.cl.cam.ac.uk/~mgk25/ucs/CP1252.html, the 1252 extension
extends ISO 8859-1. Now ISO 8859-1 allready contains the norwegian
characters, at least according to
http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html

So what is my problem, actually?

2) How do I set up my system to deal correctly and robustly with the ISO
8859-1 character set? How about the ISO 8859-2 character set?

3) Is there any INTRODUCTORY documentation about Python's internal string
thing?

One last curiosity and its mandatory question:

4) What kind of string objects does pyXML employ, since I can parse XML
with norwegian content and call doEncode on strings returned from my XML
file, without any Unicode crash?

Thank you, if you can help.
Catalin



	<<<< ================================== >>>>
	<<     We are what we repeatedly do.      >>
	<<  Excellence, therefore, is not an act  >>
	<<             but a habit.               >>
	<<<< ================================== >>>>





More information about the Python-list mailing list