this string, what encoding?
Anonymous User
nospam at home.com
Fri Aug 10 14:39:12 EDT 2001
Hi, is there a way to find out what encoding a string is in? (This
question--heck, this entire post--may not make any sense, in which case it
merely reveals my ignorance.)
Here's why I'm driven to ask this question:
import xml.dom.minidom
foo = '<foo/>'
fooDoc = xml.dom.minidom.parseString(foo)
fooXml = fooDoc.toxml()
try:
fooDoc2 = xml.dom.minidom.parseString(fooXml)
except TypeError:
print 'Round-tripping failed.'
The reason this fails is that cStringIO, which is used by
xml.dom.pulldom.parseString, rejects unicode strings, as noted here:
http://sourceforge.net/tracker/index.php?func=detail&aid=216388&group_id=547
0&atid=105470
So, I figured, I'd workaround this by adding the following modification to
pulldom.py:
def parseString(string, parser=None):
try:
from cStringIO import StringIO
# <fixme type="workaround">
# this is a temporary workaround since cStringIO doesn't accept
unicode input as noted here:
#
http://sourceforge.net/tracker/index.php?func=detail&aid=216388&group_id=547
0&atid=105470
string = string.encode('utf-8')
# </fixme>
except ImportError:
from StringIO import StringIO
Rather than just always encoding the string, I'd like to be able to do
something like this (pseudocode):
# if the string is not ASCII, make it so
if string.encoding != 'utf-8':
string = string.encode('utf-8')
Thanks,
// mark
More information about the Python-list
mailing list