encoding ascii data for xml

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Sat Oct 4 02:59:20 EDT 2008


On Fri, 03 Oct 2008 14:41:13 -0700, harrelson wrote:

> import xml.dom.minidom
> print chr(3).encode('utf-8')
> dom = xml.dom.minidom.parseString( "<test>%s</test>" %
> chr(3).encode('utf-8') )
> 
> chr(3) is the ascii character for "end of line".  I would think that
> trying to encode this to utf-8 would fail but it doesn't-- I don't get a
> failure till we get into xml land and the parser complains.  My question
> is why doesn't encode() blow up?  It seems to me that encode() shouldn't
> output anything that parseString() can't handle.

It's not a problem with encode IMHO but with XML because XML can't handle 
all ASCII characters.  XML parsers choke on every code below 32 that is 
not whitespace.  BTW `chr(3)` isn't "end of line" but "end of text" (ETX).

If you want to be sure that an arbitrary string can be embedded into XML 
you'll have to encode it as base64 or something similar.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list