[XML-SIG] Re: tabs inside attribute values removed

Luke Kenneth Casson Leighton lkcl@samba-tng.org
Tue, 13 Mar 2001 00:15:46 +1100


On Fri, 9 Mar 2001, Jeremy Kloth wrote:

> 
> 
> > i am having to pre-process all text, substituting
> > 	 for "\t" as a work-around for this problem.
> > 
> > if this is not performed, then all tabs inside
> > attribute's values, e.g.
> > <node attr="value\tsep\tby\ttabs"/>, are turned into
> > spaces.
> 
> Using PyXML 0.6.4, I didn't see this behavior.
> 
> from xml.dom.ext.reader import Sax2
> doc = Sax2.FromXml('<element attr="a&#x09;tab"/>')
> attr = doc.documentElement.attributes.item(0)
> print repr(attr.value)
> 'a\011tab'

it's the other way round [and this was with 0.6.2]

doc = Sax2.FromXml('<element attr="a\011tab"/>')
attr = doc.documentElement.attributes.attributes['','attr'].value

and should i be using doc.documentElement.attributes['ns','name'].value,
is that okay?

[ just checked this]

it still doesn't work, and it still doesn't work with 0.6.4.

so, yes: i have to pre-process all text, substituting \t with &#x09; which
is _not_ something i want to have to leave in the code, long-term, as you
might imagine!

some of the documents i am parsing are over 2.5mb in size, and other
people may find larger uses (see http://sourceforge.net/projects/pyxsmqll)

yes, i know: i need to move to a Sax model not a DOM one.  first
implementation, and all that :)

all best,

luke

 ----- Luke Kenneth Casson Leighton <lkcl@samba-tng.org> -----

"i want a world of dreams, run by near-sighted visionaries"
"good.  that's them sorted out.  now, on _this_ world..."