[XML-SIG] (Py)DOM: Character References

Carsten Oberscheid co@daisybytes.su.uunet.de
Thu, 18 Mar 1999 17:31:45 +0100


>
> Carsten Oberscheid writes:
>  > Can anybody tell why character references are not modeled explicitely in
>  > the
>  > DOM? In XML they have their own identity, explicitely distinct from entity
>  >
>
> Carsten,
>   Good question.  I don't know why character references need explicit
> nodes in the DOM; I'm not terribly interested in knowing that
> something was encoded as "+" or "+".

Ok, since charrefs encode only characters from the document's base character 
set (Unicode for XML, ASCII for SGML -- is that right?), it would be 
unnecessary overhead to create a distinct DOM node for each charref. Forget 
that, should have thought before I wrote...

>                                          I would like to be able to
> have this:
>
> <!DOCTYPE thing>
> <thing>&foo;</thing>
>
> provide a reference to &foo; as a child of the <thing> node.  Here's
> what I get now:
>
> >>> buffer = '<!DOCTYPE thing>\n<thing>&foo;</thing>'
> >>> import xml.dom.utils
> >>> reader = xml.dom.utils.FileReader()
> >>> import cStringIO
> >>> sio = cStringIO.StringIO(buffer)
> >>> dom = reader.readStream(sio)
> >>> dom.documentElement
> <Element 'thing'>
> >>> len(dom.documentElement.childNodes)
> 0

That's ok (unless you have a DTD for doctype "thing" which declares "&foo;" -- 
in well-formed XML, only some default entities (&amp;, &lt;, &gt;) are allowed 
-- replace &foo; by &amp; and it works.

>
> And here's a bug ;-) :
>
> >>> dom.documentElement.childNodes
> <NodeList]>

I'm not sure, but this could be caused by the last line of 
xml.dom.core.SingleParentNodeList.__repr__(). I guess "-2" should be "-1"...

>
>   -Fred
>

.co.

+------------------------------------------------------- daisy bytes! --------+
 Carsten Oberscheid
 co@daisybytes.su.uunet.de                        digital document processing
 http://www.pweb.de/daisybytes.su                     electronic publishing