[XML-SIG] DOM seems incomplete

Markus Jostock markus.jostock at softwareag.com
Tue Aug 10 14:17:55 CEST 2004


Hi

Thanks for the hint, but stripping whitespaces does not seem to help:

Trying to access a child node results in an exception since the child 
does not exist (i.e. it is of type 'None').

print doc.firstChild.firstChild.nodeName
causes an exception:
Traceback (most recent call last):
  File "TUsecaseCreateEmptyDoc.py", line 54, in test01
    print structure.firstChild.firstChild.nodeName
AttributeError: 'NoneType' object has no attribute 'nodeName'

Kind regards

    Markus

Alexandre CONRAD wrote:

> Markus Jostock wrote:
>
>> Hi
>>
>> I am parsing a string into a DOM. That works without problems. But 
>> when I want to access childen of the first element, there seem to be 
>> none. But pretty printing shows them.
>>
>> Maybe you have an idea what might be going wrong?
>>
>> Thanks in advance for some clues.
>>
>> Kind regards
>>    Markus
>>
>>
>> The string I parse:
>> string = '<MYXML><DOCUMENT><DOCAT INFO="" 
>> STATUS="PRV"><DOCAT.HEAD.LK><LINK DOC="!NEW!" 
>> /></DOCAT.HEAD.LK><RESAT.LK><LINK DOC="!NEW!" 
>> /></RESAT.LK></DOCAT></DOCUMENT></MYXML>'
>>
>> Parsing works without errors:
>>    from xml.dom.ext.reader import Sax2
>>    reader = Sax2.Reader()
>>    doc = reader.fromString(string)
>>
>> When I pretty print it, it looks ok:
>>    from xml.dom.ext import PrettyPrint
>>    PrettyPrint(doc)
>> prints:
>> <?xml version='1.0' encoding='UTF-8'?>
>> <MYXML>
>>    <DOCUMENT>
>>        <DOCAT INFO='' STATUS='PRV'>
>>            <DOCAT.HEAD.LK>
>>                <LINK DOC='!NEW!'/>
>>            </DOCAT.HEAD.LK>
>>            <RESAT.LK>
>>                <LINK DOC='!NEW!'/>
>>            </RESAT.LK>
>>        </DOCAT>
>>    </DOCUMENT>
>> </MYXML>
>>
>> Accessing doc.firstChild is ok:
>> print doc.firstChild.nodeName  prints MYXML
>>
>> But if a want to access further children of <MYXML>, there are none:
>> print doc.firstChild.nodeList prints <NodeList at c43968: []> or
>> print doc.firstChild.firstChild prints None
>>
>> Where are my children gone?
>
>
> Because you are PrettyPrint'ing it parses newlines and whitespaces
> (indentation) as text nodes. Try
>
> 'print doc.firstChild.firstChild.firstChild'. You should find your node
> (I think, maybe you'll have to add 1 more fistChild).
>
> In my case, I want to keep the xml file PrettyPrint'ed. So what I do is
> that I parse the PrettyPrint'ed file and strip out new lines and
> whitespaces before I do anything to it :
>
> def openDoc(self, xml_file):
>     # Create Reader object
>     reader = Sax2.Reader()
>     # Parse the document
>     doc = reader.fromStream(xml_file)
>     # Strip out white spaces from doc
>     xml.dom.ext.StripXml(doc)
>     return doc
>
> Now, I can play around with my 'doc' without worrying about whitespaces.
> When I write it back on disk, I pretty print it again :
>
> def write_xml(self, doc, xml_file):
>     # Open XML file in write mode
>     f = open(xml_file, "w")
>     # Write doc pretty printed to file
>     f.write(xml.dom.ext.PrettyPrint(doc, xml_file))
>     # Close file
>     f.close()
>
> Regards,




More information about the XML-SIG mailing list