xml.dom.minidom.parse() splitting text nodes?

Gillou nospam at bigfoot.com
Fri Jan 17 11:55:31 EST 2003


Looking at the methods of the DOM Node (copy/paste following) in the python doc.
  normalize() 
  Join adjacent text nodes so that all stretches of text are stored as single Text instances. This simplifies processing text from a DOM tree for many applications. New in version 2.1. 
A DOM Document inherits from a Node so...

HTH

--Gilles

<hawkeye.parker at autodesk.com> a écrit dans le message de news: mailman.1042760131.22057.python-list at python.org...
  i'm running into an odd issue parsing large xml files.  it appears that minidom is arbitrarily splitting some TEXT_NODEs into pieces.  for example, the file in question contains a number of these tags:

  <C:Footer>This space provided for legal clarification of contract issues as defined by the project participants prior to project initiation. The content herein is determined withing the General Tab of the Log Properties dialogue box</C:Footer>

  the parser correctly parses the C:Footer tag into a dom element, but for some reason *periodically* splits the child node into *two* text nodes.  i can find no ryhme or reason to the splitting, though it is consistent for a given file; i.e., it always splits the same nodes in the same place.

  has anyone else run across this issue?  can you explain it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20030117/3803cc4c/attachment.html>


More information about the Python-list mailing list