xml input sanitizing method in standard lib?

Terry Reedy tjreedy at udel.edu
Mon Mar 9 16:52:36 EDT 2009


Petr Muller wrote:
> Hi,

> Thanks for response and sorry for I wasn't clear first time. I have a
> heap of data (logs), from which I build a XML document using
> xml.dom.minidom. In this data, some xml invalid characters may occur -
> form feed (\x0c) character is one example. 

Is this a hypothetical problem or do you actually have such chars?
If so, are they random hiccups from sick loggers, storage or 
transmission errors, or informative markers intentionally inserted?
When you find them, do you want to silently ignore, ignore but raise a 
flag (metalog the log error), or act on them as part of the 
parsing/structuring process?

Silently deleting chars is easy with str.translate.

> I don't know what else is illegal in xml, so I've searched if there's
> some method how to prepare strings for insertion to a xml doc before I
> start research on a xml spec and write such function on my own.




More information about the Python-list mailing list