SOT : & in XML-documents
Thomas Weholt
2002 at weholt.org
Tue Oct 8 15:49:13 EDT 2002
Damn!! The original data is filled with it. My solution so far has been to
keep a list ( limited to '&' so far ) of characters to replace ( '&' is
replaced with 'and' etc. ).
Are there any other characters I must avoid/replace?
Thanks for your help.
Best regards,
Thomas
"Henrik Motakef" <henrik.motakef at web.de> wrote in message
news:87bs64903u.fsf at pokey.henrik-motakef.de...
> "Thomas Weholt" <2002 at weholt.org> writes:
>
> > I'm trying to parse an old fileformat into xml. The problem is that the
> > character & appears from time to time in the original file.
> [...]
> > Anybody got any clues on how to avoid problems with characters like
this?
>
> Don't use them ;-) Or, better, proberly escape them as &. This is
> not an issue of the charset, so no XML declaration will save you.
>
> If you are dealing with HTML, you could use tidy (google will find it
> for you) to create well-formed XML. IIRC there is also a shareware
> program that tries to clean up broken XML regardless of it's document
> type, probably called "XML tidy" or some such.
>
> Good luck
> Henrik
More information about the Python-list
mailing list