problem parsing utf-8 encoded xml - minidom
"Martin v. Löwis"
martin at v.loewis.de
Fri Jul 4 02:36:56 EDT 2008
> The parser is failing on this line:
>
> <mrcb245-c>Heinrich Kèufner, Norbert Nedopil, Heinz Schèoch (Hrsg.).</
> mrcb245-c>
If it is literally this line, it's no surprise: there must not be a line
break between the slash and the closing element name.
However, since you are getting the error in a different column, it's
indeed more likely that there is a problem with the encoding.
Given that the Python UTF-8 codec refuses the data, most likely, the
data is *not* encoded in UTF-8 (but perhaps in Latin-1). If so, you
need to prefix the XML document with a proper XML declaration, such
as
<?xml version="1.0" encoding="iso-8859-1"?>
Alternatively, make sure that the file is really encoded in UTF-8.
Regards,
Martin
More information about the Python-list
mailing list