SAX-Parser entity
Bernhard Fisseni
bfisseni at gmx.de
Fri Mar 1 15:04:46 EST 2002
Hi, Fabian,
>
> Ahh, utf-16 sounds good.
> Thanks, I have at least one solution:
> Reading the xml-file into a buffer and convert it to utf-16.
> minidom is then able to parse the whole thing and saves the strings in
> unicode, which is fine again.
>
> I do not understand the part with "If your parser supports...". As it seems
> to me, the minidom default parser does not support ISO 8859/1 and even
> unicode makes problems only.
had you declared <?xml version="1.0" encoding="UTF-8"?> ??
> I didn't know, you can reprogram the parser so easily.
When I wrote a programme using SAX, I finally used
xml.sax.saxutils.escape()
def characters(self, content):
self._out.write(saxutils.escape(content, self.transhash))
where content means the characters and self.transhash looked like:
transhash = { u'\u00c4' : 'Ä',
u'\u00c6' : 'Æ',
}
I suppose, this is not the most elegant way to do it, but it works.
Regards,
Bernhard
> Harvey Thomas <hst at empolis.co.uk> wrote:
>> I would guess that your document is in ISO 8859/1 (otherwise known as
>> latin-1). XML parsers must be able to parse utf-8 and utf-16 and may
>> support other encodings. If your parser supports latin-1 then modify the
>> XML declaration. Otherwise use the codecs module.
>
>>> Problem:
>>> I try to use the minidom XML-Parser to parse my little file
>>> in order to generate HTML Code.
>>> Being german, I really like to use Umlauts but minidom does not.
>>> ...
>>> Traceback (most recent call last):
>>> "/usr/lib/python2.0/site-packages/_xmlplus/sax/handler.py",
>>> line 38, in fatalError
>>> raise exception
>>> xml.sax._exceptions.SAXParseException: <unknown>:29:19: not well-formed
>>>
>>> where Character 19 in Row 29 is the occurence of an ü.
--
Bernhard Fisseni
Studi: Steinweg 32 -- 53121 Bonn-Endenich, D -- +49-228-6203949
zu hause: Ubierstrasse 8 -- 53498 Bad Breisig, D -- +49-2633-96333
More information about the Python-list
mailing list