SAX-Parser entity

Bernhard Fisseni bfisseni at gmx.de
Fri Mar 1 15:04:46 EST 2002


Hi, Fabian,
> 
> Ahh, utf-16 sounds good.
> Thanks, I have at least one solution:
> Reading the xml-file into a buffer and convert it to utf-16.
> minidom is then able to parse the whole thing and saves the strings in
> unicode, which is fine again.
> 
> I do not understand the part with "If your parser supports...". As it seems
> to me, the minidom default parser does not support ISO 8859/1 and even
> unicode makes problems only.
had you declared <?xml version="1.0" encoding="UTF-8"?> ??

> I didn't know, you can reprogram the parser so easily.
When I wrote a programme using SAX, I finally used
xml.sax.saxutils.escape()

def characters(self, content):
    self._out.write(saxutils.escape(content, self.transhash))

where content means the characters and self.transhash looked like:

transhash = { u'\u00c4' : 'Ä',
              u'\u00c6' : 'Æ',
	    }

I suppose, this is not the most elegant way to do it, but it works.

Regards,
Bernhard

> Harvey Thomas <hst at empolis.co.uk> wrote:
>> I would guess that your document is in ISO 8859/1 (otherwise known as
>> latin-1). XML parsers must be able to parse utf-8 and utf-16 and may
>> support other encodings. If your parser supports latin-1 then modify the
>> XML declaration. Otherwise use the codecs module.
> 
>>> Problem:
>>> I try to use the minidom XML-Parser to parse my little file 
>>> in order to generate HTML Code.
>>> Being german, I really like to use Umlauts but minidom does not.
>>> ...
>>> Traceback (most recent call last):
>>> "/usr/lib/python2.0/site-packages/_xmlplus/sax/handler.py", 
>>> line 38, in fatalError
>>>     raise exception
>>> xml.sax._exceptions.SAXParseException: <unknown>:29:19: not  well-formed
>>> 
>>> where Character 19 in Row 29 is the occurence of an ü.


-- 
Bernhard Fisseni
Studi:    Steinweg 32  --  53121 Bonn-Endenich, D  --  +49-228-6203949
zu hause: Ubierstrasse 8  --  53498 Bad Breisig, D  --  +49-2633-96333



More information about the Python-list mailing list