[XML-SIG] Re: Re: SAX encoding and special characters

Fredrik Lundh fredrik at pythonware.com
Sun Apr 18 04:14:05 EDT 2004


"Thomas" wrote:

> FL> </F>
> Yes, this was the 1st thing I tryed out. Unfortunately I got:
> Traceback (most recent call last):
>   File "./xmlparser_new.py", line 210, in ?
>     saxparser.parseString(document)
> AttributeError: ExpatParser instance has no attribute 'parseString'
>
> Do you have an idea how to fix it? (yes, I underestand that it's not
> supported by expat - unfortunately I don't have experience with it).

iirc, parse takes either a file name or a file object, so the following
might work:

    import StringIO
    ...
    saxparser.parse(StringIO.StringIO(document))

Python's SAX implementation also supports incremental parsing; I think
you should be able to simply do:

    saxparser.feed(document)
    saxparser.close()

:::

and yes, since you have to read the entire document into a string, you can
extract the encoding from that string.  here's a fairly robust RE that
should
do the trick:

    m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data)
    if m:
        encoding = m.group(1)

(a much better approach is to stick to a standard encoding in the output
files,
no matter what encoding the XML files use.  XML is unicode, and the XML
encoding shouldn't matter).

</F>






More information about the XML-SIG mailing list