[XML-SIG] Changes in pyexpat.c

Wed, 27 Sep 2000 23:16:46 +0200

> Probably because the checked-in 4DOM is out of date.  We've
> hesitated checking in the 4Suite 0.9.x version because of all the
> flux and not wanting to contribute to the confusion (and not being
> sure whether we had much bandwidth to help sort out any resulting
> confusion).
>
> However, it's time to do the right thing, so...

Yes, I was going to ask whether PyXML could get a new copy of 4DOM...

> Do we check the latest 4DOM and back-port the output encoding stuff to PyXML 
> (it's all in ext/Printer.py)  

Sounds like a good plan to me.

> I haven't had a chance to play with Python 2.0, so I'm not sure how
> hard the port would be.  Here is the representative snippet from
> ext/Printer.py

It should not be too difficult to have this working on all Python
versions.

> from xml.unicode.iso8859 import wstring
> wstring.install_alias('ISO-8859-1', 'ISO_8859-1:1987')

try:
  import codecs #will fail on 1.5
  def utf8_to_code(string,encoding):
    encoder = codecs.lookup(encoding)[0]       # encode,decode,reader,writer
    return encoder(unicode(string,"utf-8"))[0] # result,size
except ImportError:
  def utf8_to_code(string,encoding):
    #raise exception?
    #support some trivial cases, e.g. latin1?
    #try wstrop?
    return string # silently return utf-8...

>         #Note: Pass through to wstrop.  This means we don't play nice and
>         #Escape characters that are not in the target encoding.
>         ws = wstring.from_utf8(new_string)
>         new_string = ws.encode(encoding)
>         #This version would skip all untranslatable chars: see wstrop.c
>         #new_string = ws.encode(encoding, 1)

          new_string = utf8_to_code(new_string,encoding)

Regards,
Martin