[XML-SIG] outputting non-ascii strings

Martin v. Loewis martin@v.loewis.de
23 May 2002 08:44:50 +0200


Matt Patterson <matt@reprocessed.org> writes:

> The other thing that's cropped up now - using Juergen's suggestion to
> use stream.write(ustring.encode('utf-8')), which works a treat, decodes
> all the entities in the text, so I now have free-floating ampersands and
> angle brackets, where before I had entities. I do have typographer's
> quotes still :-)
> 
> Is there an easy way around this problem? I've looked through my Python
> books (Learning Python, Programming Python, Python and XML) and can't
> find a comprehensive treatment of this issue - if there is one I'd like
> to know, please! Is there a good place to go and look for such
> documentation?

I'm not sure what issue you are referrring to, here?

Are you saying that,

- when using XML library functions to generate XML, it will produce
  ampersands and angle brackets which are not markup? That would be a
  bug; please report details.

- when using your own custom XML generating functions, you see such
  things? You may use xml.sax.saxutils.escape to replace them with
  the built-in entity references.

- with your custom writeback routines, you see more "literal"
  characters in the output text than you originally had in the input
  document, and you want them all back, exactly where they used to be?
  This is not possible. You have the option of searching the output
  strings yourself, and either writing character references or entity
  references where appropriate.

Regards,
Martin