[XML-SIG] PyExpat encoding (was: XML support in Python 1.6)

Lars Marius Garshol larsga@garshol.priv.no
02 Jun 2000 12:22:39 +0200


Here is my take on this:

 - the entire XML data model is based on Unicode and we should just
   accept that rather than try to work against it

 - since Python 1.6 supports Unicode directly we should exploit that;
   especially since mixing ordinary and Unicode strings seems to be
   painless (in other words: the fact that you get Unicode strings
   should be more or less invisible to you unless you actively care)

 - I can't imagine why anyone would want ordinary strings with UTF-8
   encoded text in them; but if someone can come up with a convincing
   use case we should support that as well

Conclusion:

 - if Python version is lower than 1.6, we should just do what we do
   today: return UTF-8 encoded normal strings

 - if not, return Unicode objects

 - I have no problems with adding a run-time configuration option to
   expat that allows users to say 'parser.set_return_unicode(0)'.

 - there should probably also be a 'parser.get_return_unicode()' so
   that applications can check what is going on

The real question is of course who will do the actual work of adding
this... :-)

--Lars M.