[XML-SIG] Strings or Unicode ?

M.-A. Lemburg mal@lemburg.com
Thu, 08 Nov 2001 15:36:59 +0100


I'm working on a fast XML scanner and was wondering what the
prefered method is for dealing with encodings and Unicode.
There are basically two options:

1. find out the XML encoding by looking at the header,
   decode the data into Unicode,
   run the parser over the Unicode string and let it
   generate Unicode tag names, attributes, etc.

2. run the parser over the raw string data and let
   it generate raw string tag names, attributes, etc.,
   convert the tag names, attributes, data etc. to Unicode
   on a case-by-case basis and only if needed

Both could be implemented from a single source file,
but I wonder whether it's worth providing option 2.

Thanks,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/