[XML-SIG] Open issues: Namespaces and Unicode

A.M. Kuchling akuchlin@cnri.reston.va.us
Wed, 16 Dec 1998 20:48:09 -0500


There are two major issues still unresolved at this point, from the
list assembled during the Developer's Day session at IPC7.  Other
things, like WDDX and all that, are more minor and not showstoppers.

     1) Unicode support.  

The wstring type was added in version 0.5 of the package, but it was
just added to the installation, not integrated with the XML parsers.
sgmlop and pyexpat are probably the only parsers that stand a chance
of handling 16-bit Unicode.  xmlproc relies on the re module, and
making re handle Unicode would be a big job, so users would have to
UTF-8 encode their data first.  

      From poking around inside Expat, it looks like it can handle
UTF-16, agreeing with a simple test with xmlwf; try running this test
program to generate a file named t.xml and then run it through xmlwf:

from xml.unicode import wstring
s=wstring.L("""<?xml version="1.0" encoding="UTF-16"?>
<thing>text</thing>""")
f = open('t.xml', 'w') ; f.write(s.utf16() ) ; f.close()

Amazingly, if the resulting file is then parsed by Python code using
pyexpat, the resulting UTF8 output is correct, even though the code
doesn't do anything special about Unicode at all.  I suspect that this
is only a coincidence, and won't work on a machine of different
endianness.  
	     
     Anyway, we should probably modify at least one of the parsers to
handle a wide string.  Pyexpat is probably the best candidate, since
the Unicode support is already there in Expat itself.  Does this seem
to be a reasonable course of action?  Any volunteers?

     2) Namespace support.  

We also wanted to arrive at some form of namespace support for the SAX
and DOM interfaces.  Unfortunately, no one responsible seems to be
defining what namespace support should look like in SAX and DOM.  The
plan for SAX might be to use a parser filter that implemented the
additional namespace processing; in a Nov. 13 xml-dev post David
Megginson supported this idea, and said he'd like to formalise the
idea of a SAX filter in SAX 1.0.1.  I'm not aware of any public info
about the changes, but have written Megginson asking about it.

      There also seems no sign of namespace support for the DOM,
though I've posted to the www-dom mailing list asking about it.  This
presents us with two options: ignore DOM namespaces completely for 1.0
and wait for some guidance from the working group; or add some utility
function or module to do it, knowing that it will probably be made
obsolete in the future.  (For example, there might be a
do_namespaces() function in xml.dom.utils that walked over a DOM tree
looking for xmlns:* attributes and decorated all the nodes with an
attribute containing the namespace URI, or a Node method that scanned
its ancestors looking for namespace declarations.)

    What do you think?

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
It is in this matter that I fall foul of so many American writers on writing;
they seem to think that writing is a confidence game by means of which the
author cajoles a restless, dull-witted, shallow audience into hearing his
point of view. Such an attitude is base, and can only beget base prose.
    -- Robertson Davies, "Elements of Style"