[XML-SIG] SAX characters() output on multiple lines for non-ascii

Fred Drake fdrake at acm.org
Sun Feb 3 04:03:20 CET 2008


On Feb 2, 2008, at 6:04 PM, woodcock wrote:
> I am starting with SAX and am trying to parse a file that contains  
> non-ascii
> characters.  The xml file uses 'ISO-8859-1'.  When it parses text  
> containing
> non-ascii characters the output is across multiple lines.

This is a fundamental issue with the SAX interface (the interface  
doesn't mandate the splits, but states that they're allowed).  If you  
want something that buffers the text and provides it in larger chunks,  
that could be written as a proxy content handler.

It might be nice if one were provided out of the box, since this is a  
common request, but the basic issue is that some seriously huge  
amounts of data may be enclosed between non-text calls, and one of the  
advantages of SAX is that it doesn't require loading large portions of  
the document into memory if the application doesn't require it.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>






More information about the XML-SIG mailing list