[Expat-discuss] Re: Summary of Pull API thoughts

Tue Mar 25 11:54:53 EST 2003

Chris Cross writes:
 > However, we have a sticky requirement for our language support to be able
 > to switch dynamically between 1 byte ascii and 2 byte Unicode. By doing
 > this, our European languages require half the space of the Asian languages,
 > which is very important for our embedded customers who sqeal every kilobyte
 > we consume.
 > 
 > How hard would it be to change the character size from a build-time to a
 > run-time decision in expat?

Karl Waclawek writes:
 > It looks hard, since even the API itself is statically tied to the
 > definition of XML_Char.
 > 
 > However, you should be able to compile two libraries (XML_Char defined
 > as char or wchar_t), and dynamically load whichever you need at runtime,
 > and even switch between them. Why would that not work for you?

That sounds fairly tedious to me.

Recall that Expat tends to report data in fairly small chunks for
typical applications.  Even in plain text (PCDATA), Expat breaks data
at line boundaries.  If you want further control over the amount of
data reported in the character data callback, limit the amount of data
passed into Expat for any XML_Parse() or XML_ParseBuffer() call.

This can be used to an application's advantage, especially if there's
concern for the amount of memory being consumed.  Compile Expat with
the appropriate output encoding for your primary audience, and then
re-encode if necessary in the application logic.  This should be easy
to implement and allows support for output encodings other than UTF-8
or UTF-16.

  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation