[Expat-discuss] Split UTF-8 sequence possible?
Jeff Garbers
jgarbers at xltsoftware.com
Mon Nov 10 11:13:37 EST 2003
Having just overcome the newbie problem of not realizing that expat
feeds UTF-8 sequences to my handlers, I'm now wondering if
expat ever splits a multi-byte UTF-8 sequence across two calls to my
character handler callback.
For example, say there's a non-ASCII accented character
in its input character data (however it may have been encoded).
expat will want to send me a two-byte UTF-8 sequence. If there's
only one byte left in the output buffer, will it (1) call my character
data
callback with the buffer one short of capacity, and save the two-byte
sequence for the next callback, or (2) put the first of the two UTF-8
bytes in the buffer, call my callback, and then put the second at the
start of the buffer for the NEXT callback?
I'm really hoping #1. Can anybody confirm this?
Thanks -- Jeff Garbers
More information about the Expat-discuss
mailing list