[Expat-discuss] Expat, unicode, encodings and bad output

Vidar T. Fauske xjill.iv at gmail.com
Tue Apr 12 22:57:54 CEST 2005


Hi all! New to the list (though I have looked in the archives) =)

I've been using expat for a short while, taking input from data read
by WinHTTP ( <http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winhttp/http/winhttp_start_page.asp>
). I'm now trying to get it to work with Unicode, but having a real
hard time at it.

My current test document is in iso-8859-1 encoding, it's read into a
buffer retrieved from XML_GetBuffer() by WinHTTP, and then
XML_ParseBuffer() is called with the bytes downloaded argument from
the WinHTTP call (the buffer is readable, and correct, when debugging
in VS.NET).

Now however comes the problem: When one of the handlers (start/end tag
and character data) is called, the XML_Char strings (element, data)
just seems like garbage, both in VS.NET, and when outputted (to
wstring then TextOut, or directly to an console taking wide chars).
Sometimes I see weired characters here (Asian-style looking), and
sometimes just ??? or blocks.

I got UNICODE, and XML_UNICODE defined, and I've tried with
XML_UNICODE_WCHAR_T to see if that makes any difference (it changes
the strings somehow, but it still look like garbage).

Anyone see what can be wrong? I can paste the code if no one sees
anything apparent.


  - Vidar


More information about the Expat-discuss mailing list