[Expat-discuss] Example of using Expat with Unicode?

warren henning w_k_henning at hotmail.com
Fri Jun 20 08:24:09 EDT 2003


Hi,

This email is pretty long, so if you don't feel like reading here's what I'm 
really basically asking: I am humbly requesting is a way to read in XML in 
Expat such that I can handle Unicode stuff, using XML_Char's throughout the 
program rather than char's.

I downloaded Expat from Jclark.com and have been trying to figure out how to 
use it. I looked at the sample program, "elements.c" and basically figured 
out what to do based on that. However, that program prints out the character 
data for some XML files (the one that made me notice this is located at 
http://www.aaronsw.com/weblog/index.xml - what appears as "10% think it’s 
advantageous to be a woman in American society" in Internet Explorer's XML 
viewer appears as "10% think it’s advantageous to be a woman in American 
society".) The file was being read from disk using standard C file i/o 
functions - all I did was open up a FILE* structure with a call to fopen() 
and replaced "fread(buf, 1, sizeof(buf), stdin)" with "fread(buf, 1, 
sizeof(buf), my_file_structure". This worked fine except for this Unicode 
stuff that my normal functions seemed to read in incorrectly.

(I am using Windows and Visual C++ 6.0, by the way.)

Now I figured I should modify my file I/O and string functions to use the 
Unicode equivalents of the ones I was using before -- e.g., _wfopen instead 
of fopen. My element handles would take XML_Char's as parameters instead of 
char's. I had quite a bit of other stuff in the code that I thought could go 
wrong so I figured I'd make a little separate test program to make sure I 
understood how to do Unicode file I/O. This was the test program:

#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>


int main(int argc, char** argv)
{
	wchar_t test[5];
	size_t len;
	FILE* fp;

	if((fp = _wfopen(L"xml_test.xml",L"r")) == NULL)
	{
		wprintf(L"_wfopen failed.\n");
		return(0);
	}
	len = fread(test, sizeof(wchar_t), 5, fp);
	wprintf(L"%d\n",len);
	/*if(fgetws(test, BUFSIZ, fp) == NULL)
	{
		printf("fgetws\n");
		fclose(fp);
		return 0;
	}*/
	wprintf(L"%s\n",test);
	fclose(fp);
	return 0;
}

xml_test.xml is the file containing just the part that IE showed as "it's". 
That file can be viewed at http://wkh.daysbackwhen.com/xml_test.xml

You see that I have an fread() similar to what I thought I would be using in 
my modified, Unicode-using program -- I thought I would be reading in blocks 
of wchar_t's instead of char's.

Well that produced no output when run except the printing out of the len 
variable which worked as expected -- so I tried using fgetws as an 
alternative method (after commenting out the fread and print len lines) -- 
that part is currently commented out in the lines above. Doing so crashed 
after it printed out:

2
it&#915;ÇÖs

The 2 part was expected. Crashing  after apparently trying to print Unicode 
as ASCII was not.

So I'm basically out of ideas. I tried downloading the latest version for 
Windows -- 1.95.6 -- thinking maybe I should just forget about Unicode file 
I/O and just try modifying the elements.c program and seeing what would 
happen but couldn't get the samples for the new version to build in Visual 
C++ -- I get the following error:

LINK : fatal error LNK1181: cannot open input file "libexpatMT.lib"

So what I am humbly requesting is a way to read in XML in Expat such that I 
can handle Unicode stuff like the aaronsw.com feed. Just basically a 
modified version of the sample program. If possible I would like to also see 
how this would be done for parsing an XML file on disk.

I have done my best to figure this out. Any help is much appreciated!

Sincerely,
Warren

_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail




More information about the Expat-discuss mailing list