[Expat-discuss] Example of using Expat with Unicode?
warren henning
w_k_henning at hotmail.com
Fri Jun 20 08:24:09 EDT 2003
Hi,
This email is pretty long, so if you don't feel like reading here's what I'm
really basically asking: I am humbly requesting is a way to read in XML in
Expat such that I can handle Unicode stuff, using XML_Char's throughout the
program rather than char's.
I downloaded Expat from Jclark.com and have been trying to figure out how to
use it. I looked at the sample program, "elements.c" and basically figured
out what to do based on that. However, that program prints out the character
data for some XML files (the one that made me notice this is located at
http://www.aaronsw.com/weblog/index.xml - what appears as "10% think its
advantageous to be a woman in American society" in Internet Explorer's XML
viewer appears as "10% think itâs advantageous to be a woman in American
society".) The file was being read from disk using standard C file i/o
functions - all I did was open up a FILE* structure with a call to fopen()
and replaced "fread(buf, 1, sizeof(buf), stdin)" with "fread(buf, 1,
sizeof(buf), my_file_structure". This worked fine except for this Unicode
stuff that my normal functions seemed to read in incorrectly.
(I am using Windows and Visual C++ 6.0, by the way.)
Now I figured I should modify my file I/O and string functions to use the
Unicode equivalents of the ones I was using before -- e.g., _wfopen instead
of fopen. My element handles would take XML_Char's as parameters instead of
char's. I had quite a bit of other stuff in the code that I thought could go
wrong so I figured I'd make a little separate test program to make sure I
understood how to do Unicode file I/O. This was the test program:
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
int main(int argc, char** argv)
{
wchar_t test[5];
size_t len;
FILE* fp;
if((fp = _wfopen(L"xml_test.xml",L"r")) == NULL)
{
wprintf(L"_wfopen failed.\n");
return(0);
}
len = fread(test, sizeof(wchar_t), 5, fp);
wprintf(L"%d\n",len);
/*if(fgetws(test, BUFSIZ, fp) == NULL)
{
printf("fgetws\n");
fclose(fp);
return 0;
}*/
wprintf(L"%s\n",test);
fclose(fp);
return 0;
}
xml_test.xml is the file containing just the part that IE showed as "it's".
That file can be viewed at http://wkh.daysbackwhen.com/xml_test.xml
You see that I have an fread() similar to what I thought I would be using in
my modified, Unicode-using program -- I thought I would be reading in blocks
of wchar_t's instead of char's.
Well that produced no output when run except the printing out of the len
variable which worked as expected -- so I tried using fgetws as an
alternative method (after commenting out the fread and print len lines) --
that part is currently commented out in the lines above. Doing so crashed
after it printed out:
2
itΓÇÖs
The 2 part was expected. Crashing after apparently trying to print Unicode
as ASCII was not.
So I'm basically out of ideas. I tried downloading the latest version for
Windows -- 1.95.6 -- thinking maybe I should just forget about Unicode file
I/O and just try modifying the elements.c program and seeing what would
happen but couldn't get the samples for the new version to build in Visual
C++ -- I get the following error:
LINK : fatal error LNK1181: cannot open input file "libexpatMT.lib"
So what I am humbly requesting is a way to read in XML in Expat such that I
can handle Unicode stuff like the aaronsw.com feed. Just basically a
modified version of the sample program. If possible I would like to also see
how this would be done for parsing an XML file on disk.
I have done my best to figure this out. Any help is much appreciated!
Sincerely,
Warren
_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE*
http://join.msn.com/?page=features/junkmail
More information about the Expat-discuss
mailing list