From karl at waclawek.net Fri Sep 1 15:29:12 2006 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 01 Sep 2006 09:29:12 -0400 Subject: [Expat-discuss] Linking In-Reply-To: References: Message-ID: <44F835A8.1090803@waclawek.net> Tom Younger wrote: > Hi there. > > I have a question about linking. > > I would like to link the Expat library statically. When I link to the > Linux libraries, this happens properly, however when I link my project > under Windows, I can only get the program to work if I include the path to > the libexpat.dll in my system path. > > Even though I include StaticLibs\libexpatMT.lib in my link command, without > Libs\libexpat.lib, it can't resolve some symbols. It appears as though > libexpat.lib loads the DLL at run-time. For static linking you need to define a specific symbol. If I remember correctly it is XML_STATIC. Karl From mkanaga at gmail.com Thu Sep 7 23:05:20 2006 From: mkanaga at gmail.com (m k) Date: Thu, 7 Sep 2006 14:05:20 -0700 Subject: [Expat-discuss] Expat, XML-Parser make test fails on AIX 5.3 Message-ID: <55b890c10609071405k1afc99b3k5e58ef7f42d33801@mail.gmail.com> Greetings! IBM Provides a pre-built Perl on AIX 5.3. They also provide a script to toggle the perl to run in either in Perl 32 or Perl 64 bit. 1. In Perl 32 mode: I am able to compile Expat-1.95.7, then XML-Parser 2.34 (Perl Module) & able to run the make test for XML-Parser with out any problem. In this mode, we didn't see any problems. However, 2. In Perl 64 bit mode: I installed same version of Expat & XML-Parser & able to make every thing. But, when I run make test for XML-Parser, I get the following errors with Expat.so ()... 3. I got the same errors with downloaded Perl 5.8.8, compiled on this server using IBM Visual Age C/C++, Expat 2.0.0 & XML-Parser 2.34 Any pointers or help would be very much appreciated. Regards, -Murali root at sfobench01(./XML-Parser-2.34)make test make[1]: Entering directory `/chroot/tmp/perl/XML-Parser-2.34/Expat' make[1]: Leaving directory `/chroot/tmp/perl/XML-Parser-2.34/Expat' PERL_DL_NONLAZY=1 /bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/astress.........Can't load '/chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so' for module XML::Parser::Expat: rtld: 0712-001 Symbol XML_Parse was referenced from module /chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol XML_SetNamespaceDeclHandler was referenced from module /chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol XML_SetElementHandler was referenced from module /chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol XML_SetUnknownEncodingHandler was referenced from module /chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(), but a runtime definition of the symbol was not found. rtld: 0712-001 Symbol XML_SetEndCdataSectionHandler was referenced from module /chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(), but a runtime definition of the symbol was not found. root at sfobench01(./XML-Parser-2.34)perl -V Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=aix, osvers=5.3.0.4, archname=aix-64all uname='aix sfobench01 3 5 000aba68d600 ' config_args='-Duse64bitall' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=-1 -qnoansialias -DUSE_NATIVE_DLOPEN -I/usr/local/include -q64 -DUSE_64_BIT_ALL -q64', optimize='-O', cppflags='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=-1 -qnoansialias -DUSE_NATIVE_DLOPEN -I/usr/local/include' ccversion='7.0.0.0', gccversion='', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=87654321 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='ld', ldflags ='-brtl -bdynamic -bmaxdata:0x80000000 -L/usr/local/lib -q64 -b64' libpth=/usr/local/lib /lib /usr/lib /usr/ccs/lib libs=-lbind -lnsl -ldbm -ldl -lld -lm -lcrypt -lc -lbsd perllibs=-lbind -lnsl -ldl -lld -lm -lcrypt -lc -lbsd libc=, so=a, useshrplib=false, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_aix.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -bE:/chroot/perl-5.8.8/lib/5.8.8/aix-64all/CORE/perl.exp' cccdlflags=' ', lddlflags='-b64 -bhalt:4 -bexpall -G -bnoentry -lc -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: PERL_MALLOC_WRAP USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES USE_PERLIO Built under aix Compiled at Sep 6 2006 15:26:55 @INC: /chroot/perl-5.8.8/lib/5.8.8/aix-64all /chroot/perl-5.8.8/lib/5.8.8 /chroot/perl-5.8.8/lib/site_perl/5.8.8/aix-64all /chroot/perl-5.8.8/lib/site_perl/5.8.8 /chroot/perl-5.8.8/lib/site_perl . root at sfobench01(./XML-Parser-2.34) From chenming442 at gmail.com Fri Sep 8 13:42:44 2006 From: chenming442 at gmail.com (Chen Ming) Date: Fri, 8 Sep 2006 19:42:44 +0800 Subject: [Expat-discuss] how XML_Parse work with processing instruction? Message-ID: Hi everyone, I try the following code both in VC7 and Dev-CPP. But XML_Parse function seems can't work properly with processing instruction like In the following code,the first time XML_Parse is called for string " #include #include #include #include using namespace std; int Count; const int BUFFERSIZE = 256; void XMLCALL start(void *data, const char *el, const char **attr) { const char* tag = "tag1"; if (strcmp(el, tag) == 0) Count++; } /* End of start handler */ void XMLCALL end(void *data, const char *el) { const char* end = "tag"; } /* End of end handler */ int main(int argc, char *argv[]) { getchar(); XML_Parser p = XML_ParserCreate(NULL); if (p == NULL) { cout << "Parser create failed!" <> buffer) { int length = strlen(buffer); done = file.eof(); XML_Status status = XML_Parse(p, buffer, length, done); if (status == XML_STATUS_ERROR) { string error = XML_ErrorString(XML_GetErrorCode(p)); cout << "parse error: " << error << endl; system("PAUSE"); return EXIT_SUCCESS; } if (done) break; } cout << "There are " << Count << " ." << endl; system("PAUSE"); return EXIT_SUCCESS; } From kumar_qnx at yahoo.com Fri Sep 8 23:09:54 2006 From: kumar_qnx at yahoo.com (kumar qnx) Date: Fri, 8 Sep 2006 14:09:54 -0700 (PDT) Subject: [Expat-discuss] Help using Libexpat In-Reply-To: <44F731E9.20809@waclawek.net> Message-ID: <20060908210954.43854.qmail@web55003.mail.re4.yahoo.com> Hi, I would like to know if there is any easy method in keeping correspondece between the element name and the data contained within those elements, i.e for example i would like to know , data that the data is from the element name. Any help is appreciated. regards, Pavan. --- Karl Waclawek wrote: > R?gis St-Gelais (Laubrass) wrote: > > Expat is an XML parser. > > It only read XML files. > > > > I simply create my XML files using the good old > fprintf function. > > > > > There is also genx, a C-library written by Tim Bray: > > http://www.tbray.org/ongoing/When/200x/2004/02/20/GenxStatus > > Karl > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From tebaugh at gmail.com Tue Sep 19 00:45:18 2006 From: tebaugh at gmail.com (Terry Ebaugh) Date: Mon, 18 Sep 2006 18:45:18 -0400 Subject: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to resolve Message-ID: Hi, I've just started working with expat. I have xml files that are gzipped. I gzcat them and pipe them to my parser. I am getting the XML_ERROR_JUNK_AFTER_DOC_ELEMENT error message and I'm unsure how to resolve. I was under the impression that it is caused by extracter character after a document root close tag. I tried stripping the chars after that close tag but that doesnt seem to work. Is this caused by a new document starting immediately after the first one has finished? Does anyone have any suggestions? Here is the error message and what was in the buffer: Parse error:file:1:row:4:column:0:reason:junk after document element BUFFER = nter> References: Message-ID: Terry: The rules for XML are very clear... only one document per file. eXpat is very specific about enforcing the correctness of XML files. You are trying to process multiple documents in one file (STDIN.) You'd need to write some sort of a filter to create a new parser for the start of each document. If you could guarantee that every XML file would start with the option tag then you would have a basis for your filter. Otherwise it might be much easier to extract your zip file into multiple files in a temporary directory, and then clean up afterward, although that would have issues with keeping files in the same order as the zip file unless you have then named them in the order they would be processed from the file system. Of course, if you're clever, you might be able to look for the error and know that its not a "real" error then ignore it and start a new parser at the correct place inside your buffer. Good luck on your project... Nick On 9/18/06, Terry Ebaugh wrote: > I've just started working with expat. I have xml files that are gzipped. I > gzcat them and pipe them to my parser. > I am getting the XML_ERROR_JUNK_AFTER_DOC_ELEMENT error message and I'm > unsure how to resolve. I was under the impression that it > is caused by extracter character after a document root close tag. I tried > stripping the chars after that close tag but that doesnt seem to work. Is > this caused by a new document starting immediately after the first one has > finished? > > Does anyone have any suggestions? > > Here is the error message and what was in the buffer: > > Parse error:file:1:row:4:column:0:reason:junk after document element > BUFFER = nter> > > > > > > My main loop where I read stdin and call the parser is below: > > /***********************************************************************/ > /* Read stdin */ > /***********************************************************************/ > for (;;) { > len = (int)fread(buff, 1, BUFFSIZE-1, stdin); > if (ferror(stdin)) { > fprintf(stderr,"Error reading stdin\n"); > exit(-2); > } > done = feof(stdin); > //if nothing read then exit so AI doesnt blow up > if ((len == 0) && (done) && (cur_file_num==0)) > break; > > if(XML_Parse(p, buff, strlen(buff), done) == XML_STATUS_ERROR){ > fprintf(stderr, "\nParse error at > host:%s:file:%d:row:%d:column:%d:reason:%s\n", > host, cur_file_num, XML_GetCurrentLineNumber(p), > XML_GetCurrentColumnNumber(p), > XML_ErrorString(XML_GetErrorCode(p))); > fprintf(stderr,"BUFFER = %s\n",buff); > exit(-3); > } > if(done) > break; > } > > /* Free memory used by the parser */ > if(p) { > XML_ParserFree(p); > } > return 0; > } From tebaugh at gmail.com Tue Sep 19 07:24:32 2006 From: tebaugh at gmail.com (Terry Ebaugh) Date: Tue, 19 Sep 2006 01:24:32 -0400 Subject: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to resolve In-Reply-To: <7C83A8A6B56D3A478333B1DF47E185863A274C@MPBABGEX01.corp.mphasis.com> Message-ID: <001201c6dbab$e4680270$6a01a8c0@terry> Nick pinpointed my problem. I didn't have any trouble parsing the files individually it only happened when I tried to pipe multiple documents to the parser via stdin. So now I'll either set up a filter or ignore the error message and be clever. Thanks for the help! Terry ------------------------------------------------- Date: Monday, Sep 18th,2006 C:\pet C:\pet\cat C:\pet\cat\ignore\human _____ From: Mukesh S [mailto:Mukesh.S at mphasis.com] Sent: Tuesday, September 19, 2006 1:08 AM To: Terry Ebaugh; expat-discuss at libexpat.org Subject: RE: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to resolve Hi, Have you tried with individual files likes instead of reading all the files, I suggest you to go for baby steps. Like Step1 ) read the first xml file, and close it ,and immediate within you function open another file,that make sense. Step2) the next will be yours, check for the proper tag, if you are sure that all the xml files you have same format expect the data is different then it will works. My Small web-page: http://www.geocities.com/muki_champs Regards, Mukesh Srivastav, Sr.Software Engineer. India, Bangalore. +91-9980142921 (M) -----Original Message----- From: expat-discuss-bounces at libexpat.org [mailto:expat-discuss-bounces at libexpat.org] On Behalf Of Terry Ebaugh Sent: Tuesday, September 19, 2006 4:15 AM To: expat-discuss at libexpat.org Subject: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to resolve Hi, I've just started working with expat. I have xml files that are gzipped. I gzcat them and pipe them to my parser. I am getting the XML_ERROR_JUNK_AFTER_DOC_ELEMENT error message and I'm unsure how to resolve. I was under the impression that it is caused by extracter character after a document root close tag. I tried stripping the chars after that close tag but that doesnt seem to work. Is this caused by a new document starting immediately after the first one has finished? Does anyone have any suggestions? Here is the error message and what was in the buffer: Parse error:file:1:row:4:column:0:reason:junk after document element BUFFER = nter> Message-ID: <000901c6df49$5e168a60$6401a8c0@crankshaft> Hi, I'm looking at integrating expat into my TCP server and I'd like to use avoid the extra copy by utilizing XML_GetBuffer and XML_ParseBuffer. So I'll use the buffer returned from XML_GetBuffer as my TCP/IP receive buffer, passing the buffer to XML_ParseBuffer when some data is received. So as the XML stream is received and processed, I'll need to know how much of the XML buffer is unused and where the unused index begins so that I can receiev data into the buffer without overwriting XML bytes left over from the prior TCP read. So...are there any functions in the expat API that were intended to support this kind of design? Thanks, ---James From jameswhetstone at comcast.net Sat Sep 23 22:04:17 2006 From: jameswhetstone at comcast.net (James Whetstone) Date: Sat, 23 Sep 2006 13:04:17 -0700 Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer References: <000901c6df49$5e168a60$6401a8c0@crankshaft> Message-ID: <001001c6df4b$73adf780$6401a8c0@crankshaft> Another question along the same lines is whether or not I even need to worry about overwriting left over data in the XML Buffer. I assumed there would sometimes be some left over data of a XML fragment in the buffer, but maybe that isn't the case. JW ----- Original Message ----- From: "James Whetstone" To: Sent: Saturday, September 23, 2006 12:49 PM Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer > Hi, > > I'm looking at integrating expat into my TCP server and I'd like to use > avoid the extra copy by utilizing XML_GetBuffer and XML_ParseBuffer. So > I'll use the buffer returned from XML_GetBuffer as my TCP/IP receive > buffer, > passing the buffer to XML_ParseBuffer when some data is received. So as > the > XML stream is received and processed, I'll need to know how much of the > XML > buffer is unused and where the unused index begins so that I can receiev > data into the buffer without overwriting XML bytes left over from the > prior > TCP read. So...are there any functions in the expat API that were > intended > to support this kind of design? > > Thanks, > ---James > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss From karl at waclawek.net Sat Sep 23 22:29:35 2006 From: karl at waclawek.net (Karl Waclawek) Date: Sat, 23 Sep 2006 16:29:35 -0400 Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer In-Reply-To: <001001c6df4b$73adf780$6401a8c0@crankshaft> References: <000901c6df49$5e168a60$6401a8c0@crankshaft> <001001c6df4b$73adf780$6401a8c0@crankshaft> Message-ID: <4515992F.3060609@waclawek.net> James Whetstone wrote: > Another question along the same lines is whether or not I even need to worry > about overwriting left over data in the XML Buffer. I assumed there would > sometimes be some left over data of a XML fragment in the buffer, but maybe > that isn't the case. > > If I remember correctly, Expat buffers any unused fragments. So you should not have to worry. Karl From jameswhetstone at comcast.net Sat Sep 23 23:58:23 2006 From: jameswhetstone at comcast.net (James Whetstone) Date: Sat, 23 Sep 2006 14:58:23 -0700 Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer References: <000901c6df49$5e168a60$6401a8c0@crankshaft> <001001c6df4b$73adf780$6401a8c0@crankshaft> <4515992F.3060609@waclawek.net> Message-ID: <001901c6df5b$641edcc0$6401a8c0@crankshaft> So I stepped through the code to see what happens to unused fragments, and it leaves the fragments in the buffer. From what I can tell, instead of moving the offset of the input buffer, XML_GetBuffer is intended to be called each time new input is to be accepted. It then allocates a new (larger) buffer, memcpys the fragment from the old buffer to the new buffer and then frees the old buffer. I'd like to avoid this by simple moving the input buffer's offset to the a end of the fragment and NOT calling XML_GetBuffer to avoid the extran memory allocation. Any suggestions? ---James ----- Original Message ----- From: "Karl Waclawek" To: "James Whetstone" Cc: Sent: Saturday, September 23, 2006 1:29 PM Subject: Re: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer > James Whetstone wrote: >> Another question along the same lines is whether or not I even need to >> worry about overwriting left over data in the XML Buffer. I assumed >> there would sometimes be some left over data of a XML fragment in the >> buffer, but maybe that isn't the case. >> >> > If I remember correctly, Expat buffers any unused fragments. So you should > not have to worry. > > Karl From jameswhetstone at comcast.net Sun Sep 24 00:38:00 2006 From: jameswhetstone at comcast.net (James Whetstone) Date: Sat, 23 Sep 2006 15:38:00 -0700 Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer References: <000901c6df49$5e168a60$6401a8c0@crankshaft><001001c6df4b$73adf780$6401a8c0@crankshaft><4515992F.3060609@waclawek.net> <001901c6df5b$641edcc0$6401a8c0@crankshaft> Message-ID: <002601c6df60$ed0dd900$6401a8c0@crankshaft> So I found the easiest and maybe the best way to prevent additional memory allocations is to initially create a buffer that is double the size of the TCP input buffer. For example, I create a buffer using XML_GetBuffer(parser, 8192) and then code the rest of the program as if the buffer is 4096 bytes. So subsequent calls to XML_GetBuffer are called with a buffer size of 4096. ---James ----- Original Message ----- From: "James Whetstone" To: "Karl Waclawek" Cc: Sent: Saturday, September 23, 2006 2:58 PM Subject: Re: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer > So I stepped through the code to see what happens to unused fragments, and > it leaves the fragments in the buffer. From what I can tell, instead of > moving the offset of the input buffer, XML_GetBuffer is intended to be > called each time new input is to be accepted. It then allocates a new > (larger) buffer, memcpys the fragment from the old buffer to the new > buffer > and then frees the old buffer. I'd like to avoid this by simple moving > the > input buffer's offset to the a end of the fragment and NOT calling > XML_GetBuffer to avoid the extran memory allocation. Any suggestions? > > ---James > > ----- Original Message ----- > From: "Karl Waclawek" > To: "James Whetstone" > Cc: > Sent: Saturday, September 23, 2006 1:29 PM > Subject: Re: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer > > >> James Whetstone wrote: >>> Another question along the same lines is whether or not I even need to >>> worry about overwriting left over data in the XML Buffer. I assumed >>> there would sometimes be some left over data of a XML fragment in the >>> buffer, but maybe that isn't the case. >>> >>> >> If I remember correctly, Expat buffers any unused fragments. So you >> should >> not have to worry. >> >> Karl > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss From karl at waclawek.net Sun Sep 24 06:38:07 2006 From: karl at waclawek.net (Karl Waclawek) Date: Sun, 24 Sep 2006 00:38:07 -0400 Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer In-Reply-To: <001901c6df5b$641edcc0$6401a8c0@crankshaft> References: <000901c6df49$5e168a60$6401a8c0@crankshaft> <001001c6df4b$73adf780$6401a8c0@crankshaft> <4515992F.3060609@waclawek.net> <001901c6df5b$641edcc0$6401a8c0@crankshaft> Message-ID: <45160BAF.6070800@waclawek.net> James Whetstone wrote: > So I stepped through the code to see what happens to unused fragments, > and it leaves the fragments in the buffer. From what I can tell, > instead of moving the offset of the input buffer, XML_GetBuffer is > intended to be called each time new input is to be accepted. Yes. > It then allocates a new (larger) buffer, memcpys the fragment from the > old buffer to the new buffer and then frees the old buffer. No, only if the requested length plus the unprocessed fragment exceeds the size of the current buffer, otherwise the unused fragment is simply moved to the beginning of the buffer. > I'd like to avoid this by simple moving the input buffer's offset to > the a end of the fragment and NOT calling XML_GetBuffer to avoid the > extran memory allocation. Any suggestions? > Your suggestion in your other message is good - requesting a larger buffer on the first call to XML_GetBuffer - should reduce or eliminate new memory allocations. Karl From franky.braem at gmail.com Wed Sep 27 23:02:13 2006 From: franky.braem at gmail.com (Franky Braem) Date: Wed, 27 Sep 2006 23:02:13 +0200 Subject: [Expat-discuss] Always reports utf-8 encoding? Message-ID: <451AE6D5.1020909@gmail.com> I've compiled expat with XML_UNICODE to get UTF-16 encoding. But it seems that the character data handler always gets its information in UTF-8. The xml-file is stored in UTF-16 format. This is what I do in the handler: void ModulesXMLParser::CharacterDataHandler(void *userData, const XML_Char *s, int len) { ModulesXMLParser *modxml = (ModulesXMLParser *) userData; for(int i = 0; i < len; i++) { const unsigned t = s[i]; modxml->m_chars.AppendByte(t); } //modxml->m_chars.AppendData((void *) s, len); } And this is how I convert the information stored in m_chars: wxMBConvUTF16 conv; modxml->m_chars.AppendByte('\0'); modxml->m_chars.AppendByte('\0'); wxString dllName = wxString((const char *) modxml->m_chars.GetData(), conv); The above doesn't work. The following works: wxString dllName = wxString((const char *) modxml->m_chars.GetData(), wxConvUTF8); Any ideas on how to get UTF-16 output? Franky. From marco.forberg at gmx.net Wed Sep 27 23:09:58 2006 From: marco.forberg at gmx.net (Marco Forberg) Date: Wed, 27 Sep 2006 23:09:58 +0200 Subject: [Expat-discuss] Always reports utf-8 encoding? In-Reply-To: <451AE6D5.1020909@gmail.com> References: <451AE6D5.1020909@gmail.com> Message-ID: Did you try setting the encoding when creating the parser? XML_ParserCreate("UTF-16") Am 27.09.2006, 23:02 Uhr, schrieb Franky Braem : > I've compiled expat with XML_UNICODE to get UTF-16 encoding. But it > seems that the character data handler always gets its information in > UTF-8. > The xml-file is stored in UTF-16 format. > > This is what I do in the handler: > > void ModulesXMLParser::CharacterDataHandler(void *userData, > const XML_Char *s, > int len) > { > ModulesXMLParser *modxml = (ModulesXMLParser *) userData; > for(int i = 0; i < len; i++) > { > const unsigned t = s[i]; > modxml->m_chars.AppendByte(t); > } > //modxml->m_chars.AppendData((void *) s, len); > } > > And this is how I convert the information stored in m_chars: > > wxMBConvUTF16 conv; > modxml->m_chars.AppendByte('\0'); > modxml->m_chars.AppendByte('\0'); > wxString dllName = wxString((const char *) > modxml->m_chars.GetData(), conv); > > The above doesn't work. The following works: > > wxString dllName = wxString((const char *) > modxml->m_chars.GetData(), wxConvUTF8); > > Any ideas on how to get UTF-16 output? > > Franky. > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > From karl at waclawek.net Thu Sep 28 15:19:59 2006 From: karl at waclawek.net (Karl Waclawek) Date: Thu, 28 Sep 2006 09:19:59 -0400 Subject: [Expat-discuss] Always reports utf-8 encoding? In-Reply-To: <451AE6D5.1020909@gmail.com> References: <451AE6D5.1020909@gmail.com> Message-ID: <451BCBFF.5070100@waclawek.net> Franky Braem wrote: > I've compiled expat with XML_UNICODE to get UTF-16 encoding. But it > seems that the character data handler always gets its information in UTF-8. > The xml-file is stored in UTF-16 format. > > Are you linking to the correct library? For UTF-16 it is called "libexpatw". Karl From franky.braem at gmail.com Fri Sep 29 21:17:31 2006 From: franky.braem at gmail.com (Franky Braem) Date: Fri, 29 Sep 2006 21:17:31 +0200 Subject: [Expat-discuss] Always reports utf-8 encoding? In-Reply-To: <451BE77E.8050609@waclawek.net> References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net> <451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net> Message-ID: <451D714B.8040105@gmail.com> Karl Waclawek wrote: > Franky Braem wrote: >>>> >>> Are you linking to the correct library? For UTF-16 it is called >>> "libexpatw". >>> >>> >> I'm linking with libexpatwMT.lib >> >> Franky. > > I looked at the "expatw_static" project (I assume you are using Visual > Studio), and it seems the defines are correct. > Did you also use the XML_STATIC define? > > Karl XML_STATIC is defined. And yes I'm using Visual Studio. Franky. From karl at waclawek.net Fri Sep 29 21:28:51 2006 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 29 Sep 2006 15:28:51 -0400 Subject: [Expat-discuss] Always reports utf-8 encoding? In-Reply-To: <451D714B.8040105@gmail.com> References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net> <451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net> <451D714B.8040105@gmail.com> Message-ID: <451D73F3.70606@waclawek.net> Franky Braem wrote: >> I looked at the "expatw_static" project (I assume you are using >> Visual Studio), and it seems the defines are correct. >> Did you also use the XML_STATIC define? >> >> Karl > XML_STATIC is defined. And yes I'm using Visual Studio. > Maybe you can post a small self-contained example program that shows the problem. Karl From franky.braem at gmail.com Fri Sep 29 22:24:20 2006 From: franky.braem at gmail.com (Franky Braem) Date: Fri, 29 Sep 2006 22:24:20 +0200 Subject: [Expat-discuss] Always reports utf-8 encoding? In-Reply-To: <451D73F3.70606@waclawek.net> References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net> <451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net> <451D714B.8040105@gmail.com> <451D73F3.70606@waclawek.net> Message-ID: <451D80F4.4090004@gmail.com> Karl Waclawek wrote: > Maybe you can post a small self-contained example program that shows > the problem. > > Karl > The following is a small example: #include "expat.h" XML_Char buffer[1000]; int length = 0; void EndElementHandler(void *userData, const XML_Char *name); void CharacterDataHandler(void *userData, const XML_Char *s, int len); int _tmain(int argc, _TCHAR* argv[]) { XML_Parser parser = XML_ParserCreate(NULL); XML_SetUserData(parser, NULL); XML_SetElementHandler(parser, NULL, EndElementHandler); XML_SetCharacterDataHandler(parser, CharacterDataHandler); FILE *f = fopen("c:\\temp\\modules.xml", "r"); if ( f ) { // obtain file size. fseek (f , 0 , SEEK_END); long lSize = ftell (f); rewind (f); // allocate memory to contain the whole file. char *readbuffer = (char*) malloc (lSize); if (readbuffer == NULL) exit (2); // copy the file into the buffer. fread (readbuffer,1,lSize,f); XML_Parse(parser, readbuffer, lSize, 1); free(readbuffer); } XML_ParserFree(parser); return 0; } void EndElementHandler(void *userData, const XML_Char *name) { length = 0; } void CharacterDataHandler(void *userData, const XML_Char *s, int len) { for(int i = 0; i < len; i++, length++) { buffer[length] = s[i]; } } The following is defined: WIN32;_DEBUG;_CONSOLE;XML_UNICODE;XML_STATIC And I link with libexpatwMT.lib When I debug the above, the name of the tags are always readable, while I expect some UTF-16 characters. From karl at waclawek.net Sat Sep 30 05:23:44 2006 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 29 Sep 2006 23:23:44 -0400 Subject: [Expat-discuss] Always reports utf-8 encoding? In-Reply-To: <451D80F4.4090004@gmail.com> References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net> <451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net> <451D714B.8040105@gmail.com> <451D73F3.70606@waclawek.net> <451D80F4.4090004@gmail.com> Message-ID: <451DE340.3020104@waclawek.net> Franky Braem wrote: > Karl Waclawek wrote: >> Maybe you can post a small self-contained example program that shows >> the problem. >> >> Karl >> > The following is a small example: Seems this is just how the debugger processes and displays it - trying to be smart. You can assign a value > 255 to a buffer element (array of XML_Char), which means XML_Char has more than one byte. Btw, defining XML_UNICODE and not XML_UNICODE_WCHAR_T will typedef XML_Char as ushort, not as wchar_t. Karl From franky.braem at gmail.com Sat Sep 30 17:51:46 2006 From: franky.braem at gmail.com (Franky Braem) Date: Sat, 30 Sep 2006 17:51:46 +0200 Subject: [Expat-discuss] Always reports utf-8 encoding? In-Reply-To: <451DE340.3020104@waclawek.net> References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net> <451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net> <451D714B.8040105@gmail.com> <451D73F3.70606@waclawek.net> <451D80F4.4090004@gmail.com> <451DE340.3020104@waclawek.net> Message-ID: <451E9292.9040007@gmail.com> When I do the following in the characterhandler: ModulesXMLParser *modxml = (ModulesXMLParser *) userData; modxml->m_chars.AppendData((void *) s, len * 2); it works. Note the len * 2. Is this mentioned in the docs somewhere? If not, please add it. Franky. From karl at waclawek.net Sat Sep 30 18:31:10 2006 From: karl at waclawek.net (Karl Waclawek) Date: Sat, 30 Sep 2006 12:31:10 -0400 Subject: [Expat-discuss] Always reports utf-8 encoding? In-Reply-To: <451E9292.9040007@gmail.com> References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net> <451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net> <451D714B.8040105@gmail.com> <451D73F3.70606@waclawek.net> <451D80F4.4090004@gmail.com> <451DE340.3020104@waclawek.net> <451E9292.9040007@gmail.com> Message-ID: <451E9BCE.60701@waclawek.net> Franky Braem wrote: > When I do the following in the characterhandler: > > ModulesXMLParser *modxml = (ModulesXMLParser *) userData; > modxml->m_chars.AppendData((void *) s, len * 2); > > it works. Note the len * 2. Is this mentioned in the docs somewhere? > If not, please add it. > > The len refers to the number of XML_Chars, not to the number of bytes. With XML_UNICODE the size of XML_Char is 2, therefore the math above. I would use sizeof(XML_Char) instead of 2. Karl Karl