From fdrake@acm.org Thu Sep 5 19:04:19 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 5 Sep 2002 14:04:19 -0400 Subject: [Expat-discuss] Roadmap for the development of Expat Message-ID: <15735.40099.481729.43202@grendel.zope.com> The Expat team has published a proposed roadmap that describes our intended directions for future development of the parser. The roadmap is available on the Expat website at: http://www.libexpat.org/dev/roadmap.html We welcome comments on the proposal; please send feedback on the roadmap to expat-discuss mailing list: http://mail.libexpat.org/mailman-21/listinfo/ Please do not "Reply to All" to this message to avoid further cross posting. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From veillard@redhat.com Thu Sep 5 20:27:27 2002 From: veillard@redhat.com (Daniel Veillard) Date: Thu, 5 Sep 2002 15:27:27 -0400 Subject: [Expat-discuss] Re: [XML-SIG] Roadmap for the development of Expat In-Reply-To: <15735.40099.481729.43202@grendel.zope.com>; from fdrake@acm.org on Thu, Sep 05, 2002 at 02:04:19PM -0400 References: <15735.40099.481729.43202@grendel.zope.com> Message-ID: <20020905152727.R25901@redhat.com> On Thu, Sep 05, 2002 at 02:04:19PM -0400, Fred L. Drake, Jr. wrote: > > The Expat team has published a proposed roadmap that describes our > intended directions for future development of the parser. The roadmap > is available on the Expat website at: > > http://www.libexpat.org/dev/roadmap.html Seems to miss Namespace in XML-1.1 Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ From fdrake@acm.org Thu Sep 5 21:20:25 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 5 Sep 2002 16:20:25 -0400 Subject: [Expat-discuss] Re: [XML-SIG] Roadmap for the development of Expat In-Reply-To: <20020905152727.R25901@redhat.com> References: <15735.40099.481729.43202@grendel.zope.com> <20020905152727.R25901@redhat.com> Message-ID: <15735.48265.473231.777235@grendel.zope.com> Daniel Veillard writes: > Seems to miss Namespace in XML-1.1 This was editorial oversight, not a failing of intention. I've clarified the document to indicate that Namespaces in XML 1.1 would be part of the XML 1.1 support. I agree that it's important to mention the additional spec explicitly, since it is a separate spec. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From c1pp1n0@yahoo.com Fri Sep 6 22:06:39 2002 From: c1pp1n0@yahoo.com (Massimiliano Ciaramita) Date: Fri, 6 Sep 2002 14:06:39 -0700 (PDT) Subject: [Expat-discuss] Iso encoded data Message-ID: <20020906210639.80964.qmail@web20502.mail.yahoo.com> Hello, I am trying to get at the data between tags that is encoded as iso-8859-1. I am using the XML_SetCharacterDataHandler but that data is not in the array of characters retrieved. For example, from the input We keep wondering — what Mr. Gates wanted to say I get the cdata: We keep wondering what Mr. Gates wanted to say with no dash anywhere. I am initializing the parser with the encoding info: XML_Parser parser = XML_ParserCreate("iso-8859-1"); How do you get the encoded data? Many thanks! __________________________________________________ Do You Yahoo!? Yahoo! Finance - Get real-time stock quotes http://finance.yahoo.com From kwaclaw@thestar.ca Fri Sep 6 22:20:58 2002 From: kwaclaw@thestar.ca (Karl Waclawek) Date: Fri, 6 Sep 2002 17:20:58 -0400 Subject: [Expat-discuss] Iso encoded data References: <20020906210639.80964.qmail@web20502.mail.yahoo.com> Message-ID: <044b01c255eb$4b1b96d0$9e539696@citkwaclaww2k> > Hello, > > I am trying to get at the data between tags that is > encoded as iso-8859-1. I am using the > XML_SetCharacterDataHandler but that data is not in > the array of characters retrieved. For example, from > the input > > We keep wondering — what Mr. Gates wanted > to say > > I get the cdata: > > We keep wondering what Mr. Gates wanted to say > > with no dash anywhere. > I am initializing the parser with the encoding info: > > XML_Parser parser = XML_ParserCreate("iso-8859-1"); > > How do you get the encoded data? > Many thanks! Did you define the entity mdash somewhere in your DTD or internal subset? This is not a predefined entity! If it was not defined, then depending on the details of your XML document, the missing entity might not be a well-formedness violation, so Expat would not return an error. Karl Karl Get to know us http://www.thestar.com - Canada's largest daily newspaper online http://www.toronto.com - All you need to know about T.O. http://www.workopolis.com - Canada's biggest job site http://www.torontostartv.com - Webcasting & Production http://www.newinhomes.com - Ontario's Largest New Home & Condo Website http://www.waymoresports.com - Canada's most comprehensive sports site http://www.tmgtv.ca - Torstar Media Group Television From joegd@mutantsoft.com Sat Sep 7 00:30:50 2002 From: joegd@mutantsoft.com (Joe Collins) Date: Fri, 06 Sep 2002 16:30:50 -0700 Subject: [Expat-discuss] Beginner problem Message-ID: <3D793AAA.7050201@mutantsoft.com> ---------------------- multipart/alternative attachment Hi all, I am having a weird problem I can't figure out. Much help would be appreciated. I have some code - I always get a "not well formed (invalid token)" error at the end of the file. I can't figure it out. I KNOW the document is well formed. I was previously using the xerces parser and it worked fine. In fact if I run the outline.exe < my.xml it works fine... So I am totally stumped. I thought maybe there was an eof problem, but it doesn't look like it in the debugger... Anyway here's my code... Please help [:(] thanks, -Joe XNode *node=NULL; //This is my own dom-node object //Set up the parser XML_Parser parser = XML_ParserCreate(NULL); if (! parser) { printf("Couldn't allocate memory for parser\n"); exit(-1); } XML_SetElementHandler(parser, XMLParser::start, XMLParser::end); //These handlers are static members //that do nothing right now XML_SetUserData(parser, node); //Read in the document char *buff; int sz = sizeOfFile(doc); //fstat to get the size of the file if (sz <= 0) return NULL; buff = (char *)XML_GetBuffer(parser, sz); //Get the buffer ZeroMemory(buff,sz); //Zero out the buffer if (readTextFile(buff, doc, sz) <= 0) //Read the xml file return NULL; if (! XML_ParseBuffer(parser, sz, 0)) //Parse the doc { int x = XML_GetCurrentLineNumber(parser); const char *XML_LChar = XML_ErrorString(XML_GetErrorCode(parser)); printf("Parse error at line %d:\n%s\n", XML_GetCurrentLineNumber(parser), XML_ErrorString(XML_GetErrorCode(parser))); return NULL; } XML_ParserFree(parser); ---------------------- multipart/alternative attachment-- From joe@mutantsoft.com Sat Sep 7 00:29:07 2002 From: joe@mutantsoft.com (Joe Collins) Date: Fri, 06 Sep 2002 16:29:07 -0700 Subject: [Expat-discuss] Beginner problem Message-ID: <3D793A43.8010501@mutantsoft.com> Hi all, I am having a weird problem I can't figure out. Much help would be appreciated. I have some code - I always get a "not well formed (invalid token)" error at the end of the file. I can't figure it out. I KNOW the document is well formed. I was previously using the xerces parser and it worked fine. In fact if I run the outline.exe < my.xml it works fine... So I am totally stumped. I thought maybe there was an eof problem, but it doesn't look like it in the debugger... Anyway here's my code... Please help :( thanks, -Joe XNode *node=NULL; //This is my own dom-node object //Set up the parser XML_Parser parser = XML_ParserCreate(NULL); if (! parser) { printf("Couldn't allocate memory for parser\n"); exit(-1); } XML_SetElementHandler(parser, XMLParser::start, XMLParser::end); //These handlers are static members //that do nothing right now XML_SetUserData(parser, node); //Read in the document char *buff; int sz = sizeOfFile(doc); //fstat to get the size of the file if (sz <= 0) return NULL; buff = (char *)XML_GetBuffer(parser, sz); //Get the buffer ZeroMemory(buff,sz); //Zero out the buffer if (readTextFile(buff, doc, sz) <= 0) //Read the xml file return NULL; if (! XML_ParseBuffer(parser, sz, 0)) //Parse the doc { int x = XML_GetCurrentLineNumber(parser); const char *XML_LChar = XML_ErrorString(XML_GetErrorCode(parser)); printf("Parse error at line %d:\n%s\n", XML_GetCurrentLineNumber(parser), XML_ErrorString(XML_GetErrorCode(parser))); return NULL; } XML_ParserFree(parser); From manojkithany108@hotmail.com Sat Sep 7 00:36:43 2002 From: manojkithany108@hotmail.com (Manoj Kithany) Date: Fri, 06 Sep 2002 23:36:43 +0000 Subject: [Expat-discuss] libexpat.a Error Message-ID: Hi Experts! I tried to install Apache 2.0.40 on IBM AIX 5.1 System - but I gave following errors: --------------------------------- error: failed dependencies: libexpat.a(libexpat.so.0) is needed by apache-1.3.22-1ssl -------------------------------- On which I installed libexpat.a by using "./configure" "make" and "make install" but still it gives the same error message - wonder why? Do any of you Experts know how to eliminate this error. How do I tell installation that I have installed libexpat.a? Any related informatin on this is appreciated. THANKS! Manoj G. Kithany _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx From karl@waclawek.net Sat Sep 7 01:26:20 2002 From: karl@waclawek.net (Karl Waclawek) Date: Fri, 6 Sep 2002 20:26:20 -0400 Subject: [Expat-discuss] Beginner problem References: <3D793AAA.7050201@mutantsoft.com> Message-ID: <001f01c25605$312552b0$0207a8c0@karl> > Hi all, > > I am having a weird problem I can't figure out. Much help would be > appreciated. > > I have some code - I always get a "not well formed (invalid token)" > error at the > end of the file. I can't figure it out. I KNOW the document is well > if (readTextFile(buff, doc, sz) <= 0) //Read the xml file > return NULL; > if (! XML_ParseBuffer(parser, sz, 0)) //Parse the doc If you only use one buffer, you need to set the isFinal argument to <> 0, otherwise the parser expects more data. Karl From fdrake@acm.org Sat Sep 7 01:04:24 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 6 Sep 2002 20:04:24 -0400 Subject: [Expat-discuss] Expat 1.95.5 released Message-ID: <15737.17032.677262.166667@grendel.zope.com> Expat 1.95.5 has been released. The following changes have been made since the previous release: - Added XML_UseForeignDTD() for improved SAX2 support. - Added XML_GetFeatureList(). - Defined XML_Bool type and the values XML_TRUE and XML_FALSE. - Use an incomplete struct instead of a void* for the parser. - Fixed UTF-8 decoding bug that caused legal UTF-8 to be rejected. - Finally fixed bug where default handler would report DTD events that were already handled by another handler. Initial patch contributed by Darryl Miller. - Removed unnecessary DllMain() function that caused static linking into a DLL to be difficult. - Added VC++ projects for building static libraries. - Reduced line-length for all source code and headers to be no longer than 80 characters, to help with AS/400 support. - Reduced memory copying during parsing (SF patch #600964). - Fixed a variety of bugs: see SF issues 580793, 434664, 483514, 580503, 581069, 584041, 584183, 584832, 585537, 596555, 596678, 598352, 598944, 599715, 600479, 600971. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From joegd@mutantsoft.com Sat Sep 7 01:26:28 2002 From: joegd@mutantsoft.com (Joe Collins) Date: Fri, 06 Sep 2002 17:26:28 -0700 Subject: [Expat-discuss] Beginner problem References: <3D793AAA.7050201@mutantsoft.com> <001f01c25605$312552b0$0207a8c0@karl> Message-ID: <3D7947B4.5010504@mutantsoft.com> ---------------------- multipart/alternative attachment ok. I get the error regardless of isFinal being 0 or 1. -Joe Karl Waclawek wrote: >>Hi all, >> >>I am having a weird problem I can't figure out. Much help would be >>appreciated. >> >>I have some code - I always get a "not well formed (invalid token)" >>error at the >>end of the file. I can't figure it out. I KNOW the document is well >> > > >> if (readTextFile(buff, doc, sz) <= 0) //Read the xml file >> return NULL; >> if (! XML_ParseBuffer(parser, sz, 0)) //Parse the doc >> > >If you only use one buffer, you need to set the isFinal argument to <> 0, >otherwise the parser expects more data. > >Karl > > ---------------------- multipart/alternative attachment-- From manojkithany108@hotmail.com Sat Sep 7 21:56:43 2002 From: manojkithany108@hotmail.com (Manoj Kithany) Date: Sat, 07 Sep 2002 20:56:43 +0000 Subject: [Expat-discuss] libexpat.a Error Message-ID: Hi Experts! I tried to install Apache 2.0.40 on IBM AIX 5.1 System - but I gave following errors: --------------------------------- error: failed dependencies: libexpat.a(libexpat.so.0) is needed by apache-1.3.22-1ssl -------------------------------- On which I installed libexpat.a by using "./configure" "make" and "make install" but still it gives the same error message - wonder why? Do any of you Experts know how to eliminate this error. How do I tell installation that I have installed libexpat.a? Any related informatin on this is appreciated. THANKS! Manoj G. Kithany _________________________________________________________________ Join the world’s largest e-mail service with MSN Hotmail. http://www.hotmail.com From jgarbers@mindspring.com Tue Sep 10 22:06:52 2002 From: jgarbers@mindspring.com (Jeff Garbers) Date: Tue, 10 Sep 2002 17:06:52 -0400 Subject: [Expat-discuss] Trouble building with XML_UNICODE_WCHAR_T Message-ID: <000001c2590d$ffad2760$6601a8c0@XLTVAIO> Sorry for what may be a newbie problem, but I seem to be having trouble getting expat to build libraries in its wchar_t mode (that is, with XML_UNICODE_WCHAR_T defined). I'm using the 1.95.5 code, building with GCC 2.96 on Mandrake 8.2. I've modified a copy of the sample 'elements.c' to use XML_Char instead of char in its callbacks, define XML_UNICODE_WCHAR_T in the source, compile it and link it statically to libexpatw.a (which I built following the instructions in the README.) Strange thing is that the libexpatw.a and libexpat.a are nearly identical -- they're the same size, with a different number in what appears to be a header in the .a file. When I look at what expat is returning on the callbacks, the buffers appear to contain regular 8-bit chars and not wchar_t (which are 32 bits on this platform). Again, sorry if this is a stupid mistake, but I've been beating my head against it all day and would appreciate any experts' help. Thanks again. Jeff Garbers XLT Software www.xltsoftware.com From manojkithany108@hotmail.com Wed Sep 11 16:31:32 2002 From: manojkithany108@hotmail.com (Manoj Kithany) Date: Wed, 11 Sep 2002 15:31:32 +0000 Subject: [Expat-discuss] aclocal Missing! Message-ID: Hi Experts, When I run "./buildconf" to install Apache 2.0.40 on IBM AIX, I get following errors: Seems that "aclocal" is missing - from where do I get and install aclocal...? I checked GNU site but coul'nt find . I have already installed following tools: libtool automake autoconf Transcrips of Errors are: --------------------------------------------------- ./buildconf.sh[12]: /usr/bin/../share/aclocal: not found. Incorporating /Downloads/httpd-2.0.40/srclib/apr-util/xml/expat/libtool.m4 into aclocal.m4 ... cat: 0652-050 Cannot open /Downloads/httpd-2.0.40/srclib/apr-util/xml/expat/libtool.m4. Copying libtool helper files ... Creating config.h.in ... autoheader: config.h.in is unchanged Creating configure ... configure.in:56: error: possibly undefined macro: AC_LIBTOOL_WIN32_DLL configure.in:57: error: possibly undefined macro: AC_PROG_LIBTOOL --------------------------------------------------------- _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com From jgarbers@xltsoftware.com Wed Sep 11 18:50:35 2002 From: jgarbers@xltsoftware.com (Jeff Garbers) Date: Wed, 11 Sep 2002 13:50:35 -0400 Subject: [Expat-discuss] Using expat in "wide mode" on systems with 32-bit wchar_t's Message-ID: <002401c259bb$bf01d540$6601a8c0@XLTVAIO> Thanks to Karl for his kind help with my previous posting. I'm now able to build expat successfully with XML_UNICODE_WCHAR_T. My test code shows that libexpatw.a is indeed returning 16-bit characters. Now, however, another problem arises on systems that define wchar_t to be 32 bits wide -- like GCC on the iMac and Linux boxes I'm using. I can build my own code in 16-bit wchar mode by using the -fshort-wchar option (at least on Linux; that doesn't appear to work on Mac OS X). That lets my code talk to expat successfully... but now none of the wide string functions from the standard library will work, since they're all compiled for 32-bit wchar_t's. So my question is for folks who've used expat in its "wide mode" on systems with a 32-bit wchar_t: how do you deal with it? Do you just expand everything coming out of expat into a wider buffer before operating on it? Do you avoid using the built-in wide string functions? Or is there a trick I haven't discovered yet? Thanks again to all for your help. Jeff Garbers jgarbers@xltsoftware.com From fdrake@acm.org Wed Sep 11 18:56:36 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 11 Sep 2002 13:56:36 -0400 Subject: [Expat-discuss] Using expat in "wide mode" on systems with 32-bit wchar_t's In-Reply-To: <002401c259bb$bf01d540$6601a8c0@XLTVAIO> References: <002401c259bb$bf01d540$6601a8c0@XLTVAIO> Message-ID: <15743.33748.18671.736178@grendel.zope.com> Jeff Garbers writes: > Thanks to Karl for his kind help with my previous posting. I'm now > able to build expat successfully with XML_UNICODE_WCHAR_T. My test > code shows that libexpatw.a is indeed returning 16-bit characters. Thanks, Karl! Next time, please keep it on the list so I can use it to improve the documentation. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From Josh.Martin@abq.sc.philips.com Wed Sep 11 23:34:11 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Wed, 11 Sep 2002 16:34:11 -0600 (MDT) Subject: [Expat-discuss] aclocal Missing! Message-ID: <200209112234.g8BMYCB21395@atoae450.abq.sc.philips.com> > Hi Experts, > > When I run "./buildconf" to install Apache 2.0.40 on IBM AIX, > I get following errors: Seems that "aclocal" is missing - from where do I > get and install aclocal...? I checked GNU site but coul'nt find . > I have already installed following tools: > > libtool > automake > autoconf > Transcrips of Errors are: > --------------------------------------------------- > ./buildconf.sh[12]: /usr/bin/../share/aclocal: not found. ... Hello, The 'aclocal' program is part of the 'automake' package, and is automatically installed. It actually looks to me like the 'buildconf.sh' script is looking for it in the wrong place. It is looking in /usr/share, while it looks like it is actually trying to look in /usr/share/bin. I am not familiar with Apache 2 (only Apache 1) nor IBM AIX. It seems to me that while you may have ran or configured the buildconf script incorrectly, it looks more like it was not written to correctly look for aclocal in (what I assume is) a non-standard directory. I'll let you proceed from here, but let me know if you figure it out, or would like more help. - Josh Martin From Josh.Martin@abq.sc.philips.com Wed Sep 11 23:56:13 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Wed, 11 Sep 2002 16:56:13 -0600 (MDT) Subject: [Expat-discuss] Beginner problem Message-ID: <200209112256.g8BMuDB28101@atoae450.abq.sc.philips.com> Hello, If you are now setting isFinal to 0 on all but the last buffer, and non-zero on the last buffer, then I would say to make sure that the length element that you are sending to XML_ParseBuffer() does not include the final NULL character in the string, as this would not be a valid XML character. In other words you want to use strlen() for the length of the string, not sizeof() for the number of bytes, which is what fstat() is giving you. What bothers me is that I would think you would get this problem on every buffer, not just the last, unless you're reading in the entire document into a single buffer. - Josh Martin > ok. I get the error regardless of isFinal being 0 or 1. > > -Joe > > Karl Waclawek wrote: > > >>Hi all, > >> > >>I am having a weird problem I can't figure out. Much help would be > >>appreciated. > >> > >>I have some code - I always get a "not well formed (invalid token)" > >>error at the > >>end of the file. I can't figure it out. I KNOW the document is well > >> > > > > > >> if (readTextFile(buff, doc, sz) <= 0) //Read the xml file > >> return NULL; > >> if (! XML_ParseBuffer(parser, sz, 0)) //Parse the doc > >> > > > >If you only use one buffer, you need to set the isFinal argument to <> 0, > >otherwise the parser expects more data. > > > >Karl > > > > > From Umashanker.Challa@gems9.gov.bc.ca Thu Sep 12 01:06:26 2002 From: Umashanker.Challa@gems9.gov.bc.ca (Challa, Umashanker C HLTH:EX) Date: Wed, 11 Sep 2002 17:06:26 -0700 Subject: [Expat-discuss] Expat on AIX ? Message-ID: <7C7118AA087A43459FECD0DB06BFD15D03614EAA@shield.gov.bc.ca> Does any one know if Expat can work on AIX. If so are there any known issues? Thanks in advance UC From karl@waclawek.net Thu Sep 12 03:25:58 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed, 11 Sep 2002 22:25:58 -0400 Subject: [Expat-discuss] Trouble building with XML_UNICODE_WCHAR_T References: <001c01c2591e$73705db0$6601a8c0@GIGALON> Message-ID: <001601c25a03$bb26f750$0207a8c0@karl> > Ah, there's the problem. The Linux box defines a 32-bit wchar_t... which > seems excessive to me, but then again I don't speak any Asian languages! > I'd prefer to stay 16-bit for internal purposes, but then none of my > wide string handling functions would be available. Have you looked into the compiler option -fshort-wchar_t? Should that not change your wide string handling functions to 16bit? > I'll probably stick to Windows-based development for a while until I > have more time to figure this out. Expat differntiates between two types of strings: 1) XML_LChar: application strings like error messages, version string, feature descriptions 2) XML_Char: XML output There are two UTF-16 compile options in Expat, which affect these strings differently: - XML_UNICODE: defines XML_LChar as char and XML_Char as unsigned short, which means that even if the application itself works with 8bit strings, it can still generate UTF-16 encoded XML output - XML_UNICODE_WCHAR_T: defines XML_LChar and XML_Char as wchar_t The second one requires wchar_t to be 16bit wide - check the -fshort-wchar option. > Is 32-bit wchar_t compatibility on the roadmap for expat? Is it anything > I'd be able to help with? I am not sure if compiling without the -fshort-wchar option would work. If not, we would certainly appreciate it if you had a closer look. Expat relies on volunteer contributions!!! However, the main problem is that even if it works, output would still be encoded as UTF-16, for which 32bit characters are not appropriate. So, to really get 32bit wchar_t going, one would need to add UTF-32 as another output encoding. You are, of course, most welcome to make such a contribution. :-) Karl From Umashanker.Challa@gems9.gov.bc.ca Thu Sep 12 16:57:35 2002 From: Umashanker.Challa@gems9.gov.bc.ca (Challa, Umashanker C HLTH:EX) Date: Thu, 12 Sep 2002 08:57:35 -0700 Subject: [Expat-discuss] Support for AIX Message-ID: <7C7118AA087A43459FECD0DB06BFD15D03614EB1@shield.gov.bc.ca> Does anyone know of known issues about running Expat on AIX box. Thanks in advance Umashanker Challa From deschan3@attbi.com Thu Sep 12 22:27:56 2002 From: deschan3@attbi.com (Desmond Chan) Date: Thu, 12 Sep 2002 14:27:56 -0700 Subject: [Expat-discuss] binary in XML? Message-ID: <01C25A68.970549C0.deschan3@attbi.com> Hi all, I am aware that it is not advisable to include binary in a XML document. Yet I would like to know if and how I can use expat to parse a XML document with binary in it. For example, ... It is a file element with data of 100 bytes. "..." is 100 bytes of binary data. Is it possible to ask expat to simply copy 100 bytes of data to a buffer after it sees , then skips that 100 bytes and start parsing the doc as text from the end tag again? Thanks, Desmond From hari1008@hotmail.com Fri Sep 13 18:08:00 2002 From: hari1008@hotmail.com (hari hari) Date: Fri, 13 Sep 2002 17:08:00 +0000 Subject: [Expat-discuss] failed dependencies: libexpat.a Message-ID: hi my many friendis i installing apache 1.3.22 for IBM aix 5.1 system. I got this from the CD i get from the server. when installing apache from the cd, i getting below error. The following errors during installation occurred: error: failed dependencies: libexpat.a(libexpat.so.0) is needed by apache-1.3.22-1ssl ploice helping me in above error removing, --hari _________________________________________________________________ Send and receive Hotmail on your mobile device: http://mobile.msn.com From fdrake@acm.org Fri Sep 13 20:23:20 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 13 Sep 2002 15:23:20 -0400 Subject: [Expat-discuss] failed dependencies: libexpat.a In-Reply-To: References: Message-ID: <15746.15144.432307.986535@grendel.zope.com> hari hari writes: > The following errors during installation occurred: > error: failed dependencies: libexpat.a(libexpat.so.0) is needed by > apache-1.3.22-1ssl This has to do with the way Apache embedded Expat in some 1.3.x versions. I know this was resolved in Apache 1.3, but I don't remember the exact patchlevel you need to get the fix. The most recent release of Apache 1.3 is 1.3.26; you can find that at: http://httpd.apache.org/ -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From MGodoy@adexus.cl Mon Sep 16 22:03:58 2002 From: MGodoy@adexus.cl (Marcela Godoy) Date: Mon, 16 Sep 2002 17:03:58 -0400 Subject: [Expat-discuss] How Can I get the value of an element using Expat apis? Message-ID: Hi, I need pass the value of an element to a variable of a C program, By examples, the xml text: hola1, How get the value of a and b elements for?. Marcela Godoy Pinilla. Ingeniero de Software mgodoy@adexus.cl 56-2-6861142 ADEXUS. From mauro@altersoft.com.ar Tue Sep 17 18:48:08 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Tue, 17 Sep 2002 13:48:08 -0400 (ART) Subject: [Expat-discuss] Pre-begginer question Message-ID: Hello! I'm just starting to use eXpat. Does eXpat support XSchema? I've only seen references to DTD parsing. Thanks! Mauro -- Ing.Mauro Daniel Ardolino Departamento de Desarrollo y Servicios Altersoft Billinghurst 1599 - Piso 9 C1425DTE - Capital Federal Tel/Fax: 4821-3376 / 4822-8759 mailto: mauro@altersoft.com.ar website: http://www.altersoft.com.ar From fdrake@acm.org Tue Sep 17 17:51:55 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 17 Sep 2002 12:51:55 -0400 Subject: [Expat-discuss] Pre-begginer question In-Reply-To: References: Message-ID: <15751.23979.884667.164605@grendel.zope.com> Mauro Daniel Ardolino writes: > Does eXpat support XSchema? I've only seen references to DTD parsing. There is not XSchema support in Expat. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From Josh.Martin@abq.sc.philips.com Wed Sep 18 04:51:13 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue, 17 Sep 2002 21:51:13 -0600 (MDT) Subject: [Expat-discuss] RE: Beginner Problem Message-ID: <200209180351.g8I3pDB13063@atoae450.abq.sc.philips.com> Hello, I was wondering, were you ever able to solve the problem of the "invalid token" error at the end of your buffers? - Josh Martin From fdrake@acm.org Wed Sep 18 04:55:52 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 17 Sep 2002 23:55:52 -0400 Subject: [Expat-discuss] RE: Beginner Problem In-Reply-To: <200209180351.g8I3pDB13063@atoae450.abq.sc.philips.com> References: <200209180351.g8I3pDB13063@atoae450.abq.sc.philips.com> Message-ID: <15751.63816.841429.261310@grendel.zope.com> Josh Martin writes: > I was wondering, were you ever able to solve the problem of the > "invalid token" error at the end of your buffers? Josh, Can you be more specific? I think there have been a couple of bugs that showed themselves by setting this error code. Is there a specific bug ID from the SourceForge tracker that you're interested in? We have fixed some UTF-8 decoding issues in the last couple of releases. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From Josh.Martin@abq.sc.philips.com Wed Sep 18 05:14:50 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue, 17 Sep 2002 22:14:50 -0600 (MDT) Subject: [Expat-discuss] binary in XML? Message-ID: <200209180414.g8I4EoB20184@atoae450.abq.sc.philips.com> > Hi all, > I am aware that it is not advisable to include binary in a XML document. > Yet I would like to know if and how I can use expat to parse a XML > document with binary in it. > For example, > > ... > > > It is a file element with data of 100 bytes. "..." is 100 bytes of binary > data. Is it possible to ask expat to simply copy 100 bytes of data to a > buffer after it sees , then skips that 100 bytes and start > parsing the doc as text from the end tag again? > Thanks, > > Desmond While expat itself does not have any functionality to do what you are talking about, it would be easy to have your application do this. In your StartElementHandler() function you can have your application read in the 100 bytes of data into a buffer when you handle the tag and the "length" attribute, and then process the information as desired. Your application can then resume sending the document to expat, and because you start reading after the binary information expat will never know the difference. However, there are some things to watch out for when using this method. In order to make sure that the binary data is not included in a buffer read to expat you will either have to send the document to expat one byte at a time, or you will have to make sure the tags and the binary information are on separate lines (as follows) and read the document in one line at a time. ... These caveats are why it's generally advisable not to use inline binary information in an XML document, but they do not preclude the possibility. Good luck. - Josh Martin From Josh.Martin@abq.sc.philips.com Wed Sep 18 05:17:41 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue, 17 Sep 2002 22:17:41 -0600 (MDT) Subject: [Expat-discuss] RE: Beginner Problem Message-ID: <200209180417.g8I4HgB21064@atoae450.abq.sc.philips.com> > Josh Martin writes: > > I was wondering, were you ever able to solve the problem of the > > "invalid token" error at the end of your buffers? > > Josh, > > Can you be more specific? I think there have been a couple of bugs > that showed themselves by setting this error code. Is there a > specific bug ID from the SourceForge tracker that you're interested > in? We have fixed some UTF-8 decoding issues in the last couple of > releases. > > > -Fred Fred, Please note that the message was To: joe, and only Cc: to the expat discussion list. Sorry, I should have made it clear that the message was addressed to Joe, and not the list. - Josh Martin From Josh.Martin@abq.sc.philips.com Wed Sep 18 05:31:06 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue, 17 Sep 2002 22:31:06 -0600 (MDT) Subject: [Expat-discuss] How Can I get the value of an element using Expat apis? Message-ID: <200209180431.g8I4V6B24979@atoae450.abq.sc.philips.com> > Hi, I need pass the value of an element to a variable of a C program, By > examples, the xml text: hola1, How get the value of a and b > elements for?. > > Marcela Godoy Pinilla. > Ingeniero de Software > mgodoy@adexus.cl > 56-2-6861142 > ADEXUS. Hello, First, let me clear up a misconception you seem to have. Elements, such as the ones denoted by the and tags, do not have values, although they may contain attributes which have values (such as ). The value of any attributes can be obtained with the handler that you specify with XML_SetStartElementHandler() (see the documentation). The text contained between a start and end tag such as the "hola" in hola is called character data, and does not technically have any relation to the tag, except as far as validation using the DTD is concerned. This character data can be obtained using the handler that you specify with XML_SetCharacterDataHander() (again, see the documentation). The only way to tell what tags the character data occured inside of is to keep track of which tags you are parsing inside of your StartElementHandler() function and your EndElementHandler() function. - Josh Martin From Josh.Martin@abq.sc.philips.com Wed Sep 18 06:02:23 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue, 17 Sep 2002 23:02:23 -0600 (MDT) Subject: [Expat-discuss] Re: Beginner Problem Message-ID: <200209180502.g8I52NB04476@atoae450.abq.sc.philips.com> ------------- Begin Forwarded Message ------------- Date: Tue, 17 Sep 2002 22:03:07 -0700 From: Joe Collins To: Josh Martin Subject: Re: Beginner Problem Yup. for XML_ParseBuffer(parser, size, 0) I was using size from fstat instead of getting size from fread -Joe Josh Martin wrote: >Hello, > >I was wondering, were you ever able to solve the problem of the "invalid token" >error at the end of your buffers? > > - Josh Martin > > > ------------- End Forwarded Message ------------- From karl@waclawek.net Wed Sep 18 14:06:21 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed, 18 Sep 2002 09:06:21 -0400 Subject: [Expat-discuss] Re: Beginner Problem References: <200209180502.g8I52NB04476@atoae450.abq.sc.philips.com> Message-ID: <001901c25f14$2f0497d0$9e539696@citkwaclaww2k> > ------------- Begin Forwarded Message ------------- > > Date: Tue, 17 Sep 2002 22:03:07 -0700 > From: Joe Collins > To: Josh Martin > Subject: Re: Beginner Problem > > Yup. > for XML_ParseBuffer(parser, size, 0) > I was using size from fstat instead of getting size from fread > > -Joe I am not a Unix guy, but I have seen this a few times in Expat code I have encountered, causing a problem almost every time.. Why is fread actually returning a different value than fstat? Karl From Mario.Ruggier@softplumbers.com Wed Sep 18 15:36:36 2002 From: Mario.Ruggier@softplumbers.com (Ruggier, Mario) Date: Wed, 18 Sep 2002 16:36:36 +0200 Subject: [Expat-discuss] expat from javascript? Message-ID: <6C927E5DB9915040A6EF446A431CA1A6393150@spdc01.softplumbers.com> Hi all, does anyone know if expat can be used from javascript (or jscript, for = ASPages)?=20 Appreciate any info... Cheers, mario From fdrake@acm.org Wed Sep 18 16:00:20 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 18 Sep 2002 11:00:20 -0400 Subject: [Expat-discuss] Re: Beginner Problem In-Reply-To: <001901c25f14$2f0497d0$9e539696@citkwaclaww2k> References: <200209180502.g8I52NB04476@atoae450.abq.sc.philips.com> <001901c25f14$2f0497d0$9e539696@citkwaclaww2k> Message-ID: <15752.38148.46516.123254@grendel.zope.com> Karl Waclawek writes: > I am not a Unix guy, but I have seen this a few times > in Expat code I have encountered, causing a problem > almost every time.. > Why is fread actually returning a different value than fstat? I don't have any sample code from this thread handy (don't recall seeing any, but I've been swamped), so it's hard to say. If the input was being read in chunks, using the value from fstat() would certainly be the wrong thing, but I'm guessing that wasn't the case here, or it wouldn't have been a surprise or a mystery. On Windows, there could be a proble if the file were opened in text mode. Though XML is generally considered text, it usually a mistake to open the file in text mode, simply because it makes the size reported by fstat() differ from the number of bytes provided by the fread() calls. If the input were a socket, short reads would certainly be possible (and maybe common), but there wouldn't be any useful fstat() data. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Wed Sep 18 16:59:15 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed, 18 Sep 2002 11:59:15 -0400 Subject: [Expat-discuss] expat from javascript? References: <6C927E5DB9915040A6EF446A431CA1A6393150@spdc01.softplumbers.com> Message-ID: <00c101c25f2c$56db0830$9e539696@citkwaclaww2k> > Hi all, > does anyone know if expat can be used from javascript (or jscript, for ASPages)? > Appreciate any info... You would likely have to write an ActiveX wrapper for Expat, since that is how you can get external APIs into JavaScript, AFAIK. Since you require Windows anyway, I would suggest the easiest course of action would be to use MSXML4. Karl From Mario.Ruggier@softplumbers.com Wed Sep 18 17:08:56 2002 From: Mario.Ruggier@softplumbers.com (Ruggier, Mario) Date: Wed, 18 Sep 2002 18:08:56 +0200 Subject: [Expat-discuss] expat from javascript? Message-ID: <6C927E5DB9915040A6EF446A431CA1A6393152@spdc01.softplumbers.com> > > Hi all, >=20 > > does anyone know if expat can be used from javascript (or=20 > jscript, for ASPages)?=20 > > Appreciate any info... >=20 > You would likely have to write an ActiveX wrapper for Expat, > since that is how you can get external APIs into JavaScript, AFAIK. >=20 > Since you require Windows anyway, I would suggest the easiest course > of action would be to use MSXML4. Thanks. I wanted to compare performance between the two infact (for = expatsax vs. msxmlsax, and expatsax vs. msxmldom) for use in an existent = application done in javascript. But i do not know if the sax event = handling in javascript would anyway kill any of the possible speed gain = from expat ;-\ Anyone knows?=20 Cheers, mario =20 > Karl >=20 >=20 From karl@waclawek.net Wed Sep 18 18:12:59 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed, 18 Sep 2002 13:12:59 -0400 Subject: [Expat-discuss] expat from javascript? References: <6C927E5DB9915040A6EF446A431CA1A6393152@spdc01.softplumbers.com> Message-ID: <00f901c25f36$a3e43570$9e539696@citkwaclaww2k> > > Since you require Windows anyway, I would suggest the easiest course > > of action would be to use MSXML4. > Thanks. I wanted to compare performance between the two infact (for expatsax vs. msxmlsax, and expatsax vs. > msxmldom) for use in an existent application done in javascript. But i do not know if the sax event handling > in javascript would anyway kill any of the possible speed gain from expat ;-\ Anyone knows? I would say that once you set handlers and do some actual processing, the CPU cycles to do that will outweigh the CPU cycles spent on parsing by a significant amount. In addition, you have I/O overhead which also is a major part of the equation. Realistically speaking, you will very likely not see any difference at all. In my own tests (using a compiled Delphi program, which added an extra SAX2 layer to Expat), Expat was noticeably faster than MSXML3. I also compared with MSXML4, using Expat through the same program and using MSXML4 through the C style (pointer based) interfaces, not the slower VB style ones that you would use in JavaScript. In that comparison MSXML4 was a tad faster (just a few percent points), unless a complex DTD was used, in which case Expat was faster again. This was not a professional benchmarking attempt, but it should give you an idea. Curious: What expatsax implementation are you referring to? Karl From chr020@email.mot.com Wed Sep 18 21:59:22 2002 From: chr020@email.mot.com (RoseRan) Date: Wed, 18 Sep 2002 15:59:22 -0500 Subject: [Expat-discuss] expat starter question Message-ID: <3D88E92A.DCBF1320@email.mot.com> Hello, I downloaded the expat package from ftp://ftp.jclark.com/pub/xml/expat.zip. I built it and it seems successful, now I got the executable. However, I have a problem to run it with my own dtd file and xml file. Could somebody tell me what's the command line arguments to run with own dtd file, and how to do it? I run it in Unix machine, and built it with -DXML_DTD. Really appreciate your help, thanks! Rose From rolf@pointsman.de Wed Sep 18 22:13:06 2002 From: rolf@pointsman.de (rolf@pointsman.de) Date: Wed, 18 Sep 2002 23:13:06 +0200 (MEST) Subject: [Expat-discuss] expat starter question In-Reply-To: <3D88E92A.DCBF1320@email.mot.com> Message-ID: <200209182113.XAA13693@pointsman.pointsman.de> On 18 Sep, RoseRan wrote: > Hello, > > I downloaded the expat package from > ftp://ftp.jclark.com/pub/xml/expat.zip. > I built it and it seems successful, now I got the executable. However, I > have > a problem to run it with my own dtd file and xml file. Could somebody > tell me what's > the command line arguments to run with own dtd file, and how to do it? I > run it in Unix machine, and built it > with -DXML_DTD. With executable you mean the included xmlwf tool? If yes, use it with the option -p. Out of the xmlwf manpage: -p Tells xmlwf to process external DTDs and parameter entities. Your xml document has to point to your dtd in the DOCTYPE declaration. xmlwf resolves only file urls. Please notice, that expat doesn't validate the document, even if it reads the DTD and any other external general or parameter entities. It does use the information out of the DTD (and the internal subset, if present) to add attributes with fixed values, if they are omitted by there elements, and does stuff like the additional attribute value normalization, required by XML rec 3.3.3 for attribute value types other than CDATA and such things. rolf From karl@waclawek.net Wed Sep 18 22:18:24 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed, 18 Sep 2002 17:18:24 -0400 Subject: [Expat-discuss] expat starter question References: <3D88E92A.DCBF1320@email.mot.com> Message-ID: <01b001c25f58$ec40ddb0$9e539696@citkwaclaww2k> > > I downloaded the expat package from > ftp://ftp.jclark.com/pub/xml/expat.zip. This version of Expat is two years old. Any reason why you didn't pick one of the newer versions from http://www.libexpat.org? Karl From jc.gervais@videotron.ca Thu Sep 19 16:35:18 2002 From: jc.gervais@videotron.ca (Jean-Claude Gervais) Date: Thu, 19 Sep 2002 11:35:18 -0400 Subject: [Expat-discuss] Using Expat to rewrite an XML file, can it be done? Message-ID: Hi, I'd like to write a front-end that would use Expat to read an XML file and present certain strings contained in the XML to the user for editing. The problem I'm having is that I must then rewrite the XML back out to a file. In order to do that, what would be a good usage of Expat? I think I might need to "see" all the XML being parsed, so that whatever came before the string a user edited could be rewritten to the output. That would mean "seeing" every tag, symbol and value in the XML file, wouldn't it? Can Expat be used to do this, or would I need to develop my own parser? Thanks in advance. From rolf@pointsman.de Thu Sep 19 17:18:43 2002 From: rolf@pointsman.de (rolf@pointsman.de) Date: Thu, 19 Sep 2002 18:18:43 +0200 (MEST) Subject: [Expat-discuss] Using Expat to rewrite an XML file, can it be done? In-Reply-To: Message-ID: <200209191618.SAA28807@pointsman.pointsman.de> On 19 Sep, Jean-Claude Gervais wrote: > Hi, > > I'd like to write a front-end that would use Expat to read an XML file and > present certain strings contained in the XML to the user for editing. > > The problem I'm having is that I must then rewrite the XML back out to a > file. > > In order to do that, what would be a good usage of Expat? As you describe your task, this is a thing out of scope for expat alone. expat is 'only' a streaming XML parser. It reads the XML data and provides the data throu handlers to the application. It neither stores the data for you, nor has it capabilities, to write XML data. Of course you could use expat as parser for your application. But you need additional components on top of expat. You could write them from the scratch, or - if your XML data is not that huge - just use a DOM Implementation on top of expat. There are several DOM implementations on top of expat avaliable, with C API, or for usage from tcl/tk, python, or perl (and probably a lot of scripting languages more). > Can Expat be used to do this, or would I need to develop my own parser? You should definitely not try to implement your own XML parser. This would be in almost all cases just a waste of effort. Pick one of the avaliable parser, that fits your needs, and just build the application layer on top of that parser. rolf From Josh.Martin@abq.sc.philips.com Thu Sep 19 19:12:58 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Thu, 19 Sep 2002 12:12:58 -0600 (MDT) Subject: [Expat-discuss] Re: Beginner Problem Message-ID: <200209191812.g8JICwB16489@atoae450.abq.sc.philips.com> > From: "Karl Waclawek" > I am not a Unix guy, but I have seen this a few times > in Expat code I have encountered, causing a problem > almost every time.. > Why is fread actually returning a different value than fstat? > > Karl In my experience this problem (invalid token error at end) is caused by sending the size of the buffer in bytes to XML_Parse() instead of the length of the string. This causes the terminating NULL in the string/buffer to be parsed by Expat, which always seems to confuse it. Now, would it be possible/desirable to hack the expat release so that it will ignore any single terminating NULL character that it does not understand? Or, if not that, then maybe put an explicit warning in the documentation (near XML_Parse/XML_ParseBuffer) about this behavior? I don't see any problems with the hack approach, as long as the character is only ignored when it causes a problem, so that any intentional nulls aren't stripped/ignored. - Josh Martin From mauro@altersoft.com.ar Fri Sep 20 15:27:09 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Fri, 20 Sep 2002 10:27:09 -0400 (ART) Subject: [Expat-discuss] Using Expat to rewrite an XML file, can it be done? Message-ID: About this answer I have a question: I'm starting to write a program in C++ and I want to parse a XML document to get some parameters. Is it recommended to use something else than expat? I mean, reading this answer I think I have to find something to mount on expat, other high-level layer. Is it that what you wanted to say? Any suggestions? Thanks! Mauro ---------- Forwarded message ---------- Date: Thu, 19 Sep 2002 18:18:43 +0200 (MEST) From: rolf@pointsman.de To: expat-discuss@libexpat.org Subject: Re: [Expat-discuss] Using Expat to rewrite an XML file, can it be done? On 19 Sep, Jean-Claude Gervais wrote: > Hi, > > I'd like to write a front-end that would use Expat to read an XML file and > present certain strings contained in the XML to the user for editing. > > The problem I'm having is that I must then rewrite the XML back out to a > file. > > In order to do that, what would be a good usage of Expat? As you describe your task, this is a thing out of scope for expat alone. expat is 'only' a streaming XML parser. It reads the XML data and provides the data throu handlers to the application. It neither stores the data for you, nor has it capabilities, to write XML data. Of course you could use expat as parser for your application. But you need additional components on top of expat. You could write them from the scratch, or - if your XML data is not that huge - just use a DOM Implementation on top of expat. There are several DOM implementations on top of expat avaliable, with C API, or for usage from tcl/tk, python, or perl (and probably a lot of scripting languages more). > Can Expat be used to do this, or would I need to develop my own parser? You should definitely not try to implement your own XML parser. This would be in almost all cases just a waste of effort. Pick one of the avaliable parser, that fits your needs, and just build the application layer on top of that parser. rolf _______________________________________________ Expat-discuss mailing list Expat-discuss@libexpat.org http://mail.libexpat.org/mailman-21/listinfo/expat-discuss From karl@waclawek.net Fri Sep 20 14:50:51 2002 From: karl@waclawek.net (Karl Waclawek) Date: Fri, 20 Sep 2002 09:50:51 -0400 Subject: [Expat-discuss] Using Expat to rewrite an XML file, can it be done? References: Message-ID: <007f01c260ac$bb5645f0$9e539696@citkwaclaww2k> > About this answer I have a question: > > I'm starting to write a program in C++ and I want to parse a XML > document to get some parameters. Is it recommended to use something > else than expat? I mean, reading this answer I think I have to > find something to mount on expat, other high-level layer. > Is it that what you wanted to say? Any suggestions? Well, if you want to edit an XML file then a DOM parser looks more like what you need. There are several available, like Xerces and libXML. If you just want to read data from the XML document and use them some other way, than Expat should do just fine for the reading part. If you want to adhere to a standardized API, like SAX, then you need a wrapper that exposes Expat functionality (which is already SAX like) in a more strictly SAX way. There is one link on the Expat home page for a SAX wrapper written in C++. I don't know how widely used this specific version of the SAX API is, in the C++ world, but here it is: http://www.jezuk.co.uk/cgi-bin/view/arabica There is another C++ wrapper too, but the link seems dead at the moment: http://www.codeproject.com/soap/expatimpl.asp Hope that helps, Karl From mauro@altersoft.com.ar Fri Sep 20 16:34:43 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Fri, 20 Sep 2002 11:34:43 -0400 (ART) Subject: [Expat-discuss] Using Expat to rewrite an XML file, can it be done? In-Reply-To: <007f01c260ac$bb5645f0$9e539696@citkwaclaww2k> Message-ID: Thank you very much! my confussion started when I thought that expat was a SAX parser...as you say is "SAX like". Now I red "what expat not is". I'll try arabica. Thanks again! Mauro On Fri, 20 Sep 2002, Karl Waclawek wrote: > > > About this answer I have a question: > > > > I'm starting to write a program in C++ and I want to parse a XML > > document to get some parameters. Is it recommended to use something > > else than expat? I mean, reading this answer I think I have to > > find something to mount on expat, other high-level layer. > > Is it that what you wanted to say? Any suggestions? > > Well, if you want to edit an XML file then a DOM parser looks > more like what you need. There are several available, like > Xerces and libXML. > > If you just want to read data from the XML document and use them some > other way, than Expat should do just fine for the reading part. > > If you want to adhere to a standardized API, like SAX, then > you need a wrapper that exposes Expat functionality > (which is already SAX like) in a more strictly SAX way. > > There is one link on the Expat home page for a SAX wrapper > written in C++. I don't know how widely used this specific > version of the SAX API is, in the C++ world, but here it is: > > http://www.jezuk.co.uk/cgi-bin/view/arabica > > There is another C++ wrapper too, but the link seems dead at the moment: > > http://www.codeproject.com/soap/expatimpl.asp > > > Hope that helps, > > Karl > -- Ing.Mauro Daniel Ardolino Departamento de Desarrollo y Servicios Altersoft Billinghurst 1599 - Piso 9 C1425DTE - Capital Federal Tel/Fax: 4821-3376 / 4822-8759 mailto: mauro@altersoft.com.ar website: http://www.altersoft.com.ar From mauro@altersoft.com.ar Fri Sep 20 21:57:01 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Fri, 20 Sep 2002 16:57:01 -0400 (ART) Subject: [Expat-discuss] tag-path in expat Message-ID: Hi all! I have a question. I want to know the tag-path of an element while it is parsed. e.g.: hi hi again I want to know if the parser arrived to a/b/c or to a/r/c.... How can I do that? Thanks! Mauro -- Ing.Mauro Daniel Ardolino Departamento de Desarrollo y Servicios Altersoft Billinghurst 1599 - Piso 9 C1425DTE - Capital Federal Tel/Fax: 4821-3376 / 4822-8759 mailto: mauro@altersoft.com.ar website: http://www.altersoft.com.ar From fdrake@acm.org Fri Sep 20 21:00:48 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 20 Sep 2002 16:00:48 -0400 Subject: [Expat-discuss] tag-path in expat In-Reply-To: References: Message-ID: <15755.32368.195245.68682@grendel.zope.com> Mauro Daniel Ardolino writes: > Hi all! > I have a question. I want to know the tag-path of an element while it is > parsed. ... > I want to know if the parser arrived to a/b/c or to a/r/c.... > How can I do that? Keep a stack of the open elements in the start-element handler, and pop the stack in the end-element handler. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From mauro@altersoft.com.ar Fri Sep 20 22:04:35 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Fri, 20 Sep 2002 17:04:35 -0400 (ART) Subject: [Expat-discuss] tag-path in expat In-Reply-To: <15755.32368.195245.68682@grendel.zope.com> Message-ID: Oh! you are right! I'm sleepy I think... Sorry Fred for posting so ridiculous question. Thanks! Mauro On Fri, 20 Sep 2002, Fred L. Drake, Jr. wrote: > > Mauro Daniel Ardolino writes: > > Hi all! > > I have a question. I want to know the tag-path of an element while it is > > parsed. > ... > > I want to know if the parser arrived to a/b/c or to a/r/c.... > > How can I do that? > > Keep a stack of the open elements in the start-element handler, and > pop the stack in the end-element handler. > > > -Fred > > -- Ing.Mauro Daniel Ardolino Departamento de Desarrollo y Servicios Altersoft Billinghurst 1599 - Piso 9 C1425DTE - Capital Federal Tel/Fax: 4821-3376 / 4822-8759 mailto: mauro@altersoft.com.ar website: http://www.altersoft.com.ar From deschan3@attbi.com Fri Sep 20 23:49:17 2002 From: deschan3@attbi.com (Desmond Chan) Date: Fri, 20 Sep 2002 15:49:17 -0700 Subject: [Expat-discuss] reset start/end element handler Message-ID: <01C260BD.47D322E0.deschan3@attbi.com> Hi all, I am wondering if there is any rule preventing me from resetting start/end element handlers after the expat parser has started parsing. Basically I need to let user of my API's to set different callback functions according to different tags. For example I need to let users plug in different handlers once I detect there is a open tag. Likewise for . Is this okay to do? Thanks, Desmond From fdrake@acm.org Fri Sep 20 23:55:58 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 20 Sep 2002 18:55:58 -0400 Subject: [Expat-discuss] reset start/end element handler In-Reply-To: <01C260BD.47D322E0.deschan3@attbi.com> References: <01C260BD.47D322E0.deschan3@attbi.com> Message-ID: <15755.42878.3283.417932@grendel.zope.com> Desmond Chan writes: > I need to let users plug in different handlers once I detect there is a > open tag. Likewise for . Is this okay to do? Yes, it certainly is. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From mauro@altersoft.com.ar Mon Sep 23 14:45:09 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Mon, 23 Sep 2002 09:45:09 -0400 (ART) Subject: [Expat-discuss] CharacterDataHandler question. Message-ID: Hello! I red the documentation about the CharacterDataHandler and I think this event is called when there's data inside tags. I'm using the C++ Wrapper written by Tim Smith. The fact is that I'm testing it and I discovered that this handler is called even if there's no data inside tags. e.g.: It is called twice and the data is first a CR and then a white space. On the other hand I know that this handler can be called more than once for the same tag bringing pieces of information (why is that?). What I cannot solve is differencing between a valid white space and this strange spureous white space. e.g. THIS HAS THREE SPACES My first idea: On start element: init a variable. On character: save adding on the variable. On end element: take off the CR and the last white space. Do I have to do all that work? If so I think I can write a subclass from Tim's one to do it only once. Am I wrong? Thanks a lot. -- Mauro -- Ing.Mauro Daniel Ardolino Departamento de Desarrollo y Servicios Altersoft Billinghurst 1599 - Piso 9 C1425DTE - Capital Federal Tel/Fax: 4821-3376 / 4822-8759 mailto: mauro@altersoft.com.ar website: http://www.altersoft.com.ar From mauro@altersoft.com.ar Mon Sep 23 15:05:12 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Mon, 23 Sep 2002 10:05:12 -0400 (ART) Subject: [Expat-discuss] CharacterDataHandler question. (re-post) Message-ID: Sorry if this message arrived twice. I'm not having echo of the first post. It could not arrived. The message: Hello! I red the documentation about the CharacterDataHandler and I think this event is called when there's data inside tags. I'm using the C++ Wrapper written by Tim Smith. The fact is that I'm testing it and I discovered that this handler is called even if there's no data inside tags. e.g.: It is called twice and the data is first a CR and then a white space. On the other hand I know that this handler can be called more than once for the same tag bringing pieces of information (why is that?). What I cannot solve is differencing between a valid white space and this strange spureous white space. e.g. THIS HAS THREE SPACES My first idea: On start element: init a variable. On character: save adding on the variable. On end element: take off the CR and the last white space. Do I have to do all that work? If so I think I can write a subclass from Tim's one to do it only once. Am I wrong? Thanks a lot. -- Mauro -- Ing.Mauro Daniel Ardolino Departamento de Desarrollo y Servicios Altersoft Billinghurst 1599 - Piso 9 C1425DTE - Capital Federal Tel/Fax: 4821-3376 / 4822-8759 mailto: mauro@altersoft.com.ar website: http://www.altersoft.com.ar From karl@waclawek.net Mon Sep 23 14:25:26 2002 From: karl@waclawek.net (Karl Waclawek) Date: Mon, 23 Sep 2002 09:25:26 -0400 Subject: [Expat-discuss] CharacterDataHandler question. References: Message-ID: <000e01c26304$ae2b5690$9e539696@citkwaclaww2k> > I red the documentation about the CharacterDataHandler and I think > this event is called when there's data inside tags. I'm using the C++ > Wrapper written by Tim Smith. > > The fact is that I'm testing it and I discovered that this handler is > called even if there's no data inside tags. > e.g.: > If is on a line of itself, then Expat will report the line breaks before and after! > It is called twice and the data is first a CR and then a white space. > On the other hand I know that this handler can be called more than once > for the same tag bringing pieces of information (why is that?). What if the character data is "interrupted" with child elements? > What I cannot solve is differencing between a valid white space and this > strange spureous white space. > > e.g. > THIS HAS THREE SPACES > > My first idea: > On start element: init a variable. > On character: save adding on the variable. > On end element: take off the CR and the last white space. > > Do I have to do all that work? > If so I think I can write a subclass from Tim's one to do it only once. If you are sure that the extra whitespace is reported *between* the start and end tags, then I would suggest you try out Expat directly. If the behavious is still there, then please file a bug report and attach a small example to reproduce the behaviour. You can also try the other C++ wrapper, which is SAX2 compliant: http://www.jezuk.co.uk/cgi-bin/view/arabica Karl From mauro@altersoft.com.ar Mon Sep 23 15:32:25 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Mon, 23 Sep 2002 10:32:25 -0400 (ART) Subject: [Expat-discuss] CharacterDataHandler question. In-Reply-To: <000e01c26304$ae2b5690$9e539696@citkwaclaww2k> Message-ID: You are right. I thought they were ignored (the things outside tags). So I have to test start-element and end-element to see if the data is inside tags or not. Thanks a lot!! About this: > > It is called twice and the data is first a CR and then a white space. > > On the other hand I know that this handler can be called more than once > > for the same tag bringing pieces of information (why is that?). > > What if the character data is "interrupted" with child elements? Do you mean something like this? (Is this well-formed?) this is datamore data Thanks! -- Mauro On Mon, 23 Sep 2002, Karl Waclawek wrote: > > I red the documentation about the CharacterDataHandler and I think > > this event is called when there's data inside tags. I'm using the C++ > > Wrapper written by Tim Smith. > > > > The fact is that I'm testing it and I discovered that this handler is > > called even if there's no data inside tags. > > e.g.: > > > > If is on a line of itself, then Expat will report > the line breaks before and after! > > > It is called twice and the data is first a CR and then a white space. > > On the other hand I know that this handler can be called more than once > > for the same tag bringing pieces of information (why is that?). > > What if the character data is "interrupted" with child elements? > > > What I cannot solve is differencing between a valid white space and this > > strange spureous white space. > > > > e.g. > > THIS HAS THREE SPACES > > > > My first idea: > > On start element: init a variable. > > On character: save adding on the variable. > > On end element: take off the CR and the last white space. > > > > Do I have to do all that work? > > If so I think I can write a subclass from Tim's one to do it only once. > > If you are sure that the extra whitespace is reported *between* > the start and end tags, then I would suggest you try out Expat directly. > If the behavious is still there, then please file a bug report and > attach a small example to reproduce the behaviour. > > You can also try the other C++ wrapper, which is SAX2 compliant: > http://www.jezuk.co.uk/cgi-bin/view/arabica > > Karl > > -- Ing.Mauro Daniel Ardolino Departamento de Desarrollo y Servicios Altersoft Billinghurst 1599 - Piso 9 C1425DTE - Capital Federal Tel/Fax: 4821-3376 / 4822-8759 mailto: mauro@altersoft.com.ar website: http://www.altersoft.com.ar From karl@waclawek.net Mon Sep 23 15:06:27 2002 From: karl@waclawek.net (Karl Waclawek) Date: Mon, 23 Sep 2002 10:06:27 -0400 Subject: [Expat-discuss] CharacterDataHandler question. References: Message-ID: <005901c2630a$689bf7f0$9e539696@citkwaclaww2k> > About this: > > > It is called twice and the data is first a CR and then a white space. > > > On the other hand I know that this handler can be called more than once > > > for the same tag bringing pieces of information (why is that?). > > > > What if the character data is "interrupted" with child elements? > Do you mean something like this? (Is this well-formed?) > > this is datamore data Yes, this is well-formed. Karl From mauro@altersoft.com.ar Tue Sep 24 14:46:55 2002 From: mauro@altersoft.com.ar (Mauro Daniel Ardolino) Date: Tue, 24 Sep 2002 09:46:55 -0400 (ART) Subject: [Expat-discuss] question Message-ID: Hi all! I have a question: Can expat be used to check validity against a DTD? I can't understand the funtionality of XML_UseForeignDTD. Thanks. Mauro -- Ing.Mauro Daniel Ardolino Departamento de Desarrollo y Servicios Altersoft Billinghurst 1599 - Piso 9 C1425DTE - Capital Federal Tel/Fax: 4821-3376 / 4822-8759 mailto: mauro@altersoft.com.ar website: http://www.altersoft.com.ar From awakankar@hss.hns.com Tue Sep 24 14:01:12 2002 From: awakankar@hss.hns.com (awakankar@hss.hns.com) Date: Tue, 24 Sep 2002 18:31:12 +0530 Subject: [Expat-discuss] Refering to XML Schema and Expat Message-ID: <65256C3E.00473EBE.00@sandesh.hss.hns.com> Have some doubts, could anyone please clarify them 1) I have an XML based on XML schema (and not DTD), does this in any way affect the working of expat? I guess expat should be able to succesfully parse my XML. 2) What does the compile time macro XML_DTD mean? How does an inclusion of external DTD help? And why is it required? 3) Being a beginner, I know that parsers are based either on SAX or DOM. What I can gather from the material I read from Expat, I understand it works in the same way SAX would work. Bu this seems conflicting with the NON-GOALS mentioned here http://expat.sourceforge.net/dev/roadmap.html Thanks in advance for your help Regards DISCLAIMER: This message is proprietary to Hughes Software Systems Limited (HSS) and is intended solely for the use of the individual to whom it is addressed. It may contain privileged or confidential information and should not be circulated or used for any purpose other than for what it is intended. If you have received this message in error, please notify the originator immediately. If you are not the intended recipient, you are notified that you are strictly prohibited from using, copying, altering, or disclosing the contents of this message. HSS accepts no responsibility for loss or damage arising from the use of the information transmitted by this email including damage from virus. From karl@waclawek.net Tue Sep 24 14:15:57 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue, 24 Sep 2002 09:15:57 -0400 Subject: [Expat-discuss] question References: Message-ID: <005101c263cc$86331c90$9e539696@citkwaclaww2k> > Hi all! > I have a question: Can expat be used to check validity against a DTD? > I can't understand the funtionality of XML_UseForeignDTD. It allows you to supply a DTD if the document does not contain a reference to an external subset (DTD). This is equivalent to the functionality in the SAX interface specification of EntityResolver2.getExternalSubset(). Check out: http://saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html Karl From karl@waclawek.net Tue Sep 24 14:22:25 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue, 24 Sep 2002 09:22:25 -0400 Subject: [Expat-discuss] Refering to XML Schema and Expat References: <65256C3E.00473EBE.00@sandesh.hss.hns.com> Message-ID: <005f01c263cd$6c251410$9e539696@citkwaclaww2k> > Have some doubts, could anyone please clarify them > 1) I have an XML based on XML schema (and not DTD), does this in any way affect > the working of expat? I guess expat should be able to succesfully parse my XML. Yes, it should not have any influence. > 2) What does the compile time macro XML_DTD mean? It just means that you get full DTD reporting capability. If you don't need that you can compile without it and get a smaller (and maybe a little faster) library. > How does an inclusion of > external DTD help? And why is it required? I don't understand this question. Where did you read about a required inclusion of an external DTD? > 3) Being a beginner, I know that parsers are based either on SAX or DOM. What I > can gather from the material I read from Expat, I understand it works in the > same way SAX would work. Bu this seems conflicting with the NON-GOALS mentioned > here http://expat.sourceforge.net/dev/roadmap.html SAX is a detailed API specification. It is not Expat's goal to conform to this specification exactly, but conceptually Expat works just like a SAX parser, and it is possible to write wrappers (C++, Delphi, etc.) that expose Expat's functionality through a SAX API. Karl From awakankar@hss.hns.com Tue Sep 24 15:08:18 2002 From: awakankar@hss.hns.com (awakankar@hss.hns.com) Date: Tue, 24 Sep 2002 19:38:18 +0530 Subject: [Expat-discuss] Refering to XML Schema and Expat Message-ID: <65256C3E.004D63B8.00@sandesh.hss.hns.com> > Have some doubts, could anyone please clarify them > 1) I have an XML based on XML schema (and not DTD), does this in any way affect > the working of expat? I guess expat should be able to succesfully parse my XML. Yes, it should not have any influence. --- In one of the threads above (with subject pre-beginer question), it was mentioned that Expat does not support XML schema. What does this mean? Its seems misleading. --- > How does an inclusion of > external DTD help? And why is it required? I don't understand this question. Where did you read about a required inclusion of an external DTD? --- Sorry about this,I was confused. What you explained earlier clarifies my doubt. --- Just one more general question. --One reason of expat being so popular is that it s light-weight and very efficient. But since it does not validate, for peope who would like to validate an XML document first and then parse it. Are there some efficient validators by the same author? If not, are there any other efficient validators to be used in conjunction with expat? --Are there already available SAX or DOM wrappers on Expat? Could you give me some pointers to them. Many thanks From karl@waclawek.net Tue Sep 24 15:23:51 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue, 24 Sep 2002 10:23:51 -0400 Subject: [Expat-discuss] Refering to XML Schema and Expat References: <65256C3E.004D63B8.00@sandesh.hss.hns.com> Message-ID: <007501c263d6$012183c0$9e539696@citkwaclaww2k> > In one of the threads above (with subject pre-beginer question), it was > mentioned that Expat does not support XML schema. What does this mean? Its > seems misleading. Well, an XML Schema is written in XML, so it can be parsed by any XML parser, but Expat does not use the information in the schema to validate the document. > Just one more general question. > --One reason of expat being so popular is that it s light-weight and very > efficient. But since it does not validate, for peope who would like to validate > an XML document first and then parse it. Are there some efficient validators by > the same author? If not, are there any other efficient validators to be used in > conjunction with expat? Good question. I know of at least one validation layer on top of Expat, written for a TCl wrapper. There are probably others but nobody has ever contributed one back to the project. Btw, such a layer would validate as you parse, not as an extra step. Could anyone on this list jump in here? We would really like to know what is out there. According to SourceForge statistics we have had 150,000 downloads in the last 5 months, so there is a chance that somebody did work in that area. > --Are there already available SAX or DOM wrappers on Expat? Could you give me > some pointers to them. Just browse through our web site: http://www.libexpat.org. Right on the first page ... Karl From jc.gervais@videotron.ca Tue Sep 24 17:49:18 2002 From: jc.gervais@videotron.ca (Jean-Claude Gervais) Date: Tue, 24 Sep 2002 12:49:18 -0400 Subject: [Expat-discuss] Bug in my code or bug in Expat? Message-ID: Hi, I'm trying to write code to read an XML file, but my application keeps crashing and I can't figure out what the problem is. During the StartElementHandler function, I am simply trying to trace what the names and values of the elements are and the program crashes. Could you PLEASE look at the OnStartElementHandler function, and tell me if I'm doing something wrong? PS - The code DOES work, most of the time and prints out a series of informations, it chokes if the file is large, but I can't see why it would make any difference. ------------------------------------ #include void OnStartElementHandler( void * userData, const XML_Char * name, const XML_Char ** atts ) { for ( unsigned uDepth = 0;; uDepth++ ) { if ( NULL == *atts ) { break; } const XML_Char * pszAttribute = *atts; atts++; /* Increment to next item, the VALUE. */ const XML_Char * pszValue = *atts; TCHAR szLogMessage[1024]; sprintf( szLogMessage, "%s=\"%s\"", pszAttribute, pszValue ); wxLogDebug( szLogMessage ); } } void OnCharacterDataHandler( void * userData, const XML_Char * s, int len ) { LPTSTR lpszLogMessage; lpszLogMessage = (LPTSTR)calloc( sizeof( TCHAR ), len + 1 ); if ( NULL == lpszLogMessage ) { assert( FALSE ); return; } strncpy( lpszLogMessage, s, len ); wxLogDebug( lpszLogMessage ); free( lpszLogMessage ); } void OnEndElementHandler( void * userData, const XML_Char * name ) { wxLogDebug( name ); } class XML_PARSER_INFO { public: XML_Parser m_parser; }; typedef XML_PARSER_INFO * PXML_PARSER_INFO; typedef XML_PARSER_INFO FAR * LPXML_PARSER_INFO; #define READ_XML_BUFFER_SIZE ( 1024 * 4 ) /* 4K of parsing fury! */ int test( LPCTSTR lpszXMLFile ) { XML_PARSER_INFO Info; Info.m_parser = XML_ParserCreate( "ISO-8859-1" ); XML_SetUserData( Info.m_parser, &Info ); XML_SetElementHandler( Info.m_parser, OnStartElementHandler, OnEndElementHandler ); XML_SetCharacterDataHandler( Info.m_parser, OnCharacterDataHandler ); FILE * pFile = fopen( lpszXMLFile, "rb" ); if ( NULL == pFile ) { return -1; } int iResult = 0; int iTotalCount = 0; size_t uRead = 0; TCHAR cBuffer[READ_XML_BUFFER_SIZE]; while( ! feof( pFile ) ) { uRead = fread( cBuffer, 1, READ_XML_BUFFER_SIZE, pFile ); if ( 0 >= uRead ) { assert( FALSE ); break; } iTotalCount += uRead; TCHAR szMessage[128]; sprintf( szMessage, "At offset %d", iTotalCount ); wxLogDebug( szMessage ); iResult = XML_Parse( Info.m_parser, cBuffer, uRead, 0 != feof( pFile ) ); // assert( iResult == 0 ); } fclose( pFile ); XML_ParserFree( Info.m_parser ); return iResult; } From rolf@pointsman.de Tue Sep 24 17:58:43 2002 From: rolf@pointsman.de (rolf@pointsman.de) Date: Tue, 24 Sep 2002 18:58:43 +0200 (MEST) Subject: [Expat-discuss] Refering to XML Schema and Expat In-Reply-To: <007501c263d6$012183c0$9e539696@citkwaclaww2k> Message-ID: <200209241658.SAA28358@pointsman.pointsman.de> On 24 Sep, Karl Waclawek wrote: >> Just one more general question. >> --One reason of expat being so popular is that it s light-weight and very >> efficient. But since it does not validate, for peope who would like to validate >> an XML document first and then parse it. Are there some efficient validators by >> the same author? If not, are there any other efficient validators to be used in >> conjunction with expat? > > Good question. > I know of at least one validation layer on top of Expat, written > for a TCl wrapper. There are probably others but nobody has ever > contributed one back to the project. > Btw, such a layer would validate as you parse, not as an extra step. > > Could anyone on this list jump in here? > We would really like to know what is out there. I'm the one, who has written this validation layer on top of expat for an expat Tcl wrapper, that Karl above mentioned. If someone is interested, best would be to grab the sources out of the CVS of this project. To get it, use cvs -d:pserver:anonymous@www.archiware.com:/usr/local/pubcvs co tdom The validation code is in the subdirectory extensions/tnc. But don't rush. This code needs the framework of the hole project, to work. On the other side, it should be possible, to break the validation code out of the framework. The core of that code are simply a few expat handler functions. It's not perfect, though, the code has a few limitations. But beside the deficiencies of my code, I should mention, that I think, it's not possible, to write a 100 percent compliant DTD validator on top of the current expat code. There are two major problems, I see: It is not possible at the moment, to check for the Validity constraint: Proper Declaration/PE Nesting, and second it is not possible, to validate XML documents, that have standalone="yes" and an external subset (the problem is, that there is no reliable way to know, if expat has done additional normalization on attribute types other than CDATA (see XML rec 3.3.3). If somebody is interested, I may elaborate this notes a bit more. > According to SourceForge statistics we have had 150,000 downloads > in the last 5 months, so there is a chance that somebody did work > in that area. Yes! Please, raise your head. rolf From fdrake@acm.org Tue Sep 24 17:59:26 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 24 Sep 2002 12:59:26 -0400 Subject: [Expat-discuss] Bug in my code or bug in Expat? In-Reply-To: References: Message-ID: <15760.39406.451302.21815@grendel.zope.com> Jean-Claude Gervais writes: > Could you PLEASE look at the OnStartElementHandler function, > and tell me if I'm doing something wrong? I've only got time for a very brief look; there may be more to find. void OnStartElementHandler(void *userData, const XML_Char *name, const XML_Char ** atts ) { for ( unsigned uDepth = 0;; uDepth++ ) { if (NULL == *atts) { break; } const XML_Char * pszAttribute = *atts; atts++; /* Increment to next item, the VALUE. */ const XML_Char * pszValue = *atts; ... You're pulling two pointers from the array, but only incrementing atts once. That means you're getting a "false" attribute value where the (perceived) name is the value of the previous attribute, and whose value is the name of the next attribute. At the end, you're using the value of the last attribute and NULL as the value, which is never a legal use of the values. Solution: After setting pszValue, increment atts again. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Tue Sep 24 18:03:34 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue, 24 Sep 2002 13:03:34 -0400 Subject: [Expat-discuss] Refering to XML Schema and Expat References: <200209241658.SAA28358@pointsman.pointsman.de> Message-ID: <003101c263ec$517e0440$9e539696@citkwaclaww2k> > But beside the deficiencies of my code, I should mention, that I > think, it's not possible, to write a 100 percent compliant DTD > validator on top of the current expat code. There are two major > problems, I see: It is not possible at the moment, to check for the > Validity constraint: Proper Declaration/PE Nesting, and second it is > not possible, to validate XML documents, that have standalone="yes" > and an external subset (the problem is, that there is no reliable way > to know, if expat has done additional normalization on attribute > types other than CDATA (see XML rec 3.3.3). If somebody is interested, > I may elaborate this notes a bit more. Please do - on both points! Karl From rolf@pointsman.de Wed Sep 25 02:29:42 2002 From: rolf@pointsman.de (rolf@pointsman.de) Date: Wed, 25 Sep 2002 03:29:42 +0200 (MEST) Subject: [Expat-discuss] Refering to XML Schema and Expat In-Reply-To: <003101c263ec$517e0440$9e539696@citkwaclaww2k> Message-ID: <200209250129.DAA30000@pointsman.pointsman.de> On 24 Sep, Karl Waclawek wrote: > >> But beside the deficiencies of my code, I should mention, that I >> think, it's not possible, to write a 100 percent compliant DTD >> validator on top of the current expat code. There are two major >> problems, I see: It is not possible at the moment, to check for the >> Validity constraint: Proper Declaration/PE Nesting, and second it is >> not possible, to validate XML documents, that have standalone="yes" >> and an external subset (the problem is, that there is no reliable way >> to know, if expat has done additional normalization on attribute >> types other than CDATA (see XML rec 3.3.3). If somebody is interested, >> I may elaborate this notes a bit more. > > Please do - on both points! I should have known it.. ;-) Ok, since it's you, that asks. First the validity constraints Proper Declaration/PE Nesting and Proper Group/PE Nesting. I've missed the second in my note, but both are similar problems. Take a look at this example document with this external subset e1.dtd "> with e2.dtd Expat accepts this, which is of course OK. But this is not valid. (The OASIS xml test suite includes a few tests for both constraints.) It does not help much, to analyse the replacement text of a parameter entity inside the XML_EntityDeclHandler, to find such 'ill-formed' parameter entities (beside, that this would be far from plain simple to do). This is, because this parameter entities are not really 'ill-formed' - they follow the production rules for a parameter entity and are therefor 'legal' inside the DTD. It is only a validation error, if they are used, as in the examples above. For example, if e1.dtd in the first example is "> then the document is valid. So, even if I would analyze the replacement text in some clever way in my validation layer, this would not help, because there is currently no way, to get noticed about when a parameter entity is used (and in which markup context). Second the problem with standalone documents. If a document has a Standalone Document Declaration, this does not necessarily mean, it doesn't have any external entities. It means in the words of the recommendation: "In a standalone document declaration, the value "yes" indicates that there are no external markup declarations which affect the information passed from the XML processor to the application." External markup declarations could affect the information passed from the XML processor to the applications throu, for example, attribute defaults (if the defaulted attribute is omitted in the document) or entity declarations. Most of this can be handled in some way. But I think its not possible, to detect a special problem with attribute value normalization in a reliable way. Please take a look at this example (it follows the example in the XML recommendation, section 3.3.3) ]> with e3.dtd Since a validating parser must always read all external entities, expat knows, that the attribute a is of type NMTOKENS. If expat knows the type of an attribute, it does the additional attribute value normalization, described in 3.3.3. Therefor, the element start handler will see "A B" as Value of the attribute a. The problem is, that the information about the attribute type in the external entity has affected the information. If a would have the type CDATA, the attribute value would have been " A B ". Therefor, the standalone="yes" claim of this document is false, the document is not valid. If the documents reads like this: ]> (with the same e3.dtd) the document would be valid. And there is not way to know, that expat has done additional normalization according to the attribute type. Therefor, the both cases are indistinguishable from expat handler level. Well, has somebody really followed this explanations? (Well, my english is to clumsy, sorry.) On the other hand, this both problems are the only 'unsolvable' ones (without changing the expat sources, of course), that I'm aware of, which prevents one from writing full DTD validation on top of expat. rolf From karl@waclawek.net Wed Sep 25 04:12:03 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue, 24 Sep 2002 23:12:03 -0400 Subject: [Expat-discuss] Refering to XML Schema and Expat References: <200209250129.DAA30000@pointsman.pointsman.de> Message-ID: <003301c26441$52e81af0$0207a8c0@karl> > On 24 Sep, Karl Waclawek wrote: > > > >> But beside the deficiencies of my code, I should mention, that I > >> think, it's not possible, to write a 100 percent compliant DTD > >> validator on top of the current expat code. There are two major > >> problems, I see: It is not possible at the moment, to check for the > >> Validity constraint: Proper Declaration/PE Nesting, and second it is > >> not possible, to validate XML documents, that have standalone="yes" > >> and an external subset (the problem is, that there is no reliable way > >> to know, if expat has done additional normalization on attribute > >> types other than CDATA (see XML rec 3.3.3). If somebody is interested, > >> I may elaborate this notes a bit more. > > > > Please do - on both points! > > I should have known it.. ;-) Ok, since it's you, that asks. Before commenting below: Excellent read, thanks! > First the validity constraints Proper Declaration/PE Nesting and > Proper Group/PE Nesting. I've missed the second in my note, but both > are similar problems. > > Take a look at this example document > > > > > with this external subset e1.dtd > > "> > > Expat is perfectly happy with this. That's OK, since expat is a > well-formedness parser and don't have to care about validity > constraints. But this example doesn't fullfill the validity constraint > Proper Declaration/PE Nesting. If Expat reported entity boundaries for internal entity references, would you then be able to detect this error? > The same is true for this example of not fullfilled validity > constraint Proper Group/PE Nesting: > > > > > with e2.dtd > > > > > > > > Expat accepts this, which is of course OK. But this is not valid. (The > OASIS xml test suite includes a few tests for both constraints.) > > It does not help much, to analyse the replacement text of a parameter > entity inside the XML_EntityDeclHandler, to find such 'ill-formed' > parameter entities (beside, that this would be far from plain simple > to do). This is, because this parameter entities are not really > 'ill-formed' - they follow the production rules for a parameter entity > and are therefor 'legal' inside the DTD. It is only a validation > error, if they are used, as in the examples above. For example, if > e1.dtd in the first example is > > "> > > > then the document is valid. So, even if I would analyze the > replacement text in some clever way in my validation layer, this would > not help, because there is currently no way, to get noticed about when > a parameter entity is used (and in which markup context). OK, so there is where the InternalEntityRefHandler comes in, as a solution for both cases. As already discussed, this is on our roadmap, but only once the new API is in place which allows reporting of internal entity boundaries (PE or GE). Do you agree that we have a solution in our sights? At least, mid-term this might be implemented - all depending on time available. > Second the problem with standalone documents. If a document has a > Standalone Document Declaration, this does not necessarily mean, it > doesn't have any external entities. It means in the words of the > recommendation: > > "In a standalone document declaration, the value "yes" indicates > that there are no external markup declarations which affect the > information passed from the XML processor to the application." > > External markup declarations could affect the information passed from > the XML processor to the applications throu, for example, attribute > defaults (if the defaulted attribute is omitted in the document) or > entity declarations. Most of this can be handled in some way. But I > think its not possible, to detect a special problem with attribute > value normalization in a reliable way. > > Please take a look at this example (it follows the example in the XML > recommendation, section 3.3.3) > > > > > > ]> > > > with e3.dtd > > > a NMTOKENS #IMPLIED> > > Since a validating parser must always read all external entities, > expat knows, that the attribute a is of type NMTOKENS. If expat knows > the type of an attribute, it does the additional attribute value > normalization, described in 3.3.3. Therefor, the element start handler > will see "A B" as Value of the attribute a. The problem is, that the > information about the attribute type in the external entity has > affected the information. If a would have the type CDATA, the > attribute value would have been " A B ". Therefor, the > standalone="yes" claim of this document is false, the document is not > valid. > > If the documents reads like this: > > > > > > ]> > > > (with the same e3.dtd) the document would be valid. > > And there is not way to know, that expat has done additional > normalization according to the attribute type. Therefor, the both > cases are indistinguishable from expat handler level. Looking at the spec: The standalone document declaration must have the value "no" if any external markup declarations contain declarations of: a.. attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or b.. entities (other than amp, lt, gt, apos, quot), if references to those entities appear in the document, or c.. attributes with values subject to normalization, where the attribute appears in the document with a value which will change as a result of normalization, or d.. element types with element content, if white space occurs directly within any instance of those types. It seems none of these applies here. c) looks the closest, but there is no attribute value declared in the external subset. Is there another constraint that applies? > Well, has somebody really followed this explanations? (Well, my > english is to clumsy, sorry.) On the other hand, this both problems > are the only 'unsolvable' ones (without changing the expat sources, of > course), that I'm aware of, which prevents one from writing full DTD > validation on top of expat. If we could eliminate the second problem - depending on your reply - than we might have at least a solution already in the planning stages. Karl From karl@waclawek.net Wed Sep 25 04:44:44 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue, 24 Sep 2002 23:44:44 -0400 Subject: [Expat-discuss] Refering to XML Schema and Expat Message-ID: <004301c26445$e37ac4b0$0207a8c0@karl> > Looking at the spec: > > > The standalone document declaration must have the value "no" if any external > markup declarations contain declarations of: > a.. attributes with default values, if elements to which these attributes > apply appear in the document without specifications of values for these > attributes, or > b.. entities (other than amp, lt, gt, apos, quot), if references to those > entities appear in the document, or > c.. attributes with values subject to normalization, where the attribute > appears in the document with a value which will change as a result of > normalization, or > d.. element types with element content, if white space occurs directly > within any instance of those types. > > > It seems none of these applies here. c) looks the closest, but there > is no attribute value declared in the external subset. > Is there another constraint that applies? I checked the errata for this VC, and it contains a new version of point c) which reads: c.. attributes with tokenized types, where the attribute appears in the document with a value such that normalization will produce a different value from that which would be produced in the absence of the declaration, Now, that changes things. I don't see a way to solve this without having Expat do some validation specific work - maybe a flag indicating if any normalization beyond CDATA requirements was performed. Karl From rolf@pointsman.de Wed Sep 25 22:52:24 2002 From: rolf@pointsman.de (rolf@pointsman.de) Date: Wed, 25 Sep 2002 23:52:24 +0200 (MEST) Subject: [Expat-discuss] Refering to XML Schema and Expat In-Reply-To: <003301c26441$52e81af0$0207a8c0@karl> Message-ID: <200209252152.XAA31986@pointsman.pointsman.de> On 24 Sep, Karl Waclawek wrote: >> First the validity constraints Proper Declaration/PE Nesting and >> Proper Group/PE Nesting. >> [...] >> So, even if I would analyze the >> replacement text in some clever way in my validation layer, this would >> not help, because there is currently no way, to get noticed about when >> a parameter entity is used (and in which markup context). > > OK, so there is where the InternalEntityRefHandler comes in, > as a solution for both cases. > As already discussed, this is on our roadmap, but only once > the new API is in place which allows reporting of internal > entity boundaries (PE or GE). Do you agree that we have > a solution in our sights? Yes, as far as I see, this would help. >> Second the problem with standalone documents. >> [...] >> And there is not way to know, that expat has done additional >> normalization according to the attribute type. Therefor, the both >> cases are indistinguishable from expat handler level. Out of your other reply to my mail: > I don't see a way to solve this without > having Expat do some validation specific work - maybe a flag indicating > if any normalization beyond CDATA requirements was performed. Again, yes. Well, I wouldn't call this 'some validation specific work'. The attribute value normalization according to the attribute type is already done by expat (that's fine). It's just an additional little piece of information (more), provided by the parser. Please notice, that it's not enough to know, that expat had done additional normalization on one or some of the attribute values of an element, it's necessary to know on with attribute values this was done. rolf From karl@waclawek.net Thu Sep 26 01:05:45 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed, 25 Sep 2002 20:05:45 -0400 Subject: [Expat-discuss] Refering to XML Schema and Expat References: <200209252152.XAA31986@pointsman.pointsman.de> Message-ID: <000701c264f0$76b35cb0$0207a8c0@karl> > >> Second the problem with standalone documents. > >> [...] > >> And there is not way to know, that expat has done additional > >> normalization according to the attribute type. Therefor, the both > >> cases are indistinguishable from expat handler level. > > Out of your other reply to my mail: > > > I don't see a way to solve this without > > having Expat do some validation specific work - maybe a flag indicating > > if any normalization beyond CDATA requirements was performed. > > Again, yes. Well, I wouldn't call this 'some validation specific > work'. The attribute value normalization according to the attribute > type is already done by expat (that's fine). It's just an additional > little piece of information (more), provided by the parser. Please > notice, that it's not enough to know, that expat had done additional > normalization on one or some of the attribute values of an element, > it's necessary to know on with attribute values this was done. I just realized that with the new API Expat might not be able to perform full normalization anyway, since that requires two passes which would get in the way of reporting the data as a stream. So, the processing application will have to take the CDATA normalized attribute value, check if it is a type other than CDATA and pass it to some normalizeAttributeValue function which then performs the second level normalization returning a flag to indicate if the value was actually affected. Karl From jeremy.kloth@fourthought.com Fri Sep 27 17:54:43 2002 From: jeremy.kloth@fourthought.com (Jeremy Kloth) Date: Fri, 27 Sep 2002 10:54:43 -0600 Subject: [Expat-discuss] core dump with namespaces and external entities Message-ID: <200209271054.43270.jeremy.kloth@fourthought.com> ---------------------- multipart/mixed attachment Attached is a little python script which causes a segfault on my machine. I have narrowed the problem down to the xmlns attribute defined in the DTD. It only dumps core if, in the DTD, the name is 'xmlns'. Any other name works fine. I've tested this with expat 1.95.2 and 1.95.4 using pyexpat from python 2.2.1 and with 1.95.5 using PyXML from CVS. This is the back trace: #0 lookup (table=0x815be2c, name=0x0, createSize=0) at extensions/expat/lib/xmlparse.c:5175 5175 while (*s) (gdb) bt #0 lookup (table=0x815be2c, name=0x0, createSize=0) at extensions/expat/lib/xmlparse.c:5175 #1 0x4018f294 in dtdCopy (newDtd=0x815bdf0, oldDtd=0x8153ef0, parser=0x815bc90) at extensions/expat/lib/xmlparse.c:5011 #2 0x401896d4 in XML_ExternalEntityParserCreate (oldParser=0x8153d90, context=0x815d6e4 "xml=http://www.w3.org/XML/1998/namespace\fen", encodingName=0x0) at extensions/expat/lib/xmlparse.c:960 #3 0x40187547 in xmlparse_ExternalEntityParserCreate (self=0x8110d1c, args=0x812aff4) at extensions/pyexpat.c:1054 #4 0x080d4507 in PyCFunction_Call (func=0x8128cd8, arg=0x812aff4, kw=0x0) at Objects/methodobject.c:90 #5 0x08076e01 in eval_frame (f=0x8138b24) at Python/ceval.c:2004 #6 0x08077806 in PyEval_EvalCodeEx (co=0x815e168, globals=0x40, locals=0x0, args=0x815d1d2, argcount=4, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #7 0x080c79a6 in function_call (func=0x811d694, arg=0x81137dc, kw=0x0) at Objects/funcobject.c:374 #8 0x080b5a3e in PyObject_Call (func=0x0, arg=0x81137dc, kw=0x0) at Objects/abstract.c:1684 #9 0x080784ea in PyEval_CallObjectWithKeywords (func=0x811d694, arg=0x81137dc, kw=0x0) at Python/ceval.c:3049 #10 0x401886d0 in call_with_frame (c=0x0, func=0x811d694, args=0x81137dc) at extensions/pyexpat.c:335 #11 0x401868ef in my_ExternalEntityRefHandler (parser=0x2, context=0x815d7c8 "xml=http://www.w3.org/XML/1998/namespace\fen", base=0x0, systemId=0x811b960 "en.xml", publicId=0x0) at extensions/pyexpat.c:791 #12 0x4018a90a in doContent (parser=0x8153d90, startTagLevel=0, enc=0x401a9e40, s=0x8159c78 "&en;\n\n\t", end=0x8159c85 "\t", nextPtr=0x0) at extensions/expat/lib/xmlparse.c:1948 #13 0x4018be82 in doProlog (parser=0x8153d90, enc=0x401a9e40, s=0x8159c3b "\n&en;\n\n\t", end=0x8159c85 "\t", tok=1075383520, next=0x8159c3b "\n&en;\n\n\t", nextPtr=0x0) at extensions/expat/lib/xmlparse.c:1691 #14 0x40190cba in prologInitProcessor (parser=0x8153d90, s=0x8159be0 "\n\n]>\n\n&en;\n\n\t", end=0x8159c85 "\t", nextPtr=0x0) at extensions/expat/lib/xmlparse.c:3095 #15 0x40190848 in XML_Parse (parser=0x8153d90, s=0x815cf74 "\n\n]>\n\n&en;\n\n", len=0, isFinal=1) at extensions/expat/lib/xmlparse.c:1394 #16 0x40186f16 in xmlparse_Parse (self=0x8110d1c, args=0x815ddfc) at extensions/pyexpat.c:809 #17 0x080d4507 in PyCFunction_Call (func=0x8112b20, arg=0x815ddfc, kw=0x0) at Objects/methodobject.c:90 #18 0x08076e01 in eval_frame (f=0x8121594) at Python/ceval.c:2004 #19 0x08077806 in PyEval_EvalCodeEx (co=0x815e5e0, globals=0x2, locals=0x811a634, args=0x815bc7c, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #20 0x08079b65 in PyEval_EvalCode (co=0x815e5e0, globals=0x811a634, locals=0x811a634) at Python/ceval.c:483 #21 0x0809e759 in run_node (n=0x812b690, filename=0xbffff87c "parse.py", globals=0x811a634, locals=0x811a634, flags=0xbffff688) at Python/pythonrun.c:1079 #22 0x0809e6ce in PyRun_FileExFlags (fp=0x810adc8, filename=0xbffff87c "parse.py", start=257, globals=0x811a634, locals=0x811a634, closeit=1, flags=0xbffff688) at Python/pythonrun.c:1057 #23 0x0809dfe3 in PyRun_SimpleFileExFlags (fp=0xbffff688, filename=0xbffff87c "parse.py", closeit=-1073743748, flags=0xbffff688) at Python/pythonrun.c:685 #24 0x080531ac in Py_Main (argc=-1073743748, argv=0xbffff714) at Modules/main.c:364 #25 0x08052d17 in main (argc=2, argv=0xbffff714) at Modules/python.c:10 #26 0x40078082 in __libc_start_main () from /lib/i686/libc.so.6 Jeremy Kloth ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: parse.py Type: text/x-python Size: 810 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020927/cd598881/parse.py ---------------------- multipart/mixed attachment-- From karl@waclawek.net Fri Sep 27 18:29:07 2002 From: karl@waclawek.net (Karl Waclawek) Date: Fri, 27 Sep 2002 13:29:07 -0400 Subject: [Expat-discuss] core dump with namespaces and external entities References: <200209271054.43270.jeremy.kloth@fourthought.com> Message-ID: <012701c2664b$6245ff10$9e539696@citkwaclaww2k> > Attached is a little python script which causes a segfault on my machine. I > have narrowed the problem down to the xmlns attribute defined in the DTD. It > only dumps core if, in the DTD, the name is 'xmlns'. Any other name works > fine. > > I've tested this with expat 1.95.2 and 1.95.4 using pyexpat from python 2.2.1 > and with 1.95.5 using PyXML from CVS. I extracted the files from your script and tested with a completely different application and got the same result, a NULL pointer exception. Please file a bug report. I will see if I can dig into it this weekend. Karl From hao.mi@sun.com Sat Sep 28 07:26:11 2002 From: hao.mi@sun.com (Michael Mi) Date: Sat, 28 Sep 2002 14:26:11 +0800 Subject: [Expat-discuss] Does Expat 1.95.5 support DTD validating? Message-ID: <024e01c266b7$f0a74ea0$84d99e81@BATTER> This is a multi-part message in MIME format. ---------------------- multipart/alternative attachment As described in the document of 1.95.5, there is a new function = XML_UseForeignDTD added. I am not sure what this function is used for = exactly. But my question is that is the new version of Expat a = validating parser? Thanks Michael ---------------------- multipart/alternative attachment-- From karl@waclawek.net Mon Sep 30 16:34:58 2002 From: karl@waclawek.net (Karl Waclawek) Date: Mon, 30 Sep 2002 11:34:58 -0400 Subject: [Expat-discuss] core dump with namespaces and external entities References: <200209271054.43270.jeremy.kloth@fourthought.com> <012701c2664b$6245ff10$9e539696@citkwaclaww2k> Message-ID: <000c01c26896$ef7cecc0$0207a8c0@karl> This is a multi-part message in MIME format. ---------------------- multipart/mixed attachment > > Attached is a little python script which causes a segfault on my machine. I > > have narrowed the problem down to the xmlns attribute defined in the DTD. It > > only dumps core if, in the DTD, the name is 'xmlns'. Any other name works > > fine. > > > > I've tested this with expat 1.95.2 and 1.95.4 using pyexpat from python 2.2.1 > > and with 1.95.5 using PyXML from CVS. > > I extracted the files from your script and tested with a completely > different application and got the same result, a NULL pointer exception. > > Please file a bug report. I will see if I can dig into it this weekend. I had some time and came up with a patch. This patch is fairly big, because I had to change parser->m_dtd to a separately allocated structure. That in itself can - of course - introduce new bugs. Let's hope not. Anyway, please test the attached patch. You need to apply the diff against xmlparse.c rev. 1.89 from CVS. Karl ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: Patch.zip Type: application/x-zip-compressed Size: 14983 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020930/8c68be2a/Patch.bin ---------------------- multipart/mixed attachment--