From JWieman@daktronics.com Mon Apr 8 09:35:05 2002 From: JWieman@daktronics.com (Joe Wiemann) Date: Mon Apr 8 08:35:05 2002 Subject: [Expat-discuss] Extended ascii Message-ID: Can someone provide me with example code to process extended ascii along = with example xml files and code.... From ntang99@hotmail.com Mon Apr 8 11:40:31 2002 From: ntang99@hotmail.com (Michael Tang) Date: Mon Apr 8 10:40:31 2002 Subject: [Expat-discuss] how to stop the parsing after error happens? Message-ID: ---------------------- multipart/alternative attachment Which API should be called to stop the parsing process when an applicatio= n level error is detected in the handler? thanks for any help, -MichaelGet more from the Web. FREE MSN Explorer download : http://explo= rer.msn.com ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020408/146023fb/attachment.html ---------------------- multipart/alternative attachment-- From carlos@pehoe.civil.ist.utl.pt Mon Apr 8 12:18:35 2002 From: carlos@pehoe.civil.ist.utl.pt (Carlos Pereira) Date: Mon Apr 8 11:18:35 2002 Subject: [Expat-discuss] Re: how to stop the parsing after error happens? Message-ID: <200204081810.TAA00684@pehoe.civil.ist.utl.pt> >Which API should be called to stop the parsing process when an applicatio= >n level error is detected in the handler? >thanks for any help, I think you are looking for something as this: if (my_error_flag == TRUE) XML_SetElementHandler (parser, NULL, NULL); This disables both start and end callbacks To start: XML_SetStartElementHandler (parser, my_start); XML_SetEndElementHandler (parser, my_end); or: XML_SetElementHandler (parser, my_start, my_end); To end: XML_SetStartElementHandler (parser, NULL); XML_SetEndElementHandler (parser, NULL); or: XML_SetElementHandler (parser, NULL, NULL); I am speaking from memory, check the manual for more details, Carlos From Axel.Kittenberger@maxxio.com Tue Apr 9 01:25:04 2002 From: Axel.Kittenberger@maxxio.com (Axel Kittenberger) Date: Tue Apr 9 00:25:04 2002 Subject: [Expat-discuss] Converting UT8 to Latin-1 Message-ID: <200204090724.JAA13263@merlin.gams.co.at> Hi! Although this is in high danger to be a FAQ I didn't find it: Does libexpat provide a convinient function to convert the UTF8 Strings passed to the Call-backs to latin-1 ? Thanks - Axel From Axel.Kittenberger@maxxio.com Tue Apr 9 05:44:04 2002 From: Axel.Kittenberger@maxxio.com (Axel Kittenberger) Date: Tue Apr 9 04:44:04 2002 Subject: [Expat-discuss] PATCH: just a bit doc... Message-ID: <200204091143.NAA25402@merlin.gams.co.at> Hi! This patch just adds two function headers (for a beginning :o) I consider function headers to be very very important when analysing/debugging/patching foreign source code. As it really significantly speeds up understandig the soure. ----------------------------------------------------------- --- xmltok/xmltok.c Tue Apr 9 13:26:39 2002 +++ xmltok/xmltok.new Tue Apr 9 12:17:57 2002 @@ -1115,6 +1115,15 @@ return result; } + +/***** + * Encodes the character 'c' to the buffer pointed by 'buf'. + * + * c ... in: character to encode. + * buf ... in: buffer to write to. + * + * Returns the number of bytes written ta 'buf' + */ int XmlUtf8Encode(int c, char *buf) { enum { @@ -1151,6 +1160,14 @@ return 0; } +/***** + * Encodes the character 'charNum' to the buffer pointed by 'buf'. + * + * charNum ... in: character to encode. + * buf ... in: buffer to write to. + * + * Returns the number of utf16 chars written to 'buf' + */ int XmlUtf16Encode(int charNum, unsigned short *buf) { if (charNum < 0) ----------------------------------------------------------- Second question: from the CVS, the MS developer Studio files in every directory. Is this really necessary? I have nothing personal against windows, but do we need ms studio files in every source directory? Can't they stay at home somewhere in /win32 where I can safely ignore them :o) Maybe can we use cygwin or mingw to build ports for windows? I would personally would highly encourage mingw or cygwin as compilers for ms over Ms-Studio. http://www.mingw.org http://www.cygwin.com/ Thanks - Axel From fdrake@acm.org Tue Apr 9 07:36:09 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Apr 9 06:36:09 2002 Subject: [Expat-discuss] Converting UT8 to Latin-1 In-Reply-To: <200204090724.JAA13263@merlin.gams.co.at> References: <200204090724.JAA13263@merlin.gams.co.at> Message-ID: <15538.60957.143769.771133@grendel.zope.com> Axel Kittenberger writes: > Although this is in high danger to be a FAQ I didn't find it: > > Does libexpat provide a convinient function to convert the UTF8 Strings > passed to the Call-backs to latin-1 ? No, it doesn't, and it probably shouldn't. There's nothing about the conversion that's specific to Expat. There are a number of libraries that include utilities like this. I'd take a look at recode (there's a library version of this these days) and ICU ("IBM Components for Unicode"). There are probably a number of others as well, but you'll need to look around with your technical and licensing requirements in mind. If anyone would like to contribute a list of such libraries, I'll be glad to add pointers to the website at expat.sourceforge.net, but I don't have time to undertake that myself. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From Axel.Kittenberger@maxxio.com Tue Apr 9 07:48:12 2002 From: Axel.Kittenberger@maxxio.com (Axel Kittenberger) Date: Tue Apr 9 06:48:12 2002 Subject: [Expat-discuss] Converting UT8 to Latin-1 In-Reply-To: <15538.60957.143769.771133@grendel.zope.com> References: <200204090724.JAA13263@merlin.gams.co.at> <15538.60957.143769.771133@grendel.zope.com> Message-ID: <200204091347.PAA31764@merlin.gams.co.at> > No, it doesn't, and it probably shouldn't. There's nothing about the > conversion that's specific to Expat. Oh, I just switched from libxml to expat, and adapted our software to it, because of the far smaller memory footprint. (BTW: _please_ keep a small memory footprint an important consideration for expat, 250K difference is a lot in an embedded world) Libxml2 could convert UTF8 to Latin, so I just thought it would do also, but doesn't matter and it's even good that way, as missing it keeps expat it's smaller footprint :o) I found the function iconv() of glibc fit's my needs well, and as we've glibc eitherway it doesn't add much extra size, just 30k extra for /usr/lib/gconv/ISO8859-1.so the approperiate glibc's gconv module. From Josh.Martin@abq.sc.philips.com Tue Apr 9 17:42:08 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue Apr 9 16:42:08 2002 Subject: [Expat-discuss] how to stop the parsing after error happens? Message-ID: <200204092340.RAA05903@abqn42.abq.sc.philips.com> > Which API should be called to stop the parsing process when an application level error is detected in the handler? > thanks for any help, > -MichaelGet more from the Web. FREE MSN Explorer download : http://explorer.msn.com I might not fully understand the question, but if you want to stop processing the document wouldn't you just terminate the loop that calls XML_Parse (or XML_ParseBuffer) and jump to your cleanup code (such as calling XML_ParserFree) that you call before termination? - Josh Martin From Josh.Martin@abq.sc.philips.com Tue Apr 9 17:54:03 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue Apr 9 16:54:03 2002 Subject: [Expat-discuss] Extended ascii Message-ID: <200204092353.RAA05911@abqn42.abq.sc.philips.com> By extended ascii do you mean the normal ascii 256 character set, or are you talking about Unicode, or what? - Josh Martin > Can someone provide me with example code to process extended ascii along with example xml files and code.... > > > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From g_expat@zewt.org Wed Apr 10 18:35:30 2002 From: g_expat@zewt.org (Glenn Maynard) Date: Wed Apr 10 17:35:30 2002 Subject: [Expat-discuss] Extended ascii In-Reply-To: <200204092353.RAA05911@abqn42.abq.sc.philips.com> References: <200204092353.RAA05911@abqn42.abq.sc.philips.com> Message-ID: <20020411003130.GA15850@zewt.org> *reordering poorly-ordered quoting ...* On Tue, Apr 09, 2002 at 05:53:03PM -0600, Josh Martin wrote: > > Can someone provide me with example code to process extended ascii along with > > example xml files and code.... > > By extended ascii do you mean the normal ascii 256 character set, or are you > talking about Unicode, or what? ASCII is a 7-bit character set, not an 8-bit set. Depending on your world, "extended ascii" might be referring to Windows codepage 437 (old DOS linedrawing characters) or ISO-8859-1 (Latin1), but it's incorrect to call either of those "ASCII" or "extended ASCII". (I doubt he's looking to use line-drawing characters in XML.) Of course, all he needs to do is change the encoding in his XML files. -- Glenn Maynard From marcjero@yahoo.com Mon Apr 15 15:21:53 2002 From: marcjero@yahoo.com (Jerome Marc) Date: Mon Apr 15 14:21:53 2002 Subject: [Expat-discuss] expatj Message-ID: <007e01c1e4b9$a2700ea0$0201a8c0@jma1> This is a multi-part message in MIME format. ---------------------- multipart/alternative attachment Hello, I would like to share some interesting work I did last week with Expat. I was looking for a very fast Java parser and I thought to write a Java = sax2 driver for Expat. So Expat is always a C compiled library and the Java sax classes work = with the library through JNI. High level api can easily be done with Java and we got a very fast, tiny = parser in native code. The question is : Is it much faster than a 100% pure Java parser ? The = answer seems to be YES. I know that JNI calls are much slower than = standard Java method invocation but my first tests shows that this = expatj is several times faster than Xerces-J sax2 ! I have to complete handlers management code and then I will try serious = benchmarks. Did Anyone experience some sax2 stuff on top of Expat ? I = suppose some experiences about that should help me a lot...=20 For example, SkippedEntity handler is a real nightmare :-) Thanks a lot about reactions and ideas. Jerome ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020415/26adc28a/attachment.html ---------------------- multipart/alternative attachment-- From karl@waclawek.net Mon Apr 15 16:23:14 2002 From: karl@waclawek.net (Karl Waclawek) Date: Mon Apr 15 15:23:14 2002 Subject: [Expat-discuss] expatj References: <007e01c1e4b9$a2700ea0$0201a8c0@jma1> Message-ID: <001201c1e4ce$51372950$0207a8c0@karl> > I have to complete handlers management code and then I will try serious benchmarks. > Did Anyone experience some sax2 stuff on top of Expat ? I suppose some experiences > about that should help me a lot... > For example, SkippedEntity handler is a real nightmare :-) Yes, I did a pretty complete SAX2 wrapper for Expat in Delphi. Even including my wrapper overhead, this is only marginally slower than MSXML4 called through pointer based COM interfaces. You are right, SkippedEntity is a dog, and there are a few other difficulties. It turned out to be more work than I thought. Also, it seems that Expat is more or less stagnant. In any case, you can have my stuff, if you understand Delphi. Karl From gstein@lyra.org Mon Apr 15 22:37:04 2002 From: gstein@lyra.org (Greg Stein) Date: Mon Apr 15 21:37:04 2002 Subject: [Expat-discuss] expatj In-Reply-To: <001201c1e4ce$51372950$0207a8c0@karl>; from karl@waclawek.net on Mon, Apr 15, 2002 at 06:38:51PM -0400 References: <007e01c1e4b9$a2700ea0$0201a8c0@jma1> <001201c1e4ce$51372950$0207a8c0@karl> Message-ID: <20020415213640.B22575@lyra.org> On Mon, Apr 15, 2002 at 06:38:51PM -0400, Karl Waclawek wrote: >... > Also, it seems that Expat is more or less stagnant. If it works, then why change it? :-) People always ask me whether I've dropped my "edna" project because I haven't released a version in over a year. The simple answer is that I feel it is "done". Expat solves its problems space, and it does it very well. What else is there to do? [ actually, we need a new release to get some bug fixes and build fixes out there in usage, but still... the parser seems 'done' ] Cheers, -g -- Greg Stein, http://www.lyra.org/ From djm@maccormack.net Tue Apr 16 06:05:02 2002 From: djm@maccormack.net (David MacCormack) Date: Tue Apr 16 05:05:02 2002 Subject: [Expat-discuss] expatj In-Reply-To: <20020415213640.B22575@lyra.org> Message-ID: I'd still like to see the following: http://sourceforge.net/tracker/index.php?func=detail&aid=429501&group_id=10127&atid=310127 Dave On Mon, 15 Apr 2002, Greg Stein wrote: > On Mon, Apr 15, 2002 at 06:38:51PM -0400, Karl Waclawek wrote: > >... > > Also, it seems that Expat is more or less stagnant. > > If it works, then why change it? :-) > > People always ask me whether I've dropped my "edna" project because I > haven't released a version in over a year. The simple answer is that I feel > it is "done". Expat solves its problems space, and it does it very well. > What else is there to do? > > [ actually, we need a new release to get some bug fixes and build fixes out > there in usage, but still... the parser seems 'done' ] > > Cheers, > -g > > -- David MacCormack djm@maccormack.net :wq damn! From karl@waclawek.net Tue Apr 16 07:25:02 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue Apr 16 06:25:02 2002 Subject: [Expat-discuss] expatj References: <007e01c1e4b9$a2700ea0$0201a8c0@jma1> <001201c1e4ce$51372950$0207a8c0@karl> <20020415213640.B22575@lyra.org> Message-ID: <005f01c1e549$fc84cee0$9e539696@citkwaclaww2k> > On Mon, Apr 15, 2002 at 06:38:51PM -0400, Karl Waclawek wrote: > >... > > Also, it seems that Expat is more or less stagnant. > > If it works, then why change it? :-) There are a few bugs. I don't even think all of them have been formally reported. For instance, the default handler reports a lot of DTD data (don't remember if all or just most), even if you are handling them. It is not supposed to do this. I had hoped I could use it to report top level whitespace. There is also one with regards to James Clark's test cases, files .../valid/not-sa/003.xml and .../valid/not-sa/004.xml, which Expat has problems with. I reported this to the list last November. Expat does not respect this xml rule: Well-formedness constraint: PEs in Internal Subset In the internal DTD subset, parameter-entity references can occur only where markup declarations can occur, not within markup declarations. (This does not apply to references that occur in external parameter entities or to the external subset.) That is, Expat reports an error for external parameter entities too. IMO, this is because a child parser in Expat does not know that it is a child parser - and therefore is processing an external entity. > People always ask me whether I've dropped my "edna" project because I > haven't released a version in over a year. The simple answer is that I feel > it is "done". Expat solves its problems space, and it does it very well. > What else is there to do? For instance, one could add some form of stop/start functions, so that it would be easier to implement the new pull APIs on top of Expat. Those things have been suggested on the list. > [ actually, we need a new release to get some bug fixes and build fixes out > there in usage, but still... the parser seems 'done' ] Yep. Karl From fdrake@acm.org Tue Apr 16 08:03:13 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Apr 16 07:03:13 2002 Subject: [Expat-discuss] expatj In-Reply-To: <005f01c1e549$fc84cee0$9e539696@citkwaclaww2k> References: <007e01c1e4b9$a2700ea0$0201a8c0@jma1> <001201c1e4ce$51372950$0207a8c0@karl> <20020415213640.B22575@lyra.org> <005f01c1e549$fc84cee0$9e539696@citkwaclaww2k> Message-ID: <15548.12043.370131.792518@grendel.zope.com> Karl Waclawek writes: > There are a few bugs. Definately! I'd really like to get them fixed, but haven't had time to review most of the patches that have been submitted. I tried to clean out some dead wood from the trackers last night, but didn't get very far. > I don't even think all of them have been formally reported. > For instance, the default handler reports a lot of DTD data > (don't remember if all or just most), even if you are handling them. > It is not supposed to do this. I had hoped I could use it to > report top level whitespace. If there isn't an open bug in the tracker, assume the email got lost. I probably saw it, but if I didn't have time to look at it, the email is buried somewhere deep in the bowels of my mailer. The tracker is much better for bug tracking than the list. > There is also one with regards to James Clark's test cases, > files .../valid/not-sa/003.xml and .../valid/not-sa/004.xml, > which Expat has problems with. I reported this to the list > last November. Expat does not respect this xml rule: Again, this need to be in the tracker. > For instance, one could add some form of stop/start functions, so that > it would be easier to implement the new pull APIs on top of Expat. > Those things have been suggested on the list. They need to be in the tracker. Feel free to submit feature requests as "bugs" and set the "Group" to "Feature Request". They'll get a lower priority than actual bugs, but they'll get lost otherwise. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Tue Apr 16 08:18:53 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Apr 16 07:18:53 2002 Subject: [Expat-discuss] expatj In-Reply-To: <20020415213640.B22575@lyra.org> References: <007e01c1e4b9$a2700ea0$0201a8c0@jma1> <001201c1e4ce$51372950$0207a8c0@karl> <20020415213640.B22575@lyra.org> Message-ID: <15548.12889.477027.194882@grendel.zope.com> Greg Stein writes: > [ actually, we need a new release to get some bug fixes and build fixes out > there in usage, but still... the parser seems 'done' ] So does this mean you're looking at the build issues assigned to you? ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From lshen@cisco.com Tue Apr 16 13:01:11 2002 From: lshen@cisco.com (Lin Shen) Date: Tue Apr 16 12:01:11 2002 Subject: [Expat-discuss] Encoding Message-ID: <005301c1e578$f1bd8450$738b6b80@lshen> This is a multi-part message in MIME format. ---------------------- multipart/alternative attachment Hi, I'm using Expat to parse my VoiceXML document. Just wonder which API = should I use to get the encoding specified in the document by in my application? I'm using Expat = 1.1. thanks lin ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020416/9a376de2/attachment.html ---------------------- multipart/alternative attachment-- From fdrake@acm.org Tue Apr 16 13:20:49 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Apr 16 12:20:49 2002 Subject: [Expat-discuss] Encoding In-Reply-To: <005301c1e578$f1bd8450$738b6b80@lshen> References: <005301c1e578$f1bd8450$738b6b80@lshen> Message-ID: <15548.30588.378042.616388@grendel.zope.com> Lin Shen writes: > I'm using Expat to parse my VoiceXML document. Just wonder which > API should I use to get the encoding specified in the document by > in my application? I'm using > Expat 1.1. If the actual encoding and the encoding pseudo-attribute of the XML declaration, then you don't need to do anything; just pass NULL as the first parameter to XML_ParserCreate() or XML_ParserCreateNS(). If you know the actual encoding, pass it as a string to one of those functions. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From lshen@cisco.com Tue Apr 16 13:36:16 2002 From: lshen@cisco.com (Lin Shen) Date: Tue Apr 16 12:36:16 2002 Subject: [Expat-discuss] Encoding References: <005301c1e578$f1bd8450$738b6b80@lshen> <15548.30588.378042.616388@grendel.zope.com> Message-ID: <006501c1e57b$eef61f90$738b6b80@lshen> Fred, Maybe I didn't state it clear in my last mail. What I'm looking for is a way to get the encoding pesudo-attribute of the XML declaration. I'm not sure which callback handler suits this case. I'm using element handler in my applicaiton right now, but it skips . And I also tried the default handler, it passes everything (including comments) back to the application, which meaks me wonder if there is a better way. thanks lin ----- Original Message ----- From: "Fred L. Drake, Jr." To: "Lin Shen" Cc: Sent: Tuesday, April 16, 2002 12:11 PM Subject: Re: [Expat-discuss] Encoding > > Lin Shen writes: > > I'm using Expat to parse my VoiceXML document. Just wonder which > > API should I use to get the encoding specified in the document by > > in my application? I'm using > > Expat 1.1. > > If the actual encoding and the encoding pseudo-attribute of the XML > declaration, then you don't need to do anything; just pass NULL as the > first parameter to XML_ParserCreate() or XML_ParserCreateNS(). If you > know the actual encoding, pass it as a string to one of those > functions. > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Zope Corporation From gstein@lyra.org Tue Apr 16 14:47:12 2002 From: gstein@lyra.org (Greg Stein) Date: Tue Apr 16 13:47:12 2002 Subject: [Expat-discuss] expatj In-Reply-To: <15548.12889.477027.194882@grendel.zope.com>; from fdrake@acm.org on Tue, Apr 16, 2002 at 10:16:57AM -0400 References: <007e01c1e4b9$a2700ea0$0201a8c0@jma1> <001201c1e4ce$51372950$0207a8c0@karl> <20020415213640.B22575@lyra.org> <15548.12889.477027.194882@grendel.zope.com> Message-ID: <20020416134516.D23582@lyra.org> On Tue, Apr 16, 2002 at 10:16:57AM -0400, Fred L. Drake, Jr. wrote: > > Greg Stein writes: > > [ actually, we need a new release to get some bug fixes and build fixes out > > there in usage, but still... the parser seems 'done' ] > > So does this mean you're looking at the build issues assigned to you? > ;-) I certainly can. I recall doing the same back in November or something. Looks like it won't be until next week, though. I'm definitely up for making a release RSN. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Tue Apr 16 14:56:18 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Apr 16 13:56:18 2002 Subject: [Expat-discuss] expatj In-Reply-To: <20020416134516.D23582@lyra.org> References: <007e01c1e4b9$a2700ea0$0201a8c0@jma1> <001201c1e4ce$51372950$0207a8c0@karl> <20020415213640.B22575@lyra.org> <15548.12889.477027.194882@grendel.zope.com> <20020416134516.D23582@lyra.org> Message-ID: <15548.36817.874526.573678@grendel.zope.com> Greg Stein writes: > I certainly can. I recall doing the same back in November or something. > Looks like it won't be until next week, though. You did. I'll try to push on the actual bugfixes. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Tue Apr 16 15:36:01 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Apr 16 14:36:01 2002 Subject: [Expat-discuss] Encoding In-Reply-To: <006501c1e57b$eef61f90$738b6b80@lshen> References: <005301c1e578$f1bd8450$738b6b80@lshen> <15548.30588.378042.616388@grendel.zope.com> <006501c1e57b$eef61f90$738b6b80@lshen> Message-ID: <15548.31751.920250.24698@grendel.zope.com> Lin Shen writes: > Maybe I didn't state it clear in my last mail. What I'm looking for is a way > to get the encoding pesudo-attribute of the XML declaration. I'm not sure > which callback handler suits this case. I'm using element handler in my You're right; I did misunderstand. You should use the XML declaration handler, set using XML_SetXmlDeclHandler(). The version, encoding, and standalone values are passed to the handler as separate parameters. You need to be using a 1.95.x version of Expat, though. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From lshen@cisco.com Tue Apr 16 15:45:05 2002 From: lshen@cisco.com (Lin Shen) Date: Tue Apr 16 14:45:05 2002 Subject: [Expat-discuss] Encoding References: <005301c1e578$f1bd8450$738b6b80@lshen><15548.30588.378042.616388@grendel.zope.com><006501c1e57b$eef61f90$738b6b80@lshen> <15548.31751.920250.24698@grendel.zope.com> Message-ID: <00c401c1e58f$acb0b370$738b6b80@lshen> That's why I couldn't find the right API. I'm using Expat 1.1 still. lin ----- Original Message ----- From: "Fred L. Drake, Jr." To: "Lin Shen" Cc: Sent: Tuesday, April 16, 2002 12:31 PM Subject: Re: [Expat-discuss] Encoding > > Lin Shen writes: > > Maybe I didn't state it clear in my last mail. What I'm looking for is a way > > to get the encoding pesudo-attribute of the XML declaration. I'm not sure > > which callback handler suits this case. I'm using element handler in my > > You're right; I did misunderstand. > > You should use the XML declaration handler, set using > XML_SetXmlDeclHandler(). The version, encoding, and standalone values > are passed to the handler as separate parameters. You need to be > using a 1.95.x version of Expat, though. > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Zope Corporation From JWieman@daktronics.com Wed Apr 17 12:17:09 2002 From: JWieman@daktronics.com (Joe Wiemann) Date: Wed Apr 17 11:17:09 2002 Subject: [Expat-discuss] Extended ascii Message-ID: This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. ---------------------- multipart/mixed attachment why won't this file parse with expat >>> "Joe Wiemann" 04/08/02 10:29AM >>> Can someone provide me with example code to process extended ascii along = with example xml files and code.... _______________________________________________ Expat-discuss mailing list Expat-discuss@lists.sourceforge.net=20 https://lists.sourceforge.net/lists/listinfo/expat-discuss ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: utfx.xml Type: text/xml Size: 65 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020417/c0fd3d67/utfx.xml ---------------------- multipart/mixed attachment-- From JWieman@daktronics.com Wed Apr 17 12:26:05 2002 From: JWieman@daktronics.com (Joe Wiemann) Date: Wed Apr 17 11:26:05 2002 Subject: [Expat-discuss] Extended ascii Message-ID: This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. ---------------------- multipart/mixed attachment this one won't either >>> "Joe Wiemann" 04/17/02 01:15PM >>> why won't this file parse with expat >>> "Joe Wiemann" 04/08/02 10:29AM >>> Can someone provide me with example code to process extended ascii along = with example xml files and code.... _______________________________________________ Expat-discuss mailing list Expat-discuss@lists.sourceforge.net=20 https://lists.sourceforge.net/lists/listinfo/expat-discuss=20 ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: utfx.xml Type: text/xml Size: 94 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020417/88f4c3f2/utfx.xml ---------------------- multipart/mixed attachment-- From karl@waclawek.net Wed Apr 17 12:44:03 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed Apr 17 11:44:03 2002 Subject: [Expat-discuss] Extended ascii References: Message-ID: <009f01c1e63f$b6e1bd20$9e539696@citkwaclaww2k> > why won't this file parse with expat You haven't specified an encoding. If no encoding is specified, the parser assumes UTF-8 or UTF-16. I got it to parse with specifying "ISO-8859-1", but I am not sure it is the right encoding. Karl From Philippe.Casgrain@corel.com Wed Apr 17 12:47:25 2002 From: Philippe.Casgrain@corel.com (Philippe Casgrain) Date: Wed Apr 17 11:47:25 2002 Subject: [Expat-discuss] Extended ascii Message-ID: <2563EFDB68120F48B9F511B7957504365BA58F@OTT-VSVR2.corelcorp.corel.ics> This is a multi-part message in MIME format. ---------------------- multipart/mixed attachment The second file (with joe between jack tags) parses fine with my expat 1.95.2, as one would expect. The first one doesn't parse because it contains a high-ascii character which should be ö (or something like it) instead of "=F6". If I'm wrong, please correct me :-) I'm actually going through this process today, as I incorporated expat in my app and found a few files which had high-ascii in them. Philippe Casgrain -------- Message d'origine-------- De: Joe Wiemann Date: mer. 2002-04-17 14:24 =C0: expat-discuss@lists.sourceforge.net Cc:=09 Objet: Re: [Expat-discuss] Extended ascii this one won't either >>> "Joe Wiemann" 04/17/02 01:15PM >>> why won't this file parse with expat >>> "Joe Wiemann" 04/08/02 10:29AM >>> Can someone provide me with example code to process extended ascii along with example xml files and code.... _______________________________________________ Expat-discuss mailing list Expat-discuss@lists.sourceforge.net=20 https://lists.sourceforge.net/lists/listinfo/expat-discuss=20 ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3903 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020417/cbb3bc4d/attachment.bin ---------------------- multipart/mixed attachment-- From JWieman@daktronics.com Wed Apr 17 13:35:00 2002 From: JWieman@daktronics.com (Joe Wiemann) Date: Wed Apr 17 12:35:00 2002 Subject: [Expat-discuss] Extended ascii Message-ID: appears it was a problem with 1.95.1=20 ok now next question. anyone have an UTF8Decoder to convert back to a standard 256 char ascii = string?? >>> "Joe Wiemann" 04/17/02 01:15PM >>> why won't this file parse with expat >>> "Joe Wiemann" 04/08/02 10:29AM >>> Can someone provide me with example code to process extended ascii along = with example xml files and code.... _______________________________________________ Expat-discuss mailing list Expat-discuss@lists.sourceforge.net=20 https://lists.sourceforge.net/lists/listinfo/expat-discuss=20 From carlos@pehoe.civil.ist.utl.pt Wed Apr 17 13:58:21 2002 From: carlos@pehoe.civil.ist.utl.pt (Carlos Pereira) Date: Wed Apr 17 12:58:21 2002 Subject: [Expat-discuss] discarding http header Message-ID: <200204171926.UAA13444@pehoe.civil.ist.utl.pt> Hi there, When my HTTP client code asks for a remote file, I receive the (HTTP server produced) header and the body (the file itself) which then I intend to send to Expat, exactly as I currently do with local files. Is there a simple way to discard the HTTP header, keeping only the real thing, i.e. the XML file? Although this question might be not strictly about Expat, people in this list might have experience on this, Using something as libcurl looks as an overkill to me (http://curl.haxx.se/libcurl/), actually my 50-line socket code is working fine, I just need to know what is the more reliable, simpler way of sending this header to /dev/null :-) Thanks a lot! Carlos From dcrowley@scitegic.com Wed Apr 17 14:14:47 2002 From: dcrowley@scitegic.com (David Crowley) Date: Wed Apr 17 13:14:47 2002 Subject: [Expat-discuss] Extended ascii In-Reply-To: Message-ID: <5.1.0.14.0.20020417130314.0299fd68@pop.business.earthlink.net> At 12:16 PM 4/17/2002, you wrote: > appears it was a problem with 1.95.1 > >ok now next question. > >anyone have an UTF8Decoder to convert back to a standard 256 char ascii >string?? here's a c++ method. inline bool sp_UTF8_UCS(const char *& utf8, int& c) { int b = (unsigned char)*utf8++; if (b <= 0x7F) { c = b; } else if ((b & 0xE0) == 0xC0) { /* 110xxxxx 10xxxxxx */ c = (b & 0x1F) << 6; b = *utf8++; c |= b & 0x3F; } else if ((b & 0xF0) == 0xE0) { /* 1110xxxx + 2 */ c = (b & 0x0F) << 12; b = *utf8++; c |= (b & 0x3F) << 6; b = *utf8++; c |= b & 0x3F; } else if ((b & 0xF1) == 0xF0) { /* 11110xxx + 3 */ c = (b & 0x0F) << 18; b = *utf8++; c |= (b & 0x3F) << 12; b = *utf8++; c |= (b & 0x3F) << 6; b = *utf8++; c |= b & 0x3F; } else if ((b & 0xFD) == 0xF8) { /* 111110xx + 4 */ c = (b & 0x0F) << 24; b = *utf8++; c |= (b & 0x0F) << 18; b = *utf8++; c |= (b & 0x3F) << 12; b = *utf8++; c |= (b & 0x3F) << 6; b = *utf8++; c |= b & 0x3F; } else if ((b & 0xFE) == 0xFC) { /* 1111110x + 5 */ c = (b & 0x0F) << 30; b = *utf8++; c |= (b & 0x0F) << 24; b = *utf8++; c |= (b & 0x0F) << 18; b = *utf8++; c |= (b & 0x3F) << 12; b = *utf8++; c |= (b & 0x3F) << 6; b = *utf8++; c |= b & 0x3F; } else { /* Error */ return false; } return true; } From Josh.Martin@abq.sc.philips.com Wed Apr 17 19:48:03 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Wed Apr 17 18:48:03 2002 Subject: [Expat-discuss] Extended ascii Message-ID: <200204180146.TAA09600@abqn42.abq.sc.philips.com> Heh, and all these years I thought the line drawing characters were part of the ASCII character set. And yet I still knew that terminal programs could transmit ASCII characters in 7-bit chunks. Well now I'm just a little confused. - Josh Martin > *reordering poorly-ordered quoting ...* > > On Tue, Apr 09, 2002 at 05:53:03PM -0600, Josh Martin wrote: > > > Can someone provide me with example code to process extended ascii along with > > > example xml files and code.... > > > > By extended ascii do you mean the normal ascii 256 character set, or are you > > talking about Unicode, or what? > > ASCII is a 7-bit character set, not an 8-bit set. Depending on your world, > "extended ascii" might be referring to Windows codepage 437 (old DOS > linedrawing characters) or ISO-8859-1 (Latin1), but it's incorrect to > call either of those "ASCII" or "extended ASCII". (I doubt he's looking > to use line-drawing characters in XML.) > > Of course, all he needs to do is change the encoding in his XML files. > > > -- > Glenn Maynard > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From Axel.Kittenberger@maxxio.com Thu Apr 18 02:05:02 2002 From: Axel.Kittenberger@maxxio.com (Axel Kittenberger) Date: Thu Apr 18 01:05:02 2002 Subject: [Expat-discuss] Extended ascii In-Reply-To: References: Message-ID: <200204180804.KAA23021@merlin.gams.co.at> On Wednesday 17 April 2002 21:16, Joe Wiemann wrote: > appears it was a problem with 1.95.1 > > ok now next question. > > anyone have an UTF8Decoder to convert back to a standard 256 char ascii > string?? What do you mean by "standard"? Latin-1? (ISO-8859-1) ? If you've a system with a glibc you can use iconv(). I believe iconv() is even in some POSIX standard you can use it on all mature systems. (that excludes windows :-P (as far I know....)) From Axel.Kittenberger@maxxio.com Thu Apr 18 02:07:03 2002 From: Axel.Kittenberger@maxxio.com (Axel Kittenberger) Date: Thu Apr 18 01:07:03 2002 Subject: [Expat-discuss] Extended ascii In-Reply-To: <5.1.0.14.0.20020417130314.0299fd68@pop.business.earthlink.net> References: <5.1.0.14.0.20020417130314.0299fd68@pop.business.earthlink.net> Message-ID: <200204180806.KAA23106@merlin.gams.co.at> > >anyone have an UTF8Decoder to convert back to a standard 256 char ascii > >string?? > > here's a c++ method. > > inline bool > sp_UTF8_UCS(const char *& utf8, int& c) I guess by ""standard 256 char ascii"" he ment Latin-1. UCS isn't Latin, right? From JWieman@daktronics.com Thu Apr 18 09:42:59 2002 From: JWieman@daktronics.com (Joe Wiemann) Date: Thu Apr 18 08:42:59 2002 Subject: [Expat-discuss] XML_ParserReset Message-ID: How is the reset functionality coming along =AF has anyone had time to = work on it yet.... I think this would be a great utility especially for those of us using = this on an embedded platform where memory fragmentation is a big concern. From gstein@lyra.org Thu Apr 18 11:18:02 2002 From: gstein@lyra.org (Greg Stein) Date: Thu Apr 18 10:18:02 2002 Subject: [Expat-discuss] XML_ParserReset In-Reply-To: ; from JWieman@daktronics.com on Thu, Apr 18, 2002 at 10:39:22AM -0500 References: Message-ID: <20020418101756.G26875@lyra.org> On Thu, Apr 18, 2002 at 10:39:22AM -0500, Joe Wiemann wrote: > > > How is the reset functionality coming along - has anyone had time to work on it yet.... > > I think this would be a great utility especially for those of us using this on an embedded platform where memory fragmentation is a big concern. Expat is a volunteer project. The best way to see functionality added to it is if you supply a patch :-) At a minimum, please ensure that you've added your feature request(s) to the tracker at http://sf.net/projects/expat/ Cheers, -g -- Greg Stein, http://www.lyra.org/ From JWieman@daktronics.com Thu Apr 18 11:45:12 2002 From: JWieman@daktronics.com (Joe Wiemann) Date: Thu Apr 18 10:45:12 2002 Subject: [Expat-discuss] XML_ParserReset Message-ID: I know this -- a patch has already been submitted for this >>> Greg Stein 04/18/02 12:17PM >>> On Thu, Apr 18, 2002 at 10:39:22AM -0500, Joe Wiemann wrote: >=20 >=20 > How is the reset functionality coming along - has anyone had time to = work on it yet.... >=20 > I think this would be a great utility especially for those of us using = this on an embedded platform where memory fragmentation is a big concern. Expat is a volunteer project. The best way to see functionality added to = it is if you supply a patch :-) At a minimum, please ensure that you've added your feature request(s) to = the tracker at http://sf.net/projects/expat/=20 Cheers, -g --=20 Greg Stein, http://www.lyra.org/ From dcrowley@scitegic.com Thu Apr 18 12:03:03 2002 From: dcrowley@scitegic.com (David Crowley) Date: Thu Apr 18 11:03:03 2002 Subject: [Expat-discuss] XML_ParserReset In-Reply-To: <20020418101756.G26875@lyra.org> References: Message-ID: <5.1.0.14.0.20020418110100.027d7148@pop.business.earthlink.net> I supplied the patch like 6 months ago.... At 10:17 AM 4/18/2002, Greg Stein wrote: >On Thu, Apr 18, 2002 at 10:39:22AM -0500, Joe Wiemann wrote: > > > > > > How is the reset functionality coming along - has anyone had time to > work on it yet.... > > > > I think this would be a great utility especially for those of us using > this on an embedded platform where memory fragmentation is a big concern. > >Expat is a volunteer project. The best way to see functionality added to it >is if you supply a patch :-) > >At a minimum, please ensure that you've added your feature request(s) to the >tracker at http://sf.net/projects/expat/ > >Cheers, >-g > >-- >Greg Stein, http://www.lyra.org/ > >_______________________________________________ >Expat-discuss mailing list >Expat-discuss@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/expat-discuss Got SOAP? http://easysoap.sourceforge.net From dcrowley@scitegic.com Thu Apr 18 12:16:14 2002 From: dcrowley@scitegic.com (David Crowley) Date: Thu Apr 18 11:16:14 2002 Subject: [Expat-discuss] XML_ParserReset In-Reply-To: <5.1.0.14.0.20020418110100.027d7148@pop.business.earthlink. net> References: <20020418101756.G26875@lyra.org> Message-ID: <5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net> At 11:01 AM 4/18/2002, David Crowley wrote: >I supplied the patch like 6 months ago.... Sorry. It was 8 months ago: http://sourceforge.net/tracker/index.php?func=detail&aid=450608&group_id=10127&atid=310127 From karl@waclawek.net Thu Apr 18 12:43:06 2002 From: karl@waclawek.net (Karl Waclawek) Date: Thu Apr 18 11:43:06 2002 Subject: [Expat-discuss] XML_ParserReset References: <20020418101756.G26875@lyra.org> <5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net> Message-ID: <002b01c1e708$cb4e3760$9e539696@citkwaclaww2k> > At 11:01 AM 4/18/2002, David Crowley wrote: > > >I supplied the patch like 6 months ago.... > > Sorry. It was 8 months ago: > > http://sourceforge.net/tracker/index.php?func=detail&aid=450608&group_id=10127&atid=310127 > Do you think the principle behind your patch could be used not to reset, but to remember the state of the parser? I am asking because I recently made a feature request regarding new functions suspend(), resume(), abort(), so that the new pull APIs can be supported. Karl From gstein@lyra.org Thu Apr 18 15:53:09 2002 From: gstein@lyra.org (Greg Stein) Date: Thu Apr 18 14:53:09 2002 Subject: [Expat-discuss] XML_ParserReset In-Reply-To: <5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net>; from dcrowley@scitegic.com on Thu, Apr 18, 2002 at 11:15:23AM -0700 References: <20020418101756.G26875@lyra.org> <5.1.0.14.0.20020418110100.027d7148@pop.business.earthlink. net> <5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net> Message-ID: <20020418144847.A28124@lyra.org> On Thu, Apr 18, 2002 at 11:15:23AM -0700, David Crowley wrote: > At 11:01 AM 4/18/2002, David Crowley wrote: > > >I supplied the patch like 6 months ago.... > > Sorry. It was 8 months ago: > > http://sourceforge.net/tracker/index.php?func=detail&aid=450608&group_id=10127&atid=310127 Cool. As long as its in the tracker, then we aren't going to forget about it. On the other hand, I think Fred is the one to bug about this one :-) Personally, I think something like ParserReset is a fine addition. Cheers, -g -- Greg Stein, http://www.lyra.org/ From dcrowley@scitegic.com Thu Apr 18 17:12:03 2002 From: dcrowley@scitegic.com (David Crowley) Date: Thu Apr 18 16:12:03 2002 Subject: [Expat-discuss] XML_ParserReset In-Reply-To: <002b01c1e708$cb4e3760$9e539696@citkwaclaww2k> References: <20020418101756.G26875@lyra.org> <5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net> Message-ID: <5.1.0.14.0.20020418160749.03859870@pop.business.earthlink.net> At 11:42 AM 4/18/2002, Karl Waclawek wrote: >Do you think the principle behind your patch could be used >not to reset, but to remember the state of the parser? > >I am asking because I recently made a feature request regarding >new functions suspend(), resume(), abort(), so that the new >pull APIs can be supported. > >Karl No, I don't think so. Those functions would require additional changes in very different places. My patch was more about being able to parse new documents without creating a new XMLParser object, and preserving the memory allocated across the parsing calls. It speeds up parsing multiple documents noticeably. From fdrake@acm.org Thu Apr 18 20:19:03 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Apr 18 19:19:03 2002 Subject: [Expat-discuss] XML_ParserReset In-Reply-To: <20020418144847.A28124@lyra.org> References: <20020418101756.G26875@lyra.org> <5.1.0.14.0.20020418110100.027d7148@pop.business.earthlink. net> <5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net> <20020418144847.A28124@lyra.org> Message-ID: <15551.32349.304396.826008@grendel.zope.com> Greg Stein writes: > Cool. As long as its in the tracker, then we aren't going to forget about > it. On the other hand, I think Fred is the one to bug about this one :-) > > Personally, I think something like ParserReset is a fine addition. I definately agree. I really want to get this and a fix for the XML_SetReturnNSTriplet() bug in the next release. The availability of time to really review and write test cases has been the bottleneck. (The XML_SetReturnNSTriplet() fix is very valuable for Level 2 DOM builders, which is my primary interest in that issue.) -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From JWieman@daktronics.com Fri Apr 19 16:28:02 2002 From: JWieman@daktronics.com (Joe Wiemann) Date: Fri Apr 19 15:28:02 2002 Subject: [Expat-discuss] ISO-14962-1997 Message-ID: Anyone been successful in parsing this encoding with EXPAT. From karl@waclawek.net Fri Apr 19 18:10:13 2002 From: karl@waclawek.net (Karl Waclawek) Date: Fri Apr 19 17:10:13 2002 Subject: [Expat-discuss] XML_ParserReset References: <20020418101756.G26875@lyra.org><5.1.0.14.0.20020418110100.027d7148@pop.business.earthlink. net><5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net><20020418144847.A28124@lyra.org> <15551.32349.304396.826008@grendel.zope.com> Message-ID: <000901c1e801$f0934840$0207a8c0@karl> > Greg Stein writes: > > Cool. As long as its in the tracker, then we aren't going to forget about > > it. On the other hand, I think Fred is the one to bug about this one :-) > > > > Personally, I think something like ParserReset is a fine addition. > > I definately agree. I really want to get this and a fix for the > XML_SetReturnNSTriplet() bug in the next release. The availability of > time to really review and write test cases has been the bottleneck. > > (The XML_SetReturnNSTriplet() fix is very valuable for Level 2 DOM > builders, which is my primary interest in that issue.) What about the XML_UNICODE patch then? Is really nobody interested in UTF-16 output? I supplied a patch a while ago, and it seems to work for me, but I have never really subjected it to any targetted Unicode testing, so that is what it would need. Karl From fdrake@acm.org Fri Apr 19 20:27:03 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri Apr 19 19:27:03 2002 Subject: [Expat-discuss] XML_ParserReset In-Reply-To: <000901c1e801$f0934840$0207a8c0@karl> References: <20020418101756.G26875@lyra.org> <5.1.0.14.0.20020418110100.027d7148@pop.business.earthlink. net> <5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net> <20020418144847.A28124@lyra.org> <15551.32349.304396.826008@grendel.zope.com> <000901c1e801$f0934840$0207a8c0@karl> Message-ID: <15552.53717.461898.43017@grendel.zope.com> Karl Waclawek writes: > What about the XML_UNICODE patch then? > Is really nobody interested in UTF-16 output? I am, but that seems a slightly lower priority. I'll try to look at it as well if I can manage enough time. > I supplied a patch a while ago, and it seems to work for me, > but I have never really subjected it to any targetted Unicode > testing, so that is what it would need. The patch is currently in two files and a chunk of preprocessor magic in the comments. Would it be possible to create a single patch against the current CVS? That would make it a lot easier for me to review. If you can summarize the specific tests you think are needed, that would really help as well; I'd like to add tests for everything that gets changed in the library if I can, to ensure we're getting the results we think we are, and to avoid regressions as maintenance continues. Any help in writing the tests would be appreciated as well. (One advantage of getting this one fixed is that the Python bindings will be able to avoid the current UTF-16 -> UTF-8 -> UTF-16 dance that happens now when the user wants Python Unicode strings instead of UTF-8; that's a lot of useless transformation that could be saved!) -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Fri Apr 19 20:43:02 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri Apr 19 19:43:02 2002 Subject: [Expat-discuss] preparing for 1.95.3 Message-ID: <15552.54686.163739.527179@grendel.zope.com> I'd like to get an Expat release out next week, and am actually making some progress on the bugs & patches. If you've submitted either a bug or patch and are comfortable working with CVS, I'd appreciate your testing what's currently in the CVS repository to see if your reports need to be revised. If you reported based on the 1.95.2 release or earlier, be aware that the build process has been revised a good bit; if your report involves the build process, it is especially important for us to know whether your reports are still valid against the current sources. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Fri Apr 19 21:20:02 2002 From: karl@waclawek.net (Karl Waclawek) Date: Fri Apr 19 20:20:02 2002 Subject: [Expat-discuss] XML_ParserReset References: <20020418101756.G26875@lyra.org><5.1.0.14.0.20020418110100.027d7148@pop.business.earthlink. net><5.1.0.14.0.20020418111432.02f3c5e0@pop.business.earthlink.net><20020418144847.A28124@lyra.org><15551.32349.304396.826008@grendel.zope.com><000901c1e801$f0934840$0207a8c0@karl> <15552.53717.461898.43017@grendel.zope.com> Message-ID: <003e01c1e81c$808b5db0$0207a8c0@karl> > Karl Waclawek writes: > > What about the XML_UNICODE patch then? > > Is really nobody interested in UTF-16 output? > > I am, but that seems a slightly lower priority. I'll try to look at > it as well if I can manage enough time. > > > I supplied a patch a while ago, and it seems to work for me, > > but I have never really subjected it to any targetted Unicode > > testing, so that is what it would need. > > The patch is currently in two files and a chunk of preprocessor magic > in the comments. Would it be possible to create a single patch > against the current CVS? I am not sure I understand. Do you mean the two patches I submitted? If yes, I can give you a diff of the combined patch (NSTriplet & XML_UNICODE) against 1.95.2. But against the current CVS - does this mean I have to somehow merge my patch in? Maybe the best approach is to look at my diff, manually re-apply to the current CVS, and send you the completed file(s)? (I believe only xmlparse.c and expat.h are affected). > That would make it a lot easier for me to > review. If you can summarize the specific tests you think are needed, > that would really help as well; I guess the best would be to run a few xml files through both compiled versions of Expat (XML_UNICODE on and off), and then use a Unicode converter on the output and check if the cross-converted files match with the ones produced by Expat? The problem is likely getting XML files with lots of characters beyond the typical western set. > I'd like to add tests for everything > that gets changed in the library if I can, to ensure we're getting the > results we think we are, and to avoid regressions as maintenance > continues. Any help in writing the tests would be appreciated as > well. Currently I am working on UTF-8 <--> UTF-16 converters for an XML writer. Should be done sometime next week. I can supply those, as a first step. But they are not written in C, I am afraid. > (One advantage of getting this one fixed is that the Python bindings > will be able to avoid the current UTF-16 -> UTF-8 -> UTF-16 dance that > happens now when the user wants Python Unicode strings instead of > UTF-8; that's a lot of useless transformation that could be saved!) Also, I think I remember someone writing about a Java wrapper using JNI. Java is natively UTF-16 too, I believe. Karl From karl@waclawek.net Sun Apr 21 11:09:02 2002 From: karl@waclawek.net (Karl Waclawek) Date: Sun Apr 21 10:09:02 2002 Subject: [Expat-discuss] XML_UNICODE patch Message-ID: <001d01c1e959$5fb71640$0207a8c0@karl> I have just added another XML_UNICODE fix (#546795), that should enable UTF-16 output on systems where wchar_t is a 32 bit type. That is, it should now work when XML_UNICODE is defined, but not XML_UNICODE_WCHAR_T. It also includes the recent "fix to the fix" for NSTriplets. If anybody is interested, please test. I only tested on Windows, VC++ 6.0. Karl From spainj@countryday.net Sun Apr 21 18:16:11 2002 From: spainj@countryday.net (Spain, Jeffry A.) Date: Sun Apr 21 17:16:11 2002 Subject: [Expat-discuss] Rebuilding Expat for ActivePerl Message-ID: <20E95D00890E0C4A90DD33E656361640787810@hopple.countryday.net> This is a multi-part message in MIME format. ---------------------- multipart/mixed attachment ------_=_NextPart_001_01C1E992.CB90B05E I'd like to rebuild expat.dll from source code with some modifications using Visual Studio for use with ActivePerl. I've tried the expat.dsw project file in C:\Perl\site\lib\XML\Parser\Expat, but it doesn't appear to build the required expat.dll that goes in C:\Perl\site\lib\auto\XML\Parser\Expat. I've also tried downloading the latest Expat Win32 version 1.95.2 from http://sourceforge.net/projects/expat/. While it does build expat.dll, the Perl module Expat.pm fails with an error: Can't find 'boot_XML__Parser__Expat' symbol in C:/Perl/site/lib/auto/XML/Parser/Expat/Expat.dll. Thank you for any suggestions as to how to proceed. ------_=_NextPart_001_01C1E992.CB90B05E An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020421/cb387c29/attachment.html ------_=_NextPart_001_01C1E992.CB90B05E-- ---------------------- multipart/mixed attachment-- From T.A.Meyer@massey.ac.nz Tue Apr 23 23:10:01 2002 From: T.A.Meyer@massey.ac.nz (Tony Meyer) Date: Tue Apr 23 22:10:01 2002 Subject: [Expat-discuss] Parsing xml files with multiple external entities Message-ID: Hi all, [Forgive me if this is a stupid question, but I've only recently started using expat and I can't find a solution in the archives]. I have an xml file that references three dtds, like this: %worldupdtd; %visiondtd; ]> When I parse this my program only parses one dtd (worldup.dtd in this case). If I move the worldup.dtd entity to within intervalscript.dtd the engine parses both, but not a third (it seems happy to process only one entity per file). Can anyone give me advice about where I'm going wrong? Cheers, Tony Meyer Massey University, Auckland, NZ Relevant code (stripped of error processing etc): void ProcessXMLIntervalScript(ifstream *in) { XML_Parser parser = XML_ParserCreate(NULL); XML_SetParamEntityParsing(parser, XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE); XML_SetExternalEntityRefHandler(parser, extern_ent); XML_SetElementHandler(parser, startElement, endElement); XML_UseParserAsHandlerArg(parser); bool done = false; do { // read in chunk of xml data char buf[chunk_size]; // [...] if (!XML_Parse(parser, buf, i, done)) ; } while (!done); XML_ParserFree(parser); } int extern_ent(XML_Parser p, const XML_Char *context, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId) { // this function based on code from: // http://sourceforge.net/mailarchive/forum.php?thread_id=329890&forum_id=6385 char fname[MAX_PATH]; if (base != NULL) strcpy(fname, base); else strcpy(fname, ""); strcat(fname, systemId); FILE *ext = fopen(fname, "r"); XML_Parser e; if ((e = XML_ExternalEntityParserCreate(p, context, NULL)) == 0) ; if (strrchr(fname, '/') != NULL) { char *temp = strdup(fname); *(strrchr(temp, '/')+1) = '\0'; if (XML_SetBase(e, temp) == 0) ; free((char *)temp); } char buff[4096]; int length; long b_index; while (!feof(ext)) { if (((length = fread(buff, sizeof(char), sizeof(buff), ext)) == 0) && (!feof(ext))) ; if (XML_Parse(e, buff, length, length == 0) == 0) ; } // cleanup XML_ParserFree(e); fclose(ext); return 42; // Successful parse } From karl@waclawek.net Wed Apr 24 08:23:12 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed Apr 24 07:23:12 2002 Subject: [Expat-discuss] Parsing xml files with multiple external entities References: Message-ID: <001101c1eb9b$54d52e80$9e539696@citkwaclaww2k> > Hi all, > > [Forgive me if this is a stupid question, but I've only recently started > using expat and I can't find a solution in the archives]. > > I have an xml file that references three dtds, like this: > > > %worldupdtd; > > %visiondtd; > ]> > > When I parse this my program only parses one dtd (worldup.dtd in this case). > If I move the worldup.dtd entity to within intervalscript.dtd the engine > parses both, but not a third (it seems happy to process only one entity per > file). > > Can anyone give me advice about where I'm going wrong? Which version of Expat did you use? I had a different experienc, even worse than yours: In my tests it seemed that Expat ignored External Parameter Entity declarations in the DTD (internal or external subset). Meaning, the entity declaration handler is never called. However, Expat will report the PE reference. Now, the XML spec says that a PE reference need only be included if the XML parser is validating, so, strictly speaking, I could simply ignore PE references for which no declaration was reported. Still, I find it strange, that Expat does not report entity declarations when it reports the associated references. Is this a bug? Karl From karl@waclawek.net Wed Apr 24 08:47:13 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed Apr 24 07:47:13 2002 Subject: [Expat-discuss] Parsing xml files with multiple external entities References: <001101c1eb9b$54d52e80$9e539696@citkwaclaww2k> Message-ID: <001e01c1eb9e$bfb8bd40$9e539696@citkwaclaww2k> > Now, the XML spec says that a PE reference need only be included > if the XML parser is validating, so, strictly speaking, I could > simply ignore PE references for which no declaration was reported. > > Still, I find it strange, that Expat does not report entity > declarations when it reports the associated references. > > Is this a bug? I forgot to add: Expat reports these PE references in the *external* entity reference handler. Without having reported the declarations, this just doesn't make sense, does it? Karl From karl@waclawek.net Wed Apr 24 10:55:10 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed Apr 24 09:55:10 2002 Subject: [Expat-discuss] Parsing xml files with multiple external entities References: <001101c1eb9b$54d52e80$9e539696@citkwaclaww2k> <001e01c1eb9e$bfb8bd40$9e539696@citkwaclaww2k> Message-ID: <002f01c1ebb0$ad58bd50$9e539696@citkwaclaww2k> > > Now, the XML spec says that a PE reference need only be included > > if the XML parser is validating, so, strictly speaking, I could > > simply ignore PE references for which no declaration was reported. > > > > Still, I find it strange, that Expat does not report entity > > declarations when it reports the associated references. > > > > Is this a bug? > > I forgot to add: Expat reports these PE references in the > *external* entity reference handler. > Without having reported the declarations, this just doesn't make sense, does it? Tony, Even if it is not a real bug (not sure), would you please add it as a bug with Group = Feature Request? I whipped up a quick fix, that has worked for me so far. Basically, what I did is I followed the chain of state handlers for general entity declarations and parameter entity declarations, and where the state handler for parameter entity declarations took a short cut, I added similar code as for the general entity declarations. I will upload the patch for whoever wants to check it out. (it is based on the latest revisions of xmlparse.c and xmlrole.c) Karl From Josh.Martin@abq.sc.philips.com Wed Apr 24 12:04:23 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Wed Apr 24 11:04:23 2002 Subject: [Expat-discuss] Parsing xml files with multiple external entities Message-ID: <200204241803.MAA07566@abqn42.abq.sc.philips.com> > Hi all, > > [Forgive me if this is a stupid question, but I've only recently started > using expat and I can't find a solution in the archives]. > > I have an xml file that references three dtds, like this: > > > %worldupdtd; > > %visiondtd; > ]> > > When I parse this my program only parses one dtd (worldup.dtd in this case). > If I move the worldup.dtd entity to within intervalscript.dtd the engine > parses both, but not a third (it seems happy to process only one entity per > file). > > Can anyone give me advice about where I'm going wrong? > > Cheers, > Tony Meyer > Massey University, > Auckland, NZ I posted this same type of problem almost exactly a year ago, where I could not get external entity declarations within multiple/nested DTD's to parse correctly with 1.95.2. If a resolution was made about this problem, I do not remember what it was. If you report this as a bug then maybe something can be done this time around. - Josh Martin From karl@waclawek.net Wed Apr 24 12:21:17 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed Apr 24 11:21:17 2002 Subject: [Expat-discuss] Parsing xml files with multiple external entities References: <200204241803.MAA07566@abqn42.abq.sc.philips.com> Message-ID: <000501c1ebbc$b5692370$9e539696@citkwaclaww2k> > > Hi all, > > > > [Forgive me if this is a stupid question, but I've only recently started > > using expat and I can't find a solution in the archives]. > > > > I have an xml file that references three dtds, like this: > > > > > > > %worldupdtd; > > > > %visiondtd; > > ]> > I posted this same type of problem almost exactly a year ago, where I could not > get external entity declarations within multiple/nested DTD's to parse correctly > with 1.95.2. If a resolution was made about this problem, I do not remember what > it was. If you report this as a bug then maybe something can be done this time > around. I posted this (or a related?) bug under the bug # 544679. Could you please review - and add, if something is missing? Karl From fdrake@acm.org Wed Apr 24 19:34:04 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed Apr 24 18:34:04 2002 Subject: [Expat-discuss] Parsing xml files with multiple external entities In-Reply-To: <002f01c1ebb0$ad58bd50$9e539696@citkwaclaww2k> References: <001101c1eb9b$54d52e80$9e539696@citkwaclaww2k> <001e01c1eb9e$bfb8bd40$9e539696@citkwaclaww2k> <002f01c1ebb0$ad58bd50$9e539696@citkwaclaww2k> Message-ID: <15559.23799.529581.542516@grendel.zope.com> Karl Waclawek writes: > Even if it is not a real bug (not sure), would you please > add it as a bug with Group = Feature Request? I think this is a bug. Expat generally tries to report useful information, and reports quite a bit more than is required of a non-validating parser. > I whipped up a quick fix, that has worked for me so far. > Basically, what I did is I followed the chain of state handlers > for general entity declarations and parameter entity declarations, > and where the state handler for parameter entity declarations > took a short cut, I added similar code as for the general > entity declarations. Sounds good to me. I'll work up a regression test for this based on Tony's email. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Thu Apr 25 08:34:09 2002 From: karl@waclawek.net (Karl Waclawek) Date: Thu Apr 25 07:34:09 2002 Subject: [Expat-discuss] Undefined entity error Message-ID: <001a01c1ec66$0c93d9c0$9e539696@citkwaclaww2k> I came across the following behaviour, given this document: %worldupdtd; %visiondtd; ]> some text Expat will report an "undefined entity" fatal error for the reference %visiondtd;. However, the XML spec says this (look at the tag: Well-formedness constraint: Entity Declared In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference must match that in an entity declaration that does not occur within the external subset or a parameter entity, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. The declaration of a general entity must precede any reference to it which appears in a default value in an attribute-list declaration. Note that if entities are declared in the external subset or in external parameter entities, a non-validating processor is not obligated to read and process their declarations; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'. Validity constraint: Entity Declared In a document with an external subset or external parameter entities with "standalone='no'", the Name given in the entity reference must match that in an entity declaration. For interoperability, valid documents should declare the entities amp, lt, gt, apos, quot, in the form specified in 4.6 Predefined Entities. The declaration of a parameter entity must precede any reference to it. Similarly, the declaration of a general entity must precede any attribute-list declaration containing a default value with a direct or indirect reference to that general entity. Since Expat is not validating, the emphasized section should apply. So, does anybody agree that this is a bug and Expat should not report an error? If yes, how should Expat report the entity then? In the default handler? Karl From fdrake@acm.org Thu Apr 25 09:11:28 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Apr 25 08:11:28 2002 Subject: [Expat-discuss] Undefined entity error In-Reply-To: <001a01c1ec66$0c93d9c0$9e539696@citkwaclaww2k> References: <001a01c1ec66$0c93d9c0$9e539696@citkwaclaww2k> Message-ID: <15560.7279.749741.899530@grendel.zope.com> Karl Waclawek writes: > Since Expat is not validating, the emphasized section should apply. > So, does anybody agree that this is a bug and Expat should not report an error? Agreed. > If yes, how should Expat report the entity then? In the default handler? That seems reasonable, given that none of the existing callbacks make sense for it. I'd like to have a way to separate this event from the default handler by setting a new callback, but that can be a 1.95.4 feature. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Thu Apr 25 10:33:38 2002 From: karl@waclawek.net (Karl Waclawek) Date: Thu Apr 25 09:33:38 2002 Subject: [Expat-discuss] Undefined entity error References: <001a01c1ec66$0c93d9c0$9e539696@citkwaclaww2k> <15560.7279.749741.899530@grendel.zope.com> Message-ID: <002e01c1ec76$bfa0b140$9e539696@citkwaclaww2k> > Karl Waclawek writes: > > Since Expat is not validating, the emphasized section should apply. > > So, does anybody agree that this is a bug and Expat should not report an error? > > Agreed. > > > If yes, how should Expat report the entity then? In the default handler? > > That seems reasonable, given that none of the existing callbacks make > sense for it. I'd like to have a way to separate this event from the > default handler by setting a new callback, but that can be a 1.95.4 > feature. Probably a callback similar to the SAX "SkippedEntity" callback. That would actually encompass more than this specific situation. In my SAX adapter I am generating these events from either the default handler or the external entity ref handler (depending on the value of certain SAX features). I'll log a bug report then. Karl From fdrake@acm.org Thu Apr 25 10:38:05 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Apr 25 09:38:05 2002 Subject: [Expat-discuss] Undefined entity error In-Reply-To: <002e01c1ec76$bfa0b140$9e539696@citkwaclaww2k> References: <001a01c1ec66$0c93d9c0$9e539696@citkwaclaww2k> <15560.7279.749741.899530@grendel.zope.com> <002e01c1ec76$bfa0b140$9e539696@citkwaclaww2k> Message-ID: <15560.12471.184005.334531@grendel.zope.com> Karl Waclawek writes: > Probably a callback similar to the SAX "SkippedEntity" callback. > That would actually encompass more than this specific situation. I like that idea. > In my SAX adapter I am generating these events from either > the default handler or the external entity ref handler > (depending on the value of certain SAX features). > > I'll log a bug report then. Good. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Thu Apr 25 14:36:56 2002 From: karl@waclawek.net (Karl Waclawek) Date: Thu Apr 25 13:36:56 2002 Subject: [Expat-discuss] Undefined entity error References: <001a01c1ec66$0c93d9c0$9e539696@citkwaclaww2k> <15560.7279.749741.899530@grendel.zope.com> Message-ID: <005201c1ec98$a3c1dd60$9e539696@citkwaclaww2k> > Karl Waclawek writes: > > Since Expat is not validating, the emphasized section should apply. > > So, does anybody agree that this is a bug and Expat should not report an error? > > Agreed. > > > If yes, how should Expat report the entity then? In the default handler? > > That seems reasonable, given that none of the existing callbacks make > sense for it. I'd like to have a way to separate this event from the > default handler by setting a new callback, but that can be a 1.95.4 > feature. I have entered a patch (#548786) for this problem. Karl From cpercival@interaxis.co.uk Fri Apr 26 08:54:42 2002 From: cpercival@interaxis.co.uk (Chris Percival) Date: Fri Apr 26 07:54:42 2002 Subject: [Expat-discuss] Data misalignment on PocketPC platform Message-ID: I have compiled Expat to run on a PocketPC device with an ARM processor (with the XML_UNICODE #def). However certain bits of code can produce a Datatype Misalignment exception, for example line 317 of xmltok.c (*to++ = *from++;), as I guess the processor can't do that operation if the address is not word aligned? So my options are: 1. To find a version that has this fixed (or fix it myself using the __unaligned macro maybe?). or 2. To not use the XML_UNICODE #def and hope the problem goes away (and do the unicode to ascii conversion myself). Any comments/suggestions welcome. Thanks for your time. Chris From karl@waclawek.net Fri Apr 26 10:29:11 2002 From: karl@waclawek.net (Karl Waclawek) Date: Fri Apr 26 09:29:11 2002 Subject: [Expat-discuss] Data misalignment on PocketPC platform References: Message-ID: <001501c1ed3f$545ae490$9e539696@citkwaclaww2k> > I have compiled Expat to run on a PocketPC device with an ARM processor > (with the XML_UNICODE #def). However certain bits of code can produce a > Datatype Misalignment exception, for example line 317 of xmltok.c (*to++ = > *from++;), as I guess the processor can't do that operation if the address > is not word aligned? > > So my options are: > > 1. To find a version that has this fixed (or fix it myself using the > __unaligned macro maybe?). > or > 2. To not use the XML_UNICODE #def and hope the problem goes away (and do > the unicode to ascii conversion myself). > Which version are you using? The XML_UNICODE def does not work in the current release 1.95.2. A fix has been checked in - you need expat.h rev. >= 1.16, and xmlparse.c rev. >= 1.29. Karl From tim.crook@adobe.com Fri Apr 26 15:15:02 2002 From: tim.crook@adobe.com (Tim Crook) Date: Fri Apr 26 14:15:02 2002 Subject: [Expat-discuss] Question about unknown encoding callback... Message-ID: <311000B0752ED211B61700805F0D6B0902FDBFF8@ottmail3.jetform.com> When the convert function is called, is the buffer that is passed is only guaranteed to have one character in it? The buffer is not NUL terminated, right? _________________________________________ Tim Crook Software Developer Adobe Corporation > 560 Rochester Street > Ottawa, Ontario > Canada K1S 5K2 > Phone: +1 613.751.4800 Ext 5734 Fax: +1 613.594.8886 E-mail: tim.crook@adobe.com From cpercival@interaxis.co.uk Mon Apr 29 03:55:03 2002 From: cpercival@interaxis.co.uk (Chris Percival) Date: Mon Apr 29 02:55:03 2002 Subject: [Expat-discuss] Re: Data misalignment on PocketPC platform Message-ID: >> I have compiled Expat to run on a PocketPC device with an ARM processor >> (with the XML_UNICODE #def). However certain bits of code can produce a >> Datatype Misalignment exception, for example line 317 of xmltok.c (*to++ = >> *from++;), as I guess the processor can't do that operation if the address >> is not word aligned? >> >> So my options are: >> >> 1. To find a version that has this fixed (or fix it myself using the >> __unaligned macro maybe?). >> or >> 2. To not use the XML_UNICODE #def and hope the problem goes away (and do >> the unicode to ascii conversion myself). >> > > Which version are you using? > The XML_UNICODE def does not work in the current release 1.95.2. > A fix has been checked in - you need expat.h rev. >= 1.16, and > xmlparse.c rev. >= 1.29. > > Karl I am using 1.95.2. I have found the files you suggest in patch #546795 and applied them, which seams to clear up the problem. Thank you for your help.. Chris From cpercival@interaxis.co.uk Mon Apr 29 10:15:03 2002 From: cpercival@interaxis.co.uk (Chris Percival) Date: Mon Apr 29 09:15:03 2002 Subject: [Expat-discuss] Re: Data misalignment on PocketPC platform In-Reply-To: Message-ID: >> I have compiled Expat to run on a PocketPC device with an ARM processor >> (with the XML_UNICODE #def). However certain bits of code can produce a >> Datatype Misalignment exception, for example line 317 of xmltok.c (*to++ = >> *from++;), as I guess the processor can't do that operation if the address >> is not word aligned? >> >> So my options are: >> >> 1. To find a version that has this fixed (or fix it myself using the >> __unaligned macro maybe?). >> or >> 2. To not use the XML_UNICODE #def and hope the problem goes away (and do >> the unicode to ascii conversion myself). >> > > Which version are you using? > The XML_UNICODE def does not work in the current release 1.95.2. > A fix has been checked in - you need expat.h rev. >= 1.16, and > xmlparse.c rev. >= 1.29. > > Karl Unfortunatly, although using these files fixed the problem in a debug build the problem still exists in a release build. So, back to my origional request. Anybody got any more thoughts? Anybody else built for an ARM processor? Anybody else get these problems? Chris