From karl@waclawek.net Thu Nov 1 06:21:10 2001 From: karl@waclawek.net (Karl Waclawek) Date: Thu Nov 1 06:21:10 2001 Subject: [Expat-discuss] SAX2 Lexicalhandler and Expat Message-ID: <002301c162e0$5666ab50$9e539696@citkwaclaww2k> I am writing a SAX2 wrapper for Expat (in Delphi), but I have trouble figuring out how to derive the StartEntity and EndEntity events in the LexicalHandler interface from any of the Expat callbacks (I am using Expat 1.95.2 on Win32). I would be grateful for any advice, Karl From anishsaik@yahoo.com Fri Nov 2 07:13:16 2001 From: anishsaik@yahoo.com (Anish Sai) Date: Fri Nov 2 07:13:16 2001 Subject: [Expat-discuss] (no subject) Message-ID: <20011102151229.94169.qmail@web21108.mail.yahoo.com> Hi I am trying to parse XML documents in Japanese that have encoding "Shift-JIS" using Expat. I know that expat has no built in support for this encoding. 1. Are there plans to enhance Expat to support more encodings? 2. If I have to write my own handler are there any code examples that can guide me. Since expat passes strings to the handler in UTF-8 format, I guess I must write routines that map the strings from Shift-JIS to UTF8. Any body having any insights of handling foreign XML files, please respond. With Regards Anish. __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From anishsaik@yahoo.com Fri Nov 2 07:17:02 2001 From: anishsaik@yahoo.com (Anish Sai) Date: Fri Nov 2 07:17:02 2001 Subject: [Expat-discuss] Handling Japanese XML files using Expat Message-ID: <20011102151608.57260.qmail@web21103.mail.yahoo.com> Hi I am trying to parse XML documents in Japanese that have encoding "Shift-JIS" using Expat. I know that expat has no built in support for this encoding. 1. Are there plans to enhance Expat to support more encodings? 2. If I have to write my own handler are there any code examples that can guide me. Since expat passes strings to the handler in UTF-8 format, I guess I must write routines that map the strings from Shift-JIS to UTF8. Any body having any insights of handling foreign XML files, please respond. With Regards Anish. __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From patrick@meer.net Fri Nov 2 09:32:03 2001 From: patrick@meer.net (Patrick McCormick) Date: Fri Nov 2 09:32:03 2001 Subject: [Expat-discuss] Handling Japanese XML files using Expat References: <20011102151608.57260.qmail@web21103.mail.yahoo.com> Message-ID: <004f01c163c3$fe10c910$a39d9dd1@CG479672a> look at the XML::Parser package that comes with perl; they include several Shift-JIS encoding files. you can probably adapt their approach to work with an expat-only application. (coopercc, one of the current expat maintainers, is also the XML::Parser maintainer) http://wwwx.netheaven.com/~coopercc/xmlparser/intro.html ----- Original Message ----- From: "Anish Sai" To: Sent: Friday, November 02, 2001 7:16 AM Subject: [Expat-discuss] Handling Japanese XML files using Expat > Hi > I am trying to parse XML documents in Japanese that > have encoding "Shift-JIS" using Expat. I know that > expat has no built in support for this encoding. > 1. Are there plans to enhance Expat to support more > encodings? > 2. If I have to write my own handler are there any > code examples that can guide me. Since expat passes > strings to the handler in UTF-8 format, I guess I must > write routines that map the strings from Shift-JIS to > UTF8. > Any body having any insights of handling foreign XML > files, please respond. > With Regards > Anish. > > > __________________________________________________ > Do You Yahoo!? > Find a job, post your resume. > http://careers.yahoo.com > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss > From fdrake@acm.org Fri Nov 2 14:27:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri Nov 2 14:27:02 2001 Subject: [Expat-discuss] does expat detect illegal utf-8 sequences? In-Reply-To: <04d801c161a5$0937e240$a39d9dd1@CG479672a> References: <04d801c161a5$0937e240$a39d9dd1@CG479672a> Message-ID: <15331.7038.133880.780363@grendel.zope.com> Patrick McCormick writes: > I have a problem where users like to use iso-8859-1 without declaring it in > the prolog, like this: ... > It's entirely possible that I am not understanding utf-8 properly - can > someone explain what supposed to happen with the document above? It's also entirely possible this is a bug in Expat. ;-( Could you file a formal bug report on SourceForge? I don't want to lose track of this. http://sourceforge.net/projects/expat/ Thanks for reporting this! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From patrick@meer.net Fri Nov 2 15:02:06 2001 From: patrick@meer.net (Patrick McCormick) Date: Fri Nov 2 15:02:06 2001 Subject: [Expat-discuss] does expat detect illegal utf-8 sequences? References: <04d801c161a5$0937e240$a39d9dd1@CG479672a> <15331.7038.133880.780363@grendel.zope.com> Message-ID: <01d501c163f2$1fcd7ac0$c49d9dd1@CG479672a> I did some research into what UTF-8 sequences are illegal and not and wrote a patch. details are in bug #477667: http://sourceforge.net/tracker/index.php?func=detail&aid=477667&group_id=10 127&atid=110127 --Patrick ----- Original Message ----- From: "Fred L. Drake, Jr." To: "Patrick McCormick" Cc: Sent: Friday, November 02, 2001 2:17 PM Subject: Re: [Expat-discuss] does expat detect illegal utf-8 sequences? > > Patrick McCormick writes: > > I have a problem where users like to use iso-8859-1 without declaring it in > > the prolog, like this: > ... > > It's entirely possible that I am not understanding utf-8 properly - can > > someone explain what supposed to happen with the document above? > > It's also entirely possible this is a bug in Expat. ;-( Could you > file a formal bug report on SourceForge? I don't want to lose track > of this. > > http://sourceforge.net/projects/expat/ > > Thanks for reporting this! > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Zope Corporation > From g_expat@zewt.org Fri Nov 2 22:54:02 2001 From: g_expat@zewt.org (Glenn Maynard) Date: Fri Nov 2 22:54:02 2001 Subject: [Expat-discuss] entity expansion Message-ID: <20011103015343.B15875@zewt.org> Well, this seems like an extremely simple thing, so I feel like I'm missing something obvious. But I can't find it (through far too much code tracing), so here we go: how to disable expansion of entities? There's a call to disable parameter entities, and stuff for external entities; the XML document I'm parsing has an internal DTD, however, and I don't want entities expanded. It seems like something every XML library would have, so I feel like I'm just missing an obvious API call; I hate posting seemingly obvious questions to lists, but I've spent far too much time on this. :) (Not as much time as I spent trying to get libxml to handle entites reasonably with its SAX interface; their response was along the lines of "entities with the SAX interface are hard, either use DOM or don't use entities". No, I don't think I'm doing to load the 20 meg JMdict with DOM ...) -- Glenn Maynard From karl@waclawek.net Mon Nov 5 06:49:10 2001 From: karl@waclawek.net (Karl Waclawek) Date: Mon Nov 5 06:49:10 2001 Subject: [Expat-discuss] Bug? Illegal parameter reference error for valid document Message-ID: <002001c16608$ee7178f0$9e539696@citkwaclaww2k> I ran Expat 1.95.2 against James Clark's test cases version 1998-11-18. There are two files in the directory \valid\not-sa, which are supposed to be valid, but Expat returns an "illegal parameter entity reference" error. Is this a bug, or are the test cases outdated, or am I doing something wrong? Here is how it looks for the first file: File 004.xml: File 004-1.ent: --> Expat does not seem to like "%e1;" %e1; File 0004-2.ent: And the second file: File 003.xml: File 003-1.ent: --> Expat does not seem to like %e File 003-2.ent: empty file Is it possible that I am doing something wrong, maybe in calling the external entity reference handler? Karl From Josh.Martin@abq.sc.philips.com Tue Nov 6 11:39:14 2001 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue Nov 6 11:39:14 2001 Subject: [Expat-discuss] Bug? Illegal parameter reference error for valid document Message-ID: <200111061938.MAA06584@abqn42.abq.sc.philips.com> Hi, I was having alot of problems with nested external entities. I would assume that your problem is the same as the problem that I was having. When you parse your XML document and external entities you need to either read the file in one byte at a time, or you need to read in one line at a time (stopping at the newline character). The reason for this is that if you read past the newline you might pick up more than one entity declaration in your buffer. After you come back from you external entity handler you will start reading after the end of the buffer and lose the extra information. This will most likely cause you to miss some of the declarations. For example take the entity file "004-1.ent": 1: 2: 3: --> Expat does not seem to like "%e1;" 4: %e1; If you use a large buffer and just read in bytes until you fill the buffer and then parse it, you will probably load the whole file into the buffer. When you parse the buffer the external entity handler will be called on the second line (which contains the 'e1' entity declaration). When you return from the external entity parser and begin parsing again the parser will start at the end of the file, skipping lines 3 and 4. If you parse the file either one byte at a time, or one line at a time then all of the entity declarations will be parsed correctly. Also, make sure that you have something like the following in your main parser loop, otherwise you might never parse all the entities that you need to parse. XML_SetParamEntityParsing(p, XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE); Personally I think this behavior of expat is flawed, since other handlers seem to be called multiple times just fine if there are multiple instances in the buffer. However, the fact that the external entity handler spawns off an entirely new parser probably throws a monkey wrench into the whole deal. Meanwhile you can just use the work-around that I mentioned and it should all work just fine. Let me know if this fixes your problem, or if you find another solution. - Josh Martin > From: "Karl Waclawek" > To: > Cc: > X-Original-Date: Mon, 5 Nov 2001 09:48:28 -0500 > Date: Mon, 5 Nov 2001 09:48:28 -0500 > > > I ran Expat 1.95.2 against James Clark's test cases version 1998-11-18. > There are two files in the directory \valid\not-sa, which are supposed > to be valid, but Expat returns an "illegal parameter entity reference" > error. Is this a bug, or are the test cases outdated, or am I doing something wrong? > > Here is how it looks for the first file: > > File 004.xml: > > > > File 004-1.ent: > > > --> Expat does not seem to like "%e1;" > %e1; > > File 0004-2.ent: > > > And the second file: > > File 003.xml: > > > > File 003-1.ent: > > > --> Expat does not seem to like %e > > File 003-2.ent: empty file > > Is it possible that I am doing something wrong, maybe > in calling the external entity reference handler? > > Karl > > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From karl@waclawek.net Tue Nov 6 12:25:19 2001 From: karl@waclawek.net (Karl Waclawek) Date: Tue Nov 6 12:25:19 2001 Subject: [Expat-discuss] Bug? Illegal parameter reference error for valid document References: <200111061938.MAA06584@abqn42.abq.sc.philips.com> Message-ID: <004b01c16701$11c3d700$9e539696@citkwaclaww2k> > Hi, > > I was having alot of problems with nested external entities. I would assume > that your problem is the same as the problem that I was having. When you parse > your XML document and external entities you need to either read the file in one > byte at a time, or you need to read in one line at a time (stopping at the > newline character). The reason for this is that if you read past the newline > you might pick up more than one entity declaration in your buffer. After you > come back from you external entity handler you will start reading after the end > of the buffer and lose the extra information. This will most likely cause you > to miss some of the declarations. For example take the entity file "004-1.ent": This does not sound like proper behaviour. Is this a documented bug? > 1: > 2: > 3: --> Expat does not seem to like "%e1;" > 4: %e1; > > If you use a large buffer and just read in bytes until you fill the buffer and > then parse it, you will probably load the whole file into the buffer. When you > parse the buffer the external entity handler will be called on the second line > (which contains the 'e1' entity declaration). When you return from the external > entity parser and begin parsing again the parser will start at the end of the > file, skipping lines 3 and 4. If you parse the file either one byte at a time, > or one line at a time then all of the entity declarations will be parsed > correctly. I just tried it with a one-byte buffer, but got the same problem (file 004.xml). > Also, make sure that you have something like the following in your main parser > loop, otherwise you might never parse all the entities that you need to parse. > > XML_SetParamEntityParsing(p, XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE); I use that one: XML_PARAM_ENTITY_PARSING_ALWAYS. > Personally I think this behavior of expat is flawed, since other handlers seem > to be called multiple times just fine if there are multiple instances in the > buffer. However, the fact that the external entity handler spawns off an > entirely new parser probably throws a monkey wrench into the whole deal. > Meanwhile you can just use the work-around that I mentioned and it should all > work just fine. Let me know if this fixes your problem, or if you find another > solution. To be honest, so far Expat's behaviour seems consistent and independent of buffer size. Can you give an example that behaves differently with different buffer sizes? Karl From Josh.Martin@abq.sc.philips.com Tue Nov 6 16:35:02 2001 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Tue Nov 6 16:35:02 2001 Subject: [Expat-discuss] Bug? Illegal parameter reference error for valid document Message-ID: <200111070032.RAA06709@abqn42.abq.sc.philips.com> Karl, I'm sorry if I was vague, but you misunderstood me. It's not the size of the buffer per se that matters, it's how much of the file you read in at a time and send to the parser. Let me see if I can make this clearer. Let's say we're parsing "test.xml": My name is &bob. , "test.dtd": %TESTent; and "test.ent": and we have an external entity parser that starts out like this: int extern_ent(XML_Parser p, const XML_Char *context, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId) { XML_Parser e; unsigned char buff[4096]; FILE *ext; int length; char fname[1024]; /* Open the file FNAME and assign it to the file stream EXT. */ . . . /* Create a new external entity parser and assign it to E. */ . . . Now we might be tempted to read in the file one buffer-sized chunk at a time, in which case the external entity parsing loop would look like this (ignoring errors for clarity): // Parse until the file is done while (!feof(ext)) { // Read one buff-sized chunk of the file into BUFF and store the number // of characters successfuly read into LENGTH. length = fread(buff, sizeof(buff), 1, ext); // Parse the contents of BUFF. XML_Parse(e, buff, length, length == 0); } You would think this should work. When the external parser is told to parse the file "test.dtd" it reads the entire file into the buffer and parses it. It then sees an external reference to the file "test.ent" and spawns off another external parser to parse it. However, when that parser returns and parsing of "test.dtd" continues it discards the rest of the XML stored in the buffer and reads in a new buffer (which is empty because the end of the file has been reached). This causes the entity declaration for 'bob' to go unparsed, and when the main parser encounters the XML containing '&bob' it will generate an error. To get around this problem instead of reading the file in one buffer-sized chunk at a time, read it in one line at a time, that way when the external parser for "test.ent" is done the next buffer that will be parsed by the external parser for "test.dtd" will be the line containing the entity declaration for 'bob'. If we do this the external parsing loop will look something like this (again, ignoring errors for clarity): // Parse until the file is done while (!feof(ext)) { // Read sizeof(buff) number of characters the file into BUFF // until a new-line character is read or EOF is encountered // and store the number of characters successfuly read into LENGTH. length = strlen(fgets(buff, sizeof(buff), ext)); // Parse the contents of BUFF. XML_Parse(e, buff, length, length == 0); } One could also parse one character of the file at a time (possibly using fgetc(3S)) and be able to catch all of the external references, but this would probably be less efficient. I hope this is a little clearer this time. - Josh Martin > > Hi, > > > > I was having alot of problems with nested external entities. I would assume > > that your problem is the same as the problem that I was having. When you parse > > your XML document and external entities you need to either read the file in one > > byte at a time, or you need to read in one line at a time (stopping at the > > newline character). The reason for this is that if you read past the newline > > you might pick up more than one entity declaration in your buffer. After you > > come back from you external entity handler you will start reading after the end > > of the buffer and lose the extra information. This will most likely cause you > > to miss some of the declarations. For example take the entity file "004-1.ent": > > This does not sound like proper behaviour. Is this a documented bug? > > > 1: > > 2: > > 3: --> Expat does not seem to like "%e1;" > > 4: %e1; > > > > If you use a large buffer and just read in bytes until you fill the buffer and > > then parse it, you will probably load the whole file into the buffer. When you > > parse the buffer the external entity handler will be called on the second line > > (which contains the 'e1' entity declaration). When you return from the external > > entity parser and begin parsing again the parser will start at the end of the > > file, skipping lines 3 and 4. If you parse the file either one byte at a time, > > or one line at a time then all of the entity declarations will be parsed > > correctly. > > I just tried it with a one-byte buffer, but got the same problem (file 004.xml). > > > Also, make sure that you have something like the following in your main parser > > loop, otherwise you might never parse all the entities that you need to parse. > > > > XML_SetParamEntityParsing(p, XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE); > > I use that one: XML_PARAM_ENTITY_PARSING_ALWAYS. > > > Personally I think this behavior of expat is flawed, since other handlers seem > > to be called multiple times just fine if there are multiple instances in the > > buffer. However, the fact that the external entity handler spawns off an > > entirely new parser probably throws a monkey wrench into the whole deal. > > Meanwhile you can just use the work-around that I mentioned and it should all > > work just fine. Let me know if this fixes your problem, or if you find another > > solution. > > To be honest, so far Expat's behaviour seems consistent and independent of buffer size. > Can you give an example that behaves differently with different buffer sizes? > > Karl > From karl@waclawek.net Tue Nov 6 17:50:03 2001 From: karl@waclawek.net (Karl Waclawek) Date: Tue Nov 6 17:50:03 2001 Subject: [Expat-discuss] Bug? Illegal parameter reference error for valid document References: <200111070032.RAA06709@abqn42.abq.sc.philips.com> Message-ID: <003a01c1672f$9c2c5740$0207a8c0@karl> > Karl, > > I'm sorry if I was vague, but you misunderstood me. It's not the size of the > buffer per se that matters, it's how much of the file you read in at a time and > send to the parser. Let me see if I can make this clearer. I think I understand you, but when I set the buffer size to 1, I automatically read one byte at a time and send one byte at a time to the parser. > > Let's say we're parsing "test.xml": > > > > My name is &bob. > > , "test.dtd": > > %TESTent; > > > > and "test.ent": > > some code > You would think this should work. When the external parser is told to parse the > file "test.dtd" it reads the entire file into the buffer and parses it. It then > sees an external reference to the file "test.ent" and spawns off another > external parser to parse it. However, when that parser returns and parsing of > "test.dtd" continues it discards the rest of the XML stored in the buffer and > reads in a new buffer (which is empty because the end of the file has been > reached). This causes the entity declaration for 'bob' to go unparsed, and when > the main parser encounters the XML containing '&bob' it will generate an error. Actually, your example above works for me. I replaced the dot after &bob. with a semi-colon, renamed the DOCTYPE name to "thing", and got an error free parse with a buffer size of 16KByte. > > To get around this problem instead of reading the file in one buffer-sized chunk > at a time, read it in one line at a time, that way when the external parser for > "test.ent" is done the next buffer that will be parsed by the external parser > for "test.dtd" will be the line containing the entity declaration for 'bob'. > > If we do this the external parsing loop will look something like this (again, > ignoring errors for clarity): > > // Parse until the file is done > while (!feof(ext)) > { > // Read sizeof(buff) number of characters the file into BUFF > // until a new-line character is read or EOF is encountered > // and store the number of characters successfuly read into LENGTH. > length = strlen(fgets(buff, sizeof(buff), ext)); > // Parse the contents of BUFF. > XML_Parse(e, buff, length, length == 0); > } > > One could also parse one character of the file at a time (possibly using > fgetc(3S)) and be able to catch all of the external references, but this would > probably be less efficient. It was clearer, but I think I don't have the same problem as you. Actually, I think your problem may not exist in my version of Expat (1.95.2). Regards, Karl From rolf@pointsman.de Tue Nov 6 18:20:01 2001 From: rolf@pointsman.de (rolf@pointsman.de) Date: Tue Nov 6 18:20:01 2001 Subject: [Expat-discuss] Bug? Illegal parameter reference error for valid document In-Reply-To: <200111070032.RAA06709@abqn42.abq.sc.philips.com> Message-ID: <200111070216.DAA02685@www.pointsman.de> Josh, I'm sorry, but I can't confirm the problem, you're reporting with the example data, given by you. On 6 Nov, Josh Martin wrote: > I'm sorry if I was vague, but you misunderstood me. It's not the > size of the buffer per se that matters, it's how much of the file > you read in at a time and send to the parser. Let me see if I can > make this clearer. > Let's say we're parsing "test.xml": > > > > My name is &bob. I used instead: My name is &bob; i.e. fixed the entity syntax and made the document valid according to the referenced DTD. The two other files are untouched by me. > , "test.dtd": > > %TESTent; > > > > and "test.ent": > I've successfuly used an expat based application to read (the modified as described) XML document and (as a side effect of the parsing) the external entities without a problem, although this application either read the XML data line by line nor read it char by char but in buffer chunks of 1k. I would love, if you could be a bit more elaborate about the problem, you have described. I personally can't confirm your observation about "having alot of problems with nested external entities". I use expat based applications on a regular base, which works even with nested external entities nearly without problem. On the other side, I can confirm the problem, that Karl has reported, at least for expat 1.95.1 and James Clarks expat 1.2. I reported this by myself summarily within a lot of other stuff in a mail to this Expat-discuss mailing list at May 14 2001. If I recall my (not mailed) analysis of this problem right, this has to do with the wrongly "active" well-formend constrain "PEs in Internal Subset" (see XML recommendation 2.8) for a *second* (never, as far as I can tell, for the first) instance of an external entity parser. This all said, I have to confess that I belive, there *may* be a (another) undetected bug in expats handling of external entities. In some _very rare_ cases, I got a seg fault (!) from deep inside of expat (as far as I debugged), while expat was parsing an external entity. I spend I lot of effort, to catch the problem, but was to dumb, to get it, and then, as it sometimes goes, loose the problem out of my eyes. rolf From mpi@renzel.net Thu Nov 8 06:22:02 2001 From: mpi@renzel.net (Marcel Pommer) Date: Thu Nov 8 06:22:02 2001 Subject: [Expat-discuss] a parser that loops ? Message-ID: has anyone ever tried to implement a looping feature into the parser which could handle documents that look like this:
issues n stuff
first i thought of this: - modifying the endElementHandler so that it reports whenever it finds a closing element: - modifying the startElementHandler so that it stores the positions of all start elements on a stack - making the function that reads the input file and feeds XML_Parse() with data fseek() to the last loop block whenever the endElementHandler reports a closing unfortunately it screws up expat's management when it gets fed with the same portion of a file twice. can this be avoided in any way? regards, -- Marcel Pommer techn. Leiter renzel.net VKF Renzel GmbH - renzel.net Im Geer 15 D-46419 Isselburg Tel: +49 2874 910-240 Fax: +49 2874 910-109 mpi@renzel.net http://renzel.net From rsalz@zolera.com Thu Nov 8 06:35:05 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu Nov 8 06:35:05 2001 Subject: [Expat-discuss] a parser that loops ? References: Message-ID: <3BEA981B.DE45F0DA@zolera.com> > unfortunately it screws up expat's management when it gets fed with the same > portion of a file twice. can this be avoided in any way? How about creating a new sub-parser each time through the loop? -- Zolera Systems, Securing web services (XML, SOAP, Signatures, Encryption) http://www.zolera.com From F.J.Franklin@sheffield.ac.uk Thu Nov 8 06:41:10 2001 From: F.J.Franklin@sheffield.ac.uk (F J Franklin) Date: Thu Nov 8 06:41:10 2001 Subject: [Expat-discuss] a parser that loops ? In-Reply-To: Message-ID: On Thu, 8 Nov 2001, Marcel Pommer wrote: > > has anyone ever tried to implement a looping feature into the parser which > could handle documents that look like this: > > > >
> issues n stuff >
>
> And would it be possible to extend this to full EHTML support? http://www.segfault.org/story.phtml?mode=2&id=3a784aad-05b22280 Apologetically, Frank From tcrook@accelio.com Thu Nov 8 07:22:02 2001 From: tcrook@accelio.com (Tim Crook) Date: Thu Nov 8 07:22:02 2001 Subject: [Expat-discuss] When are expat patches going to be rolled into a new release? Message-ID: <311000B0752ED211B61700805F0D6B0901A00725@ottmail3.jetform.com> Having everything that has been put into CVS recently as a new release would be a good idea. _________________________________________ Tim Crook Software Developer Accelio Corporation > 560 Rochester Street > Ottawa, Ontario > Canada K1S 5K2 > Phone: +1 613.751.4800 Ext 5734 Fax: +1 613.594.8886 E-mail: tcrook@accelio.com From fdrake@acm.org Thu Nov 8 08:48:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Nov 8 08:48:02 2001 Subject: [Expat-discuss] When are expat patches going to be rolled into a new release? In-Reply-To: <311000B0752ED211B61700805F0D6B0901A00725@ottmail3.jetform.com> References: <311000B0752ED211B61700805F0D6B0901A00725@ottmail3.jetform.com> Message-ID: <15338.50054.639420.918426@grendel.zope.com> Tim Crook writes: > Having everything that has been put into CVS recently as a new release would > be a good idea. I agree. There are a couple of other bugs that I'd like to get wrapped up, including getting your recent patch added it. I have a backlog of mail about Expat to dig through as well. (In case anyone hadn't guessed!) I do appreciate everyone's interest and help. Let's try for a new release in time for the seasonal holidays. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Thu Nov 8 12:27:05 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Nov 8 12:27:05 2001 Subject: [Expat-discuss] possible build issue fix Message-ID: <15338.63161.286655.421333@grendel.zope.com> One of the build problems that keeps coming up is that some versions of GCC don't support the -fexceptions option. If you are using GCC on a platform for which this is the case, could you please try building Expat using the configure script at this URL and let me know how it works for you? I think I've added the right thing to get this auto-detected, but I don't have a GCC without -fexceptions support. http://starship.python.net/crew/fdrake/patches/configure Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From carlos@pehoe.civil.ist.utl.pt Thu Nov 8 12:53:03 2001 From: carlos@pehoe.civil.ist.utl.pt (Carlos Pereira) Date: Thu Nov 8 12:53:03 2001 Subject: [Expat-discuss] Re: possible build issue fix Message-ID: <200111082159.VAA06134@pehoe.civil.ist.utl.pt> >One of the build problems that keeps coming up is that some versions >of GCC don't support the -fexceptions option. If you are using GCC on >a platform for which this is the case, could you please try building >Expat using the configure script at this URL and let me know how it >works for you? I think I've added the right thing to get this >auto-detected, but I don't have a GCC without -fexceptions support. >http://starship.python.net/crew/fdrake/patches/configure This is Red Hat Linux 5.2 on Intel hardware, I tried to compile expat-1.95.2 with your script 1) running the new config is fine, but make fails with this message: cd lib && make make[1]: Entering directory `/tmp/expat-1.95.2/lib' make[1]: *** No rule to make target `expat.h', needed by `xmlparse.lo'. Stop. make[1]: Leaving directory `/tmp/expat-1.95.2/lib' make: *** [lib] Error 2 2) running then the old config, it runs well, but make fails with this message: cd lib && make make[1]: Entering directory `/tmp/expat-1.95.2/lib' /bin/sh ../libtool --mode=compile gcc -DHAVE_CONFIG_H -DPACKAGE='"expat"' -DVERSION='"expat_1.95.2"' -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -fexceptions -c xmlparse.c mkdir .libs gcc -DHAVE_CONFIG_H -DPACKAGE=\"expat\" -DVERSION=\"expat_1.95.2\" -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -fexceptions -c xmlparse.c -fPIC -DPIC -o .libs/xmlparse.lo cc1: Invalid option `-fexceptions' make[1]: *** [xmlparse.lo] Error 1 make[1]: Leaving directory `/tmp/expat-1.95.2/lib' make: *** [lib] Error 2 3) running again the new config, it runs well, and make runs well too, no problems at all! So it seems that new_config works but something is missing, which is handled only by the old config... Hope this helps! Carlos Pereira From fdrake@acm.org Thu Nov 8 13:11:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Nov 8 13:11:02 2001 Subject: [Expat-discuss] Re: possible build issue fix In-Reply-To: <200111082159.VAA06134@pehoe.civil.ist.utl.pt> References: <200111082159.VAA06134@pehoe.civil.ist.utl.pt> Message-ID: <15339.280.322465.588422@grendel.zope.com> Carlos Pereira writes: > This is Red Hat Linux 5.2 on Intel hardware, > I tried to compile expat-1.95.2 with your script Thanks for testing this! > So it seems that new_config works but > something is missing, which is handled > only by the old config... Ah; since 1.95.2, the expat.h is simply included with the source instead of being generated by the configure script; the new configure script will not generate it since it expects the file to already exist. Did the final (successful) make include the line: ---------------------------------------------------------------------- cc1: Invalid option `-fexceptions' ---------------------------------------------------------------------- in the output? -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From carlos@pehoe.civil.ist.utl.pt Thu Nov 8 13:40:02 2001 From: carlos@pehoe.civil.ist.utl.pt (Carlos Pereira) Date: Thu Nov 8 13:40:02 2001 Subject: [Expat-discuss] Re: possible build issue fix Message-ID: <200111082246.WAA06747@pehoe.civil.ist.utl.pt> >Did the final (successful) make include in the output? >cc1: Invalid option `-fexceptions' Definitely not, see below the whole output, in out (make 2> out) I got a single line with this warning: (as I said, this is a rather old Linux distribution ;-) ...) xmlrole.c:7: warning: `RCSId' defined but not used Carlos ------------------------------------------------------- make 2> out cd lib && make make[1]: Entering directory `/tmp/expat-1.95.2/lib' /bin/sh ../libtool --mode=compile gcc -DHAVE_CONFIG_H -DPACKAGE='"@PACKAGE@"' -DVERSION='"@PACKAGE@_@VERSION@"' -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmlparse.c mkdir .libs gcc -DHAVE_CONFIG_H -DPACKAGE=\"@PACKAGE@\" -DVERSION=\"@PACKAGE@_@VERSION@\" -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmlparse.c -fPIC -DPIC -o .libs/xmlparse.lo gcc -DHAVE_CONFIG_H -DPACKAGE=\"@PACKAGE@\" -DVERSION=\"@PACKAGE@_@VERSION@\" -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmlparse.c -o xmlparse.o >/dev/null 2>&1 mv -f .libs/xmlparse.lo xmlparse.lo /bin/sh ../libtool --mode=compile gcc -DHAVE_CONFIG_H -DPACKAGE='"@PACKAGE@"' -DVERSION='"@PACKAGE@_@VERSION@"' -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmltok.c rm -f .libs/xmltok.lo gcc -DHAVE_CONFIG_H -DPACKAGE=\"@PACKAGE@\" -DVERSION=\"@PACKAGE@_@VERSION@\" -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmltok.c -fPIC -DPIC -o .libs/xmltok.lo gcc -DHAVE_CONFIG_H -DPACKAGE=\"@PACKAGE@\" -DVERSION=\"@PACKAGE@_@VERSION@\" -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmltok.c -o xmltok.o >/dev/null 2>&1 mv -f .libs/xmltok.lo xmltok.lo /bin/sh ../libtool --mode=compile gcc -DHAVE_CONFIG_H -DPACKAGE='"@PACKAGE@"' -DVERSION='"@PACKAGE@_@VERSION@"' -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmlrole.c rm -f .libs/xmlrole.lo gcc -DHAVE_CONFIG_H -DPACKAGE=\"@PACKAGE@\" -DVERSION=\"@PACKAGE@_@VERSION@\" -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmlrole.c -fPIC -DPIC -o .libs/xmlrole.lo gcc -DHAVE_CONFIG_H -DPACKAGE=\"@PACKAGE@\" -DVERSION=\"@PACKAGE@_@VERSION@\" -I. -I. -I.. -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -c xmlrole.c -o xmlrole.o >/dev/null 2>&1 mv -f .libs/xmlrole.lo xmlrole.lo /bin/sh ../libtool --mode=link gcc -version-info 1:0:1 -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -o libexpat.la -rpath /usr/local/lib xmlparse.lo xmltok.lo xmlrole.lo rm -fr .libs/libexpat.la .libs/libexpat.* .libs/libexpat.* gcc -shared xmlparse.lo xmltok.lo xmlrole.lo -lc -Wl,-soname -Wl,libexpat.so.0 -o .libs/libexpat.so.0.1.0 (cd .libs && rm -f libexpat.so.0 && ln -s libexpat.so.0.1.0 libexpat.so.0) (cd .libs && rm -f libexpat.so && ln -s libexpat.so.0.1.0 libexpat.so) ar cru .libs/libexpat.a xmlparse.o xmltok.o xmlrole.o ranlib .libs/libexpat.a creating libexpat.la (cd .libs && rm -f libexpat.la && ln -s ../libexpat.la libexpat.la) make[1]: Leaving directory `/tmp/expat-1.95.2/lib' cd xmlwf && make make[1]: Entering directory `/tmp/expat-1.95.2/xmlwf' gcc -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -I../lib -c xmlwf.c -o xmlwf.o gcc -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -I../lib -c xmlfile.c -o xmlfile.o gcc -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -I../lib -c codepage.c -o codepage.o gcc -g -O2 -Wall -Wmissing-prototypes -Wstrict-prototypes -I../lib -c unixfilemap.c -o unixfilemap.o gcc -o xmlwf -static xmlwf.o xmlfile.o codepage.o unixfilemap.o -L../lib/.libs -lexpat make[1]: Leaving directory `/tmp/expat-1.95.2/xmlwf' From fdrake@acm.org Thu Nov 8 13:51:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Nov 8 13:51:02 2001 Subject: [Expat-discuss] Re: possible build issue fix In-Reply-To: <200111082246.WAA06747@pehoe.civil.ist.utl.pt> References: <200111082246.WAA06747@pehoe.civil.ist.utl.pt> Message-ID: <15339.2742.149355.267484@grendel.zope.com> Carlos Pereira writes: > Definitely not, see below the whole output, > in out (make 2> out) I got a single line with this warning: > (as I said, this is a rather old Linux distribution ;-) ...) > > xmlrole.c:7: warning: `RCSId' defined but not used Excellent! Thanks for getting back to me so quickly. I'll commit this change so it'll be in the next version of Expat. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From james.perrin@ntlworld.com Thu Nov 8 15:13:01 2001 From: james.perrin@ntlworld.com (James S. Perrin) Date: Thu Nov 8 15:13:01 2001 Subject: [Expat-discuss] Example code to test new port Message-ID: Hi, I've recenetly ported expat to AmigaOS as a shared library. This is quite involved as Amiga shared libraries differ greatly from UN*X so I would like some further example code to test the functionality of the library. Anything that is ANSI C and can be compiled with gcc will be fine as long as it doesn't require external libraries (so no Gnome apps). Also test data would be greatly appreciated. Regards James -- James S. Perrin Cruising at a speed of 50Mips (A1200T 060/66 Voodoo3) Why go at faster, you just miss the scenery? Nah! Screw the scenery I wanna go faster! From fdrake@acm.org Thu Nov 8 15:45:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Nov 8 15:45:02 2001 Subject: [Expat-discuss] Example code to test new port In-Reply-To: References: Message-ID: <15339.9470.542107.301307@grendel.zope.com> James S. Perrin writes: > I've recenetly ported expat to AmigaOS as a shared library. This is > quite involved as Amiga shared libraries differ greatly from UN*X so I > would like some further example code to test the functionality of the > library. Anything that is ANSI C and can be compiled with gcc will be fine > as long as it doesn't require external libraries (so no Gnome apps). Also > test data would be greatly appreciated. I think Python is available for the Amiga; you could try to build that with the pyexpat module, then run the regression test for that. I don't know anything about the Amiga port of Python, though. I am very aware of the poor state of the (missing) test suite, I just don't have any time available to rectify the situation just yet. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From gstein@lyra.org Fri Nov 9 08:32:04 2001 From: gstein@lyra.org (Greg Stein) Date: Fri Nov 9 08:32:04 2001 Subject: [Expat-discuss] When are expat patches going to be rolled into a new release? In-Reply-To: <15338.50054.639420.918426@grendel.zope.com>; from fdrake@acm.org on Thu, Nov 08, 2001 at 12:40:22PM -0500 References: <311000B0752ED211B61700805F0D6B0901A00725@ottmail3.jetform.com> <15338.50054.639420.918426@grendel.zope.com> Message-ID: <20011109083820.A21792@lyra.org> On Thu, Nov 08, 2001 at 12:40:22PM -0500, Fred L. Drake, Jr. wrote: > > Tim Crook writes: > > Having everything that has been put into CVS recently as a new release would > > be a good idea. > > I agree. There are a couple of other bugs that I'd like to get > wrapped up, including getting your recent patch added it. I have a > backlog of mail about Expat to dig through as well. (In case anyone > hadn't guessed!) > I do appreciate everyone's interest and help. Let's try for a new > release in time for the seasonal holidays. Can do! After this release, I've also been thinking about an expat-embed-1.95.x release. Sort of a stripped down tarball that people can embed/reship in their projects. I've noticed a lot of people bundling expat into their code, and the -embed release could be a big win for those people. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Fri Nov 9 08:37:07 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri Nov 9 08:37:07 2001 Subject: [Expat-discuss] Embedded Expat package In-Reply-To: <20011109083820.A21792@lyra.org> References: <311000B0752ED211B61700805F0D6B0901A00725@ottmail3.jetform.com> <15338.50054.639420.918426@grendel.zope.com> <20011109083820.A21792@lyra.org> Message-ID: <15340.4717.571159.607309@grendel.zope.com> I wrote: > I do appreciate everyone's interest and help. Let's try for a new > release in time for the seasonal holidays. Greg Stein writes: > Can do! Excellent! > After this release, I've also been thinking about an expat-embed-1.95.x > release. Sort of a stripped down tarball that people can embed/reship in > their projects. I've noticed a lot of people bundling expat into their code, > and the -embed release could be a big win for those people. Perhaps you can describe what you think should be included and what should be omitted? Do you think this is just a matter of creating a different tgz/zip, or will other things need to change? -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From rsalz@zolera.com Fri Nov 9 08:57:05 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri Nov 9 08:57:05 2001 Subject: [Expat-discuss] When are expat patches going to be rolled into a new release? References: <311000B0752ED211B61700805F0D6B0901A00725@ottmail3.jetform.com> <15338.50054.639420.918426@grendel.zope.com> <20011109083820.A21792@lyra.org> Message-ID: <3BEC0B00.CC2F90A8@zolera.com> > I've also been thinking about an expat-embed-1.95.x > release. Sort of a stripped down tarball that people can embed/reship in > their projects. That's an awesome idea! -- Zolera Systems, Your Key to Online Integrity Securing Web services: XML, SOAP, Dig-sig, Encryption http://www.zolera.com From gstein@lyra.org Fri Nov 9 09:19:02 2001 From: gstein@lyra.org (Greg Stein) Date: Fri Nov 9 09:19:02 2001 Subject: [Expat-discuss] Embedded Expat package In-Reply-To: <15340.4717.571159.607309@grendel.zope.com>; from fdrake@acm.org on Fri, Nov 09, 2001 at 12:29:17PM -0500 References: <311000B0752ED211B61700805F0D6B0901A00725@ottmail3.jetform.com> <15338.50054.639420.918426@grendel.zope.com> <20011109083820.A21792@lyra.org> <15340.4717.571159.607309@grendel.zope.com> Message-ID: <20011109092423.C21792@lyra.org> On Fri, Nov 09, 2001 at 12:29:17PM -0500, Fred L. Drake, Jr. wrote: >... > Greg Stein writes: >... > > After this release, I've also been thinking about an expat-embed-1.95.x > > release. Sort of a stripped down tarball that people can embed/reship in > > their projects. I've noticed a lot of people bundling expat into their code, > > and the -embed release could be a big win for those people. > > Perhaps you can describe what you think should be included and what > should be omitted? Do you think this is just a matter of creating a > different tgz/zip, or will other things need to change? I think just a different tarball. We shouldn't need to change anything; our configure scripts and stuff could easily support both *if* there is some kind of difference. For example, the embedded tarball would remove the docs and samples. We would probably flatten the directory structure, too. Stuff like that. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tcrook@accelio.com Fri Nov 9 09:50:01 2001 From: tcrook@accelio.com (Tim Crook) Date: Fri Nov 9 09:50:01 2001 Subject: [Expat-discuss] When are expat patches going to be rolled int o a new release? Message-ID: <311000B0752ED211B61700805F0D6B0901A00734@ottmail3.jetform.com> Sounds like a good idea Greg. What needs to be done for completion of the release? Can I be of service? I have access to a couple of Unix boxes, including Solaris, HP-UX, Tru64 and of course Linux. -----Original Message----- From: Greg Stein To: Fred L. Drake, Jr. Cc: Tim Crook; 'expat-discuss@lists.sourceforge.net' Sent: 11/9/01 11:38 AM Subject: Re: [Expat-discuss] When are expat patches going to be rolled into a new release? On Thu, Nov 08, 2001 at 12:40:22PM -0500, Fred L. Drake, Jr. wrote: > > Tim Crook writes: > > Having everything that has been put into CVS recently as a new release would > > be a good idea. > > I agree. There are a couple of other bugs that I'd like to get > wrapped up, including getting your recent patch added it. I have a > backlog of mail about Expat to dig through as well. (In case anyone > hadn't guessed!) > I do appreciate everyone's interest and help. Let's try for a new > release in time for the seasonal holidays. Can do! After this release, I've also been thinking about an expat-embed-1.95.x release. Sort of a stripped down tarball that people can embed/reship in their projects. I've noticed a lot of people bundling expat into their code, and the -embed release could be a big win for those people. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Fri Nov 9 09:59:04 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri Nov 9 09:59:04 2001 Subject: [Expat-discuss] When are expat patches going to be rolled int o a new release? In-Reply-To: <311000B0752ED211B61700805F0D6B0901A00734@ottmail3.jetform.com> References: <311000B0752ED211B61700805F0D6B0901A00734@ottmail3.jetform.com> Message-ID: <15340.9693.180366.371836@grendel.zope.com> Tim Crook writes: > What needs to be done for completion of the release? Can I be of service? I > have access to a couple of Unix boxes, including Solaris, HP-UX, Tru64 and > of course Linux. Once I get a writing commitment out of the way, I should be able to spend some evening/weekend time on getting patches from SourceForge added to the sources. Unfortunately, I only have Linux and Windows readily available, so portability tests would be really helpful. There are a lot of outstanding bugs for HP-UX, and I'm not sure how many of them are still problems with the current source in CVS; any help you can offer in resolving those would be of *tremendous* help. I need to write up something about a possible testing strategy as well; what we have now is definately insufficient! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From sunshine@public.kherson.ua Fri Nov 9 13:25:03 2001 From: sunshine@public.kherson.ua (Ruslan Zasukhin) Date: Fri Nov 9 13:25:03 2001 Subject: [Expat-discuss] Non ASCII chars in tag name ??? Message-ID: Hi All, I have problem with high part of ASCII chars (umlauts, Russian, German,...)= . For example, I have XML document (dump from Valentina database) that includ= e one field with name ""Citt=E0"" I do of course UTF-8 conversion and get in XML document line as value now if I try to parse this XML document then value ^^^ =20 here EXPAT in normal_scanLt() function gets into case BT_NONASCII: and returns XML_TOK_INVALID. Encoding of document is UTF8. Internet explorer report error on the same place.... :-( In the same time if I put that string between tags it work fine... Citt=AC=E0 But I need also to have tags in UTF-8 format. Where is my mistake? --=20 Best regards, Ruslan Zasukhin ------------------------- Paradigma. e-mail: ruslan@paradigmasoft.com web : http://www.paradigmasoft.com To subscribe to the Valentina mail list send a letter to valentina-on@lists.macserve.net From fdrake@acm.org Fri Nov 9 13:41:01 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri Nov 9 13:41:01 2001 Subject: [Expat-discuss] Non ASCII chars in tag name ??? In-Reply-To: References: Message-ID: <15340.22904.266548.490537@grendel.zope.com> Ruslan Zasukhin writes: > I have problem with high part of ASCII chars (umlauts, Russian, Germ= an,...). ASCII characters have only 7 bits; I presume you're using ISO 8859 character sets. > For example, I have XML document (dump from Valentina database) that= include > one field with name ""Citt=E0"" >=20 > I do of course UTF-8 conversion and get in XML document line as >=20 > value If I cut-n-paste from my terminal to convert this from UTF-8 to UTF-16, I get a decoding error (using Python): -----------------------------------------------------------------------= - >>> unicode('Citt=AC=E0', 'utf-8') Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: unexpected code byte -----------------------------------------------------------------------= - Perhaps something got lost by a mail transport agent along the way. > now if I try to parse this XML document then >=20 > value > ^^^ =20 > here EXPAT in normal_scanLt() function gets into > case BT_NONASCII: >=20 > and returns XML_TOK_INVALID. >=20 > Encoding of document is UTF8. > Internet explorer report error on the same place.... :-( If I have the right bytes for the string, that's what I'd expect. Could you send the numeric values of the bytes in "Citt=AC=E0" so that = I may check that I have the right thing? Thanks! > In the same time if I put that string between tags it work fine... >=20 > Citt=AC=E0 The fact that it's accepted here is an Expat bug; it has already been filed in the issue tracker on SourceForge: http://sourceforge.net/tracker/?func=3Ddetail&aid=3D477667&group_id=3D1= 0127&atid=3D110127 I hope to have this fixed for the next release. -Fred --=20 Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Fri Nov 9 13:56:01 2001 From: karl@waclawek.net (Karl Waclawek) Date: Fri Nov 9 13:56:01 2001 Subject: [Expat-discuss] How reliable is list archive? Message-ID: <004001c16968$42b4e070$9e539696@citkwaclaww2k> Up until today I used the mailing list though Geocrawler online. However, I realized that I might not seel all messages, since I was missing at least one of my own posts. Now I am subscribed, and it seems that I am getting more mails than one can see online. How reliable is the online archive? Karl From fdrake@acm.org Fri Nov 9 14:04:02 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri Nov 9 14:04:02 2001 Subject: [Expat-discuss] How reliable is list archive? In-Reply-To: <004001c16968$42b4e070$9e539696@citkwaclaww2k> References: <004001c16968$42b4e070$9e539696@citkwaclaww2k> Message-ID: <15340.24281.406755.590897@grendel.zope.com> Karl Waclawek writes: > Up until today I used the mailing list though Geocrawler online. > However, I realized that I might not seel all messages, since > I was missing at least one of my own posts. I think the Geocrawler archives are a mess, even if we ignore the usabilty nightmare they represent. The SourceForge crew is working on a replacement system, but I don't know whether the underlying archives are intact. I think SF is planning to replace Geocrawler with their new system in the next few months, but I really don't know any more about it than that. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Sat Nov 10 10:57:01 2001 From: karl@waclawek.net (Karl Waclawek) Date: Sat Nov 10 10:57:01 2001 Subject: [Expat-discuss] DTD declarations and default handler Message-ID: <000d01c16a1a$8503d780$0207a8c0@karl> It seems that the default handler always reports DTD declarations, even if they are handled, i.e. even if the following handlers are set: - ElementDeclHandler - AttListDeclHandler - EntityDeclHandler - DocTypeDeclHandler - NotationDeclHandler - EntityDeclHandler That seems wrong, or is this how it should be? Karl From Josh.Martin@abq.sc.philips.com Mon Nov 12 18:05:02 2001 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Mon Nov 12 18:05:02 2001 Subject: [Expat-discuss] External entity parsing bug (with example) Message-ID: <200111130204.TAA13935@abqn42.abq.sc.philips.com> ---------------------- multipart/mixed attachment Hi, In an effort to avoid any confusion, misunderstandings, or typos I have put together some example documents and demonstration programs to show the 'bug' I am having processing external entities, and the method for the fix. Here is the problem: If I have nested external entities and I read in buffer sized chunks at a time with the external entity parser, some of the declarations are lost. If I instead read in the external documents one line at a time, all goes well. The two programs I have included (bug.c and bug2.c) demonstrate these two approaches. bug.c (which does not work) reads in and parses an entire buffer sized chunk of the external document at a time. bug2.c (which does work) reads in the external document one like at a time and parses that. The only difference between these two files is on line 181 inside the parsing loop for the external entity handler. These lines are as follows: bug.c [181]: if (((length = fread(buff, sizeof(char), sizeof(buff), ext)) == 0) && (!feof(ext))) // Error handling bug2.c [181]: length = strlen(fgets(buff, sizeof(buff), ext)); The buffer is parsed with this line: if (XML_Parse(e, buff, length, length == 0) == 0) // Error handling Here is a listing of the XML documents, followed by the output from each program. "bug.xml": Your name is &bob;. "bug.dtd": %BUGent; "bug.ent": ******************************************** Output from "./bug -v bug.xml": Parsing xml document 'bug.xml' Parsing external reference 'bug.dtd' Parsing external reference 'bug.ent' Parameter Entity Declaration: myname = '"Bob"' Finished parsing external reference 'bug.ent' ./bug: Parse error at line 10: undefined entity ^ ./bug: Parse error at line 3: error in processing external entity reference ^ Output from "./bug2 -v bug.xml": Parsing xml document 'bug.xml' Parsing external reference 'bug.dtd' Parsing external reference 'bug.ent' Parameter Entity Declaration: myname = '"Bob"' Finished parsing external reference 'bug.ent' Entity Declaration: bob = 'Bob' Parameter Entity Declaration: pcdata = '(#PCDATA)*' Finished parsing external reference 'bug.dtd' The document 'bug.xml' is well-formed The programs were compiled on an HPUX 11.00 using GCC 3.0.2 and expat 1.95.2. I hope this clears this issue up. Let me know if anyone else has these same problems with these programs. - Josh Martin ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: not available Type: text/x-sun-c-file Size: 6666 bytes Desc: bug.c Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20011112/38ca13bb/attachment.bin ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: not available Type: text/x-sun-c-file Size: 6642 bytes Desc: bug2.c Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20011112/38ca13bb/attachment-0001.bin ---------------------- multipart/mixed attachment-- From g_expat@zewt.org Tue Nov 13 17:21:11 2001 From: g_expat@zewt.org (Glenn Maynard) Date: Tue Nov 13 17:21:11 2001 Subject: [Expat-discuss] entity expansion In-Reply-To: <20011103015343.B15875@zewt.org> References: <20011103015343.B15875@zewt.org> Message-ID: <20011113202028.A4510@zewt.org> On Sat, Nov 03, 2001 at 01:53:43AM -0500, Glenn Maynard wrote: > Well, this seems like an extremely simple thing, so I feel like I'm > missing something obvious. But I can't find it (through far too much > code tracing), so here we go: how to disable expansion of entities? > There's a call to disable parameter entities, and stuff for external > entities; the XML document I'm parsing has an internal DTD, however, and > I don't want entities expanded. > > It seems like something every XML library would have, so I feel like I'm > just missing an obvious API call; I hate posting seemingly obvious > questions to lists, but I've spent far too much time on this. :) (Not > as much time as I spent trying to get libxml to handle entites > reasonably with its SAX interface; their response was along the lines of > "entities with the SAX interface are hard, either use DOM or don't use > entities". No, I don't think I'm doing to load the 20 meg JMdict with > DOM ...) I can only assume by the lack of response that Expat simply can't do this. Extremely strange ... -- Glenn Maynard From djm@maccormack.net Wed Nov 14 02:13:02 2001 From: djm@maccormack.net (David MacCormack) Date: Wed Nov 14 02:13:02 2001 Subject: [Expat-discuss] entity expansion In-Reply-To: <20011113202028.A4510@zewt.org> Message-ID: On Tue, 13 Nov 2001, Glenn Maynard wrote: > On Sat, Nov 03, 2001 at 01:53:43AM -0500, Glenn Maynard wrote: > > Well, this seems like an extremely simple thing, so I feel like I'm > > missing something obvious. But I can't find it (through far too much > > code tracing), so here we go: how to disable expansion of entities? > > There's a call to disable parameter entities, and stuff for external > > entities; the XML document I'm parsing has an internal DTD, however, and > > I don't want entities expanded. > > > > It seems like something every XML library would have, so I feel like I'm > > just missing an obvious API call; I hate posting seemingly obvious > > questions to lists, but I've spent far too much time on this. :) (Not > > as much time as I spent trying to get libxml to handle entites > > reasonably with its SAX interface; their response was along the lines of > > "entities with the SAX interface are hard, either use DOM or don't use > > entities". No, I don't think I'm doing to load the 20 meg JMdict with > > DOM ...) > > I can only assume by the lack of response that Expat simply can't do > this. Extremely strange ... > > I needed the same thing and made a patch that does it (a while back). Goto http://sourceforge.net/projects/expat/, click on "Patches", and then click on the 429501 patch. Fred has other plans, so this didn't make it into the standard distro. However, it's available if you want to apply it yourself. Good luck. Dave -- ---------------- David MacCormack djm@maccormack.net In the land of Redmond, where the Windows lie. One OS to rule them all, One OS to find them, One OS to bring them all and in the darkness bind them. In the land of Redmond, where the Windows lie. From echan@macromedia.com Wed Nov 14 19:01:02 2001 From: echan@macromedia.com (Edward Chan) Date: Wed Nov 14 19:01:02 2001 Subject: [Expat-discuss] Problems building expat xml parser (Solaris 8, Sun Forte C++ 6 Up date 2) Message-ID: <5DB489EF44C5444A9974E3E934CD834C622823@ex-600town-03.macromedia.com> Hi there, I'm new to the list, so hopefully I'm sending this to the right list. Anyway, I'm trying to build the expat xml parser on Solaris 5.8, using Sun's Forte C++ 6 Update 2 compiler, but having some difficulty. However, I was able to build it on a different solaris box, with a different compiler. I think that box was Solaris 5.5.1 and the compiler version was version 4.2. I get some link errors when trying to link with libexpat.a when building xmlwf. And though I don't need the xmlwf program (I don't even know what it is), I do need to link my program with libexpat.a (and I get the same link error as when building xmlwf). The errors are: ------------------------------------------------------------------------------------------------------------------------------------------------------ CC -O2 -Ixmltok -Ixmlparse -DXML_NS -DXML_DTD -DXML_BYTE_ORDER=21 -o xmlwf/xmlwf xmlwf/xmlwf.o xmlwf/xmlfile.o xmlwf/codepage.o xmlwf/unixfilemap.o xmlparse/libexpat.a Undefined first referenced symbol in file XmlInitUnknownEncodingNS xmlparse/libexpat.a(xmlparse.o) XmlInitUnknownEncoding xmlparse/libexpat.a(xmlparse.o) ld: fatal: Symbol referencing errors. No output written to xmlwf/xmlwf *** Error code 1 make: Fatal error: Command failed for target `xmlwf/xmlwf' ------------------------------------------------------------------------------------------------------------------------------------------------------ If I do not define XML_NS, then I get rid of the first undefined symbol (XmlInitUnknownEncodingNS), but the second one (XmlInitUnknownEncoding) remains. But eventhough I was able to get rid of the first undefined symbol, I question why I was able to build it on the other machine, but not on this machine (following the exact same steps). Does anybody have any ideas? Thanks. Ed echan@macromedia.com 415-832-7485 From sunshine@public.kherson.ua Thu Nov 15 00:52:02 2001 From: sunshine@public.kherson.ua (Ruslan Zasukhin) Date: Thu Nov 15 00:52:02 2001 Subject: [Expat-discuss] [BUG] Non ASCII chars in tag name? In-Reply-To: <15340.22904.266548.490537@grendel.zope.com> Message-ID: on 11/10/01 0:32, Fred L. Drake, Jr. at fdrake@acm.org wrote: Hi Guys, it's again me.=20 I still have no answer how to work in XML with chars > 127. My task is next: - I have database that can have tables, fields and field values. - tables, fields and values can be on any language: English, Italian, German, Russian. =20 Example: field "Citt=E0" (43 69 74 74 88) hex value "Citt=E0" (43 69 74 74 88) hex How this must looks in correct XML that EXPACT can parse ??? I have try=20 ... encoding =3D "UTF-8" ... Citt=AC=E0 where=20 Citt=AC=E0 =3D> 43 69 74 74 C2 88 =20 is a string that I get from Latin1_UTF8() conversion function. But above XML fail on 0xC2 character, because "it is NON ASCII" What is going on ???????????? Document is of type UTF-8. I put UTF-8 string in to tag name, it must works but not works. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IS this bug in EXPAT or this is my mistake (on conversion) ? Anybody know other XML parser that can work with non-ascii characters in tags? BTW, we have check Windows IE 6.0. It works fine with Russian text in both tags and context. Although it seems they are based on 32-bit Unicode. Hmm, but UTF-8 also is Unicode, isn't it ? --=20 Best regards, Ruslan Zasukhin ------------------------- Paradigma. e-mail: ruslan@paradigmasoft.com web : http://www.paradigmasoft.com To subscribe to the Valentina mail list send a letter to valentina-on@lists.macserve.net From sunshine@public.kherson.ua Thu Nov 15 08:45:04 2001 From: sunshine@public.kherson.ua (Ruslan Zasukhin) Date: Thu Nov 15 08:45:04 2001 Subject: [Expat-discuss] Re: [BUG] Non ASCII chars in tag name? In-Reply-To: <501487271724958301993@lists.macserve.net> Message-ID: on 11/15/01 15:11, stephan huber at ratzfatz@digitalmind.de wrote: > This fails for UTF-8-Code inside the tag? I think, only the text between > tags and attribute-values should be converted to UTF-8, the tags should > remain intact. But i know that some parsers don't like high-asciis inside > tags... I do not see ANY REASON, why XML can not support inside tags high-asciis chars... Than more, we have test IE 6.0 on Windows, and they support Russian text everywhere. But again, they use UTF-16 I think. > Perhaps you can switch to antother xml-structuretoprevent high-asciis > inside tags... >=20 > something like >=20 > blabla >=20 > [UTF-converted value] Yes, I also have think about this. But I do not like this, because it produce 2 more words for EACH !!! field. Or I can spend several hours to make EXPAT work as I want... Or to find new XML parser... Or at last of end to get help from author of EXPAT... --=20 Best regards, Ruslan Zasukhin ------------------------- Paradigma. e-mail: ruslan@paradigmasoft.com web : http://www.paradigmasoft.com To subscribe to the Valentina mail list send a letter to valentina-on@lists.macserve.net From iordy@iordy.com Tue Nov 20 19:34:02 2001 From: iordy@iordy.com (Shane Hanna -> Iordy.com) Date: Tue Nov 20 19:34:02 2001 Subject: [Expat-discuss] install expat on indigoperl. Message-ID: <00d601c1723d$385a2d50$fef4fea9@roger3> This is a multi-part message in MIME format. ---------------------- multipart/alternative attachment Has anyone tried to get expat going with indigoperl? It a long story but = I need to get it working under win32 using indigoperl. I would love to know if a) anyone has got it working and b) how they got = it working. Cheers, Shane Hanna www.iordy.com A One Man Design Army ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20011120/ba709a2b/attachment.html ---------------------- multipart/alternative attachment-- From karl@waclawek.net Wed Nov 28 07:27:03 2001 From: karl@waclawek.net (Karl Waclawek) Date: Wed Nov 28 07:27:03 2001 Subject: [Expat-discuss] Build 1.95.2 on RedHat 7.2 with XML_UNICODE Message-ID: <005001c17821$0716c070$9e539696@citkwaclaww2k> I am trying to build Expat 1.95.2 on Linux (gcc 2.96) with XML_UNICODE (and XML_UNICODE_WCHAR_T) defined. This did not work on VC++ originally, so I made a patch (patch # 476931) and got it to work nicely on Windows. Now I am trying to do the same on Linux. However, I always get UTF8 output. Here is how I built it: 1) ran ./configure 2) edited MakeFile in lib directory: added -DXML_UNICODE_WCHAR_T (which also turns on XML_UNICODE) to DEFS added -fshort-wchar to compiler options (CFLAGS) 3) ran make I reseached the wchar_t issue on the web, and it seems that the -fshort-wchar option should do the trick. I also built it with just XML_UNICODE defined, which defines XML_Char as unsigend short, with the same result. I must be overlooking something obvious, not being a C++ expert, and being new to Unix/Linux. Would be grateful for any pointers. Regards, Karl From xavier.boussin@acterna.com Thu Nov 29 08:22:11 2001 From: xavier.boussin@acterna.com (xavier.boussin@acterna.com) Date: Thu Nov 29 08:22:11 2001 Subject: [Expat-discuss] Predefined entities Message-ID: Hello I'm using Expat 1.95.2. There are predefinied entities used by XML parsers( < > & ' ") When using these 5 entities in my XML documents, only < and & are not working but the 3 others are working: Expat generates a parsing error. What is the cause of this? How can I resolve this problem? Regards Xavier From rolf@pointsman.de Thu Nov 29 08:42:02 2001 From: rolf@pointsman.de (rolf@pointsman.de) Date: Thu Nov 29 08:42:02 2001 Subject: [Expat-discuss] Predefined entities In-Reply-To: Message-ID: <200111291638.RAA16716@www.pointsman.de> On 29 Nov, xavier.boussin@acterna.com wrote: > I'm using Expat 1.95.2. There are predefinied entities used by XML parsers( < > > & ' ") > > When using these 5 entities in my XML documents, only < and & are not > working but the 3 others are working: Expat generates a parsing error. Uhm? Ok, I'm still on 1.95.1, but this sounds very strange. Please, mail a short example of a wellformed XML document, that expat doesn't parse because of the problem, you've described. rolf