From lshen at cisco.com Thu Sep 4 16:26:29 2003 From: lshen at cisco.com (Lin Shen) Date: Thu Sep 4 18:27:46 2003 Subject: [Expat-discuss] UTF-16 encode Message-ID: <6677B3346233B94EBB11C060935101202BCDAC@vtg-um-e2k1.sj21ad.cisco.com> Hi, I have an XML document with encoding declaration "encoding="UTF-16"". I get the error "encoding specified in XML declaration is incorrect" when feeding the document to the parser. I thougt UTF-16 is one of the builtin encodings. thanks lin From steven_nikkel at ertyu.org Fri Sep 19 12:21:46 2003 From: steven_nikkel at ertyu.org (Steven Nikkel) Date: Fri Sep 19 12:21:52 2003 Subject: [Expat-discuss] Invalid Token: Carriage return? Message-ID: I'm using expat to parse an xml file and am getting the following error at the first ocurrance of a carriage return in the file: "not well-formed (invalid token)" Am I doing something wrong? From karl at waclawek.net Tue Sep 23 10:47:19 2003 From: karl at waclawek.net (Karl Waclawek) Date: Tue Sep 23 11:04:03 2003 Subject: [Expat-discuss] About Expat 1.95.7 Message-ID: <004001c381e1$97022230$9e539696@citkwaclaww2k> CVS is pretty much at the level of a new 1.95.7 release. It would be nice if some of you could check out the current HEAD and start using it. As far as I can tell it should be very stable, but just to make sure... Karl From steven_nikkel at ertyu.org Thu Sep 25 11:02:09 2003 From: steven_nikkel at ertyu.org (Steven Nikkel) Date: Thu Sep 25 11:02:24 2003 Subject: [Expat-discuss] re: Invalid Token: Carriage return? Message-ID: I singled out the problem. It seems to be happy parsing mac formatted files, that is CR line terminated. But unhappy parsing pc or unix format, CRLF or none terminated respectively. Seems like a bug. From karl at waclawek.net Thu Sep 25 11:09:03 2003 From: karl at waclawek.net (Karl Waclawek) Date: Thu Sep 25 11:10:50 2003 Subject: [Expat-discuss] re: Invalid Token: Carriage return? References: Message-ID: <00a201c38376$f56da8d0$9e539696@citkwaclaww2k> > I singled out the problem. It seems to be happy parsing mac formatted > files, that is CR line terminated. But unhappy parsing pc or unix format, > CRLF or none terminated respectively. I am skeptical, as this would make Expat basically unusable, however it is used extensively in many places and this behaviour has not been reported before. Please post a sample file that triggers this behaviour. Also, *how* are you using Expat? Karl From steven_nikkel at ertyu.org Thu Sep 25 11:45:48 2003 From: steven_nikkel at ertyu.org (Steven Nikkel) Date: Thu Sep 25 11:45:52 2003 Subject: [Expat-discuss] re: Invalid Token: Carriage return? In-Reply-To: <00a201c38376$f56da8d0$9e539696@citkwaclaww2k> References: <00a201c38376$f56da8d0$9e539696@citkwaclaww2k> Message-ID: Well, the file isn't all that important, something simple like this does it depending on the file format: Parse error at line 2, character 0: not well-formed (invalid token) I've compiled the expat library into my program. It is used to parse XML files used for configuration. > I am skeptical, as this would make Expat basically unusable, > however it is used extensively in many places and this behaviour > has not been reported before. > > Please post a sample file that triggers this behaviour. > Also, *how* are you using Expat? > > Karl From fdrake at acm.org Thu Sep 25 11:55:54 2003 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu Sep 25 11:56:14 2003 Subject: [Expat-discuss] re: Invalid Token: Carriage return? In-Reply-To: References: <00a201c38376$f56da8d0$9e539696@citkwaclaww2k> Message-ID: <16243.4106.135947.431469@grendel.zope.com> Steven Nikkel writes: > Well, the file isn't all that important, > something simple like this does it depending on the file format: > > > > > It can be a simple example like this, but what's important is that we see the exact bytes you have on disk for a file which breaks. Also, does the xmlwf application also report the error for the file? > Parse error at line 2, character 0: > not well-formed (invalid token) > > I've compiled the expat library into my program. It is used to parse XML > files used for configuration. I think what Karl wants to know (and what I think would be more helpful), is: - how do you load data from the file? - how to you initialize the parser instance? = how do you pass the data into the parser? This is where a code snippet that shows you use of Expat, and that duplicates the behavior you're seeing in your application, would be *really* helpful to help us figure out what's happening, so we can help you. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl at waclawek.net Thu Sep 25 12:04:24 2003 From: karl at waclawek.net (Karl Waclawek) Date: Thu Sep 25 12:04:44 2003 Subject: [Expat-discuss] re: Invalid Token: Carriage return? References: <00a201c38376$f56da8d0$9e539696@citkwaclaww2k> Message-ID: <00ca01c3837e$b0bf2260$9e539696@citkwaclaww2k> > Well, the file isn't all that important, > something simple like this does it depending on the file format: > > > > > Works fine for me, regardless of which type of line-break I am using. > Parse error at line 2, character 0: > not well-formed (invalid token) > > I've compiled the expat library into my program. It is used to parse XML > files used for configuration. I mean: Post the piece of code that calls the parser. How are you feeding the input to the parser? Karl From steven_nikkel at ertyu.org Thu Sep 25 12:10:03 2003 From: steven_nikkel at ertyu.org (Steven Nikkel) Date: Thu Sep 25 12:10:08 2003 Subject: [Expat-discuss] re: Invalid Token: Carriage return? In-Reply-To: <16243.4106.135947.431469@grendel.zope.com> References: <00a201c38376$f56da8d0$9e539696@citkwaclaww2k> <16243.4106.135947.431469@grendel.zope.com> Message-ID: Thanks for the clarification. > It can be a simple example like this, but what's important is that we > see the exact bytes you have on disk for a file which breaks. > > Also, does the xmlwf application also report the error for the file? xmlwf does not report any errors I can attach the file if necessary. > - how do you load data from the file? > > - how to you initialize the parser instance? > > = how do you pass the data into the parser? > > This is where a code snippet that shows you use of Expat, and that > duplicates the behavior you're seeing in your application, would be > *really* helpful to help us figure out what's happening, so we can > help you. Here's the parsing code segment, I don't know if you want to see the handler's as well, I'll refrain for now as they lenghty: #include "expat/expat.h" #define FILE_READ_BUFFSIZE 8192 int parse_file (const char *filename) { /* Variables */ FILE *fp; struct flock lock; char buff[FILE_READ_BUFFSIZE]; int retcode = 0; XML_Parser parser; parser = XML_ParserCreate (NULL); if (!parser) { dsyslog (LOG_ERR, "Couldn't allocate memory for parser\n"); retcode = -1; goto PARSER_FREE; } /* if */ XML_SetDefaultHandler (parser, xml_default_handler); XML_SetElementHandler (parser, xml_start_element_handler, xml_end_element_handler); XML_SetCharacterDataHandler (parser, xml_character_handler); /* open filename in read only mode, file must exist */ if ((fp = fopen (filename, "r")) == NULL) { dsyslog (LOG_ERR, "Unable to open %s: %m\n", filename); retcode = -1; goto PARSER_FREE; } /* if */ /* lock the file so no one can write to it while we read it */ lock.l_type = F_RDLCK; lock.l_start = 0; lock.l_whence = SEEK_SET; lock.l_len = 0; if (fcntl (fileno (fp), F_SETLKW, &lock) < 0) { dsyslog (LOG_ERR, "Unable to lock %s: %m\n", filename); retcode = -1; goto CLOSE_FILE; } /* if */ fgets (buff, sizeof (buff), fp); while (!feof (fp)) { if (XML_Parse (parser, buff, sizeof (buff), feof (fp)) == 0) { dsyslog (LOG_ERR, "Parse error at line %d, character %d:\n%s\n", XML_GetCurrentLineNumber (parser), XML_GetCurrentColumnNumber (parser), XML_ErrorString (XML_GetErrorCode (parser))); retcode = -1; goto CLOSE_FILE; } /* if */ fgets (buff, sizeof (buff), fp); } /* while */ CLOSE_FILE: fclose (fp); PARSER_FREE: XML_ParserFree (parser); return retcode; } /* function parse_file */ From karl at waclawek.net Thu Sep 25 12:49:20 2003 From: karl at waclawek.net (Karl Waclawek) Date: Thu Sep 25 12:49:30 2003 Subject: [Expat-discuss] re: Invalid Token: Carriage return? References: <00a201c38376$f56da8d0$9e539696@citkwaclaww2k><16243.4106.135947.431469@grendel.zope.com> Message-ID: <00f001c38384$f73fef70$9e539696@citkwaclaww2k> > fgets (buff, sizeof (buff), fp); > > while (!feof (fp)) { > if (XML_Parse (parser, buff, sizeof (buff), feof (fp)) == 0) { > dsyslog (LOG_ERR, "Parse error at line %d, character %d:\n%s\n", > XML_GetCurrentLineNumber (parser), XML_GetCurrentColumnNumber (parser), > XML_ErrorString (XML_GetErrorCode (parser))); > retcode = -1; > goto CLOSE_FILE; > } /* if */ > > fgets (buff, sizeof (buff), fp); > > } /* while */ Use fread, not fgets. We are not passing ASCII characters to Expat, but bytes that could be encoded in various ways. You have to look at the file as binary, not text. It also seems that your "while (!feof (fp)) {...}" loop prevents parsing if the first buffer read already encounters the end of the file. Then, pass the return value from fread instead of sizeof(buffer) to XML_Parse, do not assume the buffer is full every time. That's just from a visual inspection. Fred is more of a C programmer than I am, so he may have better advice. Karl From graham-expat at simulcra.org Fri Sep 26 07:19:08 2003 From: graham-expat at simulcra.org (Graham Bennett) Date: Fri Sep 26 07:19:14 2003 Subject: [Expat-discuss] About Expat 1.95.7 In-Reply-To: <004001c381e1$97022230$9e539696@citkwaclaww2k> References: <004001c381e1$97022230$9e539696@citkwaclaww2k> Message-ID: <20030926111908.GA29321@lamity.org> On Tue, Sep 23, 2003 at 10:47:19AM -0400, Karl Waclawek wrote: > CVS is pretty much at the level of a new 1.95.7 release. It would be > nice if some of you could check out the current HEAD and start using > it. As far as I can tell it should be very stable, but just to make > sure... Is there a list of features/fixes for the upcoming release somewhere? cheers, Graham. -- Graham Bennett From olafk2003 at web.de Fri Sep 26 04:16:24 2003 From: olafk2003 at web.de (Paul Aner) Date: Fri Sep 26 07:48:27 2003 Subject: [Expat-discuss] Parsing a huge XML-Dokument Message-ID: <200309260816.h8Q8GOQ17601@mailgate5.cinetic.de> Hi there ! I am new to this group, so don't be too angry if i did not follow any conventions of this group. I am working on the following problem: I need to parse a huge XML-Document and am bound to the expat-Parser. I am using PHP for doing this and am not able to install any other parser. The problem is, that within the XML-Document there are many nodes and subnodes. The XML-Document will represent at least some tables of a Contentmanagement Databse. The content of the XML-Document is now to be mapped on a (object oriented) class-structure withn my PHP-Script. I am looking for some tutorials or other kind of literatur for doing so. Google and other web ressources always show me how to transform XML to HTML with the expat-Parser. This seems to be a lot easier than mapping XML-Code to classen. I am very looking forward to getting help by you! Thanks in advance, Paul ______________________________________________________________________________ Zwei Mal Platz 1 mit dem jeweils besten Testergebnis! WEB.DE FreeMail und WEB.DE Club bei Stiftung Warentest! http://f.web.de/?mc=021183 From karl at waclawek.net Fri Sep 26 09:07:47 2003 From: karl at waclawek.net (Karl Waclawek) Date: Fri Sep 26 09:10:12 2003 Subject: [Expat-discuss] About Expat 1.95.7 References: <004001c381e1$97022230$9e539696@citkwaclaww2k> <20030926111908.GA29321@lamity.org> Message-ID: <001101c3842f$2eaced00$9e539696@citkwaclaww2k> > On Tue, Sep 23, 2003 at 10:47:19AM -0400, Karl Waclawek wrote: > > CVS is pretty much at the level of a new 1.95.7 release. It would be > > nice if some of you could check out the current HEAD and start using > > it. As far as I can tell it should be very stable, but just to make > > sure... > > Is there a list of features/fixes for the upcoming release somewhere? Not yet, but this is mostly a bug fix release, no new features. We really want to get a very stable Expat 2.0 out. A quick look at the bug tracker gives this list of "real" bugs fixed: - 676844 - 679754 - 692878 - 692964 - 695401 - 699323 Karl