From robert.hancock1 at virgin.net Sun Apr 1 23:38:06 2007 From: robert.hancock1 at virgin.net (Robert Hancock) Date: Sun, 01 Apr 2007 22:38:06 +0100 Subject: [Expat-discuss] large XML files in python 2.5 (expat 2.0.0, XML_LARGE_SIZE) Message-ID: <4610263E.4080601@virgin.net> Hi, I'm processing some very large (>1TB) XML files with python and expat (python 2.5 to get expat 2.0.0). The parser.CurrentByteIndex attribute is useful for me for some statistical and debugging purposes, but in the default Python build seems to be limited to 2**31 bytes (signed 32 bit int?), which I see as the Index wrapping as I work through the file. I have tried rebuilding python2.5 from source (platform: Linux x86) but can't seem to get the XML_LARGE_SIZE option to have any effect. I've run ./configure CFLAGS=-DXML_LARGE_SIZE CPPFLAGS=-DXML_LARGE_SIZE and rebuilding, but I still see the same wrapping behaviour. Are there notes anywhere on how to enable this? Any way to test it within Python? Thanks in advance for any help, Robert Hancock From eric.slosser at v-fx.com Thu Apr 5 16:58:49 2007 From: eric.slosser at v-fx.com (Eric Slosser) Date: Thu, 5 Apr 2007 10:58:49 -0400 Subject: [Expat-discuss] digitally signing a build of expat Message-ID: We ship a build of the expat library and are considering digitally signing our build so as to avoid a Windows-Vista warning at install time. Our build guy is wondering if anyone in the expat world cares about this? I don't imagine you do, but it can't hurt to ask (I hope). From karl at waclawek.net Fri Apr 6 22:42:50 2007 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 06 Apr 2007 16:42:50 -0400 Subject: [Expat-discuss] digitally signing a build of expat In-Reply-To: References: Message-ID: <4616B0CA.9010709@waclawek.net> Eric Slosser wrote: > We ship a build of the expat library and are considering digitally > signing our build so as to avoid a Windows-Vista warning at install > time. > > Our build guy is wondering if anyone in the expat world cares about > this? > I don't know anything about library signing on Vista, but I could imagine that this requires more than just a signature. I would assume that the signing identity must somehow be registered or recognized by someone (Microsoft?) as trustworthy. Could you enlighten us as to the details and requirements? Karl From jeffreyholle at bellsouth.net Mon Apr 9 04:34:28 2007 From: jeffreyholle at bellsouth.net (Jeffrey Holle) Date: Sun, 08 Apr 2007 22:34:28 -0400 Subject: [Expat-discuss] [expat] newbie question Message-ID: I'm using expat 2.0.0 self build on Ubuntu Linux 5.10. The program that I'm writing needs to process N XML files. Note not nested, but in series. My first attempt initializes the expat parser and feeds the data into it like this (pseudo code): while(!files.eof()) { getline(files.filename); ifstream file(filename); do { char buffer[256]; file.read(buffer,256); XML_Parse(parser,buffer,file.gcount(),0); } while (file.eof()); XML_Parse(parser,NULL,0,1); } I have not shown it, but I do have an element handler enabled and it works as expected for the first file, but not subsequent ones. What should I be doing to end one XML file parsing action and start another? From nickmacd at gmail.com Mon Apr 9 05:18:21 2007 From: nickmacd at gmail.com (Nick MacDonald) Date: Sun, 8 Apr 2007 23:18:21 -0400 Subject: [Expat-discuss] [expat] newbie question In-Reply-To: References: Message-ID: Jeffrey: I think you need to make a new instance of the parser for each file. What you are doing has the overall effect of concatenating the files together one ofter the other, which would not be legal because there can only be one main XML element (my terminology may be off here) per file. Unfortunately I am not near a machine with my own code and not on a machine that has (or can have) eXpat installed. But in pseudo-code, do this: while (more files) { parser=createParser() open(file) while(!eof) { read databuff xmlParse(databuff) } close(file) finalizeParser() } Good luck, Nick On 4/8/07, Jeffrey Holle wrote: > I'm using expat 2.0.0 self build on Ubuntu Linux 5.10. > > The program that I'm writing needs to process N XML files. > Note not nested, but in series. > > My first attempt initializes the expat parser and feeds the data into it > like this (pseudo code): > > while(!files.eof()) > { > getline(files.filename); > ifstream file(filename); > do { > char buffer[256]; > file.read(buffer,256); > XML_Parse(parser,buffer,file.gcount(),0); > } while (file.eof()); > XML_Parse(parser,NULL,0,1); > } > > I have not shown it, but I do have an element handler enabled and it > works as expected for the first file, but not subsequent ones. > > What should I be doing to end one XML file parsing action and start another? > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > -- Nick MacDonald NickMacD at gmail.com From andrelsm at iname.com Mon Apr 9 15:37:26 2007 From: andrelsm at iname.com (Andre Luis Monteiro) Date: Mon, 09 Apr 2007 08:37:26 -0500 Subject: [Expat-discuss] [expat] newbie question Message-ID: <20070409133726.EE9CA1F5462@ws1-2.us4.outblaze.com> Jeffrey you should to use XML_Bool XMLCALL XML_ParserReset(XML_Parser p, const XML_Char *encoding); Clean up the memory structures maintained by the parser so that it may be used again. After this has been called, parser is ready to start parsing a new document. All handlers are cleared from the parser, except for the unknownEncodingHandler. The parser's external state is re-initialized except for the values of ns and ns_triplets. This function may not be used on a parser created using XML_ExternalEntityParserCreate; it will return XML_FALSE in that case. Returns XML_TRUE on success. Your application is responsible for dealing with any memory associated with user data. regards Andr? Lu?s PS: take a glance at your expat-2.0.0/doc/reference.html > ----- Original Message ----- > From: "Nick MacDonald" > To: jeffreyholle at bellsouth.net > Subject: Re: [Expat-discuss] [expat] newbie question > Date: Sun, 8 Apr 2007 23:18:21 -0400 > > > Jeffrey: > > I think you need to make a new instance of the parser for each file. > What you are doing has the overall effect of concatenating the files > together one ofter the other, which would not be legal because there > can only be one main XML element (my terminology may be off here) per > file. Unfortunately I am not near a machine with my own code and not > on a machine that has (or can have) eXpat installed. But in > pseudo-code, do this: > > while (more files) > { > parser=createParser() > open(file) > while(!eof) > { > read databuff > xmlParse(databuff) > } > close(file) > finalizeParser() > } > > Good luck, > Nick > > On 4/8/07, Jeffrey Holle wrote: > > I'm using expat 2.0.0 self build on Ubuntu Linux 5.10. > > > > The program that I'm writing needs to process N XML files. > > Note not nested, but in series. > > > > My first attempt initializes the expat parser and feeds the data into it > > like this (pseudo code): > > > > while(!files.eof()) > > { > > getline(files.filename); > > ifstream file(filename); > > do { > > char buffer[256]; > > file.read(buffer,256); > > XML_Parse(parser,buffer,file.gcount(),0); > > } while (file.eof()); > > XML_Parse(parser,NULL,0,1); > > } > > > > I have not shown it, but I do have an element handler enabled and it > > works as expected for the first file, but not subsequent ones. > > > > What should I be doing to end one XML file parsing action and start another? > > > > _______________________________________________ > > Expat-discuss mailing list > > Expat-discuss at libexpat.org > > http://mail.libexpat.org/mailman/listinfo/expat-discuss > > > > > -- > Nick MacDonald > NickMacD at gmail.com > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > abra?o Andr? Lu?s = From suresh.kumar.j at gmail.com Tue Apr 10 07:27:15 2007 From: suresh.kumar.j at gmail.com (Suresh Kumar J) Date: Tue, 10 Apr 2007 10:57:15 +0530 Subject: [Expat-discuss] Clarification on the behavior of the text handler In-Reply-To: <88b2f6dd0704092223k1d749609l4bf9c76dfb71ce33@mail.gmail.com> References: <88b2f6dd0704092223k1d749609l4bf9c76dfb71ce33@mail.gmail.com> Message-ID: <88b2f6dd0704092227s2e3e8127t9d6b46f004246996@mail.gmail.com> Hi there! I wanted to clarify on the behavior of the text handler. Below is the description for the XML_SetCharacterDataHandler API: ------------------------------------------------------------------------ The string your handler receives is NOT zero terminated. You have to use the length argument to deal with the end of the string. A single block of contiguous text free of markup may still result in a sequence of calls to this handler. In other words, if you're searching for a pattern in the text, it may be split across calls to this handler. ------------------------------------------------------------------------ Lets say that I am passing the complete XML document to the XMLParse() API in a single shot. So If I register a character data handler for handling the element data then would I be getting the complete element text data in a single call to my registered text handler. In other words, can I safely assume that the first call to the text handler routine would contain the complete text data?. Even when I pass the complete XML document to the XMLParse() in a single shot, can the text data be split across the calls to the data handler?. Any inputs in this regard would be helpful. -- Thanks and Regards, Suresh Kumar J From suresh.kumar.j at gmail.com Tue Apr 10 07:23:08 2007 From: suresh.kumar.j at gmail.com (Suresh Kumar J) Date: Tue, 10 Apr 2007 10:53:08 +0530 Subject: [Expat-discuss] Clarification on the behavior of the text handler Message-ID: <88b2f6dd0704092223k1d749609l4bf9c76dfb71ce33@mail.gmail.com> Hi there! I wanted to clarify on the behavior of the text handler. Below is the description for the XML_SetCharacterDataHandler API: ------------------------------------------------------------------------ The string your handler receives is NOT zero terminated. You have to use the length argument to deal with the end of the string. A single block of contiguous text free of markup may still result in a sequence of calls to this handler. In other words, if you're searching for a pattern in the text, it may be split across calls to this handler. ------------------------------------------------------------------------ Lets say that I am passing the complete XML document to the XMLParse() API in a single shot. So If I register a character data handler for handling the element data then would I be getting the complete element text data in a single call to my registered text handler. In other words, can I safely assume that the first call to the text handler routine would contain the complete text data?. Even when I pass the complete XML document to the XMLParse() in a single shot, can the text data be split across the calls to the data handler?. Any inputs in this regard would be helpful. -- Thanks and Regards, Suresh Kumar J From nickmacd at gmail.com Tue Apr 10 17:45:19 2007 From: nickmacd at gmail.com (Nick MacDonald) Date: Tue, 10 Apr 2007 11:45:19 -0400 Subject: [Expat-discuss] Clarification on the behavior of the text handler In-Reply-To: <88b2f6dd0704092223k1d749609l4bf9c76dfb71ce33@mail.gmail.com> References: <88b2f6dd0704092223k1d749609l4bf9c76dfb71ce33@mail.gmail.com> Message-ID: If you want robust XML processing then you absolutely *SHOULD NOT* make any assumptions about what you will receive... you need to concatenate them all together. The most likely reason why you would get multiple calls is for escaped text, such as < and & . Try this kind of document if you want to see what I mean: This is my sample text with escapes & in the middle of it < which will likely cause multiple calls to > the handler On 4/10/07, Suresh Kumar J wrote: > I wanted to clarify on the behavior of the text handler. > > Below is the description for the XML_SetCharacterDataHandler API: > ------------------------------------------------------------------------ > The string your handler receives is NOT zero terminated. You have to > use the length argument to deal with the end of the string. A single > block of contiguous text free of markup may still result in a sequence > of calls to this handler. In other words, if you're searching for a > pattern in the text, it may be split across calls to this handler. > ------------------------------------------------------------------------ > > Lets say that I am passing the complete XML document to the XMLParse() > API in a single shot. So If I register a character data handler for > handling the element data then would I be getting the complete element > text data in a single call to my registered text handler. In other > words, can I safely assume that the first call to the text handler > routine would contain the complete text data?. Even when I pass the > complete XML document to the XMLParse() in a single shot, can the text > data be split across the calls to the data handler?. From karl at waclawek.net Tue Apr 10 18:48:48 2007 From: karl at waclawek.net (Karl Waclawek) Date: Tue, 10 Apr 2007 12:48:48 -0400 Subject: [Expat-discuss] Clarification on the behavior of the text handler In-Reply-To: <88b2f6dd0704092223k1d749609l4bf9c76dfb71ce33@mail.gmail.com> References: <88b2f6dd0704092223k1d749609l4bf9c76dfb71ce33@mail.gmail.com> Message-ID: <461BBFF0.4070004@waclawek.net> Suresh Kumar J wrote: > Hi there! > > I wanted to clarify on the behavior of the text handler. > > Below is the description for the XML_SetCharacterDataHandler API: > ------------------------------------------------------------------------ > The string your handler receives is NOT zero terminated. You have to > use the length argument to deal with the end of the string. A single > block of contiguous text free of markup may still result in a sequence > of calls to this handler. In other words, if you're searching for a > pattern in the text, it may be split across calls to this handler. > ------------------------------------------------------------------------ > > Lets say that I am passing the complete XML document to the XMLParse() > API in a single shot. So If I register a character data handler for > handling the element data then would I be getting the complete element > text data in a single call to my registered text handler. In other > words, can I safely assume that the first call to the text handler > routine would contain the complete text data?. Even when I pass the > complete XML document to the XMLParse() in a single shot, can the text > data be split across the calls to the data handler?. > > Any inputs in this regard would be helpful. > Nick is correct - you cannot assume single call-backs. For instance, any line break in element content will cause multiple call-backs, IIRC. Karl From andrelsm at iname.com Tue Apr 10 19:00:05 2007 From: andrelsm at iname.com (Andre Luis Monteiro) Date: Tue, 10 Apr 2007 12:00:05 -0500 Subject: [Expat-discuss] Clarification on the behavior of the text handler Message-ID: <20070410170010.02FF81CE67F@ws1-6.us4.outblaze.com> Nick, Kumar beyond that, we have some whitespace handling intricacies, as exposed in: http://msdn2.microsoft.com/en-us/library/ms256097.aspx eXpat splits text content (even in CDATA sections) at '\n's. Right? Question: how to add support for "xml:space" in my app? []s andrelsm > ----- Original Message ----- > From: "Nick MacDonald" > To: "Suresh Kumar J" > Subject: Re: [Expat-discuss] Clarification on the behavior of the text handler > Date: Tue, 10 Apr 2007 11:45:19 -0400 > > > If you want robust XML processing then you absolutely *SHOULD NOT* > make any assumptions about what you will receive... you need to > concatenate them all together. The most likely reason why you would > get multiple calls is for escaped text, such as < and & . > > Try this kind of document if you want to see what I mean: > > This is my sample text with escapes > & in the middle of it < which will likely > cause multiple calls to > the handler > > > On 4/10/07, Suresh Kumar J wrote: > > I wanted to clarify on the behavior of the text handler. > > > > Below is the description for the XML_SetCharacterDataHandler API: > > ------------------------------------------------------------------------ > > The string your handler receives is NOT zero terminated. You have to > > use the length argument to deal with the end of the string. A single > > block of contiguous text free of markup may still result in a sequence > > of calls to this handler. In other words, if you're searching for a > > pattern in the text, it may be split across calls to this handler. > > ------------------------------------------------------------------------ > > > > Lets say that I am passing the complete XML document to the XMLParse() > > API in a single shot. So If I register a character data handler for > > handling the element data then would I be getting the complete element > > text data in a single call to my registered text handler. In other > > words, can I safely assume that the first call to the text handler > > routine would contain the complete text data?. Even when I pass the > > complete XML document to the XMLParse() in a single shot, can the text > > data be split across the calls to the data handler?. > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > abra?o Andr? Lu?s = From boris at codesynthesis.com Tue Apr 10 21:07:31 2007 From: boris at codesynthesis.com (Boris Kolpackov) Date: Tue, 10 Apr 2007 19:07:31 +0000 (UTC) Subject: [Expat-discuss] Clarification on the behavior of the text handler References: <88b2f6dd0704092223k1d749609l4bf9c76dfb71ce33@mail.gmail.com> Message-ID: Hi Suresh, "Suresh Kumar J" writes: > Lets say that I am passing the complete XML document to the XMLParse() > API in a single shot. So If I register a character data handler for > handling the element data then would I be getting the complete element > text data in a single call to my registered text handler. No, it still can be split across several calls. What you may want to do to emulate the desired behavior is to accumulate the data in a string buffer and then process it when you know that all the data has been delivered, e.g., in the "end element" handler. hth, -boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding From Saumya.Agarwal at netapp.com Tue Apr 17 17:13:52 2007 From: Saumya.Agarwal at netapp.com (Agarwal, Saumya) Date: Tue, 17 Apr 2007 20:43:52 +0530 Subject: [Expat-discuss] How is SJIS encoding handled in expat? Message-ID: <7026BCCA258BA2438F885772CA0B431307AA3A73@exbtc01.hq.netapp.com> Hi, I have a scenario in which the encoding of the data on the server is in SJIS format. The client requests this data from the server through an API, the server sends the output in XML parsed by the expat parser. Here is the input and output - 1193746vol0 OUTPUT: vol0199699985042a93940-4ed9-11db-ba89-00a09803281611937461/vol/vol0/home/???????? ??????.doc As seen above, the client declares the document encoding to be SHIFT-JIS. The server returns the proper data (seems like SJIS, as japanese characters are represented correctly in the output ) but the encoding declared in the output document is UTF-8. Now, the strange part is that even if the client declares the document endoding to be UTF-8 in the input, the server behavior is just the same! Here are my questions - 1. Does expat support SJIS encoding? 2. If yes, then how does it know the data is SJIS encoded and when does it call the appropriate handler? 3. Is the output returned by expat, the SJIS encoded data, or does it convert the data to UTF-8 and return it? 4. Is there a way through which expat can declare to the client that the data is actually SJIS and not UTF-8? We have another parser on the client side (libxml2) which fails which a parsing error when the XML output from expat is given to it, as the data is japanese while the encoding declaration is UTF-8. Thanks, Saumya From karl at waclawek.net Tue Apr 17 18:14:30 2007 From: karl at waclawek.net (Karl Waclawek) Date: Tue, 17 Apr 2007 12:14:30 -0400 Subject: [Expat-discuss] How is SJIS encoding handled in expat? In-Reply-To: <7026BCCA258BA2438F885772CA0B431307AA3A73@exbtc01.hq.netapp.com> References: <7026BCCA258BA2438F885772CA0B431307AA3A73@exbtc01.hq.netapp.com> Message-ID: <4624F266.1040304@waclawek.net> Agarwal, Saumya wrote: > Hi, > > I have a scenario in which the encoding of the data on the server is in SJIS format. The client requests this data from the server through an API, the server sends the output in XML parsed by the expat parser. > > Here is the input and output - > > > > 1193746vol0 > > OUTPUT: > > > > vol0199699985042a93940-4ed9-11db-ba89-00a09803281611937461/vol/vol0/home/???????? ??????.doc > > > As seen above, the client declares the document encoding to be SHIFT-JIS. The server returns the proper data (seems like SJIS, as japanese characters are represented correctly in the output ) but the encoding declared in the output document is UTF-8. > Now, the strange part is that even if the client declares the document endoding to be UTF-8 in the input, the server behavior is just the same! > > Here are my questions - > 1. Does expat support SJIS encoding? > Not by default. You must register an "unknownEncodingHandler" that can handle SHIFT-JIS. Out of the box, Expat only supports ASCII, ISO8859-1 , UTF-8 and UTF-16 for input. For an example, look at patch #888879 on the Expat web site. > 2. If yes, then how does it know the data is SJIS encoded and when does it call the appropriate handler? > Normally, Expat would reject the input document. Do you know if there is an "unknownEncodingHandler"? Or more likely, the XML_ParserCreate(const XML_Char *encoding); function is called by passing a recognized encoding (instead of null). This would override the encoding declaration and make Expat treat the document as if it thus encoded. > 3. Is the output returned by expat, the SJIS encoded data, or does it convert the data to UTF-8 and return it? > Expat always return either UTF-8 or UTF-16, depending on how it was built. My guess is, the server forces one of the built-in encodings when calling XML_ParserCreate(const XML_Char *encoding). This can work as long as there is no sequence of bytes that represents an invalid code point in that encoding. > 4. Is there a way through which expat can declare to the client that the data is actually SJIS and not UTF-8? We have another parser on the client side (libxml2) which fails which a parsing error when the XML output from expat is given to it, as the data is japanese while the encoding declaration is UTF-8. > No, Expat always returns UTF-8 or UTF-16. I think there is an error on the server side. Since you say the characters returned by Expat are actually SJIS, I assume that the server forces Expat to treat it as one of the built-in encodings (most likely UTF-8). > Karl > > From omer.anjum at tut.fi Thu Apr 19 12:46:02 2007 From: omer.anjum at tut.fi (Omer Anjum) Date: Thu, 19 Apr 2007 13:46:02 +0300 Subject: [Expat-discuss] Expat For Embedded System Message-ID: <20070419134602.p4t905z54o4s0wsk@webmail.tut.fi> Dear Epat Forum members I am working on an embedded system and needs a C or C++ based XML Pareser with size less then 100Kb. Can you tell me that is Expat able to help me in solving my problem. Regards Omer From jeffreyholle at bellsouth.net Thu Apr 19 18:21:27 2007 From: jeffreyholle at bellsouth.net (Jeffrey Holle) Date: Thu, 19 Apr 2007 12:21:27 -0400 Subject: [Expat-discuss] Handling include statements in XML files Message-ID: The XML files which I am attempting to parse with expat 2.0.0 have include statements which I presently can handle. I've attempted to create a new parser via the XML_ExternalEntityParserCreate function, but I am not sure what the "context" parameter needs to be. What should it be? The include XML file is of the same type as the original, so I want all exiting handlers to work with the included file. From karl at waclawek.net Thu Apr 19 20:03:38 2007 From: karl at waclawek.net (Karl Waclawek) Date: Thu, 19 Apr 2007 14:03:38 -0400 Subject: [Expat-discuss] Expat For Embedded System In-Reply-To: <20070419134602.p4t905z54o4s0wsk@webmail.tut.fi> References: <20070419134602.p4t905z54o4s0wsk@webmail.tut.fi> Message-ID: <4627AEFA.1010101@waclawek.net> Omer Anjum wrote: > Dear Epat Forum members > > I am working on an embedded system and needs a C or C++ based XML > Pareser with size less then 100Kb. Can you tell me that is Expat able > to help me in solving my problem. > Regards > I don't think Expat's size is < 100KB on any system, but there are compile time options to minimize the size of Expat (disabling certain features, etc. - read the documentation) and there are also compiler options to keep the size smaller ( for a little less performance). Maybe you can trim it down to about 100KB that way. Karl From nickmacd at gmail.com Thu Apr 19 21:12:25 2007 From: nickmacd at gmail.com (Nick MacDonald) Date: Thu, 19 Apr 2007 15:12:25 -0400 Subject: [Expat-discuss] Handling include statements in XML files In-Reply-To: References: Message-ID: Jeffrey: I think you need to supply some sample XML here to make it clear what you're trying to do. If I understand you properly, you have some sort of embedded XML tag that you want to process to get another XML file that should be considered part of the original file? Like so? If this is the case, then you really want to just create a new parser to start parsing the included file right at the point that you get the call back for the end tag. You would then read that file for the duration of the file, and then when it concludes successfully, you would return to parsing the original file. The only downside of this approach, is that the included XML file would need to be valid, it couldn't be a partial file. If you needed to handle partial files, you'd probably need to generate an interim file which would include the content from all the files included, and then you would process that interim file and then presumably delete it as part of the clean up. Thus, in that case, you would have two steps: 1. copy the input file to a new file while include statements found in new file parse the new file for includes when include statement is found merge the two files into one new file end 2. parse the final new file Good luck with your project... Nick On 4/19/07, Jeffrey Holle wrote: > The XML files which I am attempting to parse with expat 2.0.0 have > include statements which I presently can handle. > > I've attempted to create a new parser via the > XML_ExternalEntityParserCreate function, but I am not sure what the > "context" parameter needs to be. What should it be? > > The include XML file is of the same type as the original, so I want all > exiting handlers to work with the included file. > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > -- Nick MacDonald NickMacD at gmail.com From benski at nullsoft.com Thu Apr 19 21:12:12 2007 From: benski at nullsoft.com (Ben Allison) Date: Thu, 19 Apr 2007 15:12:12 -0400 (EDT) Subject: [Expat-discuss] Expat For Embedded System In-Reply-To: <4627AEFA.1010101@waclawek.net> References: <20070419134602.p4t905z54o4s0wsk@webmail.tut.fi> <4627AEFA.1010101@waclawek.net> Message-ID: <1145.75.75.86.10.1177009932.squirrel@mail.winamp.com> > Omer Anjum wrote: >> Dear Epat Forum members >> >> I am working on an embedded system and needs a C or C++ based XML >> Pareser with size less then 100Kb. Can you tell me that is Expat able >> to help me in solving my problem. >> Regards >> > > I don't think Expat's size is < 100KB on any system, but there are > compile time options > to minimize the size of Expat (disabling certain features, etc. - read > the documentation) > and there are also compiler options to keep the size smaller ( for a > little less performance). > Maybe you can trim it down to about 100KB that way. If you statically link expat, and have a reasonable smart compiler, it should check in at less than 100kb (we get it down to about 75kb even with UTF-16 and namespace enabled). Creating a dynamic library (e.g. DLL or so) tends to be larger because the compiler isn't sure what code will and won't be called. If you need a dynamic library (maybe because multiple programs use the library) then you should be able to trim some code from unused areas by tweaking your compiler and linker settings. From santhoshpremkumar at gmail.com Fri Apr 20 07:29:46 2007 From: santhoshpremkumar at gmail.com (Santhosh Premkumar) Date: Fri, 20 Apr 2007 10:59:46 +0530 Subject: [Expat-discuss] Build on x86_64 AMD cross compilation Message-ID: <54ac2f0b0704192229g4790355fnd5c9ddfe6ee36047@mail.gmail.com> Hi I have a problem in building Expat Library in VC++ 2005 for AMD x86_64 bit compiler. I need to build the library using this compiler. I tried to configure using automake and run make, but it issues build library is invalid. Have any one run this on VC++ compiler using make file ( I am running in a shell prompt). Please guide me to run. Thanks Santhosh From santhoshpremkumar at gmail.com Fri Apr 20 07:46:03 2007 From: santhoshpremkumar at gmail.com (Santhosh Premkumar) Date: Fri, 20 Apr 2007 11:16:03 +0530 Subject: [Expat-discuss] Build on x86_64 AMD cross compilation In-Reply-To: <7C83A8A6B56D3A478333B1DF47E185868E524B@MPBABGEX01.corp.mphasis.com> References: <54ac2f0b0704192229g4790355fnd5c9ddfe6ee36047@mail.gmail.com> <7C83A8A6B56D3A478333B1DF47E185868E524B@MPBABGEX01.corp.mphasis.com> Message-ID: <54ac2f0b0704192246k38c5997cw64876a11c450b9d4@mail.gmail.com> Hi I could't find the documents. Could you please provide me the link I will elobrate my requirements 1. I have downloaded Expat 2 library files 2. Tried to Build in Microsoft visual studio compiler (C:\Program Files\Microsoft Visual Studio 8\VC\bin\x86_amd64) This is being done inorder to port 64-Bit expat Library I done these steps 1. ./configure --host=x86 --target=x86_64 CC=CL 2. make buildlib 3. Make install While installing the wmlwf present in expat folder having test files shows a warning that "Invalid library format: Ignored" . This library is build through the x86_64 compiler found to be wroing. I Do no in which place i went wrong (.. in config or in linker ) If you have any ideas. Please share them. Thanks I Santhosh Chennai Driver Testing and Development Chennai 9884937329 On 4/20/07, Mukesh Kumar wrote: > > Look to run the Expat libarary, you should also installed Active Perl, > please go thrugh the document provided by Expat INC. and if u don't > understand any step, let us know... > > Regards, > Mukesh Kumar, > Sr.Software Engineer, > Bangalore, > India. > 9342906419 (M). > > -----Original Message----- > From: expat-discuss-bounces+mukesh.s=mphasis.com at libexpat.org > [mailto:expat-discuss-bounces+mukesh.s=mphasis.com at libexpat.org] On > Behalf Of Santhosh Premkumar > Sent: Friday, April 20, 2007 11:00 AM > To: expat-discuss at libexpat.org > Subject: [Expat-discuss] Build on x86_64 AMD cross compilation > > Hi > > I have a problem in building Expat Library in VC++ 2005 for AMD x86_64 > bit > compiler. > > I need to build the library using this compiler. I tried to configure > using > automake and run make, but it issues build library is invalid. > > Have any one run this on VC++ compiler using make file ( I am running in > a > shell prompt). Please guide me to run. > > Thanks > Santhosh > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > From Saumya.Agarwal at netapp.com Fri Apr 20 10:36:33 2007 From: Saumya.Agarwal at netapp.com (Agarwal, Saumya) Date: Fri, 20 Apr 2007 14:06:33 +0530 Subject: [Expat-discuss] How is SJIS encoding handled in expat? Message-ID: <7026BCCA258BA2438F885772CA0B431307C4E5CC@exbtc01.hq.netapp.com> Thanks Karl. The problem was that XML_ParserCreate(const XML_Char *encoding); function was being called by passing UTF-8 which was overriding the encoding declaration, as you suspected. >Not by default. You must register an "unknownEncodingHandler" that can handle SHIFT-JIS. >Out of the box, Expat only supports ASCII, ISO8859-1 , UTF-8 and UTF-16 for input. >For an example, look at patch #888879 on the Expat web site. Where can I find an encoding handler which can handle SHIFT-JIS? Will expat be able to support both UTF-8 and SHIFT-JIS encoding at the same time if I register such an handler? Thanks, Saumya -----Original Message----- From: Karl Waclawek [mailto:karl at waclawek.net] Sent: Tuesday, April 17, 2007 9:45 PM To: expat-discuss at libexpat.org Subject: Re: [Expat-discuss] How is SJIS encoding handled in expat? Agarwal, Saumya wrote: > Hi, > > I have a scenario in which the encoding of the data on the server is in SJIS format. The client requests this data from the server through an API, the server sends the output in XML parsed by the expat parser. > > Here is the input and output - > > 'file:/etc/netapp_filer.dtd'> xmlns="http://www.netapp.com/filer/admin > " > version="1.0">1193746 lume-name>vol0 > > OUTPUT: > '/na_admin/netapp_filer.dtd'> xmlns='http://www.netapp.com/filer/admin'> > status="passed">vol01996999850 > 42a93940-4ed9-11db-ba89-00a098032816 e-uuid>11937461 r-of-parents>/vol/vol0/hom > e/???????? > ??????.doc etapp> > > > As seen above, the client declares the document encoding to be SHIFT-JIS. The server returns the proper data (seems like SJIS, as japanese characters are represented correctly in the output ) but the encoding declared in the output document is UTF-8. > Now, the strange part is that even if the client declares the document endoding to be UTF-8 in the input, the server behavior is just the same! > > Here are my questions - > 1. Does expat support SJIS encoding? > Not by default. You must register an "unknownEncodingHandler" that can handle SHIFT-JIS. Out of the box, Expat only supports ASCII, ISO8859-1 , UTF-8 and UTF-16 for input. For an example, look at patch #888879 on the Expat web site. > 2. If yes, then how does it know the data is SJIS encoded and when does it call the appropriate handler? > Normally, Expat would reject the input document. Do you know if there is an "unknownEncodingHandler"? Or more likely, the XML_ParserCreate(const XML_Char *encoding); function is called by passing a recognized encoding (instead of null). This would override the encoding declaration and make Expat treat the document as if it thus encoded. > 3. Is the output returned by expat, the SJIS encoded data, or does it convert the data to UTF-8 and return it? > Expat always return either UTF-8 or UTF-16, depending on how it was built. My guess is, the server forces one of the built-in encodings when calling XML_ParserCreate(const XML_Char *encoding). This can work as long as there is no sequence of bytes that represents an invalid code point in that encoding. > 4. Is there a way through which expat can declare to the client that the data is actually SJIS and not UTF-8? We have another parser on the client side (libxml2) which fails which a parsing error when the XML output from expat is given to it, as the data is japanese while the encoding declaration is UTF-8. > No, Expat always returns UTF-8 or UTF-16. I think there is an error on the server side. Since you say the characters returned by Expat are actually SJIS, I assume that the server forces Expat to treat it as one of the built-in encodings (most likely UTF-8). > Karl > > _______________________________________________ Expat-discuss mailing list Expat-discuss at libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss From karl at waclawek.net Fri Apr 20 14:57:39 2007 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 20 Apr 2007 08:57:39 -0400 Subject: [Expat-discuss] Build on x86_64 AMD cross compilation In-Reply-To: <54ac2f0b0704192229g4790355fnd5c9ddfe6ee36047@mail.gmail.com> References: <54ac2f0b0704192229g4790355fnd5c9ddfe6ee36047@mail.gmail.com> Message-ID: <4628B8C3.3030902@waclawek.net> Santhosh Premkumar wrote: > Hi > > I have a problem in building Expat Library in VC++ 2005 for AMD x86_64 bit > compiler. > > Why don't you use use the VC++ 6.0 project files (.dws, .dsp)? They can be opened and upgraded in VC++ 2005. Karl From karl at waclawek.net Fri Apr 20 15:10:25 2007 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 20 Apr 2007 09:10:25 -0400 Subject: [Expat-discuss] How is SJIS encoding handled in expat? In-Reply-To: <7026BCCA258BA2438F885772CA0B431307C4E5CC@exbtc01.hq.netapp.com> References: <7026BCCA258BA2438F885772CA0B431307C4E5CC@exbtc01.hq.netapp.com> Message-ID: <4628BBC1.7050408@waclawek.net> Agarwal, Saumya wrote: > Thanks Karl. The problem was that XML_ParserCreate(const XML_Char *encoding); function was being called by passing UTF-8 which was overriding the encoding declaration, as you suspected. > > >> Not by default. You must register an "unknownEncodingHandler" that can handle SHIFT-JIS. >> Out of the box, Expat only supports ASCII, ISO8859-1 , UTF-8 and UTF-16 for input. >> For an example, look at patch #888879 on the Expat web site. >> > > Where can I find an encoding handler which can handle SHIFT-JIS? Will expat be able to support both UTF-8 and SHIFT-JIS encoding at the same time if I register such an handler? > > I don't know of a publicly available one. You could roll your own, using the docs and the example I emtioned above (for GB2312), or you could simply convert the SHIFT-JIS input to UTF-8. Just Google for it - there may be some OpenSource available. Karl From rcruz at cpsinet.com Mon Apr 23 15:12:34 2007 From: rcruz at cpsinet.com (Robert Cruz) Date: Mon, 23 Apr 2007 08:12:34 -0500 Subject: [Expat-discuss] UnixWare port not part of the standard distribution? Message-ID: <002c01c785a9$11d4bb70$d105000a@cpsinet.com> I'm trying to work with an external vendor whose platform we depend on in many ways. We were looking into utilizing their XML capabilities, which use the expat library, however we kept receiving an error message telling us that we did not have expat installed. We run SCO UnixWare 7.14, and I have installed a package for expat 1.95.1 that I got from SCO's skunkware website. I installed that port, but I still received the same error message from the third party platform. When I contacted them about the error, they stated that the UnixWare port must not be a part of the standard distribution of the library that they are using. I've looked through some of the files in expat 2, and have seen config code for UnixWare, so I'm not 100% convinced that what they are saying is accurate. However, I'm obviously not a developer on the expat project, and therefore my opinion doesn't really count for much. I am curious to see if their position has any merit. If it does, what needs to be done to fold any UnixWare specific development into the standard distribution of the expat library? Thanks, Robert Cruz Senior Programmer CPSI 6600 Wall Street Mobile, Alabama 36695 Tel: 251.639.8100 Fax: 251.639.8214 http://www.cpsinet.com From stevencvernon at comcast.net Sun Apr 29 07:09:48 2007 From: stevencvernon at comcast.net (Steve Vernon) Date: Sat, 28 Apr 2007 22:09:48 -0700 Subject: [Expat-discuss] expat Memory Footprint? Message-ID: <005d01c78a1c$9c4603a0$6402a8c0@dell4700> What is the memory footprint for expat? How does it grow with the input document? I would like to know what is the starting size, then what "variables" it depends upon and what is the factor of expansion for each such variable. (Of course, I am talking about non-static data size - compiled code size is not likely to be an issue.) For example, I assume that for parsing that we need to at minimum keep track of all the open start tags (ones that so far have seen no corresponding end tags) and that these are stored in XML_Chars. But perhaps there is other overhead per open start tag. And, of course, there could be information stored for DTD items. The best estimate of the size would be appreciated. At most a factor of 2 over the reality. It seems that XML_MIN_SIZE affects some size, but am not certain that it affects data size - again, we won't likely have issues with code size. We will likely want to use most features of expat, so we don't want to turn off features. We have a constrained memory environment and will need to do many parallel parses. With enough parallelism the total memory footprint could get prohibitive if expat is not frugal. Essentially every byte may count. Thanks in advance.