From egil at kvaleberg.com Mon Jul 5 14:32:58 2010 From: egil at kvaleberg.com (Egil Kvaleberg) Date: Mon, 05 Jul 2010 14:32:58 +0200 Subject: [Expat-discuss] Lack of symmetry: How to create XML Message-ID: <4C31D0FA.1020600@kvaleberg.com> It has always occurred to me that while eXpat is a very elegant and simple solution to the task of parsing XML, AFAIK no parallel and obvious way of generating XML exists. When the questions is raised, I often see it suggested that one employs a "DIY" approach of generating XML through fscanf() or similar. While the task of generating XML is much more trivial than parsing, use of fscanf() is at best highly inelegant and un-symmetric. Even worse is that it introduces the user to problems of escaping characters and handling of character encoding and syntactic correctness that XML really was intended to isolate you from in the first place. So, without further ado, let me hereby suggest that eXpat is extended by a very lightweight set of functions to also generate XML. The additional overhead to the existing library is in fact entirely minimal. If there is interest, I would be more than happy to submit some code. The suggestion below I believe follows the ideas of the simplicity of the parsing side. There is one callback handler that is invoked at any time the buffer should be flushed, and the generator functions are entirely straight forward. This suggestion is for the very basic of functions, support for more sophisticated functions can readily be added later. Sincerely, Egil ----------------------------------CUT HERE----------------------------------- struct XML_GenStruct; typedef struct XML_GenStruct *XML_Generator; /* XML generator buffer handler. Will be called whenever a buffer is ready. userData is as defined by XML_GenUserData(), or NULL by default. isFinal is set for the final buffer, no further calls will be done. len may in some cases be zero. */ typedef void (*XML_GenBufferHandler)(XML_Generator g, void *userData, const char *s, int len, int isFinal); /* Create an XML generator, specifying a buffer handler. If len is NULL, a buffer of size buflen will be allocated and managed for you. Otherwise, the buffer you provide can only be released after XML_GenFree() has returned. */ XML_Generator XML_GenCreate(const XML_Char* encoding, XML_GenBufferHandler genbuf, char *buf, int buflen); /* XML generator done. After this, XML_Generator g must not be used */ void XML_GenFree(XML_Generator g); /* Generate a starting element. Must always be paired with a subsequent XML_GenEndElement() */ void XML_GenStartElement(XML_Generator g, const XML_Char *name); /* Generate an attribute pair for immediately preceding XML_GenStartElement(). Call once for every attribute. */ void XML_GenElementAttribute(XML_Generator g, const XML_Char *attr_name, const XML_Char *attr_value); /* Generate character data for XML_GenStartElement(). May be called more than once. */ void XML_GenCharacterData(XML_Generator g, const XML_Char *s); /* End element generation. Name must be the same as for the XML_GenStartElement() call */ void XML_GenEndElement(XML_Generator g, const XML_Char *name); /* Insert a comment into the XML. */ void XML_GenComment(XML_Generator g, const XML_Char *cmnt); /* Specify the userData for XML generator. */ void XML_GenUserData(XML_Generator g, void *userData); ----------------------------------CUT HERE----------------------------------- -- Company: Kvaleberg AS Office: +47 22 44 31 75 Mobile: +47 920 22 780 Fax: +47 22 44 46 77 Web: http://www.kvaleberg.com/ From fdrake at acm.org Tue Jul 6 18:00:01 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 6 Jul 2010 12:00:01 -0400 Subject: [Expat-discuss] Lack of symmetry: How to create XML In-Reply-To: <4C31D0FA.1020600@kvaleberg.com> References: <4C31D0FA.1020600@kvaleberg.com> Message-ID: On Mon, Jul 5, 2010 at 8:32 AM, Egil Kvaleberg wrote: > So, without further ado, let me hereby suggest that eXpat is extended by > a very lightweight set of functions to also generate XML. The additional > overhead to the existing library is in fact entirely minimal. I concur that something better than fprintf would be a win for dealing with generation. I'm also happy to defer to others for what a good C API for that would look like, since I don't use eXpat from C for applications. I will note two things, which you've probably already thought about: - many applications consume XML without ever generating it, and - one of eXpat's goals is to be gentle on the memory footprint. Given this, I'd rather see someone come up a *separate* library for XML generation; there are also applications out there than want to generate XML without consuming it, and they could similarly benefit from a decent library for the purpose. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From robert.bielik at xponaut.se Tue Jul 6 18:19:46 2010 From: robert.bielik at xponaut.se (Robert Bielik) Date: Tue, 06 Jul 2010 18:19:46 +0200 Subject: [Expat-discuss] Lack of symmetry: How to create XML In-Reply-To: References: <4C31D0FA.1020600@kvaleberg.com> Message-ID: <4C3357A2.2080603@xponaut.se> Fred Drake skrev: > Given this, I'd rather see someone come up a *separate* library for > XML generation; there are also applications out there than want to > generate XML without consuming it, and they could similarly benefit > from a decent library for the purpose. +1 /Rob From fdrake at acm.org Tue Jul 6 18:59:09 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 6 Jul 2010 12:59:09 -0400 Subject: [Expat-discuss] Emails from mpcustomer.com Message-ID: Someone on this list (or their employer) has some sort of auto-responder set up that sends an email to anyone who sends mail to the list. The emails have a reply-tp: support at mpcustomer.com, but are sent with a from: header that implicates (incorrectly) whoever started the thread. If you think this might be you, please, make it stop. I don't see anyone subscribed to the list with an address in the mpcustomer.com domain. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From karl at waclawek.net Tue Jul 6 20:25:55 2010 From: karl at waclawek.net (Karl Waclawek) Date: Tue, 06 Jul 2010 14:25:55 -0400 Subject: [Expat-discuss] Lack of symmetry: How to create XML In-Reply-To: References: <4C31D0FA.1020600@kvaleberg.com> Message-ID: <4C337533.9010302@waclawek.net> On 06/07/2010 12:00 PM, Fred Drake wrote: > I concur that something better than fprintf would be a win for dealing > with generation. I'm also happy to defer to others for what a good C > API for that would look like, since I don't use eXpat from C for > applications. > > I will note two things, which you've probably already thought about: > > - many applications consume XML without ever generating it, and > > - one of eXpat's goals is to be gentle on the memory footprint. > > Given this, I'd rather see someone come up a *separate* library for > XML generation; there are also applications out there than want to > generate XML without consuming it, and they could similarly benefit > from a decent library for the purpose. > > I think Tim Bray wrote such an app: http://www.tbray.org/ongoing/genx/docs/Guide.html From the requirements you will see that it is not such a simple undertaking. (I once wrote one myself in Delphi). Karl From marco.maggi-ipsu at poste.it Wed Jul 7 10:58:03 2010 From: marco.maggi-ipsu at poste.it (Marco Maggi) Date: Wed, 07 Jul 2010 10:58:03 +0200 Subject: [Expat-discuss] Emails from mpcustomer.com In-Reply-To: marco@localhost (Fred Drake's message of "Tue, 6 Jul 2010 12:59:09 -0400") References: Message-ID: <87tyobis84.fsf@rapitore.luna> "Fred Drake" wrote: > Someone on this list (or their employer) has some sort of > auto-responder set up that sends an email to anyone who > sends mail to the list. The emails have a reply-tp: > support at mpcustomer.com, but are sent with a from: header > that implicates (incorrectly) whoever started the thread. The same happened on the gnupg-users list, try a search for "gnupg-users mpcustomer" and browse the results for indications about how it was solved. HTH -- Marco Maggi From janezz55 at gmail.com Sun Jul 11 05:21:01 2010 From: janezz55 at gmail.com (Janez Zemva) Date: Sun, 11 Jul 2010 05:21:01 +0200 Subject: [Expat-discuss] binary xml Message-ID: One of the methods to encode binary data into an XML document is the CDATA method, explained on this page: http://articles.techrepublic.com.com/5100-10878_11-1050529.html However, I notice, that expat returns only the CDATA characters up to, but not including, the first control character it encounters. This may be fine according to the XML 1.0 standard, but is there a way around this, even though we don't follow the standard anymore? I'd like to get everything between the two CDATA tags. Please, don't suggest the base64 encoding or some other encoding, it is not possible for me to use those in my project. From fdrake at acm.org Sun Jul 11 20:20:42 2010 From: fdrake at acm.org (Fred Drake) Date: Sun, 11 Jul 2010 14:20:42 -0400 Subject: [Expat-discuss] binary xml In-Reply-To: References: Message-ID: On Sat, Jul 10, 2010 at 11:21 PM, Janez Zemva wrote: > However, I notice, that expat returns only the CDATA characters up to, > but not including, the first control character it encounters. Expat supports XML 1.0. If your payload doesn't conform, there's no reason to expect Expat, or any other XML parser, to accept it. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From nickmacd at gmail.com Sun Jul 11 20:53:01 2010 From: nickmacd at gmail.com (Nick MacDonald) Date: Sun, 11 Jul 2010 15:53:01 -0300 Subject: [Expat-discuss] binary xml In-Reply-To: References: Message-ID: What Fred says is precisely correct... however I have an idea for a possible workaround: Since *YOU* control the supply of data to eXpat... is there any way you can recognize your scenario and have the binary data shunted to a different buffer instead of supplying it to eXpat? Nick On Sun, Jul 11, 2010 at 3:20 PM, Fred Drake wrote: > On Sat, Jul 10, 2010 at 11:21 PM, Janez Zemva wrote: >> However, I notice, that expat returns only the CDATA characters up to, >> but not including, the first control character it encounters. > > Expat supports XML 1.0. ?If your payload doesn't conform, there's no > reason to expect Expat, or any other XML parser, to accept it. From janezz55 at gmail.com Mon Jul 12 07:51:59 2010 From: janezz55 at gmail.com (Janez Zemva) Date: Mon, 12 Jul 2010 07:51:59 +0200 Subject: [Expat-discuss] binary xml In-Reply-To: References: Message-ID: Good idea... Perhaps a better one would be to simply override the length parameter expat delivers to the character handler. Now the big question is... Does expat load all the data between CDATA tags and _then_ calls the character data handler, even if length is less than what it loads, i.e.: This way expat parsing would not matter, as length would already be provided. The "]]>" strings in the binary data are "fixed" into something else before saving them, the offsets of the fixes are then stored into the fixes field. I suppose, if the parse buffer were long enough to accommodate the entire file, the approach would work. But what if the parse buffer is not the same size as the file? Say there is a NULL character as the first character of CDATA. expat will then report a length of 0, but will it always load the entire CDATA contents, before calling the character data handler? Did any of you play with this? I'll have to check myself sooner or later. 2010/7/12 Nick MacDonald : > Basically I'm just suggesting some semi-intelligent front end filter: > > I'm assuming you might have something like this as input: > > > > ..random binary data... > ..random binary data... > ]]> > > > > > I'm thinking more like this pseudo-code: > > while (more data to process) > ..read data file into buffer > ..if (looks like binary data or expecting binary data) > ....do something useful with binary data that doesn't involve XML parsing > ..if (looks like XML or expecting it to be XML) > ....pass data to eXpat for processing > > In theory, in the above scenario, your buffer could be as few as *one* > character... ?although honestly I have never tested eXpat out in such > a scenario, I suspect its highly likely it would work... > > By passing known good XML into eXpat and watching its callbacks you > detect when in your input the data in now binary, and then switch > accordingly. ?The only trick is to know when to switch back to normal > XML, but I think that's doable... > > Nick > > > On Sun, Jul 11, 2010 at 11:07 PM, Janez Zemva wrote: >>> Since *YOU* control the supply of data to eXpat... is there any way >>> you can recognize your scenario and have the binary data shunted to a >>> different buffer instead of supplying it to eXpat? >> >> Yes, I can provide a default handler maybe, for an unrecognized tag? >> You've had that in mind? I was thinking more along the line of a >> "binary" character encoding, not utf-8 or ucs-16, or anything else. >> > > > > -- > Nick MacDonald > NickMacD at gmail.com > From strizhov at cs.colostate.edu Sun Jul 18 18:14:44 2010 From: strizhov at cs.colostate.edu (Mikhail Strizhov) Date: Sun, 18 Jul 2010 10:14:44 -0600 Subject: [Expat-discuss] TCP live stream buffer and expat xml parsing Message-ID: <4C432874.1040101@cs.colostate.edu> Hi all, I have live tcp xml stream and each xml message has same format: ...other_xml_items_here... ...other_xml_items_here... ...other_xml_items_here... When I'm calling TCP recv function to get data from socket I need to specify size of buffer, lets say 4096 bytes. Usually one .. message is around 2500-3000 bytes. In this case I'm getting 1st full message and half of next. Afterwards I'm forwarding this buffer to XML_Parse function - 1st message parsed successfully, but 2nd is half parsed and then error messages. Is anybody know how to handle live tcp stream with libexpat? My code is large to attach, its available here - http://www.netsec.colostate.edu/~strizhov/bgpmon/bgpmonclient.c Thank you! -- *Sincerely,* *Mikhail Strizhov* *Email: strizhov at cs.colostate.edu * From nickmacd at gmail.com Mon Jul 19 02:38:43 2010 From: nickmacd at gmail.com (Nick MacDonald) Date: Sun, 18 Jul 2010 20:38:43 -0400 Subject: [Expat-discuss] TCP live stream buffer and expat xml parsing In-Reply-To: <4C432874.1040101@cs.colostate.edu> References: <4C432874.1040101@cs.colostate.edu> Message-ID: Mikhail: eXpat can handle the supplied data in chunks smaller than the whole file/message, so I assume you're running into the following problem: According to the XML spec, a properly formed XML document can have only ONE root element.... it appears you are attempting to pass more than one to eXpat... You would need to detect the end of one document, and reset the parsing for the next... or you could probably use a bit of a hack... Just pass in your own buffer at the beginning... with your own root tag... and you won't need to supply the ending root tag until such time as you want to shut down parsing with eXpat... Right now, your root tags look like ... so instead of this which looks like two root tags in a row... feed in something else like where I "magically" prefixed it with a "BGPMessageParser> tag of my own invention... As far as I know, that should work for you... You'd still need to reset everything on any errors in the supplied data... but you should have been already thinking about that problem before as nothing changes in error handling in this new approach... Hope that helped... Good luck... Nick On Sun, Jul 18, 2010 at 12:14 PM, Mikhail Strizhov wrote: > I have live tcp xml stream and each xml message has same format: > > xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE"> > ...other_xml_items_here... > > > xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE"> > ...other_xml_items_here... > > > xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE"> > ...other_xml_items_here... > > > When I'm calling TCP recv function to get data from socket I need to specify > size of buffer, lets say 4096 bytes. > Usually one .. message is around 2500-3000 bytes. > In this case I'm getting 1st full message and half of next. > Afterwards I'm forwarding this buffer to XML_Parse function - 1st message > parsed successfully, but 2nd is half parsed and then error messages. > > Is anybody know how to handle live tcp stream with libexpat? > > My code is large to attach, its available here - > http://www.netsec.colostate.edu/~strizhov/bgpmon/bgpmonclient.c From strizhov at cs.colostate.edu Mon Jul 19 06:33:52 2010 From: strizhov at cs.colostate.edu (Mikhail Strizhov) Date: Sun, 18 Jul 2010 22:33:52 -0600 Subject: [Expat-discuss] TCP live stream buffer and expat xml parsing In-Reply-To: References: <4C432874.1040101@cs.colostate.edu> Message-ID: <4C43D5B0.4000902@cs.colostate.edu> Nick, Sorry, my fault, I didn't tell that when I'm connecting to live tcp stream, I get this xml structure: ... ... ... and so on. Anyway thanks for help! I found my error in code - each time of getting data from socket, I was calling XML_Parser parser = XML_ParserCreate(NULL); - creating new parser for new message. Its wrong. Simple code should be: char xml[BUF_SIZE]; memset(xml, '\0', sizeof(xml)); int done=0; XML_Parser parser = XML_ParserCreate(NULL); XML_SetElementHandler(parser, start_element, end_element); XML_SetCharacterDataHandler(parser, char_data); do { memset(xml, '\0', sizeof(xml)); int len = readn(sock, xml, BUF_SIZE); if (len <= 0 ) break; done = len < BUF_SIZE ? 1: 0; if (XML_Parse(parser, xml, len, done) == XML_STATUS_ERROR) printf("Error: %s\n", XML_ErrorString(XML_GetErrorCode(parser))); } while(!done); XML_ParserFree(parser); And it works fine. -- *Sincerely,* *Mikhail Strizhov* *Email: strizhov at cs.colostate.edu * On 07/18/2010 06:38 PM, Nick MacDonald wrote: > Mikhail: > > eXpat can handle the supplied data in chunks smaller than the whole > file/message, so I assume you're running into the following problem: > > According to the XML spec, a properly formed XML document can have > only ONE root element.... it appears you are attempting to pass more > than one to eXpat... You would need to detect the end of one > document, and reset the parsing for the next... or you could probably > use a bit of a hack... Just pass in your own buffer at the > beginning... with your own root tag... > and you won't need to supply the ending root tag until such time as > you want to shut down parsing with eXpat... > > Right now, your root tags look like ... so instead of this > > > > > > which looks like two root tags in a row... feed in > something else like > > > > > > > > > > where I "magically" prefixed it with a "BGPMessageParser> tag of my > own invention... > > As far as I know, that should work for you... You'd still need to > reset everything on any errors in the supplied data... but you should > have been already thinking about that problem before as nothing > changes in error handling in this new approach... > > Hope that helped... Good luck... > > Nick > > > > On Sun, Jul 18, 2010 at 12:14 PM, Mikhail Strizhov > wrote: > >> I have live tcp xml stream and each xml message has same format: >> >> > xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE"> >> ...other_xml_items_here... >> >> >> > xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE"> >> ...other_xml_items_here... >> >> >> > xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE"> >> ...other_xml_items_here... >> >> >> When I'm calling TCP recv function to get data from socket I need to specify >> size of buffer, lets say 4096 bytes. >> Usually one.. message is around 2500-3000 bytes. >> In this case I'm getting 1st full message and half of next. >> Afterwards I'm forwarding this buffer to XML_Parse function - 1st message >> parsed successfully, but 2nd is half parsed and then error messages. >> >> Is anybody know how to handle live tcp stream with libexpat? >> >> My code is large to attach, its available here - >> http://www.netsec.colostate.edu/~strizhov/bgpmon/bgpmonclient.c >>