From David.VanGeest at bepco.com Tue Sep 9 23:29:42 2008 From: David.VanGeest at bepco.com (David van Geest) Date: Tue, 9 Sep 2008 17:29:42 -0400 Subject: [Expat-discuss] CDATA parsing Message-ID: Hi all, We're using an older version of the Expat XML parser. I wish I could tell you what version, but it's nowhere to be found. The copyright in the header reads: "Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd" I'm wondering how CDATA sections are handled by the parser. Our wrapper classes return the first 21 characters of the CDATA section as the value of the node containing the section, and I can't figure out what is happening to the rest. I've been looking through xmlparse.c, but it's not the most human-readable code ever.... Can anyone shed some light on this? Thanks! ---- David van Geest Software Engineer Burke E. Porter Machinery 616.234.1214 From webmaster at hartwork.org Wed Sep 10 00:06:49 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Wed, 10 Sep 2008 00:06:49 +0200 Subject: [Expat-discuss] CDATA parsing In-Reply-To: References: Message-ID: <48C6F379.4090209@hartwork.org> David van Geest wrote: > We're using an older version of the Expat XML parser. I wish I could > tell you what version, but it's nowhere to be found. The copyright in > the header reads: > > "Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd" Is your Expat code available somewhere for download or could you make it so? You don't have to find out alone then. PS: Upgrading is no option for sure? > I'm wondering how CDATA sections are handled by the parser. Our > wrapper classes return the first 21 characters of the CDATA section as > the value of the node containing the section, and I can't figure out > what is happening to the rest. Is there just one call of your handler function? Sebastian From David.VanGeest at bepco.com Wed Sep 10 15:24:19 2008 From: David.VanGeest at bepco.com (David van Geest) Date: Wed, 10 Sep 2008 09:24:19 -0400 Subject: [Expat-discuss] CDATA parsing In-Reply-To: <48C6F379.4090209@hartwork.org> References: <48C6F379.4090209@hartwork.org> Message-ID: > David van Geest wrote: > > We're using an older version of the Expat XML parser. I wish I could > > tell you what version, but it's nowhere to be found. The copyright in > > the header reads: > > > > "Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd" > > Is your Expat code available somewhere for download or could you make it > so? > You don't have to find out alone then. > > PS: Upgrading is no option for sure? A zip of our code is available at dvg.homelinux.net/expat.zip. Upgrading is not an option at this point, it's too high-risk. > > I'm wondering how CDATA sections are handled by the parser. Our > > wrapper classes return the first 21 characters of the CDATA section as > > the value of the node containing the section, and I can't figure out > > what is happening to the rest. > > Is there just one call of your handler function? There is only one case where we are parsing a file with a CDATA section. I've tried all sorts of things to get at the data, but all of them return the same 21 characters. Thanks for your help! ---- David van Geest Software Engineer Burke E. Porter Machinery 616.234.1214 From webmaster at hartwork.org Wed Sep 10 20:22:37 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Wed, 10 Sep 2008 20:22:37 +0200 Subject: [Expat-discuss] CDATA parsing In-Reply-To: References: <48C6F379.4090209@hartwork.org> Message-ID: <48C8106D.9080009@hartwork.org> David van Geest wrote: >>> I'm wondering how CDATA sections are handled by the parser. Our >>> wrapper classes return the first 21 characters of the CDATA section > as >>> the value of the node containing the section, and I can't figure out >>> what is happening to the rest. >> Is there just one call of your handler function? > > There is only one case where we are parsing a file with a CDATA section. > I've tried all sorts of things to get at the data, but all of them > return the same 21 characters. I think you might have misunderstood my question. Is you callback handler only called one time? I have not worked with the CDATA handler before but it reminds of #1 confusion coming up here that character can be served split over several handler calls. Could that be the case for you? Sebastian From David.VanGeest at bepco.com Wed Sep 10 21:39:06 2008 From: David.VanGeest at bepco.com (David van Geest) Date: Wed, 10 Sep 2008 15:39:06 -0400 Subject: [Expat-discuss] CDATA parsing In-Reply-To: <48C8106D.9080009@hartwork.org> References: <48C6F379.4090209@hartwork.org> <48C8106D.9080009@hartwork.org> Message-ID: > I think you might have misunderstood my question. > Is you callback handler only called one time? > I have not worked with the CDATA handler before > but it reminds of #1 confusion coming up here > that character can be served split over several > handler calls. Could that be the case for you? Thanks for your reply, Sebastian. You're right, I misunderstood your question. I will look into whether the callback handler is only called once. Right now this is low priority, so I might not get to look at it for a while, but I appreciate your help. ---- David van Geest Software Engineer Burke E. Porter Machinery 616.234.1214 From webmaster at hartwork.org Wed Sep 10 22:17:22 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Wed, 10 Sep 2008 22:17:22 +0200 Subject: [Expat-discuss] Identifying the version of your copy of Expat / was Re: CDATA parsing In-Reply-To: References: <48C6F379.4090209@hartwork.org> Message-ID: <48C82B52.9060403@hartwork.org> David van Geest wrote: > A zip of our code is available at dvg.homelinux.net/expat.zip. > Upgrading is not an option at this point, it's too high-risk. I've been working on a script comparing your copy Expat to the available CVS tags. All of these tags are equally likely with 14 of 18 tested files matching: - V20000512 - sourceforge_init - libexpat-alpha-1 - jclark-orig My bash script and the result details are attached. Sebastian -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: guess_version.sh.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: result.txt URL: From David.VanGeest at bepco.com Wed Sep 10 22:52:09 2008 From: David.VanGeest at bepco.com (David van Geest) Date: Wed, 10 Sep 2008 16:52:09 -0400 Subject: [Expat-discuss] Identifying the version of your copy of Expat / was Re: CDATA parsing In-Reply-To: <48C82B52.9060403@hartwork.org> References: <48C6F379.4090209@hartwork.org> <48C82B52.9060403@hartwork.org> Message-ID: > David van Geest wrote: > > A zip of our code is available at dvg.homelinux.net/expat.zip. > > Upgrading is not an option at this point, it's too high-risk. > > I've been working on a script comparing your copy Expat > to the available CVS tags. > > All of these tags are equally likely with 14 of 18 tested > files matching: > - V20000512 > - sourceforge_init > - libexpat-alpha-1 > - jclark-orig > > My bash script and the result details are attached. Thanks! I appreciate it! I imagine we modified a few files at some point (long before my time...), which is why we don't have any exact matches. ---- David van Geest Software Engineer Burke E. Porter Machinery 616.234.1214 From mikeh at paxit.com Thu Sep 11 20:23:52 2008 From: mikeh at paxit.com (Mike Hansen) Date: Thu, 11 Sep 2008 13:23:52 -0500 Subject: [Expat-discuss] Windows/NTFS/Change of Permissions Message-ID: <81D939E842F94ABFAC6DCA31B81FF4AF@MikehPC> I have an odd one and I am not sure this is related to expat but I wanted to see if anyone else has run into this. We're running on a Windows platform with NTFS. We're opening files via expat as xml. After saving an xml file and then reopening the file(it is at the point of reopening) we see a change of user permissions on the file. The user permission is now locked exclusively for that user on the Windows platform and cannot be accessed by any other users on the PC. If we save in an alternative file format(not xml and not using expat to open the file) we do not see a change in user permissions. Anyone seen this one before? At this point I cannot point to the problem being expat but it is suspicious that it doesn't happen when the same data is stored using another file format. Also note that only one of our users is seeing this behavior-otherwise it is fine for all users. From webmaster at hartwork.org Thu Sep 11 21:25:25 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Thu, 11 Sep 2008 21:25:25 +0200 Subject: [Expat-discuss] Windows/NTFS/Change of Permissions In-Reply-To: <81D939E842F94ABFAC6DCA31B81FF4AF@MikehPC> References: <81D939E842F94ABFAC6DCA31B81FF4AF@MikehPC> Message-ID: <48C970A5.7040908@hartwork.org> As far as I can see Expat does not work with files, only data memory. I don't see how Expat could be part of the problem then. Sebastian From webmaster at hartwork.org Thu Sep 11 21:33:47 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Thu, 11 Sep 2008 21:33:47 +0200 Subject: [Expat-discuss] Handling malicious XML with Expat - what options do I have? Message-ID: <48C9729B.10806@hartwork.org> What can I do to make an application using Expat resilient to malicious XML? Explosion of neither time nor space are acceptable in my case. Has anyone built a working solution before? I'd be happy to hear about your experience. Thanks in advance, Sebastian From nickmacd at gmail.com Fri Sep 12 13:48:04 2008 From: nickmacd at gmail.com (Nick MacDonald) Date: Fri, 12 Sep 2008 07:48:04 -0400 Subject: [Expat-discuss] Handling malicious XML with Expat - what options do I have? In-Reply-To: <48C9729B.10806@hartwork.org> References: <48C9729B.10806@hartwork.org> Message-ID: Sebastian: DoS prevention is virtually impossible to do perfectly... you'll end up spending all your time on the effort and never get meaningful work done... so you have to find the reasonable trade off's that make things fairly secure but still leave your system usable. In this vain, sanitizing your input is probably the best bang for your buck... backed up by some "over limit" detection in your code. I haven't spent any time on this particular topic, and although I have heard of the "million laughs" attack, I am not well versed in XML attacks... but the obvious answer to me is to employ some sort of XML sanity checking logic as a pre-parsing step. You will never be invulnerable to all attacks, from the simple fact that a lot of attacks are difficult to think of in advance, but if your problem domain is simple enough (or can be made to be simple enough by applying some limiting assumptions) then you should be able to build a pre-parser that will fit the bill. If, for example, you were worried about a file taking too much memory, or too much time to process, just build in some simple logic into the parse that kicks out an error if too much memory gets used, or too much time elapses. (You'd want to build in a manual override in case of exceptional cases though.) The biggest problem is probably some of the features that can be legally used in an XML file leading to unintended problems, but I think you can turn off the utilization of those kind of features in eXpat and then you'd hopefully be able to detect them with your own code, and then flag the suspect input. The real question you have to consider, is where is the threat going to be coming from? An insider (inside job), from the outside world (the Internet?) or from bad output from another program/step (that might be more vulnerable that your part.) The amount of effort going into countermeasures, and the amount of human involvement when something goes wrong, is dictated by who's causing the problem. Nick On Thu, Sep 11, 2008 at 3:33 PM, Sebastian Pipping wrote: > What can I do to make an application using Expat > resilient to malicious XML? Explosion of neither > time nor space are acceptable in my case. > > Has anyone built a working solution before? > I'd be happy to hear about your experience. From karl at waclawek.net Fri Sep 12 15:09:54 2008 From: karl at waclawek.net (Karl Waclawek) Date: Fri, 12 Sep 2008 09:09:54 -0400 Subject: [Expat-discuss] Handling malicious XML with Expat - what options do I have? In-Reply-To: References: <48C9729B.10806@hartwork.org> Message-ID: <48CA6A22.7030706@waclawek.net> Nick MacDonald wrote: > Sebastian: > > DoS prevention is virtually impossible to do perfectly... you'll end > up spending all your time on the effort and never get meaningful work > done... so you have to find the reasonable trade off's that make > things fairly secure but still leave your system usable. In this > vain, sanitizing your input is probably the best bang for your buck... > backed up by some "over limit" detection in your code. > > I haven't spent any time on this particular topic, and although I have > heard of the "million laughs" attack, I am not well versed in XML > attacks... but the obvious answer to me is to employ some sort of XML > sanity checking logic as a pre-parsing step. As far as I know the standard approach to this attack is to turn of DTD processing. I don't remember anymore if this can be done properly with Expat, but I would look at these APIs: - XML_SetParamEntityParsing - XML_SetDefaultHandler - XML_SetExternalEntityRefHandler The proper approach would be to extend Expat's memory allocation functions to accept an extra callback parameter (application supplied), so that memory usage can be tracked. We did talk about that, but never got around to it, also because it would break Expat's API. Karl From webmaster at hartwork.org Fri Sep 12 22:29:33 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Fri, 12 Sep 2008 22:29:33 +0200 Subject: [Expat-discuss] A way to handle malicious XML with Expat / was Re: Handling malicious XML with Expat - what options do I have? In-Reply-To: <48CA6A22.7030706@waclawek.net> References: <48C9729B.10806@hartwork.org> <48CA6A22.7030706@waclawek.net> Message-ID: <48CAD12D.3020804@hartwork.org> I've been playing around with the Expat API and feeding a parser instance with "a billion laughs" [1]. The approach I am taking is counting entity value length manually inside of a custom XML_EntityDeclHandler. Demo code is attached, here is an excerpt of its output: BEGIN handleEntityDeclaration laugh0 := "ha" Length is 2 END BEGIN handleEntityDeclaration laugh1 := "&laugh0;&laugh0;" Length is 4 END .. BEGIN handleEntityDeclaration laugh16 := "&laugh15;&laugh15;" Length is 131072 END Content consided malicious XML, aborting As Python also exposes Expat's XML_EntityDeclHandler function I expect this approach to work for Python as well. Comments welcome. Sebastian [1] http://www.cogsci.ed.ac.uk/~richard/billion-laughs.xml -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: demo_1_0.cpp URL: From webmaster at hartwork.org Sat Sep 13 05:08:23 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Sat, 13 Sep 2008 05:08:23 +0200 Subject: [Expat-discuss] A way to handle malicious XML with Expat / was Re: Handling malicious XML with Expat - what options do I have? In-Reply-To: <48CAD12D.3020804@hartwork.org> References: <48C9729B.10806@hartwork.org> <48CA6A22.7030706@waclawek.net> <48CAD12D.3020804@hartwork.org> Message-ID: <48CB2EA7.1090104@hartwork.org> Talking to a friend of mine gave some new ideas. Monitoring the final size of an entity alone is not enough: - The entity could evaluate to zero length and still take "forever" to compute (== billion laughs with "" instead of "ha") - Part of the content multiplication could be moved to the document's body and therefore be hidden from monitoring So I have added two more checks: - A limit on the total lookups to form the value of an entity - A limit on the ratio between input and output The three constants in the code to play with are MAX_BYTES_PER_ENTITY_VALUE = 100000 MAX_LOOKUPS_PER_ENTITY_VALUE = 30 MAX_INPUT_FACTOR = 20 I'd be interested to know if these values still work for people working with very large documents. Sebastian -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: demo_2_0.cpp URL: From webmaster at hartwork.org Tue Sep 16 04:45:25 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Tue, 16 Sep 2008 04:45:25 +0200 Subject: [Expat-discuss] A way to handle malicious XML with Expat / was Re: Handling malicious XML with Expat - what options do I have? In-Reply-To: <48CB2EA7.1090104@hartwork.org> References: <48C9729B.10806@hartwork.org> <48CA6A22.7030706@waclawek.net> <48CAD12D.3020804@hartwork.org> <48CB2EA7.1090104@hartwork.org> Message-ID: <48CF1DC5.1000600@hartwork.org> Input/output ratio limits were a bad idea: To apply it properly one would have to process the whole file first... So here is v3: - input/output ratio limit removed - entity lookup depth limit added - mem leaks fixed I now understand why finding limit defaults is so hard if even possible. Sebastian -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: demo_3_0.cpp URL: From webmaster at hartwork.org Thu Sep 18 04:10:31 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Thu, 18 Sep 2008 04:10:31 +0200 Subject: [Expat-discuss] C++ wrapper for expat (expatmm) In-Reply-To: <480DFA5C.3090800@intellitree.com> References: <480DFA5C.3090800@intellitree.com> Message-ID: <48D1B897.1070300@hartwork.org> Coleman Kane wrote: > We recently published a C++ wrapper for Expat, which we've been using in > some of our systems and wanted to provide for the world. > > I tried getting in touch with the expatpp maintainer > (http://sourceforge.net/projects/expatpp), and got a response but he > hasn't moved anymore since then. Anyhow, his project is linked on your > mainpage, and considering the "vaporware" status of it, I want to > provide an alternative in our implementation. > > The distribution link is: http://devzone.intellitree.com/projects/expatmm > > It's an autotooled project, implements pkg-config, etc.... I just stumbled upon your mail again. Your implementation seems of more use to me than expatpp, which I cannot even find source code for... So I feel like replacing the link on the Expat website would be a good solution. I have committed the changes already (revision 1.63) but they are not online yet as the sf.net shell server doesn't respond at the moment. I hope to finish this tomorrow. Sebastian From webmaster at hartwork.org Thu Sep 18 18:20:25 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Thu, 18 Sep 2008 18:20:25 +0200 Subject: [Expat-discuss] C++ wrapper for expat (expatmm) In-Reply-To: <48D1B897.1070300@hartwork.org> References: <480DFA5C.3090800@intellitree.com> <48D1B897.1070300@hartwork.org> Message-ID: <48D27FC9.8020008@hartwork.org> Sebastian Pipping wrote: > I have committed the changes already (revision 1.63) but they are > not online yet as the sf.net shell server doesn't respond at the moment. > I hope to finish this tomorrow. Seems like I don't have permissions either. Karl, could you sync on libexpat.org for me? Sebastian From ckane at intellitree.com Thu Sep 18 14:42:16 2008 From: ckane at intellitree.com (Coleman Kane) Date: Thu, 18 Sep 2008 08:42:16 -0400 Subject: [Expat-discuss] C++ wrapper for expat (expatmm) In-Reply-To: <48D1B897.1070300@hartwork.org> References: <480DFA5C.3090800@intellitree.com> <48D1B897.1070300@hartwork.org> Message-ID: <48D24CA8.8020708@intellitree.com> Sebastian Pipping wrote: > Coleman Kane wrote: > >> We recently published a C++ wrapper for Expat, which we've been using in >> some of our systems and wanted to provide for the world. >> >> I tried getting in touch with the expatpp maintainer >> (http://sourceforge.net/projects/expatpp), and got a response but he >> hasn't moved anymore since then. Anyhow, his project is linked on your >> mainpage, and considering the "vaporware" status of it, I want to >> provide an alternative in our implementation. >> >> The distribution link is: http://devzone.intellitree.com/projects/expatmm >> >> It's an autotooled project, implements pkg-config, etc.... >> > > I just stumbled upon your mail again. Your implementation seems > of more use to me than expatpp, which I cannot even find source > code for... So I feel like replacing the link on the Expat website > would be a good solution. > > I have committed the changes already (revision 1.63) but they are > not online yet as the sf.net shell server doesn't respond at the moment. > I hope to finish this tomorrow. > > > > Sebastian Sounds great! -- Coleman Kane IntelliTree Solutions llc From fdrake at acm.org Fri Sep 19 22:20:58 2008 From: fdrake at acm.org (Fred Drake) Date: Fri, 19 Sep 2008 16:20:58 -0400 Subject: [Expat-discuss] Performance tuning of character data callbacks.... In-Reply-To: <30c6373b0808030928p5a60678aj20640f48be21955f@mail.gmail.com> References: <30c6373b0807311852u5e07af49ga1f931d3c194225b@mail.gmail.com> <4895C5F7.70001@hartwork.org> <30c6373b0808030928p5a60678aj20640f48be21955f@mail.gmail.com> Message-ID: On Aug 3, 2008, at 12:28 PM, Kevin Burton wrote: > No.... I'm saying I'll just recompile on my machine if this yields a > performance boost. > Also, couldn't it just be a tunable config param? It's fairly straightforward to add this on top of Expat; that's what's done in Python's pyexpat module (xml.parsers.expat in the Python standard library). Perhaps a little tedious in C, but then, it's C. :-) -Fred -- Fred Drake From mpeters at mapsoft.com Sun Sep 21 13:52:57 2008 From: mpeters at mapsoft.com (Michael Peters) Date: Sun, 21 Sep 2008 12:52:57 +0100 Subject: [Expat-discuss] Looking for mac dylib Message-ID: <4D523113-FFF6-49B9-94D5-E219BFC1FC89@mapsoft.com> I am looking for the libexpat-1.5.0.dylib for the PowerMac. Does anyone have a copy of this library or the source code to build this. I have tried 1.5.2 which I think is the current release and we can't get our application to run. Regards Michael Peters Mapsoft From webmaster at hartwork.org Sun Sep 21 18:27:44 2008 From: webmaster at hartwork.org (Sebastian Pipping) Date: Sun, 21 Sep 2008 18:27:44 +0200 Subject: [Expat-discuss] Looking for mac dylib In-Reply-To: <4D523113-FFF6-49B9-94D5-E219BFC1FC89@mapsoft.com> References: <4D523113-FFF6-49B9-94D5-E219BFC1FC89@mapsoft.com> Message-ID: <48D67600.4090509@hartwork.org> Michael Peters wrote: > I am looking for the libexpat-1.5.0.dylib for the PowerMac. Does anyone > have a copy of this library or the source code to build this. I have > tried 1.5.2 which I think is the current release and we can't get our > application to run. In the current this line VSNFLAG = -version-info @LIBCURRENT@:@LIBREVISION@:@LIBAGE@ using variables from LIBCURRENT=6 LIBREVISION=2 LIBAGE=5 makes the final soname (1.5.2 in your case). So if this is important to you, digging in these two files until you find a match should give the answer to your questions. Sebastian From fdrake at acm.org Sun Sep 21 19:50:21 2008 From: fdrake at acm.org (Fred Drake) Date: Sun, 21 Sep 2008 13:50:21 -0400 Subject: [Expat-discuss] Looking for mac dylib In-Reply-To: <48D67600.4090509@hartwork.org> References: <4D523113-FFF6-49B9-94D5-E219BFC1FC89@mapsoft.com> <48D67600.4090509@hartwork.org> Message-ID: <679C3ABA-AC81-442B-B578-19F654AD137D@acm.org> On Sep 21, 2008, at 12:27 PM, Sebastian Pipping wrote: > using variables from > > LIBCURRENT=6 > LIBREVISION=2 > LIBAGE=5 This wasn't always constructed this way, but the CVS repository should be able to tell when we moved to this construction. -Fred -- Fred Drake