From tarung at adobe.com Mon Aug 9 10:33:33 2010 From: tarung at adobe.com (Tarun Garg) Date: Mon, 9 Aug 2010 14:03:33 +0530 Subject: [Expat-discuss] Ignoring whitespaces while parsing XML with expat Message-ID: Is there a way to ignore unneeded whitespaces (like those that get introduced while pretty-printing XML), while parsing the XML using expat parser ? Getting those whitespaces as it is; while parsing/opening does not look good when opened. I want to get rid of these at open time itself. Regards, Tarun Garg From nickmacd at gmail.com Wed Aug 11 04:51:46 2010 From: nickmacd at gmail.com (Nick MacDonald) Date: Tue, 10 Aug 2010 22:51:46 -0400 Subject: [Expat-discuss] Ignoring whitespaces while parsing XML with expat In-Reply-To: References: Message-ID: Tarun: > Is there a way to ignore unneeded whitespaces (like those that get introduced while pretty-printing XML), while parsing the XML using expat parser ? http://www.w3.org/TR/REC-xml/#sec-white-space [quote] In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code. An XML processor MUST always pass all characters in a document that are not markup through to the application. A validating XML processor MUST also inform the application which of these characters constitute white space appearing in element content. [end quote] While it might be handy to have eXpat have a flag/mode that could remove a lot of the white space that might appear optional to you, this would be counter to the spec (as written above.) So, you, being the author of the "application" that the document mentions, must deal with the white space on your own. This shouldn't actually be too hard... but there are probably a good set of test cases you'd need to run to make sure the results you get are what you really want. Good luck, Nick From genkuro at gmail.com Wed Aug 18 20:23:31 2010 From: genkuro at gmail.com (Brian) Date: Wed, 18 Aug 2010 18:23:31 +0000 (UTC) Subject: [Expat-discuss] not well-formed (invalid token) Message-ID: Hey there - I'm using expat with python 2.6. It's all layered with xmlrpc. The actual xml doc is short lived and hidden to me. But I can catch "not well-formed (invalid token)" errors, the line number, and the offset. Unfortunately, the latter two are not terribly useful. Is there a way to get the actual offending token? Thanks, Brian From nickmacd at gmail.com Wed Aug 18 23:56:18 2010 From: nickmacd at gmail.com (Nick MacDonald) Date: Wed, 18 Aug 2010 17:56:18 -0400 Subject: [Expat-discuss] not well-formed (invalid token) In-Reply-To: References: Message-ID: Brian: Well, I'll bite... what is the point of using eXpat to parse the document (where the whole point of eXpat is to expose the document to an application) if the document is not exposed to your application?? I suspect you're dealing with some sort of a middle man here... or else you should be able to see the document yourself. In any case, the question becomes one of: who is "reading" your document and supplying it to eXpat... that entity is the really the only one that can make sense of the line number and offset information... This is the absolute extent of my knowledge and ability to make an "intelligent guess"... This mailing list is generally for support of the C eXpat codebase... and I am not convinced you'll find many people on this list who know the ins and outs of the Python wrapper/bindings for eXpat. Nick On Wed, Aug 18, 2010 at 2:23 PM, Brian wrote: > I'm using expat with python 2.6. ?It's all layered with xmlrpc. ?The actual xml > doc is short lived and hidden to me. ?But I can catch "not well-formed (invalid > token)" errors, the line number, and the offset. ?Unfortunately, the latter two > are not terribly useful. > > Is there a way to get the actual offending token? -- Nick MacDonald NickMacD at gmail.com From jzhang at ximpleware.com Fri Aug 20 01:15:08 2010 From: jzhang at ximpleware.com (jimmy Zhang) Date: Thu, 19 Aug 2010 16:15:08 -0700 Subject: [Expat-discuss] [ANN]VTD-XML 2.9 Message-ID: <6708B59264BE47C89E739B7AD9EFE336@JimmyZhangPC> VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit https://sourceforge.net/projects/vtd-xml/files/ to download the latest version. a.. Strict Conformance a.. VTD-XML now fully conforms to XML namespace 1.0 spec b.. Performance Improvement a.. Significantly improved parsing performance for small XML files c.. Expand Core VTD-XML API a.. Adds getPrefixString(), and toNormalizedString2() d.. Cutting/Splitting a.. Adds getSiblingElementFragment() e.. A number of bug fixes and code enhancement including: a.. Fixes a bug for reading very large XML documents on some platforms b.. Fixes a bug in parsing processing instruction c.. Fixes a bug in outputAndReparse() From vertleyb at gmail.com Thu Aug 26 16:22:06 2010 From: vertleyb at gmail.com (Arkadiy Vertleyb) Date: Thu, 26 Aug 2010 10:22:06 -0400 Subject: [Expat-discuss] expat/unicode question Message-ID: Hi all, I am confused whith the way unicode and regular XML documents should be used with expat: - Is it possible to process wide char (unicode) docs when expat is compiled in single-byte char mode (typedef char XML_Char)? - Is it possible to process regular docs when expat is compiled in wide char mode (typedef wchar_t XML_Char)? - Why the XML_Parse() function accepts the buffer as const char* rather than const XML_Char*? Does this mean yes for the first two questions? Thanks in advance for any help.