From tarung at adobe.com  Mon Aug  9 10:33:33 2010
From: tarung at adobe.com (Tarun Garg)
Date: Mon, 9 Aug 2010 14:03:33 +0530
Subject: [Expat-discuss] Ignoring whitespaces while parsing XML with expat
Message-ID: <AAC785F15EE02F4894D0CF11020F019BC617268472@INDIAMBX01.corp.adobe.com>

Is there a way to ignore unneeded whitespaces (like those that get introduced while pretty-printing XML), while parsing the XML using expat parser ?

Getting those whitespaces as it is; while parsing/opening does not look good when opened. I want to get rid of these at open time itself.

Regards,
Tarun Garg


From nickmacd at gmail.com  Wed Aug 11 04:51:46 2010
From: nickmacd at gmail.com (Nick MacDonald)
Date: Tue, 10 Aug 2010 22:51:46 -0400
Subject: [Expat-discuss] Ignoring whitespaces while parsing XML with
	expat
In-Reply-To: <AAC785F15EE02F4894D0CF11020F019BC617268472@INDIAMBX01.corp.adobe.com>
References: <Acs3nY2BM+phqlqDQiKCm5P6Gpa1eg==>
	<AAC785F15EE02F4894D0CF11020F019BC617268472@INDIAMBX01.corp.adobe.com>
Message-ID: <AANLkTi=Ja0Zoi8ODc-91FhK95h4DWDA4h9n=yp7qMEqS@mail.gmail.com>

Tarun:
> Is there a way to ignore unneeded whitespaces (like those that get introduced while pretty-printing XML), while parsing the XML using expat parser ?

http://www.w3.org/TR/REC-xml/#sec-white-space

[quote]
In editing XML documents, it is often convenient to use "white space"
(spaces, tabs, and blank lines) to set apart the markup for greater
readability. Such white space is typically not intended for inclusion
in the delivered version of the document. On the other hand,
"significant" white space that should be preserved in the delivered
version is common, for example in poetry and source code.

An XML processor MUST always pass all characters in a document that
are not markup through to the application. A validating XML processor
MUST also inform the application which of these characters constitute
white space appearing in element content.
[end quote]

While it might be handy to have eXpat have a flag/mode that could
remove a lot of the white space that might appear optional to you,
this would be counter to the spec (as written above.)  So, you, being
the author of the "application" that the document mentions, must deal
with the white space on your own.  This shouldn't actually be too
hard...  but there are probably a good set of test cases you'd need to
run to make sure the results you get are what you really want.

Good luck,
  Nick

From genkuro at gmail.com  Wed Aug 18 20:23:31 2010
From: genkuro at gmail.com (Brian)
Date: Wed, 18 Aug 2010 18:23:31 +0000 (UTC)
Subject: [Expat-discuss] not well-formed (invalid token)
Message-ID: <loom.20100818T201517-498@post.gmane.org>

Hey there -

I'm using expat with python 2.6.  It's all layered with xmlrpc.  The actual xml 
doc is short lived and hidden to me.  But I can catch "not well-formed (invalid 
token)" errors, the line number, and the offset.  Unfortunately, the latter two 
are not terribly useful.

Is there a way to get the actual offending token?

Thanks,
Brian 


From nickmacd at gmail.com  Wed Aug 18 23:56:18 2010
From: nickmacd at gmail.com (Nick MacDonald)
Date: Wed, 18 Aug 2010 17:56:18 -0400
Subject: [Expat-discuss] not well-formed (invalid token)
In-Reply-To: <loom.20100818T201517-498@post.gmane.org>
References: <loom.20100818T201517-498@post.gmane.org>
Message-ID: <AANLkTikyvT3bqTdSLWnu+QPtH1xvXi=TJcbEip4hyi-e@mail.gmail.com>

Brian:

Well, I'll bite... what is the point of using eXpat to parse the
document (where the whole point of eXpat is to expose the document to
an application) if the document is not exposed to your application??

I suspect you're dealing with some sort of a middle man here... or
else you should be able to see the document yourself.

In any case, the question becomes one of:  who is "reading" your
document and supplying it to eXpat... that entity is the really the
only one that can make sense of the line number and offset
information...

This is the absolute extent of my knowledge and ability to make an
"intelligent guess"...  This mailing list is generally for support of
the C eXpat codebase... and I am not convinced you'll find many people
on this list who know the ins and outs of the Python wrapper/bindings
for eXpat.

Nick


On Wed, Aug 18, 2010 at 2:23 PM, Brian <genkuro at gmail.com> wrote:
> I'm using expat with python 2.6. ?It's all layered with xmlrpc. ?The actual xml
> doc is short lived and hidden to me. ?But I can catch "not well-formed (invalid
> token)" errors, the line number, and the offset. ?Unfortunately, the latter two
> are not terribly useful.
>
> Is there a way to get the actual offending token?

-- 
Nick MacDonald
NickMacD at gmail.com

From jzhang at ximpleware.com  Fri Aug 20 01:15:08 2010
From: jzhang at ximpleware.com (jimmy Zhang)
Date: Thu, 19 Aug 2010 16:15:08 -0700
Subject: [Expat-discuss] [ANN]VTD-XML 2.9
Message-ID: <6708B59264BE47C89E739B7AD9EFE336@JimmyZhangPC>

VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit  https://sourceforge.net/projects/vtd-xml/files/ to download the latest version.
  a.. Strict Conformance 
    a.. VTD-XML now fully conforms to XML namespace 1.0 spec 
  b.. Performance Improvement
    a.. Significantly improved parsing performance for small XML files 
  c.. Expand Core VTD-XML API  
    a.. Adds getPrefixString(), and toNormalizedString2() 
  d.. Cutting/Splitting 
    a.. Adds getSiblingElementFragment()  
  e.. A number of bug fixes and code enhancement including: 
    a.. Fixes a bug for reading very large XML documents on some platforms 
    b.. Fixes a bug in parsing processing instruction 
    c.. Fixes a bug in outputAndReparse() 

From vertleyb at gmail.com  Thu Aug 26 16:22:06 2010
From: vertleyb at gmail.com (Arkadiy Vertleyb)
Date: Thu, 26 Aug 2010 10:22:06 -0400
Subject: [Expat-discuss] expat/unicode question
Message-ID: <AANLkTi=tpQ989G24yAma4j7G4_fJi2qRRoiu6BFbzKks@mail.gmail.com>

Hi all,

I am confused whith the way unicode and regular XML documents should
be used with expat:

- Is it possible to process wide char (unicode) docs when expat is
compiled in single-byte char mode (typedef char XML_Char)?

- Is it possible to process regular docs when expat is compiled in
wide char mode (typedef wchar_t XML_Char)?

- Why the XML_Parse() function accepts the buffer as const char*
rather than const XML_Char*?  Does this mean yes for the first two
questions?

Thanks in advance for any help.