From jeremy.kloth at gmail.com  Tue Jun  1 20:41:31 2010
From: jeremy.kloth at gmail.com (Jeremy Kloth)
Date: Tue, 1 Jun 2010 12:41:31 -0600
Subject: [Expat-discuss] Expat on 64 bit Linux
Message-ID: <201006011241.32047.jeremy.kloth@gmail.com>

On Wednesday, March 24, 2010 10:09:29 am you wrote:
> On Fri, Feb 5, 2010 at 10:28 AM, Jeremy Kloth <jeremy.kloth at gmail.com> 
wrote:
> > It was done to allow Expat output to be mapped directly to Python's
> > unicode objects (which can be either UCS-2 or UCS-4).
> > 
> > If desired, I can produce the patches required to add that support to the
> > Expat mainline.
> 
> Hey Jeremy!
> 
> Would the output type be controlled at compile time or at run time?
> 
> This definitely is interesting to me.  Do you also have a patched
> pyexpat that consumes the new output, or are you using a new Python
> extension to use this?

Sorry for the delay, its been "in the queue".

The output type is matched compile time to Python's Py_UNICODE type. The 
extension itself is 4Suite's cDomlette extension, which layers an Infoset 
(DOM-like) on top of the raw Expat callbacks. It has been in production for a 
very long time without issue, so I would call it quite stable.

-- 
Jeremy Kloth

From vijay.t.nikam at gmail.com  Thu Jun  3 14:51:34 2010
From: vijay.t.nikam at gmail.com (Vijay Nikam)
Date: Thu, 3 Jun 2010 18:21:34 +0530
Subject: [Expat-discuss] XML Parse into Structure
Message-ID: <AANLkTinAEKrxQZVW3zYwGbUT3iKQTj4xflQd4XSXmk7T@mail.gmail.com>

Dear All,

I am new to the XML parsing and list.
Just couple of days before I started to work on Expat parser.
I am parsing the XML files in C (libexpat.so - Linux) and I tried to
parse the XML file using expat parser
and was successful. So it was great start. thanks to the information
provided on the following link:
    http://www.xml.com/pub/a/1999/09/expat/index.html#useparser

The output of the XML parsed file is dumped on the console. Based on
this I have two queries:
1. Is it possible to dumped the parsed file into the text file? If yes
then please let me know (any pointers/ideas), thanks.
2. Is it possible to create structure with expat parser from XML file?
    - Is there any funtion available to achieve this in C, like, there
is function available for PHP XML Expat Parser
(xml_parse_into_struct)?

Please let me know any pointers regarding above two mentioned queries?
Any provided information will be imporatant and helpful.
So Kindly please acknowledge, thank you.

Kind Regards,
Vijay Nikam

From marco.maggi-ipsu at poste.it  Fri Jun  4 16:35:18 2010
From: marco.maggi-ipsu at poste.it (Marco Maggi)
Date: Fri, 04 Jun 2010 16:35:18 +0200
Subject: [Expat-discuss] XML Parse into Structure
In-Reply-To: marco@localhost
	(Vijay Nikam's message of "Thu, 3 Jun 2010 18:21:34 +0530")
References: <AANLkTinAEKrxQZVW3zYwGbUT3iKQTj4xflQd4XSXmk7T@mail.gmail.com>
Message-ID: <87ocfqzx09.fsf@rapitore.luna>

"Vijay Nikam" wrote:
> Please  let  me  know  any pointers  regarding  above  two
> mentioned queries?

You  write  to  be  on  Linux,  so  you  can  download  this
documentation file:

<http://github.com/marcomaggi/nimby-doc/blob/master/src/expat.texi>

in Texinfo  format and compile it  to HTML or  Info with the
"makeinfo" program; it should have instructions on how to do
what you want.

HTH
-- 
Marco Maggi

From nickmacd at gmail.com  Fri Jun  4 20:13:32 2010
From: nickmacd at gmail.com (Nick MacDonald)
Date: Fri, 4 Jun 2010 14:13:32 -0400
Subject: [Expat-discuss] XML Parse into Structure
In-Reply-To: <AANLkTinAEKrxQZVW3zYwGbUT3iKQTj4xflQd4XSXmk7T@mail.gmail.com>
References: <AANLkTinAEKrxQZVW3zYwGbUT3iKQTj4xflQd4XSXmk7T@mail.gmail.com>
Message-ID: <AANLkTilIYf5xDQwM4NfDz3rIACa0ISmn7izmyBr70iGD@mail.gmail.com>

Vijay:

If you want an "in memory" representation of your XML file, you
probably don't want or need to use a SAX based parser like eXpat which
is event based.  You'd probably rather find a DOM based parser which
is expressly designed to build a Document Object Model (the DOM in the
name) in memory.  You could of course layer a DOM module on top of
eXpat, but I suspect that's a fair amount of work that has already
been done many times before, if you do some searching on Source Force
and other open source repositories I'm sure you'll find a lot of
examples.  The whole idea to use SAX is to be able to process a
document with minimal memory overhead... and thus to be able to handle
exceptionally large documents that would use too much memory if they
were loaded into memory all at once.  (And SAX would be faster if you
were just searching quickly inside a document... no overhead loading
into memory that parts you'd never use.)

Good luck,
  Nick

On Thu, Jun 3, 2010 at 8:51 AM, Vijay Nikam <vijay.t.nikam at gmail.com> wrote:
> I am new to the XML parsing and list.
> Just couple of days before I started to work on Expat parser.
> I am parsing the XML files in C (libexpat.so - Linux) and I tried to
> parse the XML file using expat parser
> and was successful. So it was great start. thanks to the information
> provided on the following link:
> ? ?http://www.xml.com/pub/a/1999/09/expat/index.html#useparser
>
> The output of the XML parsed file is dumped on the console. Based on
> this I have two queries:
> 1. Is it possible to dumped the parsed file into the text file? If yes
> then please let me know (any pointers/ideas), thanks.
> 2. Is it possible to create structure with expat parser from XML file?
> ? ?- Is there any funtion available to achieve this in C, like, there
> is function available for PHP XML Expat Parser
> (xml_parse_into_struct)?
>
> Please let me know any pointers regarding above two mentioned queries?
> Any provided information will be imporatant and helpful.
> So Kindly please acknowledge, thank you.

-- 
Nick MacDonald
NickMacD at gmail.com

From aleix at member.fsf.org  Fri Jun  4 22:19:31 2010
From: aleix at member.fsf.org (=?UTF-8?Q?Aleix_Conchillo_Flaqu=C3=A9?=)
Date: Fri, 4 Jun 2010 22:19:31 +0200
Subject: [Expat-discuss] XML Parse into Structure
In-Reply-To: <AANLkTilIYf5xDQwM4NfDz3rIACa0ISmn7izmyBr70iGD@mail.gmail.com>
References: <AANLkTinAEKrxQZVW3zYwGbUT3iKQTj4xflQd4XSXmk7T@mail.gmail.com>
	<AANLkTilIYf5xDQwM4NfDz3rIACa0ISmn7izmyBr70iGD@mail.gmail.com>
Message-ID: <AANLkTilQC4hN_BFhv58c0if8Y4Tn2DYG0jH-eYMrzdqB@mail.gmail.com>

You can use SCEW (Simple C Expat Wrapper). I think it does what you need,
and it also allows you to create in-memory XML trees and dump them to
files/memory/...

http://www.nongnu.org/scew/

On Fri, Jun 4, 2010 at 20:13, Nick MacDonald <nickmacd at gmail.com> wrote:

> Vijay:
>
> If you want an "in memory" representation of your XML file, you
> probably don't want or need to use a SAX based parser like eXpat which
> is event based.  You'd probably rather find a DOM based parser which
> is expressly designed to build a Document Object Model (the DOM in the
> name) in memory.  You could of course layer a DOM module on top of
> eXpat, but I suspect that's a fair amount of work that has already
> been done many times before, if you do some searching on Source Force
> and other open source repositories I'm sure you'll find a lot of
> examples.  The whole idea to use SAX is to be able to process a
> document with minimal memory overhead... and thus to be able to handle
> exceptionally large documents that would use too much memory if they
> were loaded into memory all at once.  (And SAX would be faster if you
> were just searching quickly inside a document... no overhead loading
> into memory that parts you'd never use.)
>
> Good luck,
>  Nick
>
>

From erg at research.att.com  Wed Jun  9 21:14:31 2010
From: erg at research.att.com (Emden R. Gansner)
Date: Wed, 09 Jun 2010 15:14:31 -0400
Subject: [Expat-discuss] libexpat & URIs
Message-ID: <4C0FE817.9020403@research.att.com>

Is there a setting or some technique to get libexpat to accept the 
ampersand character within an attribute value? For example, I would like it
to parse

   <A HREF="http://abc.com/wiki/index.php?title=BrokenLink&action=edit">

Thanks.

    Emden


From karl at waclawek.net  Wed Jun  9 22:19:50 2010
From: karl at waclawek.net (Karl Waclawek)
Date: Wed, 09 Jun 2010 16:19:50 -0400
Subject: [Expat-discuss] libexpat & URIs
In-Reply-To: <4C0FE817.9020403@research.att.com>
References: <4C0FE817.9020403@research.att.com>
Message-ID: <4C0FF766.2070601@waclawek.net>

On 09/06/2010 3:14 PM, Emden R. Gansner wrote:
> Is there a setting or some technique to get libexpat to accept the
> ampersand character within an attribute value? For example, I would like it
> to parse
> 
>   <A HREF="http://abc.com/wiki/index.php?title=BrokenLink&action=edit">
> 
> Thanks.

That is not well-formed XML, which means, it is not XML.
XHTML does not allow this either, as it is XML.
See http://www.w3.org/TR/xhtml1/#C_12

Karl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: karl.vcf
Type: text/x-vcard
Size: 170 bytes
Desc: not available
URL: <http://mail.libexpat.org/pipermail/expat-discuss/attachments/20100609/39da02bc/attachment.vcf>

From jseyster at cs.stonybrook.edu  Thu Jun 17 23:12:28 2010
From: jseyster at cs.stonybrook.edu (Justin Seyster)
Date: Thu, 17 Jun 2010 17:12:28 -0400
Subject: [Expat-discuss] File not found error in the parser
Message-ID: <1276809148.32549.19.camel@crossroads>

I'm running into a really weird issue parsing a document with Expat: I
keep getting file not found errors on files that exist.

The weirdest part is that if I parse files from one particular
directory, it works without a problem.  Any other directory, however,
and I get an IOError exception.  (All the directories and files I've
tried have standard Unix permissions, and I am their owner.)  My code
looks like this:

    # I verified that this does indeed return an Expat parser.
    parser = make_parser()
    parser.setFeature(feature_namespaces, 0)

    dh = BlankHandler()
    parser.setContentHandler(dh)

    try:
        xmlhandle = open(filename, 'r')
        # Attempting a read here succeeds
        parser.parse(xmlhandle) # This line throws the IOError
        xmlhandle.close()
    except IOError as e:
        print e.strerror
        sys.exit(1)

Running this on files in most directories gives me the error:
"No such file or directory"

I know the file exists, however, because attempts to read it after
opening it but before parsing it succeed.  I also know that my code is
at least semi-valid because it correctly parses files placed in one
particular directory.

Has anybody heard of this kind of problem before?  Thanks.
        --Justin


From fdrake at acm.org  Tue Jun 22 14:50:46 2010
From: fdrake at acm.org (Fred Drake)
Date: Tue, 22 Jun 2010 08:50:46 -0400
Subject: [Expat-discuss] File not found error in the parser
In-Reply-To: <1276809148.32549.19.camel@crossroads>
References: <1276809148.32549.19.camel@crossroads>
Message-ID: <AANLkTikSANQ0gWMRUVjxU091oj_0UOnLuj4h-JMBw73S@mail.gmail.com>

On Thu, Jun 17, 2010 at 5:12 PM, Justin Seyster
<jseyster at cs.stonybrook.edu> wrote:
> The weirdest part is that if I parse files from one particular
> directory, it works without a problem.

This is strange.  Is the one directory that works for you happen to be
the current directory?

What version of Python are you using, and on what platform?


  -Fred

-- 
Fred L. Drake, Jr.    <fdrake at gmail.com>
"A storm broke loose in my mind."  --Albert Einstein

From fdrake at acm.org  Wed Jun 30 00:43:03 2010
From: fdrake at acm.org (Fred Drake)
Date: Tue, 29 Jun 2010 18:43:03 -0400
Subject: [Expat-discuss] File not found error in the parser
In-Reply-To: <1277850473.5738.9.camel@crossroads>
References: <1276809148.32549.19.camel@crossroads>
	<AANLkTikSANQ0gWMRUVjxU091oj_0UOnLuj4h-JMBw73S@mail.gmail.com> 
	<1277850473.5738.9.camel@crossroads>
Message-ID: <AANLkTimhIei5BONtWDeeIN4LU8bj4Jltvccy_lK5JeoQ@mail.gmail.com>

On Tue, Jun 29, 2010 at 6:27 PM, Justin Seyster
<jseyster at cs.stonybrook.edu> wrote:
> The one directory that works is not the current directory. ?In fact, it
> seems that the magic directory stays the same regardless of what the
> current directory is (and whether I use an absolute or relative path).

Can you reproduce this with a short script that just does the XML parsing?

If so, please post that.

> I'm using the current Python from Ubuntu Karmic, which is 2.6.4.
>
> (Somebody kindly let me know that I sent this problem to the wrong list.
> Sorry about that, and let me know if I should take this discussion of
> the list.)

I'm not too worried about that; I'm unlikely to see this elsewhere.


  -Fred

-- 
Fred L. Drake, Jr.    <fdrake at gmail.com>
"A storm broke loose in my mind."  --Albert Einstein

From jseyster at cs.stonybrook.edu  Wed Jun 30 00:27:53 2010
From: jseyster at cs.stonybrook.edu (Justin Seyster)
Date: Tue, 29 Jun 2010 18:27:53 -0400
Subject: [Expat-discuss] File not found error in the parser
In-Reply-To: <AANLkTikSANQ0gWMRUVjxU091oj_0UOnLuj4h-JMBw73S@mail.gmail.com>
References: <1276809148.32549.19.camel@crossroads>
	<AANLkTikSANQ0gWMRUVjxU091oj_0UOnLuj4h-JMBw73S@mail.gmail.com>
Message-ID: <1277850473.5738.9.camel@crossroads>

The one directory that works is not the current directory.  In fact, it
seems that the magic directory stays the same regardless of what the
current directory is (and whether I use an absolute or relative path).

I'm using the current Python from Ubuntu Karmic, which is 2.6.4.

(Somebody kindly let me know that I sent this problem to the wrong list.
Sorry about that, and let me know if I should take this discussion of
the list.)
        --Justin

On Tue, 2010-06-22 at 08:50 -0400, Fred Drake wrote:
> On Thu, Jun 17, 2010 at 5:12 PM, Justin Seyster
> <jseyster at cs.stonybrook.edu> wrote:
> > The weirdest part is that if I parse files from one particular
> > directory, it works without a problem.
> 
> This is strange.  Is the one directory that works for you happen to be
> the current directory?
> 
> What version of Python are you using, and on what platform?
> 
> 
>   -Fred
> 


From jseyster at cs.stonybrook.edu  Wed Jun 30 01:16:14 2010
From: jseyster at cs.stonybrook.edu (Justin Seyster)
Date: Tue, 29 Jun 2010 19:16:14 -0400
Subject: [Expat-discuss] File not found error in the parser
In-Reply-To: <AANLkTimhIei5BONtWDeeIN4LU8bj4Jltvccy_lK5JeoQ@mail.gmail.com>
References: <1276809148.32549.19.camel@crossroads>
	<AANLkTikSANQ0gWMRUVjxU091oj_0UOnLuj4h-JMBw73S@mail.gmail.com>
	<1277850473.5738.9.camel@crossroads>
	<AANLkTimhIei5BONtWDeeIN4LU8bj4Jltvccy_lK5JeoQ@mail.gmail.com>
Message-ID: <1277853374.5738.19.camel@crossroads>

Hmm, I wrote a script that just opens and parses an XML file, and it
gives me the same problem.  It looks like it's definitely something
wrong either with my machine's configuration or the particular version
of Python (and its Expat wappers) that I'm using.

I put the script below.
        --Justin

#!/usr/bin/env python

import sys

from xml.sax import handler
from xml.sax import make_parser
from xml.sax.handler import feature_namespaces

class BlankHandler(handler.ContentHandler):
    def __init__(self):
        pass

    def startElement(self, name, attrs):
        print "Start: ", name
        pass

    def endElement(self, name):
        print "End: ", name
        pass

if __name__ == '__main__':
    if len(sys.argv) != 2:
        sys.exit(1)
    xmlfile = sys.argv[1]

    parser = make_parser()
    parser.setFeature(feature_namespaces, 0)

    dh = BlankHandler()
    parser.setContentHandler(dh)

    try:
        xmlhandle = open(xmlfile, 'r')
        # Uncommenting the line below shows that xmlhandle can be read
        # successfully.
        #print xmlhandle.readline()
        parser.parse(xmlhandle)
        xmlhandle.close()
    except IOError as e:
        print "IOError: ",
        print e.strerror
        sys.exit(1)


On Tue, 2010-06-29 at 18:43 -0400, Fred Drake wrote:
> On Tue, Jun 29, 2010 at 6:27 PM, Justin Seyster
> <jseyster at cs.stonybrook.edu> wrote:
> > The one directory that works is not the current directory.  In fact, it
> > seems that the magic directory stays the same regardless of what the
> > current directory is (and whether I use an absolute or relative path).
> 
> Can you reproduce this with a short script that just does the XML parsing?
> 
> If so, please post that.
> 
> > I'm using the current Python from Ubuntu Karmic, which is 2.6.4.
> >
> > (Somebody kindly let me know that I sent this problem to the wrong list.
> > Sorry about that, and let me know if I should take this discussion of
> > the list.)
> 
> I'm not too worried about that; I'm unlikely to see this elsewhere.
> 
> 
>   -Fred
>