From jeremyhu at uclink4.berkeley.edu  Sat Mar  1 02:57:49 2003
From: jeremyhu at uclink4.berkeley.edu (Jeremy Huddleston)
Date: Mon Mar  3 01:34:28 2003
Subject: [Expat-discuss] expat, XFree86, and soname
Message-ID: <3E60922D.8070707@uclink4.berkeley.edu>

I just installed XFree86 4.3, and to my supprise it includes expat 
1.95.2.  I was going to delete it so that I could use the 1.95.6 that I 
have on my system instead, but there is a small problem.  XFree86 calls 
the library libexpat.so.1 while the soname used in the expat tarballs is 
libexpat.so.0.  Clearly this is not a great situation.  I can (and have) 
removed the libexpat* files from /usr/X11R6/lib and run he following in 
/usr/local/lib:

ld -shared -soname=libexpat.so.1 libexpat.so.0 -o libexpat.so.1

This has the effect of allowing files linked against libexpat.so.1 to 
gain access to the libexpat.so.0 "through" it.

Alternatively, you can create a symlink:

ln -s libexpat.so.0 libexpat.so.1

While these work arounds work, it does not solve the problem with the 
soname.  Some of my programs will be looking for libexpat.so.0 and some 
will be looking for libexpat.so.1.

Because of this, will all future libexpat.so.1 and libexpat.so.0 
libraries be binary compatible with eachother, and will future versions 
start using the libexpat.so.1 soname?

Thanks,
Jeremy


From v.cimino at resi.it  Mon Mar  3 09:12:48 2003
From: v.cimino at resi.it (Vincenzo Cimino)
Date: Mon Mar  3 03:16:37 2003
Subject: [Expat-discuss] I'm confused about CDATA section
Message-ID: <006f01c2e15c$b0965c00$10cc09c0@ciminov>


I read in this site

http://www.w3schools.com/xml/xml_cdata.asp

"Only text inside a CDATA section is ignored by the parser."

but also in another points.

The behaviour of Expat Library is therefore different?


Vincenzo Cimino
Project Manager
Divisione Network IP
RESI Informatica S.r.l.
Tel + 39.6.92710/406
Tel + 39.6.92710/222
Fax + 39.6.92710/218
mailto:v.cimino@resi.it
From karl at waclawek.net  Mon Mar  3 09:18:59 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Mon Mar  3 09:19:45 2003
Subject: [Expat-discuss] Re: I'm confused about CDATA section
References: <006f01c2e15c$b0965c00$10cc09c0@ciminov>
Message-ID: <001801c2e18f$d5732020$9e539696@citkwaclaww2k>


> I read in this site
> 
> http://www.w3schools.com/xml/xml_cdata.asp
>
> "Only text inside a CDATA section is ignored by the parser."
>
> but also in another points.
> 
> The behaviour of Expat Library is therefore different?

Not really.
The above is misleading. The text in CDATA section is not ignored,
but it it no treated as XML markup. Check this section of
the specs: http://www.w3.org/TR/REC-xml.html#sec-cdata-sect

Karl

From AFish at GoldenGate.com  Tue Mar  4 11:23:34 2003
From: AFish at GoldenGate.com (AFish@GoldenGate.com)
Date: Wed Mar  5 11:44:06 2003
Subject: [Expat-discuss] DOM parser, expat vs. lbxml?
Message-ID: <8E7C7D71F727654A8C99F1606231B8A4058EB2@exchange.earth.ggsoftware.com>

We need a DOM parser in C that will compile on any platform. So far, the
only C xml parsers I have seen are expat and libxml. The only DOM parser
build on top of expat I have seen is 'SCEW' the simple C expat wrapper
(http://www.nongnu.org/scew/).

Questions:
1. Are there other expat wrappers or examples which provide DOM-like xml
tree traversal?
2. Has anyone done a side-by-side libxml vs. expat comparison? Is there any
reason we should roll our own DOM parser on top of expat instead of using
libxml?

From karl at waclawek.net  Wed Mar  5 11:54:08 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Wed Mar  5 11:54:13 2003
Subject: [Expat-discuss] DOM parser, expat vs. lbxml?
References: <8E7C7D71F727654A8C99F1606231B8A4058EB2@exchange.earth.ggsoftware.com>
Message-ID: <008301c2e337$d6f8ef20$9e539696@citkwaclaww2k>


> We need a DOM parser in C that will compile on any platform. So far, the
> only C xml parsers I have seen are expat and libxml. The only DOM parser
> build on top of expat I have seen is 'SCEW' the simple C expat wrapper
> (http://www.nongnu.org/scew/).

> Questions:
> 1. Are there other expat wrappers or examples which provide DOM-like xml
> tree traversal?

Not that I know of.

> 2. Has anyone done a side-by-side libxml vs. expat comparison? Is there any
> reason we should roll our own DOM parser on top of expat instead of using
> libxml?

Why would you not want to use SCEW instead of rolling your own?

About libxml vs. Expat: I have never compared them, but it seems
that Expat is pretty good in the areas of speed and memory use,
as well as being quite compliant. However, Expat does not validate.

Karl


From AFish at GoldenGate.com  Wed Mar  5 10:40:26 2003
From: AFish at GoldenGate.com (AFish@GoldenGate.com)
Date: Wed Mar  5 13:41:00 2003
Subject: [Expat-discuss] DOM parser, expat vs. lbxml?
Message-ID: <8E7C7D71F727654A8C99F1606231B8A4058EB7@exchange.earth.ggsoftware.com>


>> Why would you not want to use SCEW instead of rolling your own?


This is a good point, I tried SCEW yesterday, and it was pretty good. I had
a little trouble compiling on Win32 but the issues were minimal.
From rolf at pointsman.de  Wed Mar  5 20:26:04 2003
From: rolf at pointsman.de (rolf@pointsman.de)
Date: Wed Mar  5 14:29:31 2003
Subject: [Expat-discuss] DOM parser, expat vs. lbxml?
In-Reply-To: <8E7C7D71F727654A8C99F1606231B8A4058EB2@exchange.earth.ggsoftware.com>
Message-ID: <200303051926.UAA18340@pointsman.pointsman.de>

On  4 Mar, AFish@GoldenGate.com wrote:
> We need a DOM parser in C that will compile on any platform. So far, the
> only C xml parsers I have seen are expat and libxml. The only DOM parser

Don't forget rxp, a good, fast, compliant and optional validating
parser http://www.cogsci.ed.ac.uk/~richard/rxp.html (don't get afraid
about the artless home page, it's a good product). If C++ is also OK
for you, there's of course also xerces-c++ (http://xml.apache.org).

Well, and for completeness sakes, don't forget msxml (the XML parser
out of the evil empire). I'm not a fan of MS for various reasons, but
their XML parser (and there XSLT engine) isn't bad.

> build on top of expat I have seen is 'SCEW' the simple C expat wrapper
> (http://www.nongnu.org/scew/).
> 
> Questions:
> 1. Are there other expat wrappers or examples which provide DOM-like xml
> tree traversal?

Sablotron (http://www.gingerall.com/charlie/ga/xml/p_sab.xml). From
the home page:

" Sablotron is a fast, compact and portable XML toolkit implementing
XSLT 1.0, DOM Level2 and XPath 1.0. [...] Sablotron uses James Clark's
expat XML parser."

You better shouldn't buy in there claim, that they have a "fast" XSLT
processor (this claim is somewhat ridiculous). Though, Sablotron
itself is written in C++.

There are for sure more DOM implementations based on expat around,
than only one. For example, there's an Tcl extension (I'm one of the
maintainers), which implements DOM on top of expat (and also XPath and
XSLT) (http://www.tdom.org). The DOM building parts are completetly in
C, so it may worth a look.

> 2. Has anyone done a side-by-side libxml vs. expat comparison? Is there any
> reason we should roll our own DOM parser on top of expat instead of using
> libxml?

There could be said a lot - your question is a bit vague about your
needs.

Expat does not validate (although it does read, on demand, external
entities). If a well-formdness parser is OK for you, expat is
definitely somewhat faster - but since both parsers are really fast,
this may only be of interest, if you aim for maximum
speed. Additionally, the time, needed to build a DOM like structure in
memory (which typically needs a lot of mallocs for the node
structures) isn't negligible, so the overall speed depends not only on
the raw parser speed, but also on the quality of the DOM building
code.

Another factor, which may be important (depending on the size of your
XML data) is, that DOM trees typically need _a lot_ of memory. This
depends of course on how much markup you have in your document (and
how much 'indentation' fluff you have in your document) but it's
normal, that you need 3 to 5 times the file size of memory for the
DOM tree. Although the libxml DOM trees need notable lesser memory
than every Java DOM implementaion, I know, it isn't the slimmest
implementation, avaliable. For example, the above mentioned tDOM
implentation has a notable lesser overhead (which is important for me,
because I've to handle really large product data lists in XML).

DOM and DOM are not the same. Do you mean DOM 1, 2 or 3? What about
entities? Must you preserve parsed entities? DOM alone will probably
make you somewhat unhappy, in short time. Navigation within the tree
can get tedious, if you don't have support for at least XPath (libxml
provides this). But I better stop now.

rolf


> 
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss@libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss


From miallen at eskimo.com  Wed Mar  5 14:57:12 2003
From: miallen at eskimo.com (Michael B. Allen)
Date: Wed Mar  5 14:56:18 2003
Subject: [Expat-discuss] DOM parser, expat vs. lbxml?
In-Reply-To: <8E7C7D71F727654A8C99F1606231B8A4058EB2@exchange.earth.ggsoftware.com>
References: <8E7C7D71F727654A8C99F1606231B8A4058EB2@exchange.earth.ggsoftware.com>
Message-ID: <20030305145712.5c460818.miallen@eskimo.com>

On Tue, 4 Mar 2003 11:23:34 -0800 
AFish@GoldenGate.com wrote:

> We need a DOM parser in C

DOM is not a parser. Expat is a parser. The DOM is a tree of nodes in
memory. Frequently they do come with some kind of module to load and
store from and to XML however in which case it would *use* an XML
parser.

> that will compile on any platform. So far, the

"any platform"? You really should be a little more spcecific. You might
say POSIX or ANSI platform but "any" just isn't possible.

> only C xml parsers I have seen are expat and libxml. The only DOM parser

There are several XML parsers in C and particularly in C++. Use 'xml
parser c' on google. Xerces C++, IBM's xml4c, and Oracle XML for C are
three that I can think of.

> build on top of expat I have seen is 'SCEW' the simple C expat wrapper
> (http://www.nongnu.org/scew/).

This isn't a real DOM though.

> Questions:
> 1. Are there other expat wrappers or examples which provide DOM-like xml
> tree traversal?

There are lot's of these I suspect. And you could write your own in 20
mintues. Here's one:

  http://www.eskimo.com/~miallen/libmba/dl/docs/ref/domnode.html

> 2. Has anyone done a side-by-side libxml vs. expat comparison? Is there any

I have never really used libxml. I believe it requires glib. I would
be willing to bet expat would be quite a bit faster and much much more
effecient though.

> reason we should roll our own DOM parser on top of expat instead of using
> libxml?

There are a couple of other DOM implementations. If you need a real
DOM rather than a simple DOM-like interface like those mentioned above
there's is DOMC (by yours truely):

  http://www.eskimo.com/~miallen/domc/

Incedentally I just finished testing 0.7 but I have to run through the
portability tests and create the various packages so it will take me
another week or so. I have already compiled it on Windows NT and have
a working Win32 Makefile for MSVC (I'm not a Linux zealot!).

Mike

-- 
A  program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes  the  potential  for it to be applied to tasks that are
conceptually  similar and, more important, to tasks that have not
yet been conceived. 

From xcross at us.ibm.com  Wed Mar  5 15:08:37 2003
From: xcross at us.ibm.com (Chris Cross)
Date: Wed Mar  5 15:09:03 2003
Subject: [Expat-discuss] Can expat interrupt processing?
Message-ID: <OF6339C9BC.B9CAE48A-ON85256CE0.006DA564-85256CE0.006EA709@us.ibm.com>


Can expat interrupt processing? Say you encounter an error in the start
element handler and want to bail?

thanks,
chris


Chris Cross
IBM Boca Raton
xcross@us.ibm.com
voice 561.862.2102  t/l 975.2102
fax 561.862.3922


From karl at waclawek.net  Wed Mar  5 15:19:27 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Wed Mar  5 15:19:33 2003
Subject: [Expat-discuss] Can expat interrupt processing?
References: <OF6339C9BC.B9CAE48A-ON85256CE0.006DA564-85256CE0.006EA709@us.ibm.com>
Message-ID: <010a01c2e354$85aed220$9e539696@citkwaclaww2k>

> Can expat interrupt processing? Say you encounter an error in the start
> element handler and want to bail?

This question comes up from time to time.
If you are using it from C++, throw an exception.
In C, use setjmp/longjmp if your compiler supports it.
>From other languages it will depend. 
With Delphi it works raising an exception, just like with C++.

Karl

From xcross at us.ibm.com  Wed Mar  5 17:52:01 2003
From: xcross at us.ibm.com (Chris Cross)
Date: Wed Mar  5 17:52:28 2003
Subject: [Expat-discuss] Can expat interrupt processing?
Message-ID: <OF4E015FDD.770E13A5-ON85256CE0.00779874-85256CE0.007D9CCE@us.ibm.com>


Hi Karl,
I've not used setjmp/longjmp before so thanks for your patience. If I do a
longjmp out of the element handler what are the memory implications in
expat when it hasn't finished the parse?  Would the code look something
like this:

jmp_buf mark;

main()
{
   ...
   jmpret = setjmp( mark );
   if (jmpret == 0)
   {
      if (!XML_Parse(...))
      {
         // process errors
      }
   }
   else
   {
      // What is the state of the parser here?
      // Would XML_GetCurrentLineNumber work?
      // Should I call XML_ParserFree to clean up?
      // process errors
   }

   ...
}

startElement(...)
{
   // process the element
   ...
   if (some error condition)
      longjmp(mark, -1)
}

Thanks for your help,
chris


Chris Cross
IBM Boca Raton
xcross@us.ibm.com
voice 561.862.2102  t/l 975.2102
fax 561.862.3922


|---------+---------------------------->
|         |           "Karl Waclawek"  |
|         |           <karl@waclawek.ne|
|         |           t>               |
|         |                            |
|         |           03/05/2003 03:19 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                 |
  |       To:       <expat-discuss@libexpat.org>, Chris Cross/West Palm Beach/IBM@IBMUS                             |
  |       cc:                                                                                                       |
  |       Subject:  Re: [Expat-discuss] Can expat interrupt processing?                                             |
  |                                                                                                                 |
  >-----------------------------------------------------------------------------------------------------------------|


> Can expat interrupt processing? Say you encounter an error in the start
> element handler and want to bail?

This question comes up from time to time.
If you are using it from C++, throw an exception.
In C, use setjmp/longjmp if your compiler supports it.
>From other languages it will depend.
With Delphi it works raising an exception, just like with C++.

Karl


From karl at waclawek.net  Wed Mar  5 20:08:46 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Wed Mar  5 20:06:15 2003
Subject: [Expat-discuss] Can expat interrupt processing?
References: <OF4E015FDD.770E13A5-ON85256CE0.00779874-85256CE0.007D9CCE@us.ibm.com>
Message-ID: <000b01c2e37c$f051ae90$0207a8c0@karl>

> 
> Hi Karl,
> I've not used setjmp/longjmp before so thanks for your patience. 

Neither have I. Most of the time I am not programming in C.

> If I do a
> longjmp out of the element handler what are the memory implications in
> expat when it hasn't finished the parse?  Would the code look something
> like this:
> 
> jmp_buf mark;
> 
> main()
> {
>    ...
>    jmpret = setjmp( mark );
>    if (jmpret == 0)
>    {
>       if (!XML_Parse(...))
>       {
>          // process errors
>       }
>    }
>    else
>    {
>       // What is the state of the parser here?
>       // Would XML_GetCurrentLineNumber work?
>       // Should I call XML_ParserFree to clean up?
>       // process errors
>    }
> 
>    ...
> }
> 
> startElement(...)
> {
>    // process the element
>    ...
>    if (some error condition)
>       longjmp(mark, -1)
> }

That code looks right to me.
I would not call XML_GetCurrentLineNumber on return from the error
through longjmp, I would rather store all information that is needed
later, while still in the handler, before longjmp is called.
Most such functions in Expat are only meant to be called from a handler,
even though they sometimes might still work after parsing has ended.

I suggest you try XML_GetCurrentLineNumber both ways, but
"officially" is is better to call from the handler.

You should call XML_ParserFree or XML_ParserReset just
like you would when parsing has ended normally.

AFAIK, you should not use setjmp/longjmp when in C++.

The memory implications are such - to the best of my knowledge - that
Expat does not allocate memory in local variables (e.g. for the purpose
of a callback), so that jumping out of parsing through longjmp should not
cause a memory leak.

Karl

From brox at corena.no  Thu Mar  6 08:30:23 2003
From: brox at corena.no (Bjorn Brox)
Date: Thu Mar  6 02:30:38 2003
Subject: [Expat-discuss] Manage unknown entityes?
Message-ID: <3E66F90F.9000308@corena.no>

How can I manage unknown entities?

When parsing the following xml file I get the error: "undefined entity
at line 7" and the parser stops.

--------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc [
   <!ELEMENT doc ANY>
   <!ELEMENT para (#PCDATA)>
]>
<doc>
    <para>Hello World&iquest;</para>
</doc>
--------------------

The &iquest; entity is one of the standard SGML character entities
defined in ISOPub, but unknown by the expat parser.

I know that I could solve this by adding an entity declaration, but very
often users asssume that a parser have a knowledge about all the ISO
defined entities.

Is it possible to set up a callback where I can handle unknown entities
where I can deside myself if I want the parser to terminate or simply
return the correct UTF-8 code if my callback know the entity?

-- 
Bjorn Brox, CORENA Norge AS, http://www.corena.no/, ICQ 17872043
Industritunet, Dyrmyrgt. 35, N-3611 Kongsberg, NORWAY
Phone: +47 32717210, Fax: +47 32717201, Mobile: +47 92638590


From karl at waclawek.net  Thu Mar  6 09:14:02 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Thu Mar  6 09:14:12 2003
Subject: [Expat-discuss] Manage unknown entityes?
References: <3E66F90F.9000308@corena.no>
Message-ID: <000b01c2e3ea$a39ea210$9e539696@citkwaclaww2k>


> How can I manage unknown entities?
> 
> When parsing the following xml file I get the error: "undefined entity
> at line 7" and the parser stops.
> 
> --------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE doc [
>    <!ELEMENT doc ANY>
>    <!ELEMENT para (#PCDATA)>
> ]>
> <doc>
>     <para>Hello World&iquest;</para>
> </doc>
> --------------------
> 
> The &iquest; entity is one of the standard SGML character entities
> defined in ISOPub, but unknown by the expat parser.

This is not a pre-defined entity in XML and requires a declaration.
Your document is simply not well-formed and will be rejected
by any compliant XML parser.

> I know that I could solve this by adding an entity declaration, but very
> often users asssume that a parser have a knowledge about all the ISO
> defined entities.

In XML that is an incorrect assumption.
 
> Is it possible to set up a callback where I can handle unknown entities
> where I can deside myself if I want the parser to terminate or simply
> return the correct UTF-8 code if my callback know the entity?

We don't have such a call-back, and it really is not a good idea
since it encourages the creation of non-wellformed (malformed?) documents.

Karl

From dmoore at viefinancial.com  Tue Mar 11 07:45:22 2003
From: dmoore at viefinancial.com (Moore, Dave)
Date: Tue Mar 11 07:45:30 2003
Subject: [Expat-discuss] Pull API Status?
Message-ID: <2FE8C75C7A06D4118BB50008C7F7E831DE3BF1@EXCHSRV>

I was browsing the archives and came across a proposal to implement a pull
based API on top of expat's XML_Parse+callbacks API.  This is something I
could use for a current project at work, and I'd be willing to tackle it if
no one is actively working on this.  

API Option 1 in this message
http://mail.libexpat.org/pipermail/expat-discuss/2002-August/000602.html
meets my needs, but I wanted to make sure this was still considered to be a
viable option.

While I agree with a later message in the thread that a Pull API built
directly on top of the xmltok tokenizer would be cleaner and more efficient,
it's not something I personally feel up to tackling at this point due to
time constaints on my work project.

After I hear back on the status, I have a couple of specific questions for
how people would like to handle character text nodes in a pull based API.

Thanks,
Dave


-----------------------
David Moore
dmoore@viefinancial.com

From karl at waclawek.net  Tue Mar 11 09:40:25 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 11 09:40:31 2003
Subject: [Expat-discuss] Pull API Status?
References: <2FE8C75C7A06D4118BB50008C7F7E831DE3BF1@EXCHSRV>
Message-ID: <007801c2e7dc$2754b780$9e539696@citkwaclaww2k>

> I was browsing the archives and came across a proposal to implement a pull
> based API on top of expat's XML_Parse+callbacks API.  This is something I
> could use for a current project at work, and I'd be willing to tackle it if
> no one is actively working on this.  

No one is currently working on it, as we plan to release Expat 2.0 first.
 
> API Option 1 in this message
> http://mail.libexpat.org/pipermail/expat-discuss/2002-August/000602.html
> meets my needs, but I wanted to make sure this was still considered to be a
> viable option.

It is not what I have in mind, but it could serve as the Pull solution
until we really go for a proper implementation. If we can hide implementation
details and make the API itself robust then a change in implementation
might not even break existing code - in theory, at least ;-).

> While I agree with a later message in the thread that a Pull API built
> directly on top of the xmltok tokenizer would be cleaner and more efficient,
> it's not something I personally feel up to tackling at this point due to
> time constaints on my work project.

Yes, that is always our problem!!!

Btw, here is how I see the "proper" Pull implementation:
(assuming an API has been established, details may change):

- Expat is already Pull based internally. So a lot of code can be re-used.
  One does not need to completely re-implement the layer on top of xmltok.
- The main things to change are:
  - instead of "pushing" buffers (with XML_ParseBuffer), have the main
    parsing loop pull buffers with an XML_GetNextBuffer callback.
  - add return codes to all the callbacks (like XML_SKIP, XML_USE, XML_ERROR, ...)
  - supply internal callbacks which perform the PULL API specific
    data preparation and also do any required filtering
- The Next() function would simply call the main parsing loop which returns
  when an (internal) callback returns XML_USE. The data to be reported would
  be stored in some fields in the Parser structure.

In addition we would also want to improve the API with regards to
complete entity reporting (currently the same restrictions as SAX2)
and namespace reporting (it seems better to return names as separate
localName, prefix and uri parameters).

And, of course, it should still be possible to use Expat in Push mode.
  
  
> After I hear back on the status, I have a couple of specific questions for
> how people would like to handle character text nodes in a pull based API.

I think if we want your API re-usable we should put a lot of thought into it.

Karl

From rolf at pointsman.de  Tue Mar 11 15:58:04 2003
From: rolf at pointsman.de (rolf@pointsman.de)
Date: Tue Mar 11 10:01:28 2003
Subject: [Expat-discuss] Pull API Status?
In-Reply-To: <007801c2e7dc$2754b780$9e539696@citkwaclaww2k>
Message-ID: <200303111458.PAA14381@pointsman.pointsman.de>


(... sorry for putting this a bit out of the context)

On 11 Mar, Karl Waclawek wrote:
> [...]
> In addition we would also want to improve the API with regards to
> complete entity reporting (currently the same restrictions as SAX2)
> and namespace reporting (it seems better to return names as separate
> localName, prefix and uri parameters).

I would love to see this.

(Especially the second thing (namespace reporting), which should be
much more easier, as far as I see, than the first thing, which needs
a lot more new API stuff.)

rolf


From dmoore at viefinancial.com  Tue Mar 11 11:09:49 2003
From: dmoore at viefinancial.com (Moore, Dave)
Date: Tue Mar 11 11:10:43 2003
Subject: [Expat-discuss] Pull API Status?
Message-ID: <2FE8C75C7A06D4118BB50008C7F7E831DE3BF7@EXCHSRV>

Karl,

Thanks for the reply.  Let me make some comments and questions inline to
make sure I understand you.

> > I was browsing the archives and came across a proposal to 
> implement a pull
> > based API on top of expat's XML_Parse+callbacks API.  This 
> is something I
> > could use for a current project at work, and I'd be willing 
> to tackle it if
> > no one is actively working on this.  
> 
> No one is currently working on it, as we plan to release 
> Expat 2.0 first.

This is understandable.  I didn't read the roadmap closely enough, and we
going on the August posts referring to suspension perhaps being part of
1.96.xxx

> - Expat is already Pull based internally. So a lot of code 
> can be re-used.
>   One does not need to completely re-implement the layer on 
> top of xmltok.
> - The main things to change are:
>   - instead of "pushing" buffers (with XML_ParseBuffer), have the main
>     parsing loop pull buffers with an XML_GetNextBuffer callback.

Just to get my terminology straight... By "main parsing loop" you mean any
of the prologProcessor, contentProcessor, externalEntityProcessor family of
functions, yes?

So, contentProcessor (or doContent) would change to call the
XML_GetNextBuffer callback whenever they needed more bytes.

Pull mode would work by calling XML_NextNode() (or whatever we name it) and
would -never- call XML_Parse or XML_ParseBuffer.

Push mode would continue to work in that XML_Parse and XML_ParseBuffer would
provide pointers and lengths to the main parser structure just like they do
today, and the absense of a XML_GetNextBuffer callback would ensure that
control returns to XML_Parse callers just like it does now.  

>   - add return codes to all the callbacks (like XML_SKIP, 
> XML_USE, XML_ERROR, ...)
>   - supply internal callbacks which perform the PULL API specific
>     data preparation and also do any required filtering

Return codes seem to be the cleanest way to do this, especially for an ExPat
2.1 or 3.0 release.  
However, should we preserve compatibility with "push-era" callbacks better
by providing a XML_SetNodeHandling() function to be used in a callback where
the user would send the XML_SKIP, XML_USE flags appropriately?

The default could always be "XML_SKIP".  Callbacks can continue to have void
return type and for push mode users, the main loop would continue until the
buffer is exhausted, just like today.

> - The Next() function would simply call the main parsing loop 
> which returns
>   when an (internal) callback returns XML_USE. The data to be 
> reported would
>   be stored in some fields in the Parser structure.

Agreed.

> In addition we would also want to improve the API with regards to
> complete entity reporting (currently the same restrictions as SAX2)
> and namespace reporting (it seems better to return names as separate
> localName, prefix and uri parameters).

My project does not have complex namespace reporting needs, but I would
certainly want any new pull api functions to provide as much namespace
support as the rest of Expat.

> And, of course, it should still be possible to use Expat in Push mode.

Absolutely.

> > After I hear back on the status, I have a couple of 
> specific questions for
> > how people would like to handle character text nodes in a 
> pull based API.
> 
> I think if we want your API re-usable we should put a lot of 
> thought into it.

I agree, especially with regards to namespace handling.  However, I would
like to scope out a level of effort based on the following first phase
requirements:

1.  Provide pull parsing that can handle element, attribute, and text nodes
for well formed XML documents.  Use a minimal subset of the Java API at
www.xmlpull.org as a guideline Sample API.  Subset of this API is included
at the bottom of the email.

2.  The real assessment is this - how extensive are the changes to the main
parsing loop(s)?  They have to :
	A.  Handle pulling buffers from users when needed.  If this logic
can stay "outside" in XML_Next(), it will be helpful, I would imagine.
	B.  Handle "XML_USE/XML_SKIP/XML_ERROR" values set during callbacks.
	C.  If XML_USE is returned, we have to store information about the
current node for retrieval and return.

3.  Default callbacks that match the capabilities of the Pull API should be
provided.  That is, "XML_USE" should be the default for StartElement,
EndElement, and Character callbacks.


Seeing the roadmap, I can see why the main expat distribution is holding off
on these things...  My goal would be to provide a proof of concept
implementation that minimizes impact to the main parsing loops so that any
changes to the main parsing loop that occur during the 1.95 -> 2.0
transition are easy to merge with changes necessary for pull parsing.

Any feedback would be appreciated!

Thanks,
Dave


Minimal API for proof of concept:

typedef enum 
{
	START_DOCUMENT,
	END_DOCUMENT,
	START_TAG,
	END_TAG,
	TEXT,
} NodeType;

typedef enum
{
	XML_USE,
	XML_SKIP,
	XML_ERROR,
} NodeHandling;

/* Main entry point for pull parsing.  Returns the NodeType of the most
recently parsed node where
   XML_USE was set in a callback */
NodeType XML_Next(XML_Parser p);

/* Function used in callbacks to set pull parse handling of the node */
void XML_SetNodeHandling(XML_Parser p,NodeHandling handling);

/* Functions to retrieve information about the current node. */
char *XML_GetName();		/* Returns the name of the next node */
char *XML_GetText();		/* Returns the text content of the current
node */

int XML_GetAttributeCount();  
char *XML_GetAttributeName(int index);
char *XML_GetAttributeValue(int index);

/* Clearly the above family of functions would grow/change to handle
namespaces, CDATA, comments, etc. */

/* Pull buffer management */
/* User provided function will provide a pointer to additional buffer space
and the length
   of that buffer.  If the user sets *nextBufferPtr to NULL, this signals
the end of input. */

typedef void(*XML_GetNextBufferHandler)(void *userData, char
**nextBufferPtr, int *nextBufferLen);
XML_SetGetNextBufferHandler(XML_Parser p,XML_GetNextBufferHandler handler);


From karl at waclawek.net  Tue Mar 11 12:14:06 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 11 12:14:14 2003
Subject: [Expat-discuss] Pull API Status?
References: <2FE8C75C7A06D4118BB50008C7F7E831DE3BF7@EXCHSRV>
Message-ID: <00c201c2e7f1$9f504e10$9e539696@citkwaclaww2k>


> > No one is currently working on it, as we plan to release 
> > Expat 2.0 first.
> 
> This is understandable.  I didn't read the roadmap closely enough, and we
> going on the August posts referring to suspension perhaps being part of
> 1.96.xxx

Yes, our first look at this was triggered by the fact that Mozilla uses
an older version of Expat with such modifications. But thinking some more
about it we didn't think (or at least I didn't) that this is the way to go.

Also, if you look at the code check-ins in the last year or so, you will
find that we haven't had a lot of manpower available.
It's mostly just Fred and me.

> > - Expat is already Pull based internally. So a lot of code 
> > can be re-used.
> >   One does not need to completely re-implement the layer on 
> > top of xmltok.
> > - The main things to change are:
> >   - instead of "pushing" buffers (with XML_ParseBuffer), have the main
> >     parsing loop pull buffers with an XML_GetNextBuffer callback.
> 
> Just to get my terminology straight... By "main parsing loop" you mean any
> of the prologProcessor, contentProcessor, externalEntityProcessor family of
> functions, yes?

Yes, basically doProlog and doContent. Those "processors" above are
basically used to call the parsing loop based on the parser's state.
The input can come in chunks of any size, so part of the state
is stored in a "processor" function pointer, to avoid too many
conditional checks. That is a heavily used approach in Expat.
 
> So, contentProcessor (or doContent) would change to call the
> XML_GetNextBuffer callback whenever they needed more bytes.

Yes.
 
> Pull mode would work by calling XML_NextNode() (or whatever we name it) and
> would -never- call XML_Parse or XML_ParseBuffer.

Correct.
 
> Push mode would continue to work in that XML_Parse and XML_ParseBuffer would
> provide pointers and lengths to the main parser structure just like they do
> today, and the absense of a XML_GetNextBuffer callback would ensure that
> control returns to XML_Parse callers just like it does now. 

How we do this in detail is not sure yet.
One could simply have different "processors" (without the XML_GetNextBuffer)
callded from XML_Parse(Buffer). It might even be possible to mix
push and pull operation - but not sure, haven't really thought it through.

Or, one could, for instance, imagine that we leave the XML_GetBuffer callback,
and simply call the XML_NextNode function, with defaults set to skip all nodes,
and instead of the internal callbacks, the programmer supplies his/her
own (just like today), with one change: there will still be return codes,
and if the programmer wants to terminate parsing prematurely he/she just
returns XML_STOP or XML_ERROR, or some other code.
This would give us the additional benefit of enabling C programmers
to terminate parsing without having to resort to setjmp/longjmp.

In simpler words, push mode could be achieved by simply having the
programmer supply his/her own callbacks instead of the internal ones,
and by defaulting to XML_SKIP.

One thing: this push API would not be compatible with the current one.
Therefore this should be part of a branch that builds up to Expat 3.0.

> >   - add return codes to all the callbacks (like XML_SKIP, 
> > XML_USE, XML_ERROR, ...)
> >   - supply internal callbacks which perform the PULL API specific
> >     data preparation and also do any required filtering
> 
> Return codes seem to be the cleanest way to do this, especially for an ExPat
> 2.1 or 3.0 release.  
> However, should we preserve compatibility with "push-era" callbacks better
> by providing a XML_SetNodeHandling() function to be used in a callback where
> the user would send the XML_SKIP, XML_USE flags appropriately?
> 
> The default could always be "XML_SKIP".  Callbacks can continue to have void
> return type and for push mode users, the main loop would continue until the
> buffer is exhausted, just like today.

Well, IMO, if we break from the old API anyway, we should do it properly.
However, if we can keep the old API completely the same, then it would make
sense to have something like XML_SetNodeHandling().

But since I have already had some feedback on the namespace processing
API changes, it does not look like the old API will survive completely
unmodified, and so I would say we should go with return codes.
Depends on what kind of feedback we will have, of course.
 
> > In addition we would also want to improve the API with regards to
> > complete entity reporting (currently the same restrictions as SAX2)
> > and namespace reporting (it seems better to return names as separate
> > localName, prefix and uri parameters).
> 
> My project does not have complex namespace reporting needs, but I would
> certainly want any new pull api functions to provide as much namespace
> support as the rest of Expat.

What I was aiming at was that the way qualified names are reported
is cumbersome to parse and extra work to do within Expat.
Also, the SAX2 API has a separation into localName, QName and uri,
which is confusing. E.g., what is the value of localName if ns-processing
is turned on, but the name has no prefix? etc. We would want to
avoid that.
 
> > I think if we want your API re-usable we should put a lot of 
> > thought into it.
> 
> I agree, especially with regards to namespace handling.  However, I would
> like to scope out a level of effort based on the following first phase
> requirements:
> 
> 1.  Provide pull parsing that can handle element, attribute, and text nodes
> for well formed XML documents.  Use a minimal subset of the Java API at
> www.xmlpull.org as a guideline Sample API.  Subset of this API is included
> at the bottom of the email.

Yes, existing efforts can serve as a starting point, why re-invent the wheel?

> 2.  The real assessment is this - how extensive are the changes to the main
> parsing loop(s)?  They have to :
> A.  Handle pulling buffers from users when needed.  If this logic
> can stay "outside" in XML_Next(), it will be helpful, I would imagine.

This might be something to do just before doContent is called,
in the various "processors", like for instance:

static enum XML_Error PTRCALL
contentProcessor(XML_Parser parser,
                 const char *start,
                 const char *end,
                 const char **endPtr)
{
  enum XML_Error result =
    doContent(parser, 0, encoding, start, end, endPtr);
  if (result != XML_ERROR_NONE)
    return result;
  if (!storeRawNames(parser))
    return XML_ERROR_NO_MEMORY;
  return result;
}

replaing the doContent call with something like

while (XML_GetNextBuffer(start, end, endPtr)) {
  enum XML_Error result =
    doContent(parser, 0, encoding, start, end, endPtr);
  if (result != XML_ERROR_NONE)
    return result;
}

Haven't lookd at this closely, so this may not be a good approach,
but you get the idea.

> B.  Handle "XML_USE/XML_SKIP/XML_ERROR" values set during callbacks.
> C.  If XML_USE is returned, we have to store information about the
> current node for retrieval and return.

> 3.  Default callbacks that match the capabilities of the Pull API should be
> provided.  That is, "XML_USE" should be the default for StartElement,
> EndElement, and Character callbacks.

Btw, if we use return codes, then one way to default them would be
by passing them by reference (instead of a "real" return value),
and have that value defaulted accordingly. That way the programmer
only needs to set it in the "unusual" case.

> Seeing the roadmap, I can see why the main expat distribution is holding off
> on these things...  My goal would be to provide a proof of concept
> implementation that minimizes impact to the main parsing loops so that any
> changes to the main parsing loop that occur during the 1.95 -> 2.0
> transition are easy to merge with changes necessary for pull parsing.

I think if we can get away with something as simple as the contentProcessor
approach from above, coupled with custom callbacks, it might not be that bad.
But the devil is in the detail ...

Tha API you posted looks good for a starting point. Signature details
can potentially change. But let's take it one step at a time.
I am interested in this myself (a lot), but lack of time is my enemy.


Karl


From karl at waclawek.net  Tue Mar 11 13:13:13 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 11 13:13:22 2003
Subject: [Expat-discuss] Pull API Status?
References: <2FE8C75C7A06D4118BB50008C7F7E831DE3BF7@EXCHSRV>
Message-ID: <00ca01c2e7f9$e1961ea0$9e539696@citkwaclaww2k>

> int XML_GetAttributeCount();  
> char *XML_GetAttributeName(int index);
> char *XML_GetAttributeValue(int index);

One comment about attributes - this is a shortcoming of SAX2 as well:
Due to the nature of reporting attributes values as one chunk
of data, entities within the value are silently expanded.
The same applies to parameter entities in entity values.

This is one of the deficiencies we wanted to fix, so that
Rolf and others can build a complete DOM on top of Expat.

Rolf, please fill in if I am missing anything here.

Karl


From dmoore at viefinancial.com  Tue Mar 11 13:19:36 2003
From: dmoore at viefinancial.com (Moore, Dave)
Date: Tue Mar 11 13:19:45 2003
Subject: [Expat-discuss] Pull API Status?
Message-ID: <2FE8C75C7A06D4118BB50008C7F7E831DE3BFC@EXCHSRV>

I'm certainly very flexible on this point.  The Pull API can certainly
produce a single attribute during each call to Next() - it just means we add
a few new states - START_TAG_OPEN, ATTRIBUTE, START_TAG_CLOSE, something
like that.

Or were you saying that the tokenizer in the "far future" expat will no
longer process a start tag with attributes as one token?

Dave


> -----Original Message-----
> From: Karl Waclawek [mailto:karl@waclawek.net]
> Sent: Tuesday, March 11, 2003 1:13 PM
> To: Moore, Dave; expat-discuss@libexpat.org
> Subject: Re: [Expat-discuss] Pull API Status?
> 
> 
> > int XML_GetAttributeCount();  
> > char *XML_GetAttributeName(int index);
> > char *XML_GetAttributeValue(int index);
> 
> One comment about attributes - this is a shortcoming of SAX2 as well:
> Due to the nature of reporting attributes values as one chunk
> of data, entities within the value are silently expanded.
> The same applies to parameter entities in entity values.
> 
> This is one of the deficiencies we wanted to fix, so that
> Rolf and others can build a complete DOM on top of Expat.
> 
> Rolf, please fill in if I am missing anything here.
> 
> Karl
> 

From karl at waclawek.net  Tue Mar 11 13:39:33 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 11 13:39:42 2003
Subject: [Expat-discuss] Pull API Status?
References: <2FE8C75C7A06D4118BB50008C7F7E831DE3BFC@EXCHSRV>
Message-ID: <00da01c2e7fd$8f543d80$9e539696@citkwaclaww2k>


> I'm certainly very flexible on this point.  The Pull API can certainly
> produce a single attribute during each call to Next() - it just means we add
> a few new states - START_TAG_OPEN, ATTRIBUTE, START_TAG_CLOSE, something
> like that.

That would not really help.
The simplest would be to have a startAttribute callback, followed
by characters, start/endEntity callbacks and terminated by an endAttribute
callback, just like it is done for elements.
However, that many callbacks can make Expat inefficient.
 
> Or were you saying that the tokenizer in the "far future" expat will no
> longer process a start tag with attributes as one token?

If I remember correctly - it was a while ago - some of the changes to support
complete entity reporting would have to be made in the tokenizer.

There are various other ways to do this through an API (no just like above).
I imagine we could have *two* XML_GetAttributeValue functions
(or supply an extra argument), where we would only support
one of them initially (the one that always expands entities).

For the second one, there are again options:
A low level, efficient, but maybe not so nice, option would
be to insert pointers to entity values in the character string,
prefixed by an invalid XML character (like U-FFFF) as marker.
This can work recursively.
Another option could be to return the attribute as node with children.
I prefer the first option since Expat is most often used
through wrappers anyway, and if you want the speed, then you
can still get it by going down to the Expat API level directly.

Karl

From dmoore at viefinancial.com  Tue Mar 11 13:42:04 2003
From: dmoore at viefinancial.com (Moore, Dave)
Date: Tue Mar 11 13:42:14 2003
Subject: [Expat-discuss] Pull API Status?
Message-ID: <2FE8C75C7A06D4118BB50008C7F7E831DE3BFE@EXCHSRV>

After looking further into the way that XML_GetNextBuffer() would work,
there is a "wrinkle" involved when we have a partial token left over from
the previous buffer.  Since we have to arrange for the leftover bytes to be
just in front of any new bytes before continuing with our parsing, we have
some choices:

1.  Copy leftover bytes out and into a dynamic buffer, ask the user for
their new buffer, and then copy both blocks into a contiguous dynamically
allocated buffer.  Simple for the user, slower for us.

2.  Ask the user for a buffer of a certain minimum size, and instruct the
user to copy their new data into this buffer starting at a certain offset
into that buffer, leaving us room to shift the old bytes to the beginning of
the buffer, and allowing for "huge" nodes that span several buffers.  New
signature:  
XML_GetNextBuffer(int min_buffer_len,int offset, char **p_new_buffer,int
*p_new_len);
  
3.  Ask the user to prepend our leftover data for us, effective shifting the
work that XML_GetBuffer does onto the user.  This will minimize unnecessary
allocations and copies, but doesn't feel clean to me.  New signature:

XML_GetNextBuffer(const char *p_old_data,int len_old,data, char
**p_new_buffer,int *p_new_len);

4.  Some combination of the above - that main thing is to allow for and
protect ourselves against the user re-using the same buffer across calls.  

These kind of issues seem to be philosophical - who pays in complexity for
improved performance - the library or the user?  Since I've been here all of
5 hours, I'll defer to others on this list for the choice that's consistent
with expat philosophy.

Thanks,
Dave


From karl at waclawek.net  Tue Mar 11 13:45:53 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 11 13:46:00 2003
Subject: [Expat-discuss] Pull API Status?
References: <2FE8C75C7A06D4118BB50008C7F7E831DE3BFE@EXCHSRV>
Message-ID: <00e201c2e7fe$7195fe40$9e539696@citkwaclaww2k>


> After looking further into the way that XML_GetNextBuffer() would work,
> there is a "wrinkle" involved when we have a partial token left over from
> the previous buffer.  Since we have to arrange for the leftover bytes to be
> just in front of any new bytes before continuing with our parsing, we have
> some choices:
> 
> 1.  Copy leftover bytes out and into a dynamic buffer, ask the user for
> their new buffer, and then copy both blocks into a contiguous dynamically
> allocated buffer.  Simple for the user, slower for us.
> 
> 2.  Ask the user for a buffer of a certain minimum size, and instruct the
> user to copy their new data into this buffer starting at a certain offset
> into that buffer, leaving us room to shift the old bytes to the beginning of
> the buffer, and allowing for "huge" nodes that span several buffers.  New
> signature:  
> XML_GetNextBuffer(int min_buffer_len,int offset, char **p_new_buffer,int
> *p_new_len);
>   
> 3.  Ask the user to prepend our leftover data for us, effective shifting the
> work that XML_GetBuffer does onto the user.  This will minimize unnecessary
> allocations and copies, but doesn't feel clean to me.  New signature:
> 
> XML_GetNextBuffer(const char *p_old_data,int len_old,data, char
> **p_new_buffer,int *p_new_len);
> 
> 4.  Some combination of the above - that main thing is to allow for and
> protect ourselves against the user re-using the same buffer across calls.  

XML_Parse and XML_ParseBuffer already do that. We just need to
convert them from being supplied a buffer to getting one.
Of course, the devlish details may make it not quite that simple.

Karl

From karl at waclawek.net  Tue Mar 11 14:54:19 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 11 14:54:24 2003
Subject: [Expat-discuss] Pull API Status?
References: <2FE8C75C7A06D4118BB50008C7F7E831DE3BFE@EXCHSRV>
	<00e201c2e7fe$7195fe40$9e539696@citkwaclaww2k>
Message-ID: <010e01c2e808$014b9c30$9e539696@citkwaclaww2k>

> > 4.  Some combination of the above - that main thing is to allow for and
> > protect ourselves against the user re-using the same buffer across calls.  
> 
> XML_Parse and XML_ParseBuffer already do that. We just need to
> convert them from being supplied a buffer to getting one.
> Of course, the devlish details may make it not quite that simple.

What I mean is: 

We have a function NextBuffer() - based on XML_Parse(Buffer) - which
is the one used in the parsing loop. This function in turn then
gets more input by calling back through XML_GetNextBuffer().

Karl

From fdrake at acm.org  Wed Mar 12 20:09:35 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed Mar 12 20:10:12 2003
Subject: [Expat-discuss] Performance magic in internal.h
Message-ID: <15983.55887.53503.279030@grendel.zope.com>


Expat currently contains some fairly magical compiler-specific
performance hackery in the form of the FASTCALL, PTRCALL, and
PTRFASTCALL macros defined in the internal.h header.

When these were added, I measured a real but very small speed
improvement using these.  On many platforms and compilers, these are
essentially disabled, and for others, they should be (for example, Mac
OS X using GCC, the standard C compiler for that platform).

Given the high maintenance cost of these macros (they seem to be wrong
or useless most of the time), I'm inclined to remove them completely.
Is there any reason not to?  They don't affect the public API in any
way, only the performance (very slightly), and the maintainability.

I'd *really* like to remove these for Expat 1.95.7.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From karl at waclawek.net  Wed Mar 12 21:16:06 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Wed Mar 12 21:13:14 2003
Subject: [Expat-discuss] Performance magic in internal.h
References: <15983.55887.53503.279030@grendel.zope.com>
Message-ID: <00d701c2e906$817095b0$0207a8c0@karl>

> Expat currently contains some fairly magical compiler-specific
> performance hackery in the form of the FASTCALL, PTRCALL, and
> PTRFASTCALL macros defined in the internal.h header.
> 
> When these were added, I measured a real but very small speed
> improvement using these.  On many platforms and compilers, these are
> essentially disabled, and for others, they should be (for example, Mac
> OS X using GCC, the standard C compiler for that platform).
> 
> Given the high maintenance cost of these macros (they seem to be wrong
> or useless most of the time), I'm inclined to remove them completely.
> Is there any reason not to?  They don't affect the public API in any
> way, only the performance (very slightly), and the maintainability.
> 
> I'd *really* like to remove these for Expat 1.95.7.

What about making all of these macros noop, but leaving 
them there in case anyone wants to tune Expat for their
own purposes? It was some effort to put them there in the
first place.

Karl


From carlos at pehoe.civil.ist.utl.pt  Wed Mar 12 20:35:58 2003
From: carlos at pehoe.civil.ist.utl.pt (Carlos Pereira)
Date: Thu Mar 13 08:39:27 2003
Subject: [Expat-discuss] isFinal parameter
Message-ID: <200303122035.h2CKZwp02328@pehoe.civil.ist.utl.pt>

Both XML_Parse and XML_ParseBuffer have a isFinal parameter
that tells Expat whether this is the last block or not.

Usually I would use something as this to know
whether isFinal is true or false:

size = read (socket_fd, buffer, 1024);
isFinal = size < 1024;

Unfortunately this does not work for HTTP streams,
reading from a socket, where the actual size of the
block returned by read is variable, depending of the
connection, so if I receive a block of 512 this does
not really mean that this is the last block.

To get around this, I consider:
1) using two buffers, reading buffer1, then buffer2,
an then parsing buffer1 (if the returned size
for buffer2 is 0 then buffer1 is the last block),
and then swaping buffers, until the end.

2) use one buffer, and then read (socket_fd, char, 1);
to read just a byte ahead, to see if read returns 0 or not.
This tells me what should be the isFinal parameter.

3) run XML_Parse always with isFinal parameter set to FALSE.
This is easier but it really looks completely wrong.

Is there a better solution? I am forgeting something obvious?
is there some magic solution for this problem?

Thanks a lot for enlighten my clueness!

Carlos

From fdrake at acm.org  Thu Mar 13 15:22:16 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu Mar 13 15:22:53 2003
Subject: [Expat-discuss] isFinal parameter
In-Reply-To: <200303122035.h2CKZwp02328@pehoe.civil.ist.utl.pt>
References: <200303122035.h2CKZwp02328@pehoe.civil.ist.utl.pt>
Message-ID: <15984.59512.447219.317945@grendel.zope.com>


Carlos Pereira writes:
 > Both XML_Parse and XML_ParseBuffer have a isFinal parameter
 > that tells Expat whether this is the last block or not.
...
 > 3) run XML_Parse always with isFinal parameter set to FALSE.
 > This is easier but it really looks completely wrong.

This is entirely reasonable, and quite common.  Since this is easier
to handle consistently and keeps the code calling into Expat simpler,
that's a good way to do it.  I certainly use this in the Python
bindings.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From carlos at pehoe.civil.ist.utl.pt  Thu Mar 13 21:12:01 2003
From: carlos at pehoe.civil.ist.utl.pt (Carlos Pereira)
Date: Thu Mar 13 16:11:32 2003
Subject: [Expat-discuss] Re: isFinal parameter
Message-ID: <200303132112.h2DLC1w07989@pehoe.civil.ist.utl.pt>

> Carlos Pereira writes:
>  > Both XML_Parse and XML_ParseBuffer have a isFinal parameter
>  > that tells Expat whether this is the last block or not.
> ...
>  > 3) run XML_Parse always with isFinal parameter set to FALSE.
>  > This is easier but it really looks completely wrong.

> This is entirely reasonable, and quite common.  Since this is easier
> to handle consistently and keeps the code calling into Expat simpler,
> that's a good way to do it.  I certainly use this in the Python
> bindings.

Thanks a lot, this helps me a lot, I was under the impression
that Expat would take special precautions for the last block,
but I can see now that it shouldn't really matter.

Carlos

From rolf at pointsman.de  Thu Mar 13 22:34:47 2003
From: rolf at pointsman.de (rolf@pointsman.de)
Date: Thu Mar 13 16:38:38 2003
Subject: [Expat-discuss] Re: isFinal parameter
In-Reply-To: <200303132112.h2DLC1w07989@pehoe.civil.ist.utl.pt>
Message-ID: <200303132134.WAA20958@pointsman.pointsman.de>

On 13 Mar, Carlos Pereira wrote:
>> Carlos Pereira writes:
>>  > Both XML_Parse and XML_ParseBuffer have a isFinal parameter
>>  > that tells Expat whether this is the last block or not.
>> ...
>>  > 3) run XML_Parse always with isFinal parameter set to FALSE.
>>  > This is easier but it really looks completely wrong.
> 
>> This is entirely reasonable, and quite common.  Since this is easier
>> to handle consistently and keeps the code calling into Expat simpler,
>> that's a good way to do it.  I certainly use this in the Python
>> bindings.
> 
> Thanks a lot, this helps me a lot, I was under the impression
> that Expat would take special precautions for the last block,
> but I can see now that it shouldn't really matter.

Oh, sure there is a difference between isFinal == 0 and isFinal == 1.
With isFinal == 0, the parser don't complain, if the block to parse
ends in the middle of markup, or without all levels closed etc. It
does, if isFinal == 1.

Therefor, it may be wise, to call XML_Parse() with a 0 bytes long
block (this is legal) and isFinal == 1, after you have noticed the end
of the input in the wrapper code around. Otherwise, you may no
detected, if the input ended premature.

rolf


From carlos at pehoe.civil.ist.utl.pt  Thu Mar 13 22:07:36 2003
From: carlos at pehoe.civil.ist.utl.pt (Carlos Pereira)
Date: Thu Mar 13 17:07:06 2003
Subject: [Expat-discuss] Re: isFinal parameter
Message-ID: <200303132207.h2DM7ah08057@pehoe.civil.ist.utl.pt>

>>> Carlos Pereira writes:
>>>  > Both XML_Parse and XML_ParseBuffer have a isFinal parameter
>>>  > that tells Expat whether this is the last block or not.
>>> ...
>>>  > 3) run XML_Parse always with isFinal parameter set to FALSE.
>>>  > This is easier but it really looks completely wrong.
>> 
>>> This is entirely reasonable, and quite common.  Since this is easier
>>> to handle consistently and keeps the code calling into Expat simpler,
>>> that's a good way to do it.  I certainly use this in the Python
>>> bindings.
>> 
>> Thanks a lot, this helps me a lot, I was under the impression
>> that Expat would take special precautions for the last block,
>> but I can see now that it shouldn't really matter.

>Oh, sure there is a difference between isFinal == 0 and isFinal == 1.
>With isFinal == 0, the parser don't complain, if the block to parse
>ends in the middle of markup, or without all levels closed etc. It
>does, if isFinal == 1.

>Therefor, it may be wise, to call XML_Parse() with a 0 bytes long
>block (this is legal) and isFinal == 1, after you have noticed the end
>of the input in the wrapper code around. Otherwise, you may no
>detected, if the input ended premature.

That is very clever, many, many thanks for clarifying this point.
I just implemented your expert advice. The picture is now clear.

Thank you both, Fred and Rolf, for helping me.

Carlos

From laurent.carcone at w3.org  Fri Mar 14 14:21:10 2003
From: laurent.carcone at w3.org (Laurent Carcone)
Date: Fri Mar 14 08:21:14 2003
Subject: [Expat-discuss] 
Message-ID: <20030314132110.293CD170F5@tux.inrialpes.fr>

Hello,

Would it be possible to include a simple patch in Expat to report an 
unresolved external general entity in attribute value at it's original 
position inside the attribute value rather than the default handler.

In Function 'appendAttributeValue'
 in case XML_TOK_ENTITY_REF:

I replace
          if ((pool == &tempPool) && defaultHandler)
	     reportDefault(parser, enc, ptr, next);
by
          if ((pool == &tempPool) && defaultHandler)
	    {
	      const char *ent;
	      for (ent = ptr; ent < next; ent++) {
		if (!poolAppendChar(pool, ent[0]))
		  return XML_ERROR_NO_MEMORY; 
	      }
	    }

We use Expat for our open-source browser/editor Amaya and in this case, we 
need to receive the whole entity value in the elementStartHandler and let the 
application decide how to manage external entities.


Thanks,


Laurent Carcone
---------------
W3C - ERCIM
INRIA Rh?ne-Alpes
655 avenue de l'Europe
ZIRST Montbonnot
38334 Saint Ismier Cedex

email: laurent.carcone@w3.org / Laurent.Carcone@inrialpes.fr
Phone: +33 4 76 61 52 67  Fax: +33 4 76 61 52 07


From karl at waclawek.net  Fri Mar 14 09:39:20 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Fri Mar 14 09:39:29 2003
Subject: [Expat-discuss] 
References: <20030314132110.293CD170F5@tux.inrialpes.fr>
Message-ID: <002101c2ea37$7fcc9510$9e539696@citkwaclaww2k>


> Hello,
> 
> Would it be possible to include a simple patch in Expat to report an 
> unresolved external general entity in attribute value at it's original 
> position inside the attribute value rather than the default handler.

AFAIK, *external* general entity references are not allowed in attributes.
(section 4.4.4 in the XML 1.0 specs http://www.w3.org/TR/REC-xml.html#forbidden)

But assuming you mean internal general entities, then it seems you are
running into a limitation that exists for all SAX-like parsers: they do not
report entity boundaries for general entities in attribute values and
parameter entities in declarations.

> In Function 'appendAttributeValue'
>  in case XML_TOK_ENTITY_REF:
> 
> I replace
>           if ((pool == &tempPool) && defaultHandler)
>      reportDefault(parser, enc, ptr, next);
> by
>           if ((pool == &tempPool) && defaultHandler)
>     {
>       const char *ent;
>       for (ent = ptr; ent < next; ent++) {
> if (!poolAppendChar(pool, ent[0]))
>   return XML_ERROR_NO_MEMORY; 
> 
>       }
>     }

If I understand correctly, you want "&myEntity;" to show up verbatim
in the reported attribute value?

That would a a problem, because the string "&myEntity;" could be
generated without an actual entity reference, like here:
<elm att="ABC&amp;myEntity;DEF"> where the attribute value would
be "ABC&myEntity;DEF".

> We use Expat for our open-source browser/editor Amaya and in this case,
> we need to receive the whole entity value in the elementStartHandler
> and let the application decide how to manage external entities.

I hate to say that, but as it currently stands, a fully DOM compliant
parser would be better for you.

However, we have been discussing to improve Expat in this regard,
but it would be implemented as part of a new Expat API, after
version 2.0 has been released.

There are several options, the most efficient being the insertion
of an illegal XML character (like Unicode 0xFFFF) followed by
the entity name or even a pointer. An extra API call could be used
to pass that name or pointer and retrieve the entity value.

Another one would be to turn attribute reporting into a streaming style,
with a startAttribute, endAttribute callback, and between them we would
have  character and start/endEntity callbacks. However that looks like
a lot of calls just for reporting an attribute.

We haven't discussed this thoroughly though, and anybody who comes
up with a better idea is welcome to share it with us.

Karl


From Laurent.Carcone at inrialpes.fr  Fri Mar 14 16:33:07 2003
From: Laurent.Carcone at inrialpes.fr (Laurent Carcone)
Date: Fri Mar 14 10:33:12 2003
Subject: [Expat-discuss] 
In-Reply-To: Your message of Fri, 14 Mar 2003 09:39:20 -0500."
             <002101c2ea37$7fcc9510$9e539696@citkwaclaww2k> 
Message-ID: <20030314153307.7DFD4170F5@tux.inrialpes.fr>

> 
> > Hello,
> > 
> > Would it be possible to include a simple patch in Expat to report an 
> > unresolved external general entity in attribute value at it's original 
> > position inside the attribute value rather than the default handler.
> 
> AFAIK, *external* general entity references are not allowed in attributes.
> (section 4.4.4 in the XML 1.0 specs http://www.w3.org/TR/REC-xml.html#forbidden)

You made the right assumption :-)

> But assuming you mean internal general entities, then it seems you are
> running into a limitation that exists for all SAX-like parsers: they do not
> report entity boundaries for general entities in attribute values and
> parameter entities in declarations.

I read some mails in the archives and it is what I was afraid of.

> 
> > In Function 'appendAttributeValue'
> >  in case XML_TOK_ENTITY_REF:
> > 
> > I replace
> >           if ((pool == &tempPool) && defaultHandler)
> >      reportDefault(parser, enc, ptr, next);
> > by
> >           if ((pool == &tempPool) && defaultHandler)
> >     {
> >       const char *ent;
> >       for (ent = ptr; ent < next; ent++) {
> > if (!poolAppendChar(pool, ent[0]))
> >   return XML_ERROR_NO_MEMORY; 
> > 
> >       }
> >     }
> 
> If I understand correctly, you want "&myEntity;" to show up verbatim
> in the reported attribute value?
> 
> That would a a problem, because the string "&myEntity;" could be
> generated without an actual entity reference, like here:
> <elm att="ABC&amp;myEntity;DEF"> where the attribute value would
> be "ABC&myEntity;DEF".

It's precisely the problem we encountered. To fix it, we use a special 
character followed by the entity name (like in your first option).
In fact, my real patch is :
          if ((pool == &tempPool) && defaultHandler)
	    {
	      const char *ent;
	      if (!poolAppendChar(pool, START_ENTITY))
		return XML_ERROR_NO_MEMORY;
	      for (ent = ptr+1; ent < next; ent++) {
		if (!poolAppendChar(pool, ent[0]))
		  return XML_ERROR_NO_MEMORY; 
	      }
	    }
where START_ENTITY is a special character shared by Expat and the application.

> 
> > We use Expat for our open-source browser/editor Amaya and in this case,
> > we need to receive the whole entity value in the elementStartHandler
> > and let the application decide how to manage external entities.
> 
> I hate to say that, but as it currently stands, a fully DOM compliant
> parser would be better for you.
> 
> However, we have been discussing to improve Expat in this regard,
> but it would be implemented as part of a new Expat API, after
> version 2.0 has been released.
> 
> There are several options, the most efficient being the insertion
> of an illegal XML character (like Unicode 0xFFFF) followed by
> the entity name or even a pointer. An extra API call could be used
> to pass that name or pointer and retrieve the entity value.
> 
> Another one would be to turn attribute reporting into a streaming style,
> with a startAttribute, endAttribute callback, and between them we would
> have  character and start/endEntity callbacks. However that looks like
> a lot of calls just for reporting an attribute.
> 
> We haven't discussed this thoroughly though, and anybody who comes
> up with a better idea is welcome to share it with us.
> 
> Karl
> 

I'm looking forward this discussion

Thank you for your response.

Laurent

> 
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss@libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss
> 


From fdrake at acm.org  Fri Mar 14 12:02:14 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Mar 14 12:03:04 2003
Subject: [Expat-discuss] Performance magic in internal.h
In-Reply-To: <00d701c2e906$817095b0$0207a8c0@karl>
References: <15983.55887.53503.279030@grendel.zope.com>
	<00d701c2e906$817095b0$0207a8c0@karl>
Message-ID: <15986.2838.608875.575402@grendel.zope.com>


Karl Waclawek writes:
 > What about making all of these macros noop, but leaving 
 > them there in case anyone wants to tune Expat for their
 > own purposes? It was some effort to put them there in the
 > first place.

Given the incredible response to this discussion, I'm going to assume
that no one is using Expat any more and re-write the whole thing to me
a graphical email client that only supports obscure protocols.

I'll check in the new code tonight.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From fdrake at acm.org  Fri Mar 14 12:06:05 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Mar 14 12:06:40 2003
Subject: [Expat-discuss] Performance magic in internal.h
In-Reply-To: <00d701c2e906$817095b0$0207a8c0@karl>
References: <15983.55887.53503.279030@grendel.zope.com>
	<00d701c2e906$817095b0$0207a8c0@karl>
Message-ID: <15986.3069.126323.544840@grendel.zope.com>


Karl Waclawek writes:
 > What about making all of these macros noop, but leaving 
 > them there in case anyone wants to tune Expat for their
 > own purposes? It was some effort to put them there in the
 > first place.

Given the incredible response to this suggestion, I'm going to keep
internal.h, but disable the three *CALL macros for everything except
GCC on Linux, since we know the current definitions there are known to
work and actually help.  Anyone else that wants to set them should
edit internal.h themselves.

I'll commit the changes shortly, then close SF bug #692878.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From karl at waclawek.net  Fri Mar 14 12:48:42 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Fri Mar 14 12:48:48 2003
Subject: [Expat-discuss] Performance magic in internal.h
References: 
	<15983.55887.53503.279030@grendel.zope.com><00d701c2e906$817095b0$0207a8c0@karl>
	<15986.2838.608875.575402@grendel.zope.com>
Message-ID: <008d01c2ea51$f3be49e0$9e539696@citkwaclaww2k>

> Karl Waclawek writes:
>  > What about making all of these macros noop, but leaving 
>  > them there in case anyone wants to tune Expat for their
>  > own purposes? It was some effort to put them there in the
>  > first place.
> 
> Given the incredible response to this discussion, I'm going to assume
> that no one is using Expat any more and re-write the whole thing to me
> a graphical email client that only supports obscure protocols.
> 
> I'll check in the new code tonight.

Good stuff, Fred.
I also got a request to add support for IIOP.
Should we delay this until after release 2.0?

Karl

From fdrake at acm.org  Fri Mar 14 13:04:10 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Mar 14 13:04:48 2003
Subject: [Expat-discuss] Performance magic in internal.h
In-Reply-To: <008d01c2ea51$f3be49e0$9e539696@citkwaclaww2k>
References: <15983.55887.53503.279030@grendel.zope.com>
	<00d701c2e906$817095b0$0207a8c0@karl>
	<15986.2838.608875.575402@grendel.zope.com>
	<008d01c2ea51$f3be49e0$9e539696@citkwaclaww2k>
Message-ID: <15986.6554.512051.137536@grendel.zope.com>


Karl Waclawek writes:
 > Good stuff, Fred.
 > I also got a request to add support for IIOP.
 > Should we delay this until after release 2.0?

Only the IIOP support; that'll require we write at least one unit
test.  The fancy GUI and mail protocol support defy testing, so it can
go in now.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From karl at waclawek.net  Fri Mar 14 13:17:12 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Fri Mar 14 13:18:34 2003
Subject: [Expat-discuss] Performance magic in internal.h
References: 
	<15983.55887.53503.279030@grendel.zope.com><00d701c2e906$817095b0$0207a8c0@karl><15986.2838.608875.575402@grendel.zope.com><008d01c2ea51$f3be49e0$9e539696@citkwaclaww2k>
	<15986.6554.512051.137536@grendel.zope.com>
Message-ID: <000b01c2ea55$ef572260$9e539696@citkwaclaww2k>

> Karl Waclawek writes:
>  > Good stuff, Fred.
>  > I also got a request to add support for IIOP.
>  > Should we delay this until after release 2.0?
> 
> Only the IIOP support; that'll require we write at least one unit
> test.  The fancy GUI and mail protocol support defy testing, so it can
> go in now.

Fine with me. Btw, we might be able to tunnel IIOP through SMTP.

Karl

From rolf at pointsman.de  Fri Mar 14 19:19:28 2003
From: rolf at pointsman.de (rolf@pointsman.de)
Date: Fri Mar 14 13:22:56 2003
Subject: [Expat-discuss] Performance magic in internal.h
In-Reply-To: <15986.6554.512051.137536@grendel.zope.com>
Message-ID: <200303141819.TAA06918@pointsman.pointsman.de>

On 14 Mar, Fred L. Drake, Jr. wrote:
> 
> Karl Waclawek writes:
>  > Good stuff, Fred.
>  > I also got a request to add support for IIOP.
>  > Should we delay this until after release 2.0?
> 
> Only the IIOP support; that'll require we write at least one unit
> test.  The fancy GUI and mail protocol support defy testing, so it can
> go in now.

Im sorry, but, what the heck, does IIOP mean in this context (for sure
not CORBA's Internet Inter-ORB Protocol) ? And about what fancy GUI and
mail protocol support does you guys talk here?

rolf


From karl at waclawek.net  Fri Mar 14 13:30:42 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Fri Mar 14 13:30:54 2003
Subject: [Expat-discuss] Performance magic in internal.h
References: <200303141819.TAA06918@pointsman.pointsman.de>
Message-ID: <001101c2ea57$d1f93620$9e539696@citkwaclaww2k>


> On 14 Mar, Fred L. Drake, Jr. wrote:
> > 
> > Karl Waclawek writes:
> >  > Good stuff, Fred.
> >  > I also got a request to add support for IIOP.
> >  > Should we delay this until after release 2.0?
> > 
> > Only the IIOP support; that'll require we write at least one unit
> > test.  The fancy GUI and mail protocol support defy testing, so it can
> > go in now.
> 
> Im sorry, but, what the heck, does IIOP mean in this context (for sure
> not CORBA's Internet Inter-ORB Protocol) ? And about what fancy GUI and
> mail protocol support does you guys talk here?

Well, since Expat is so fast it could be used for SOAP over IIOP
(yes, the CORBA protocol). The e-mail support is for asynchronous
messaging, especially when the parties are disconnected.

Karl

From fdrake at acm.org  Fri Mar 14 13:54:34 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri Mar 14 13:55:07 2003
Subject: [Expat-discuss] Performance magic in internal.h
In-Reply-To: <000b01c2ea55$ef572260$9e539696@citkwaclaww2k>
References: <200303141819.TAA06918@pointsman.pointsman.de>
	<001101c2ea57$d1f93620$9e539696@citkwaclaww2k>
	<15986.6554.512051.137536@grendel.zope.com>
	<15983.55887.53503.279030@grendel.zope.com>
	<00d701c2e906$817095b0$0207a8c0@karl>
	<15986.2838.608875.575402@grendel.zope.com>
	<008d01c2ea51$f3be49e0$9e539696@citkwaclaww2k>
	<000b01c2ea55$ef572260$9e539696@citkwaclaww2k>
Message-ID: <15986.9578.860736.412547@grendel.zope.com>


Karl Waclawek writes:
 > Fine with me. Btw, we might be able to tunnel IIOP through SMTP.

Most definately.

rolf@pointsman.de writes:
 > Im sorry, but, what the heck, does IIOP mean in this context (for sure
 > not CORBA's Internet Inter-ORB Protocol) ? And about what fancy GUI and
 > mail protocol support does you guys talk here?

I think you missed an announcement:

http://mail.libexpat.org/pipermail/expat-discuss/2003-March/000945.html

Karl Waclawek writes:
 > Well, since Expat is so fast it could be used for SOAP over IIOP
 > (yes, the CORBA protocol). The e-mail support is for asynchronous
 > messaging, especially when the parties are disconnected.

Oh, you wanted to keep the XML parser?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From karl at waclawek.net  Fri Mar 14 13:59:12 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Fri Mar 14 13:59:18 2003
Subject: [Expat-discuss] Performance magic in internal.h
References: 
	<200303141819.TAA06918@pointsman.pointsman.de><001101c2ea57$d1f93620$9e539696@citkwaclaww2k><15986.6554.512051.137536@grendel.zope.com><15983.55887.53503.279030@grendel.zope.com><00d701c2e906$817095b0$0207a8c0@karl><15986.2838.608875.575402@grendel.zope.com><008d01c2ea51$f3be49e0$9e539696@citkwaclaww2k><000b01c2ea55$ef572260$9e539696@citkwaclaww2k>
	<15986.9578.860736.412547@grendel.zope.com>
Message-ID: <002b01c2ea5b$ccf61cc0$9e539696@citkwaclaww2k>

> Karl Waclawek writes:
>  > Well, since Expat is so fast it could be used for SOAP over IIOP
>  > (yes, the CORBA protocol). The e-mail support is for asynchronous
>  > messaging, especially when the parties are disconnected.
> 
> Oh, you wanted to keep the XML parser?

Yes, I have a hard time letting go...

Karl

From tim.fornoville at web.de  Mon Mar 17 17:13:00 2003
From: tim.fornoville at web.de (tim fornoville)
Date: Mon Mar 17 11:13:34 2003
Subject: [Expat-discuss] 
	=?iso-8859-1?q?=26_=A7_=FC_=F6_=E4_characters_how_to_handle?=
Message-ID: <200303171613.h2HGD0316753@mailgate5.cinetic.de>

Hi,

I've got a little problem here, when I parse a file with " & ? ? ? ? "  the parser shuts down.. ,
is there a work-around... 


greetings tim 
____________________________________________________________________________
G?nnen Sie sich eine Abwechslung vom Wintergrau mit den aktuellen 
Angeboten von Lufthansa http://img.web.de/lh/lhspecial.html


From karl at waclawek.net  Mon Mar 17 11:18:51 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Mon Mar 17 11:20:05 2003
Subject: =?iso-8859-1?Q?Re:_=5BExpat-discuss=5D_&_=A7_=FC_=F6_=E4_characters_h?=
	=?iso-8859-1?Q?ow_to_handle?=
References: <200303171613.h2HGD0316753@mailgate5.cinetic.de>
Message-ID: <004801c2eca0$e61b5a70$9e539696@citkwaclaww2k>


> I've got a little problem here, when I parse a file with " & ? ? ? ? "
> the parser shuts down.. ,
> is there a work-around...

Do you mean it crashes? Or do you get an  error returned?
If the latter, maybe the XML file has the wrong encoding specified
in the XML declaration? ISO-8859-1 might work.

Karl


From karl at waclawek.net  Mon Mar 17 11:18:51 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Mon Mar 17 11:20:17 2003
Subject: =?iso-8859-1?Q?Re:_=5BExpat-discuss=5D_&_=A7_=FC_=F6_=E4_characters_h?=
	=?iso-8859-1?Q?ow_to_handle?=
References: <200303171613.h2HGD0316753@mailgate5.cinetic.de>
Message-ID: <004b01c2eca1$10b21030$9e539696@citkwaclaww2k>


> I've got a little problem here, when I parse a file with " & ? ? ? ? "
> the parser shuts down.. ,
> is there a work-around...

Do you mean it crashes? Or do you get an  error returned?
If the latter, maybe the XML file has the wrong encoding specified
in the XML declaration? ISO-8859-1 might work.

Karl


From brox at corena.no  Mon Mar 17 17:26:34 2003
From: brox at corena.no (Bjorn Brox)
Date: Mon Mar 17 11:26:54 2003
Subject: [Expat-discuss]  & =?ISO-8859-1?Q?=A7_=FC_=F6_=E4_ch?=
 =?ISO-8859-1?Q?aracters_how_to_handle?=
References: <200303171613.h2HGD0316753@mailgate5.cinetic.de>
Message-ID: <3E75F73A.1070301@corena.no>

tim fornoville wrote:
 > Hi,
 >
 > I've got a little problem here, when I parse a file with " & ? ? ? ?
 > "  the parser shuts down.. , is there a work-around...
 >
Default encoding is UTF 8 and these characters is not valid UTF-8
characters (each of them mark the start of a UTF-8 sequence, but the 
next character is probably not a valid).

Put the correct encoding in your header if you insist of using latin 1
(ISO-8859-1)

<?xml version="1.0" encoding="ISO-8859-1"?>


-- 
Bjorn Brox, CORENA Norge AS, http://www.corena.no/, ICQ 17872043
Industritunet, Dyrmyrgt. 35, N-3611 Kongsberg, NORWAY
Phone: +47 32717210, Fax: +47 32717201, Mobile: +47 92638590


From mba2000 at ioplex.com  Mon Mar 17 16:29:01 2003
From: mba2000 at ioplex.com (Michael B.Allen)
Date: Mon Mar 17 16:29:29 2003
Subject: [Expat-discuss]  & =?ISO-8859-1?Q?=A7_=FC_=F6_=E4?= characters
 how to handle
In-Reply-To: <3E75F73A.1070301@corena.no>
References: <200303171613.h2HGD0316753@mailgate5.cinetic.de>
	<3E75F73A.1070301@corena.no>
Message-ID: <20030317162901.329e88fc.mba2000@ioplex.com>

On Mon, 17 Mar 2003 17:26:34 +0100
Bjorn Brox <brox@corena.no> wrote:

> tim fornoville wrote:
>  > Hi,
>  >
>  > I've got a little problem here, when I parse a file with " & ? ? ? ?
>  > "  the parser shuts down.. , is there a work-around...
>  >
> Default encoding is UTF 8 and these characters is not valid UTF-8
> characters (each of them mark the start of a UTF-8 sequence, but the 
> next character is probably not a valid).

Except for the ampersand character which needs to be replaced with the
builtin entity reference &amp;.

Mike

-- 
A  program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes  the  potential  for it to be applied to tasks that are
conceptually  similar and, more important, to tasks that have not
yet been conceived. 

From fabio at soi.city.ac.uk  Tue Mar 18 15:58:58 2003
From: fabio at soi.city.ac.uk (Fabio Venuti)
Date: Tue Mar 18 10:59:03 2003
Subject: [Expat-discuss] getting offset of attributes
Message-ID: <005601c2ed67$4950c640$4f5b288a@cadfael>

Hi all,
by using XML_GetCurrentByteIndex I can get the offset of the beginning of an element, but how can I get the offset of, say, an attribute?
Thanks,

Fabio
From fdrake at acm.org  Tue Mar 18 11:03:10 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue Mar 18 11:03:46 2003
Subject: [Expat-discuss] getting offset of attributes
In-Reply-To: <005601c2ed67$4950c640$4f5b288a@cadfael>
References: <005601c2ed67$4950c640$4f5b288a@cadfael>
Message-ID: <15991.17214.5247.475433@grendel.zope.com>


Fabio Venuti writes:
 > by using XML_GetCurrentByteIndex I can get the offset of the
 > beginning of an element, but how can I get the offset of, say, an
 > attribute?

That's not possible with the current API.  It will probably become
possible in Expat 3.0; feel free to review the roadmap document:

	http://www.libexpat.org/dev/roadmap.html

(I should probably remove the "PROPOSAL" from the top of that document
since the developers seem to all agree that's the way to go.)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From karl at waclawek.net  Tue Mar 18 11:29:05 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 18 11:29:17 2003
Subject: [Expat-discuss] getting offset of attributes
References: <005601c2ed67$4950c640$4f5b288a@cadfael>
	<15991.17214.5247.475433@grendel.zope.com>
Message-ID: <009c01c2ed6b$7e82a550$9e539696@citkwaclaww2k>

> Fabio Venuti writes:
>  > by using XML_GetCurrentByteIndex I can get the offset of the
>  > beginning of an element, but how can I get the offset of, say, an
>  > attribute?
> 
> That's not possible with the current API.  It will probably become
> possible in Expat 3.0; 

Only if we switch to reporting of attributes in a streaming fashion.
This would require changes to the tokenizer, since it does not
report attributes as separate events.

We have also looked at solving the issues of reporting entity boundaries
in attribute values and declarations by inserting marker characters
followed by the entity name or a pointer to an entity structure.

Reporting the attributes all in one event together with the element start
tag seems almost necessary for some situations, like declaring namespace
prefixes that are used *before* the associated attribute is encountered. 
With a streaming model the problem is that the element's qualified name
cannot be be associated with an URI until the element's attribute that
declares the prefix is read.

Karl

From Michal.Roskanuk at merlin.cz  Thu Mar 20 21:43:14 2003
From: Michal.Roskanuk at merlin.cz (Roskanuk Michal)
Date: Thu Mar 20 15:43:30 2003
Subject: [Expat-discuss] Terminating expat - idea
Message-ID: <E50028F23FC6D3118B7D00609761083F023D8460@merkur.merlin.cz>

Hi,
i'd like just share idea of how to terminate expar processing
for any reason. This question appears periodically here (most
asked...), the logical answer is 'use exception in C++/Delphi
or setjmp/longjmp in C', which is - of course - ok. However
i guess for those who're looking for platform/compiler/etc
independent and a clear way (or are afraid of those mysterious
jmps ;-) the following way could be acceptable
(in scope of C language):

Create a struct, for example:

struct _XData
{
  int  err;                  // my own error number
  int  errln;                // on which line
  char errtxt[MAX_ERR_SIZE]; // optional additional info
...
}; // there is some missing, i know ;-)
typedef struct _XData XData;

and use it as userData. Main parsing loop:

...
XData x_data;
x_data.err = 0;
...
XML_SetUserData(parser, &x_data);
...
do
{
  // get data
  if(XML_Parse(parser, Buff, len, done) == XML_STATUS_ERROR)
    'handle parser error & bye
  else
    if(x_data.err > 0)
      'handle user error & bye or whatever
} while(are_there_some_data);

Then check x_data.err at the beginning of every handler,
for example:

static void start(void *data, const char *el, const char **attr)
{
  XData *px_data = (XData *) data;
  
  if(px_data->err > 0)
    return; // error occured somewhere, get out

and, of course fill out an error info whenever you
need followed by 'return' or 'goto CleanExit' or so.
This way makes code longer and a little bit slower,
but you're sure that everybody understands it and
it is totally portable (to every mind ;-).

Such workaround was posted already, however somebody
probably needs it roasted and with a soup, so i hope this helps.

Mike

- Get Started With Your Compaq, page 44:
-  If your computer still doesn't work, ensure that it is
-  plugged into an electrical outlet and try to turn it on.

From karl at waclawek.net  Thu Mar 20 15:58:08 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Thu Mar 20 15:58:16 2003
Subject: [Expat-discuss] Terminating expat - idea
References: <E50028F23FC6D3118B7D00609761083F023D8460@merkur.merlin.cz>
Message-ID: <016601c2ef23$694c5820$9e539696@citkwaclaww2k>


> Hi,
> i'd like just share idea of how to terminate expar processing
> for any reason. This question appears periodically here (most
> asked...), the logical answer is 'use exception in C++/Delphi
> or setjmp/longjmp in C', which is - of course - ok. However
> i guess for those who're looking for platform/compiler/etc
> independent and a clear way (or are afraid of those mysterious
> jmps ;-) the following way could be acceptable
> (in scope of C language):
> 
> Create a struct, for example:
> 
> struct _XData
> {
>   int  err;                  // my own error number
>   int  errln;                // on which line
>   char errtxt[MAX_ERR_SIZE]; // optional additional info
> ...
> }; // there is some missing, i know ;-)
> typedef struct _XData XData;
> 
> and use it as userData. Main parsing loop:
> 
> ...
> XData x_data;
> x_data.err = 0;
> ...
> XML_SetUserData(parser, &x_data);
> ...
> do
> {
>   // get data
>   if(XML_Parse(parser, Buff, len, done) == XML_STATUS_ERROR)
>     'handle parser error & bye
>   else
>     if(x_data.err > 0)
>       'handle user error & bye or whatever
> } while(are_there_some_data);

If I understand this correctly you can only stop parsing on return from
XML_Parse (before the next buffer is processed). That means, 
for large buffers, and especially for documents smaller than the buffer size,
this won't really be satisfying.

We are actually working on a combined Push/Pull API for Expat versions > 2.0,
which will have return codes for the handlers, as it stands right now. Certain
return codes will cause Expat to stop parsing. 

Karl


From Stuart.Fulcher at abbeylife.co.uk  Fri Mar 21 11:57:41 2003
From: Stuart.Fulcher at abbeylife.co.uk (Fulcher, Stuart)
Date: Fri Mar 21 08:15:20 2003
Subject: [Expat-discuss] Newbie Building Expat on Win32
Message-ID: <5436C68F121FD94596EA505E3638570B07870B48@al1rs_s1.abbeylife.co.uk>

Hi,

I have successfully used expat on a unix development, and now am trying to
port my code to a Windows platform. However, I'm having major problems
getting expat to build. On Unix I just included the source files with my own
to build a static library. Attempting to do the same thing on Windows
(having downloaded the win32 expat 1.95.6 source) doesn't work.

The actual Visual C++ expat project/workspace all builds fine, but when I
pick up the source and add it to my own project, I'm getting problems where
the code attempts to reference 'expat_config.h', which I can't find amongst
the downloaded files. I can't find any reference to it in the documentation
supplied either.

What am I doing wrong?! Please help!

Stuart
**********************************************************************
This email is sent in confidence for the addressee only.
Unauthorised recipients must preserve this confidentiality and should please advise the sender immediately by telephone (01202 290292) and return the original email to us without taking a copy.
We have taken reasonable precautions to ensure that no viruses are transmitted to any third party.

Neither Abbey Life Assurance Co. Ltd nor Unisys Insurance Services Ltd accepts any responsibility for any loss or damage resulting directly or indirectly from the use of this email or its contents.

Abbey Life Assurance Co. Ltd and Abbey Unit Trust Managers Ltd are members of the Lloyds TSB Group, and are regulated by the Financial Services Authority.
**********************************************************************

From karl at waclawek.net  Fri Mar 21 09:24:54 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Fri Mar 21 09:25:04 2003
Subject: [Expat-discuss] Newbie Building Expat on Win32
References: <5436C68F121FD94596EA505E3638570B07870B48@al1rs_s1.abbeylife.co.uk>
Message-ID: <001a01c2efb5$a4b04650$9e539696@citkwaclaww2k>

> I have successfully used expat on a unix development, and now am trying to
> port my code to a Windows platform. However, I'm having major problems
> getting expat to build. On Unix I just included the source files with my own
> to build a static library. Attempting to do the same thing on Windows
> (having downloaded the win32 expat 1.95.6 source) doesn't work.
> 
> The actual Visual C++ expat project/workspace all builds fine, but when I
> pick up the source and add it to my own project, I'm getting problems where
> the code attempts to reference 'expat_config.h', which I can't find amongst
> the downloaded files. I can't find any reference to it in the documentation
> supplied either.
> 
> What am I doing wrong?! Please help!

Try to define COMPILED_FROM_DSP. If it still doesn't work, check
the sample projects (DSP files for MS VC++) for their settings.

Karl

From vapor at arizonagt.org  Sat Mar 22 18:37:31 2003
From: vapor at arizonagt.org (Vapor)
Date: Sat Mar 22 19:37:40 2003
Subject: [Expat-discuss] using expat in lcc-win32
References: <mailman.0.1048376067.3442.expat-discuss@libexpat.org>
Message-ID: <02d601c2f0d4$6479c4b0$6501a8c0@tnnashjcluff>

I have been trying to get lcc-win32 and expat to play nice together, but I
fear my inexperience is keeping that from happening.

I have copied the lib directory into my projects director (testxml).  I have
added #include "lib/expat.h".  I have also copied libexpat.lib to both the
lib directory and to the root project directory.

I havent attempted to do anything more than this and just compile the
program which previously compiled.  I recieved the error d:\.....\testxml.c:
d:\........\lib\expat.h: 657 unknown enumeration 'XML_Status'

Any insight would be nice as I am quite new to C and Lcc-win32 and am
getting very frustrated with it.


From karl at waclawek.net  Sat Mar 22 22:46:12 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Sat Mar 22 22:42:45 2003
Subject: [Expat-discuss] using expat in lcc-win32
References: <mailman.0.1048376067.3442.expat-discuss@libexpat.org>
	<02d601c2f0d4$6479c4b0$6501a8c0@tnnashjcluff>
Message-ID: <000901c2f0ee$bfd168d0$0207a8c0@karl>


> I have been trying to get lcc-win32 and expat to play nice together, but I
> fear my inexperience is keeping that from happening.
> 
> I have copied the lib directory into my projects director (testxml).  I have
> added #include "lib/expat.h".  I have also copied libexpat.lib to both the
> lib directory and to the root project directory.
> 
> I havent attempted to do anything more than this and just compile the
> program which previously compiled.  I recieved the error d:\.....\testxml.c:
> d:\........\lib\expat.h: 657 unknown enumeration 'XML_Status'
> 
> Any insight would be nice as I am quite new to C and Lcc-win32 and am
> getting very frustrated with it.

Check out expat.h from CVS. Some compilers have problems with the
version used in Expat 1.95.6.

Karl

From klersun at yahoo.com  Mon Mar 24 05:17:16 2003
From: klersun at yahoo.com (Levent Ersun)
Date: Mon Mar 24 08:17:18 2003
Subject: [Expat-discuss] example please...
Message-ID: <20030324131716.60192.qmail@web11806.mail.yahoo.com>

can anyone send me an example of parsing an XML file in C...I want to parse the file and store its content in the arrays...please help


---------------------------------
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
From dmoore at viefinancial.com  Mon Mar 24 09:16:31 2003
From: dmoore at viefinancial.com (Moore, Dave)
Date: Mon Mar 24 09:16:36 2003
Subject: [Expat-discuss] Summary of Pull API thoughts
Message-ID: <2FE8C75C7A06D4118BB50008C7F7E831DE3D05@EXCHSRV>

I've tried to collect the many good ideas provided by Karl Waclawek for the
design of a Pull API on top of the expat.  

Any feedback (especially on omissions) is appreciated!  

Thanks,
Dave


-------------- next part --------------
0.0 This document attempts to capture design decisions and issues with
designing a Pull API for the Expat parser. 

1.0 Pull API Overview 

The Expat Pull API defines a new set of functions which allow a client
application to process nodes of an XML document one at a time by
"pulling" them in order via calls to a "Next()" function. The pull API
will be built "on top" of the existing API in the sense that its
implementation will include callbacks for startElementHandler,
endElementHandler, etc. These built in handlers will manage the state
necessary to return nodes to the caller of Next(). Next() will also ask
the client application for more raw data as necessary via a 
"GetNextBuffer" callback. 

A side effect of the pull API implementation is that it provides a
graceful means of interrupting XML parsing from inside a callback, even
when operating in "push" mode.

2.0 Pull API Specifics 

/* 
 * XML_GetBufferHandler 
 * 
 * Callback prototype used by PULL api to ask the user for more data. 
 * 
 * XML_SetGetBufferHandler 
 * 
 */ 

typedef void(*XML_GetBufferHandler)(XML_Parser parser, char **bufferPtr,int
*bufferLen,int *isFinal); 

XMLPARSEAPI(void) XML_SetGetBufferHandler(XML_Parser p,XML_GetBufferHandler handler); 

New functions: 
/* XML_PullParserCreate 
 * 
 * Creates a parser based on encoding, installs internal callbacks for 
 * implementing Pull based API
 */ 
XML_Parser XML_PullParserCreate(const XML_Char *encoding); 

/* XML_Next 
 * 
 * Does enough parsing to process the next node of the document. 
 * Pulls in new data if current buffer is exhausted. 
 * 
 */ 
enum XML_Status XML_Next(XML_Parser p); 


Higher level XML_Next() functions are possible which allow the parser to
move ahead to specific nodes in the document.  For example, move to the
next element, or move to the next element named "Foo".

enum XML_Status XML_Next_______(XML_Parser p,...);


/* XML_GetNodeInfo 
 * 
 * Returns a pointer to a XML_NodeInfo structure containing name, character data, 
 * etc. information about the current node 
 * 
 */ 

XML_NodeInfo *XML_GetNodeInfo(XML_Parser p); 

struct XML_NodeInfo 
{ 
  XML_NodeType type; 
  char *name; 
  char *data; 
  int data_len; 
  /* More Here... */ 
} 


3.0 Built-in callbacks and internal callbacks 

Callbacks will return "handling instructions" to the main processing
loop. The handling instructions are one of: 

XML_SKIP - Do not store node information in XML_NodeInfo, but continue
parsing. This is the default, and is compatible with existing parsing
code. 

XML_USE - Store node information in XML_NodeInfo, and return to the
caller of the parsing function. 

XML_STOP - Do not store node information in XML_NodeInfo, and return to
the caller of the parsing function.  Parsing is resumable. 

XML_ERR - Do not store node information in XML_NodeInfo, and return a
status of "XML_STATUS_ERROR" to the caller.  XML_GetErrorCode will
return "XML_USER_ERROR".  Parsing is resumable.


Internal callbacks will be automatically installed for some (most? all?)
of expat's callbacks. StartElementHandler and EndElementHandler will
certainly be among the set of internal callbacks. These callbacks will
capture node information and will return XML_SKIP or XML_USE depending
on the current call to the XML_Next() family of functions.  For simple
pull parsing, the user will not have to provide any callbacks except the
'GetNextBuffer' callback.

User callbacks can still be installed in conjunction with the internal
callbacks above for advanced processing.  The internal callbacks will
perform their own processing as appropriate, and then forward the call
to the user callback.  The user callback could then apply additional
processing.  For instance, the user callback could convert a "XML_USE"
state to "XML_SKIP" for a node not of interest to an application.

4.0 Existing API Issues/Changes 

Some other future changes for expat may impact the design &
implementation of pull parsing.

4.1 Namespace reporting 

Expat may return names as separate localName, prefix and uri parameters.
If callback signatures are changing already to pass namespace
information, we can have more options in extending callbacks to return
our node handling codes. 

4.2 Entity expansion

Due to the nature of reporting attributes values as one chunk
of data, entities within the value are silently expanded.
The same applies to parameter entities in entity values.  If entities
are expanded differently, we may have to add new node types to allow a
pull based parser to move through attribute values.

4.3 Additional internal changes

XML_NodeInfo sturcture will probably be added as an internal data
structure that is filled as each node is reached.

Additional callback function pointers are needed for forwarding from
internal pull callbacks to client callbacks.


5.0 Buffer Management 

When the parser calls the user's XML_GetBufferHandler, it expects the
user to return a pointer to buffer space containing new data.  

The user may accomplish this by allocating a new buffer via
XML_GetBuffer and then copying/reading data into that buffer, or by
referring to one of its own buffers.  If it refers to its own buffer, a
client application must ensure that this buffer remains valid until the
next call to XML_GetBufferHandler or until parsing is complete.

The parser must be able to determine when the user provided buffer was
created via XML_GetBuffer.  If the buffer was -not- created by
XML_GetBuffer, then the parser will be responsible for preserving any
partial tokens from the last buffer and prepending them to a user buffer
when parsing continues.

The main processing loops request more data by returning a new error
code, XML_ERROR_BUFFER to their caller (i.e. XML_Next()).

"Push mode" buffer management could be changed to work more like the pull
model (using the XML_GetBufferHandler() callback instead of the 


6.0 Internal Complications 

When XML_USE is returned and parsing is interrupted, some cleanup may be
deferred until the next call to XML_Next().

 It looks as if in most cases the cleanup after a callback is quite simple.
  E.g. for startElementHandler (non-empty element) it is
    poolClear(&tempPool);
  for the endElementHandler it is a while loop dealing with namespace bindings.
  All we need to do is to have a switch statement at the entry of XML_Next()
  that performs the cleanup depending on how we retuned on the last call
  (assuming the cleanup was skipped when XML_USE was returned).

For an empty element, XML_Next is tricky, since Expat assumes
both callbacks (start and end) to happen without interruption by and end of buffer.

Options:

- We add an EMPTY_TAG node type.  Internal callbacks for
StartElementHandler and EndElementHandler would somehow detect the empty
tag and set up the XML_NodeInfo appropriately.

- We set up an emptyTagProcessor, so that the next call to
  XML_Next() will call this as the processor, which will
  return with the empty element's tag name. That's a common way
  to handle parser state in Expat.


From karl at waclawek.net  Mon Mar 24 09:56:09 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Mon Mar 24 09:57:39 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
References: <2FE8C75C7A06D4118BB50008C7F7E831DE3D05@EXCHSRV>
Message-ID: <001501c2f215$8166ace0$9e539696@citkwaclaww2k>


> Any feedback (especially on omissions) is appreciated!  

Thanks, Dave for an excellent summary of our discussions!

I'd like to comment some more on a few things of the topics mentioned:

> User callbacks can still be installed in conjunction with the internal
> callbacks above for advanced processing.  The internal callbacks will
> perform their own processing as appropriate, and then forward the call
> to the user callback.  The user callback could then apply additional
> processing.  For instance, the user callback could convert a "XML_USE"
> state to "XML_SKIP" for a node not of interest to an application.

I'd say the user callback could be called first, before the internal
callback performs its own work, since it is more efficient to know
if a node is skipped *before* the internal call-back does any work on it.
Also, the internal call-back does not generate any new info that the
user call-back requires.

> For an empty element, XML_Next is tricky, since Expat assumes
> both callbacks (start and end) to happen without interruption by and end of buffer.
> 
> Options:
> 
> - We add an EMPTY_TAG node type.  Internal callbacks for
> StartElementHandler and EndElementHandler would somehow detect the empty
> tag and set up the XML_NodeInfo appropriately.

Actually, the Expat tokenizer already provides that information, so we just
need to store it somewhere for the call-backs to access.
There is one more advantage to an EMPTY_TAG node type: for an application
that would like to know this information, it is quite an effort to
re-construct it (set a flag in startElementHandler, and install every
possible handler that might get called, so that it can reset that flag
before the endElementhandler is executed). Providing this info from
Expat comes essentially for free, since the tokenizer provides it anyway.

- One more internal complication: So far we have detected one internal
data structure that is stack based, the linked list openInternalEntities.
Since it will never exist across calls to XML_Parse(Buffer) this is
currently not a problem. With a Pull API however, the parser may stop,
that is, return from XML_Next(), in the middle of an internal entity,
thus unwinding the stack and destroying openInternalEntities prematurely.
Therefore we need to switch this linked list to being heap allocated.

Karl


From karl at waclawek.net  Mon Mar 24 10:05:25 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Mon Mar 24 10:07:00 2003
Subject: [Expat-discuss] example please...
References: <20030324131716.60192.qmail@web11806.mail.yahoo.com>
Message-ID: <004701c2f216$cc6b94c0$9e539696@citkwaclaww2k>


> can anyone send me an example of parsing an XML file in C...I want to parse the file and store its
content in the arrays...please help

The Expat package contains the elements, outline and xmlwf sample applications.

Karl


From vapor at arizonagt.org  Mon Mar 24 16:29:52 2003
From: vapor at arizonagt.org (Vapor)
Date: Mon Mar 24 17:30:01 2003
Subject: [Expat-discuss] using expat in lcc-win32
References: <mailman.0.1048376067.3442.expat-discuss@libexpat.org>
	<02d601c2f0d4$6479c4b0$6501a8c0@tnnashjcluff>
	<000901c2f0ee$bfd168d0$0207a8c0@karl>
Message-ID: <003601c2f254$e46d2cd0$a50c2e0a@tnnashjcluff>

Thx for the reply.  I have updated it but I think my problem is related to
my inexperience with lcc-win32.  Does anybody have a demo project use with
lcc-win32.  I cant seem to get it to link with the static libs at all.
Again, I am fairly inexperienced with C and many of the compilers available
out there.  I have tried the comp.compiliers.lcc newsgroup for help with
this can cant seem to get anywhere with that as well.  Basicly what I am
hoping for is a simple easy to understand example with lcc-win32 that I can
see all of the options in the project and see a program compile successfully
using the expat libs.

Vapor
----- Original Message -----
From: "Karl Waclawek" <karl@waclawek.net>
To: "Vapor" <vapor@arizonagt.org>; <expat-discuss@libexpat.org>
Sent: Saturday, March 22, 2003 9:46 PM
Subject: Re: [Expat-discuss] using expat in lcc-win32


>
>
> > I have been trying to get lcc-win32 and expat to play nice together, but
I
> > fear my inexperience is keeping that from happening.
> >
> > I have copied the lib directory into my projects director (testxml).  I
have
> > added #include "lib/expat.h".  I have also copied libexpat.lib to both
the
> > lib directory and to the root project directory.
> >
> > I havent attempted to do anything more than this and just compile the
> > program which previously compiled.  I recieved the error
d:\.....\testxml.c:
> > d:\........\lib\expat.h: 657 unknown enumeration 'XML_Status'
> >
> > Any insight would be nice as I am quite new to C and Lcc-win32 and am
> > getting very frustrated with it.
>
> Check out expat.h from CVS. Some compilers have problems with the
> version used in Expat 1.95.6.
>
> Karl


From karl at waclawek.net  Mon Mar 24 20:51:25 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Mon Mar 24 20:47:55 2003
Subject: [Expat-discuss] using expat in lcc-win32
References: <mailman.0.1048376067.3442.expat-discuss@libexpat.org>
	<02d601c2f0d4$6479c4b0$6501a8c0@tnnashjcluff>
	<000901c2f0ee$bfd168d0$0207a8c0@karl>
	<003601c2f254$e46d2cd0$a50c2e0a@tnnashjcluff>
Message-ID: <000a01c2f271$0d728370$0207a8c0@karl>


> Thx for the reply.  I have updated it but I think my problem is related to
> my inexperience with lcc-win32.  Does anybody have a demo project use with
> lcc-win32.  I cant seem to get it to link with the static libs at all.

For MS VC++ you define XML_STATIC. Is lcc different?
Check the top of expat.h.

Karl

From xcross at us.ibm.com  Tue Mar 25 09:55:17 2003
From: xcross at us.ibm.com (Chris Cross)
Date: Tue Mar 25 09:56:52 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
Message-ID: <OF26998ECF.5719B3DD-ON85256CF4.00506F8F-85256CF4.0051F753@us.ibm.com>


I know this is off-topic, but while you have your design juices flowing I
thought I'd ask a question that we're struggling with in my group.

My group is using expat to implement a VoiceXML processor and a multimodal
browser using the XHTML + Voice proposal in the W3C
(http://www.w3.org/TR/xhtml+voice/.) We're also considering its use by the
grammar compiler in our speech engines in order to support the Speech
Recognition Grammar Specification (http://www.w3.org/TR/speech-grammar/)

However, we have a sticky requirement for our language support to be able
to switch dynamically between 1 byte ascii and 2 byte Unicode. By doing
this, our European languages require half the space of the Asian languages,
which is very important for our embedded customers who sqeal every kilobyte
we consume.

How hard would it be to change the character size from a build-time to a
run-time decision in expat?

thanks,
chris


Chris Cross
IBM Boca Raton
xcross@us.ibm.com
voice 561.862.2102  t/l 975.2102
fax 561.862.3922


From karl at waclawek.net  Tue Mar 25 11:35:34 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 25 11:35:43 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
References: <OF26998ECF.5719B3DD-ON85256CF4.00506F8F-85256CF4.0051F753@us.ibm.com>
Message-ID: <002701c2f2ec$8f3df080$9e539696@citkwaclaww2k>

> I know this is off-topic, but while you have your design juices flowing I
> thought I'd ask a question that we're struggling with in my group.
> 
> My group is using expat to implement a VoiceXML processor and a multimodal
> browser using the XHTML + Voice proposal in the W3C
> (http://www.w3.org/TR/xhtml+voice/.) We're also considering its use by the
> grammar compiler in our speech engines in order to support the Speech
> Recognition Grammar Specification (http://www.w3.org/TR/speech-grammar/)

Cool. Which version of Expat?
 
> However, we have a sticky requirement for our language support to be able
> to switch dynamically between 1 byte ascii and 2 byte Unicode. By doing
> this, our European languages require half the space of the Asian languages,
> which is very important for our embedded customers who sqeal every kilobyte
> we consume.
> 
> How hard would it be to change the character size from a build-time to a
> run-time decision in expat?

It looks hard, since even the API itself is statically tied to the
definition of XML_Char.

However, you should be able to compile two libraries (XML_Char defined
as char or wchar_t), and dynamically load whichever you need at runtime,
and even switch between them. Why would that not work for you?

Karl


From fdrake at acm.org  Tue Mar 25 11:54:53 2003
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue Mar 25 11:55:38 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
In-Reply-To: <OF26998ECF.5719B3DD-ON85256CF4.00506F8F-85256CF4.0051F753@us.ibm.com>
References: <OF26998ECF.5719B3DD-ON85256CF4.00506F8F-85256CF4.0051F753@us.ibm.com>
	<002701c2f2ec$8f3df080$9e539696@citkwaclaww2k>
Message-ID: <16000.35293.37806.326224@grendel.zope.com>


Chris Cross writes:
 > However, we have a sticky requirement for our language support to be able
 > to switch dynamically between 1 byte ascii and 2 byte Unicode. By doing
 > this, our European languages require half the space of the Asian languages,
 > which is very important for our embedded customers who sqeal every kilobyte
 > we consume.
 > 
 > How hard would it be to change the character size from a build-time to a
 > run-time decision in expat?

Karl Waclawek writes:
 > It looks hard, since even the API itself is statically tied to the
 > definition of XML_Char.
 > 
 > However, you should be able to compile two libraries (XML_Char defined
 > as char or wchar_t), and dynamically load whichever you need at runtime,
 > and even switch between them. Why would that not work for you?

That sounds fairly tedious to me.

Recall that Expat tends to report data in fairly small chunks for
typical applications.  Even in plain text (PCDATA), Expat breaks data
at line boundaries.  If you want further control over the amount of
data reported in the character data callback, limit the amount of data
passed into Expat for any XML_Parse() or XML_ParseBuffer() call.

This can be used to an application's advantage, especially if there's
concern for the amount of memory being consumed.  Compile Expat with
the appropriate output encoding for your primary audience, and then
re-encode if necessary in the application logic.  This should be easy
to implement and allows support for output encodings other than UTF-8
or UTF-16.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

From karl at waclawek.net  Tue Mar 25 12:08:47 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 25 12:08:56 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
References: 
	<OF26998ECF.5719B3DD-ON85256CF4.00506F8F-85256CF4.0051F753@us.ibm.com><002701c2f2ec$8f3df080$9e539696@citkwaclaww2k>
	<16000.35293.37806.326224@grendel.zope.com>
Message-ID: <003801c2f2f1$32c5a910$9e539696@citkwaclaww2k>

> Karl Waclawek writes:
>  > It looks hard, since even the API itself is statically tied to the
>  > definition of XML_Char.
>  > 
>  > However, you should be able to compile two libraries (XML_Char defined
>  > as char or wchar_t), and dynamically load whichever you need at runtime,
>  > and even switch between them. Why would that not work for you?
> 
> That sounds fairly tedious to me.

But the effort to load a library and set up the pointers to the exported
functions (that's at least how it works in Windows) is a one-time effort
that can be hidden in a re-usable function or class.
 
> Recall that Expat tends to report data in fairly small chunks for
> typical applications.  Even in plain text (PCDATA), Expat breaks data
> at line boundaries.  If you want further control over the amount of
> data reported in the character data callback, limit the amount of data
> passed into Expat for any XML_Parse() or XML_ParseBuffer() call.
> 
> This can be used to an application's advantage, especially if there's
> concern for the amount of memory being consumed.  Compile Expat with
> the appropriate output encoding for your primary audience, and then
> re-encode if necessary in the application logic.  This should be easy
> to implement and allows support for output encodings other than UTF-8
> or UTF-16.

This would work, of course. But maybe on embedded devices CPU cycles
are at a premium. Depends on Chris' circumstances.

Karl

From xcross at us.ibm.com  Tue Mar 25 12:58:44 2003
From: xcross at us.ibm.com (Chris Cross)
Date: Tue Mar 25 12:58:59 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
Message-ID: <OFEDBAB5A5.A05159A0-ON85256CF4.0062B226-85256CF4.0062C2E7@us.ibm.com>


Well the answer is maybe not so cool. We're using 1.2 with Opera, but
planning on jumping to your new version if/when we start using it elsewhere
in the product.

I'd thought of the brute force appoach of two dll's but that looks bad in
an embedded stack. On the other hand, if you're switching between Japanese
and English at runtime, 88KB for the XML parser is the least of your
worries ;-)


thanks,
chris


Chris Cross
IBM Boca Raton
xcross@us.ibm.com
voice 561.862.2102  t/l 975.2102
fax 561.862.3922


|---------+---------------------------->
|         |           "Karl Waclawek"  |
|         |           <karl@waclawek.ne|
|         |           t>               |
|         |                            |
|         |           03/25/2003 11:35 |
|         |           AM               |
|         |                            |
|---------+---------------------------->
  >----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                |
  |       To:       <expat-discuss@libexpat.org>, Chris Cross/West Palm Beach/IBM@IBMUS                            |
  |       cc:                                                                                                      |
  |       Subject:  Re: [Expat-discuss] Re: Summary of Pull API thoughts                                           |
  |                                                                                                                |
  >----------------------------------------------------------------------------------------------------------------|


> I know this is off-topic, but while you have your design juices flowing I
> thought I'd ask a question that we're struggling with in my group.
>
> My group is using expat to implement a VoiceXML processor and a
multimodal
> browser using the XHTML + Voice proposal in the W3C
> (http://www.w3.org/TR/xhtml+voice/.) We're also considering its use by
the
> grammar compiler in our speech engines in order to support the Speech
> Recognition Grammar Specification (http://www.w3.org/TR/speech-grammar/)

Cool. Which version of Expat?

> However, we have a sticky requirement for our language support to be able
> to switch dynamically between 1 byte ascii and 2 byte Unicode. By doing
> this, our European languages require half the space of the Asian
languages,
> which is very important for our embedded customers who sqeal every
kilobyte
> we consume.
>
> How hard would it be to change the character size from a build-time to a
> run-time decision in expat?

It looks hard, since even the API itself is statically tied to the
definition of XML_Char.

However, you should be able to compile two libraries (XML_Char defined
as char or wchar_t), and dynamically load whichever you need at runtime,
and even switch between them. Why would that not work for you?

Karl


From karl at waclawek.net  Tue Mar 25 13:28:11 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Tue Mar 25 13:28:20 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
References: <OFEDBAB5A5.A05159A0-ON85256CF4.0062B226-85256CF4.0062C2E7@us.ibm.com>
Message-ID: <005001c2f2fc$4a4288f0$9e539696@citkwaclaww2k>


> Well the answer is maybe not so cool. We're using 1.2 with Opera, but
> planning on jumping to your new version if/when we start using it elsewhere
> in the product.

Are you saying that UTF-16 output worked with version 1.2?
 
> I'd thought of the brute force appoach of two dll's but that looks bad in
> an embedded stack.

I have to admit that I am not experienced with the constraints
of "embedded" programming. Could you enlighten me why this would be bad?

You wouldn't need to keep them both loaded at the same time, unless
you need to use both capabilities simultaneously, in which case
Fred's approach would make a lot more sense.

> On the other hand, if you're switching between Japanese
> and English at runtime, 88KB for the XML parser is the least of your
> worries ;-)

Well, Expat 1.95.6 is bigger than 1.2, hopefully not too big.

Karl


From vapor at arizonagt.org  Tue Mar 25 16:33:03 2003
From: vapor at arizonagt.org (Vapor)
Date: Tue Mar 25 17:33:13 2003
Subject: [Expat-discuss] using expat in lcc-win32
References: <mailman.0.1048376067.3442.expat-discuss@libexpat.org>
	<02d601c2f0d4$6479c4b0$6501a8c0@tnnashjcluff>
	<000901c2f0ee$bfd168d0$0207a8c0@karl>
	<003601c2f254$e46d2cd0$a50c2e0a@tnnashjcluff>
	<000a01c2f271$0d728370$0207a8c0@karl>
Message-ID: <006501c2f31e$80ab17a0$a50c2e0a@tnnashjcluff>

Thanks a lot karl, the define helped.  I found the spot in lcc that allows
for defines in the project and added XML_STATIC.  I was able to once again
compile my program.  The problem I get now is it seems unable to find a
reference to XML_ParserCreate.

Next I decided to add a few simple lines of code and I broke it again.  Is
there a reason why just adding XML_Parser parser = XML_ParserCreate(NULL);
to my code would cause this.  I have a suspicion that the libexpat.lib isnt
properly added to the project or maybe I need some other files from expat
officially added to the project but I dont really know.  I am not familiar
at all with working outside of a single soure file.

Vapor
----- Original Message -----
From: "Karl Waclawek" <karl@waclawek.net>
To: "Vapor" <vapor@arizonagt.org>; <expat-discuss@libexpat.org>
Sent: Monday, March 24, 2003 7:51 PM
Subject: Re: [Expat-discuss] using expat in lcc-win32


>
> > Thx for the reply.  I have updated it but I think my problem is related
to
> > my inexperience with lcc-win32.  Does anybody have a demo project use
with
> > lcc-win32.  I cant seem to get it to link with the static libs at all.
>
> For MS VC++ you define XML_STATIC. Is lcc different?
> Check the top of expat.h.
>
> Karl


From vapor at arizonagt.org  Tue Mar 25 18:52:15 2003
From: vapor at arizonagt.org (Vapor)
Date: Wed Mar 26 08:59:42 2003
Subject: [Expat-discuss] using expat in lcc-win32
References: 
	<mailman.0.1048376067.3442.expat-discuss@libexpat.org><02d601c2f0d4$6479c4b0$6501a8c0@tnnashjcluff><000901c2f0ee$bfd168d0$0207a8c0@karl><003601c2f254$e46d2cd0$a50c2e0a@tnnashjcluff><000a01c2f271$0d728370$0207a8c0@karl>
	<006501c2f31e$80ab17a0$a50c2e0a@tnnashjcluff>
Message-ID: <004201c2f34e$db05bc20$6501a8c0@tnnashjcluff>

um. nm, I got it figured out.  Again thanks for the help with that
XML_STATIC, I am not all that familiar with preprocessor commands and that
helped me out alot.

Vapor
----- Original Message -----
From: "Vapor" <vapor@arizonagt.org>
To: "Karl Waclawek" <karl@waclawek.net>; <expat-discuss@libexpat.org>
Sent: Tuesday, March 25, 2003 4:33 PM
Subject: Re: [Expat-discuss] using expat in lcc-win32


> Thanks a lot karl, the define helped.  I found the spot in lcc that allows
> for defines in the project and added XML_STATIC.  I was able to once again
> compile my program.  The problem I get now is it seems unable to find a
> reference to XML_ParserCreate.
>
> Next I decided to add a few simple lines of code and I broke it again.  Is
> there a reason why just adding XML_Parser parser = XML_ParserCreate(NULL);
> to my code would cause this.  I have a suspicion that the libexpat.lib
isnt
> properly added to the project or maybe I need some other files from expat
> officially added to the project but I dont really know.  I am not familiar
> at all with working outside of a single soure file.
>
> Vapor
> ----- Original Message -----
> From: "Karl Waclawek" <karl@waclawek.net>
> To: "Vapor" <vapor@arizonagt.org>; <expat-discuss@libexpat.org>
> Sent: Monday, March 24, 2003 7:51 PM
> Subject: Re: [Expat-discuss] using expat in lcc-win32
>
>
> >
> > > Thx for the reply.  I have updated it but I think my problem is
related
> to
> > > my inexperience with lcc-win32.  Does anybody have a demo project use
> with
> > > lcc-win32.  I cant seem to get it to link with the static libs at all.
> >
> > For MS VC++ you define XML_STATIC. Is lcc different?
> > Check the top of expat.h.
> >
> > Karl
>
>
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss@libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss


From xcross at us.ibm.com  Wed Mar 26 11:11:41 2003
From: xcross at us.ibm.com (Chris Cross)
Date: Wed Mar 26 11:12:33 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
Message-ID: <OF2A456255.C15A219D-ON85256CF5.005898B2-85256CF5.0058F5F3@us.ibm.com>


Actually, we're having to do a flavor of this anyway on Linux, where the
wchar.h Unicode interface is 4 bytes while Opera is still 2. Oh by the way,
so is almost everyone else in the world but we wrote our VoiceXML processor
to wchar.h. We'll probably be moving to an internal Unicode implementation
that will free us from wchar definition...

thanks,
chris


Chris Cross
IBM Boca Raton
xcross@us.ibm.com
voice 561.862.2102  t/l 975.2102
fax 561.862.3922


|---------+---------------------------->
|         |           "Fred L. Drake,  |
|         |           Jr."             |
|         |           <fdrake@acm.org> |
|         |                            |
|         |           03/25/2003 11:54 |
|         |           AM               |
|         |                            |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                             |
  |       To:       Chris Cross/West Palm Beach/IBM@IBMUS                                                                       |
  |       cc:       expat-discuss@libexpat.org                                                                                  |
  |       Subject:  Re: [Expat-discuss] Re: Summary of Pull API thoughts                                                        |
  |                                                                                                                             |
  >-----------------------------------------------------------------------------------------------------------------------------|


Chris Cross writes:
 > However, we have a sticky requirement for our language support to be
able
 > to switch dynamically between 1 byte ascii and 2 byte Unicode. By doing
 > this, our European languages require half the space of the Asian
languages,
 > which is very important for our embedded customers who sqeal every
kilobyte
 > we consume.
 >
 > How hard would it be to change the character size from a build-time to a
 > run-time decision in expat?

Karl Waclawek writes:
 > It looks hard, since even the API itself is statically tied to the
 > definition of XML_Char.
 >
 > However, you should be able to compile two libraries (XML_Char defined
 > as char or wchar_t), and dynamically load whichever you need at runtime,
 > and even switch between them. Why would that not work for you?

That sounds fairly tedious to me.

Recall that Expat tends to report data in fairly small chunks for
typical applications.  Even in plain text (PCDATA), Expat breaks data
at line boundaries.  If you want further control over the amount of
data reported in the character data callback, limit the amount of data
passed into Expat for any XML_Parse() or XML_ParseBuffer() call.

This can be used to an application's advantage, especially if there's
concern for the amount of memory being consumed.  Compile Expat with
the appropriate output encoding for your primary audience, and then
re-encode if necessary in the application logic.  This should be easy
to implement and allows support for output encodings other than UTF-8
or UTF-16.


  -Fred

--
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From karl at waclawek.net  Wed Mar 26 11:18:59 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Wed Mar 26 11:19:39 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
References: <OF2A456255.C15A219D-ON85256CF5.005898B2-85256CF5.0058F5F3@us.ibm.com>
Message-ID: <004d01c2f3b3$687c5730$9e539696@citkwaclaww2k>

> Actually, we're having to do a flavor of this anyway on Linux, where the
> wchar.h Unicode interface is 4 bytes while Opera is still 2. 

What compiler are you using?
gcc on Linux allows a switch to make wchar_t 2 bytes instead of 4.
That is how Expat is compiled for UTF-16 output.

Karl


From xcross at us.ibm.com  Wed Mar 26 12:22:28 2003
From: xcross at us.ibm.com (Chris Cross)
Date: Wed Mar 26 12:48:59 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
Message-ID: <OFC06AA868.9FDCB7FB-ON85256CF5.005F08F4-85256CF5.005F70DA@us.ibm.com>


We're using gcc 2.95.3 (a windows cross-compiler build) for our Linux
targets. I'm a little shaky on this subject but I think that the switch
you're referring to is supported in gcc 3.x, right?

chris


Chris Cross
IBM Boca Raton
xcross@us.ibm.com
voice 561.862.2102  t/l 975.2102
fax 561.862.3922


|---------+---------------------------->
|         |           "Karl Waclawek"  |
|         |           <karl@waclawek.ne|
|         |           t>               |
|         |                            |
|         |           03/26/2003 11:18 |
|         |           AM               |
|         |                            |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                             |
  |       To:       "Fred L. Drake, Jr." <fdrake@acm.org>, <expat-discuss@libexpat.org>, Chris Cross/West Palm Beach/IBM@IBMUS  |
  |       cc:                                                                                                                   |
  |       Subject:  Re: [Expat-discuss] Re: Summary of Pull API thoughts                                                        |
  |                                                                                                                             |
  >-----------------------------------------------------------------------------------------------------------------------------|


> Actually, we're having to do a flavor of this anyway on Linux, where the
> wchar.h Unicode interface is 4 bytes while Opera is still 2.

What compiler are you using?
gcc on Linux allows a switch to make wchar_t 2 bytes instead of 4.
That is how Expat is compiled for UTF-16 output.

Karl


From karl at waclawek.net  Wed Mar 26 12:56:46 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Wed Mar 26 12:56:58 2003
Subject: [Expat-discuss] Re: Summary of Pull API thoughts
References: <OFC06AA868.9FDCB7FB-ON85256CF5.005F08F4-85256CF5.005F70DA@us.ibm.com>
Message-ID: <005f01c2f3c1$118c4bc0$9e539696@citkwaclaww2k>


> We're using gcc 2.95.3 (a windows cross-compiler build) for our Linux
> targets. I'm a little shaky on this subject but I think that the switch
> you're referring to is supported in gcc 3.x, right?

No, it should work on earlier versions too. We have been using it for a while.
The switch is -fshort-wchar . Check the README in the current distribution.

Karl

From dr at netscape.com  Wed Mar 26 15:34:13 2003
From: dr at netscape.com (Dan Rosen)
Date: Wed Mar 26 18:34:05 2003
Subject: [Expat-discuss] XML_Parse const char *s param?
Message-ID: <3E8238F5.8070609@netscape.com>

I have a question about the XML_Parse function, which I unfortunately 
couldn't find an answer to in the documentation:

XML_Parse() takes a |const char *s| parameter, which is a buffer 
containing some or all of a document. This is notably different from the 
callback functions which take |XML_Char*|. That sort of indicates to me 
that you can only pass in a buffer of single-byte characters (whether 
those are UTF-8, ISO-8859-1 or whatever, they're all |char*|).

So my question is, what if I have my data in UCS-2? Do I have to convert 
it all down to UTF-8 before passing it into XML_Parse? I tried simply 
casting the wchar_t* buffer I have down to a char* buffer, but that 
obviously didn't work...

Can somebody please explain to me exactly what XML_Parse() expects of a 
buffer that's passed to it?

Thanks,
Dan


From karl at waclawek.net  Wed Mar 26 19:13:22 2003
From: karl at waclawek.net (Karl Waclawek)
Date: Wed Mar 26 19:09:42 2003
Subject: [Expat-discuss] XML_Parse const char *s param?
References: <3E8238F5.8070609@netscape.com>
Message-ID: <000701c2f3f5$addf60f0$0207a8c0@karl>


> I have a question about the XML_Parse function, which I unfortunately 
> couldn't find an answer to in the documentation:
> 
> XML_Parse() takes a |const char *s| parameter, which is a buffer 
> containing some or all of a document. This is notably different from the 
> callback functions which take |XML_Char*|. That sort of indicates to me 
> that you can only pass in a buffer of single-byte characters (whether 
> those are UTF-8, ISO-8859-1 or whatever, they're all |char*|).
> 
> So my question is, what if I have my data in UCS-2? Do I have to convert 
> it all down to UTF-8 before passing it into XML_Parse? I tried simply 
> casting the wchar_t* buffer I have down to a char* buffer, but that 
> obviously didn't work...

Expat expects a buffer of bytes, that is all.
Is there a better way to declare such a buffer?
 
> Can somebody please explain to me exactly what XML_Parse() expects of a 
> buffer that's passed to it?

Just a buffer of bytes, regardless of encoding.
Expat handles buffer boundaries properly, so even
if a buffer ends in the middle of a multi-byte character
it is not a problem.

Karl