From karl at waclawek.net  Fri Sep  1 15:29:12 2006
From: karl at waclawek.net (Karl Waclawek)
Date: Fri, 01 Sep 2006 09:29:12 -0400
Subject: [Expat-discuss] Linking
In-Reply-To: <Pine.LNX.4.10.10608292158270.966-100000@abc123.dowco.com>
References: <Pine.LNX.4.10.10608292158270.966-100000@abc123.dowco.com>
Message-ID: <44F835A8.1090803@waclawek.net>

Tom Younger wrote:
> Hi there.
>
> I have a question about linking.
>
> I would like to link the Expat library statically.  When I link to the
> Linux libraries, this happens properly, however when I link my project
> under Windows, I can only get the program to work if I include the path to
> the libexpat.dll in my system path.
>
> Even though I include StaticLibs\libexpatMT.lib in my link command, without
> Libs\libexpat.lib, it can't resolve some symbols.  It appears as though
> libexpat.lib loads the DLL at run-time.
For static linking you need to define a specific symbol.
If I remember correctly it is XML_STATIC.

Karl

From mkanaga at gmail.com  Thu Sep  7 23:05:20 2006
From: mkanaga at gmail.com (m k)
Date: Thu, 7 Sep 2006 14:05:20 -0700
Subject: [Expat-discuss] Expat, XML-Parser make test fails on AIX 5.3
Message-ID: <55b890c10609071405k1afc99b3k5e58ef7f42d33801@mail.gmail.com>

Greetings!

IBM Provides a pre-built Perl on AIX 5.3. They also provide a script to
toggle
the perl to run in either in Perl 32 or Perl 64 bit.

1. In Perl 32 mode:

I am able to compile Expat-1.95.7, then XML-Parser 2.34 (Perl Module) & able
to run the make test for XML-Parser with out any problem.
In this mode, we didn't see any problems. However,

2. In Perl 64 bit mode:

I installed same version of Expat & XML-Parser & able to make every thing.
But, when
I run make test for XML-Parser, I get the following errors with Expat.so
()...

3. I got the same errors with downloaded Perl 5.8.8, compiled on this server
using IBM Visual Age C/C++,
   Expat 2.0.0 & XML-Parser 2.34

Any pointers or help would be very much appreciated.

Regards,
-Murali


root at sfobench01(./XML-Parser-2.34)make test
make[1]: Entering directory `/chroot/tmp/perl/XML-Parser-2.34/Expat'
make[1]: Leaving directory `/chroot/tmp/perl/XML-Parser-2.34/Expat'
PERL_DL_NONLAZY=1 /bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib/lib', 'blib/arch')" t/*.t
t/astress.........Can't load
'/chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so'
for module XML::Parser::Expat: rtld: 0712-001 Symbol XML_Parse was
referenced
      from module
/chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(),
but a runtime definition
      of the symbol was not found.
rtld: 0712-001 Symbol XML_SetNamespaceDeclHandler was referenced
      from module
/chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(),
but a runtime definition
      of the symbol was not found.
rtld: 0712-001 Symbol XML_SetElementHandler was referenced
      from module
/chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(),
but a runtime definition
      of the symbol was not found.
rtld: 0712-001 Symbol XML_SetUnknownEncodingHandler was referenced
      from module
/chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(),
but a runtime definition
      of the symbol was not found.
rtld: 0712-001 Symbol XML_SetEndCdataSectionHandler was referenced
      from module
/chroot/tmp/perl/XML-Parser-2.34/blib/arch/auto/XML/Parser/Expat/Expat.so(),
but a runtime definition
      of the symbol was not found.


root at sfobench01(./XML-Parser-2.34)perl -V
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=aix, osvers=5.3.0.4, archname=aix-64all
    uname='aix sfobench01 3 5 000aba68d600 '
    config_args='-Duse64bitall'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE
-qmaxmem=-1 -qnoansialias -DUSE_NATIVE_DLOPEN -I/usr/local/include -q64
-DUSE_64_BIT_ALL -q64',
    optimize='-O',
    cppflags='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=-1
-qnoansialias -DUSE_NATIVE_DLOPEN -I/usr/local/include'
    ccversion='7.0.0.0', gccversion='', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=87654321
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='ld', ldflags ='-brtl -bdynamic -bmaxdata:0x80000000 -L/usr/local/lib
-q64 -b64'
    libpth=/usr/local/lib /lib /usr/lib /usr/ccs/lib
    libs=-lbind -lnsl -ldbm -ldl -lld -lm -lcrypt -lc -lbsd
    perllibs=-lbind -lnsl -ldl -lld -lm -lcrypt -lc -lbsd
    libc=, so=a, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_aix.xs, dlext=so, d_dlsymun=undef, ccdlflags='
-bE:/chroot/perl-5.8.8/lib/5.8.8/aix-64all/CORE/perl.exp'
    cccdlflags=' ', lddlflags='-b64  -bhalt:4 -bexpall -G -bnoentry -lc
-L/usr/local/lib'


Characteristics of this binary (from libperl):
  Compile-time options: PERL_MALLOC_WRAP USE_64_BIT_ALL USE_64_BIT_INT
                        USE_LARGE_FILES USE_PERLIO
  Built under aix
  Compiled at Sep  6 2006 15:26:55
  @INC:
    /chroot/perl-5.8.8/lib/5.8.8/aix-64all
    /chroot/perl-5.8.8/lib/5.8.8
    /chroot/perl-5.8.8/lib/site_perl/5.8.8/aix-64all
    /chroot/perl-5.8.8/lib/site_perl/5.8.8
    /chroot/perl-5.8.8/lib/site_perl
    .
root at sfobench01(./XML-Parser-2.34)

From chenming442 at gmail.com  Fri Sep  8 13:42:44 2006
From: chenming442 at gmail.com (Chen Ming)
Date: Fri, 8 Sep 2006 19:42:44 +0800
Subject: [Expat-discuss] how XML_Parse work with processing instruction?
Message-ID: <be0d90e00609080442l3fb89671s38d0234383823ef6@mail.gmail.com>

Hi everyone, I try the following code both in VC7 and Dev-CPP.
But XML_Parse function seems can't work properly with processing
instruction like <?xml version="1.0" ?>

In the following code,the first time XML_Parse is called for string
"<?xml", it return XML_STATUS_OK; but the seconde time for string
"version="1.0"", it return XML_STATUS_ERROR,and error string form
XML_ErrorString is "not well-formed (invalid token)". If I remove the
processing instruction, everything is ok.

I guess maybe the usage of file stream has problem, but can't get it clear
, can somebody explain this error?
=============================================================
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <expat.h>

using namespace std;

int Count;
const int BUFFERSIZE = 256;

void XMLCALL start(void *data, const char *el, const char **attr)
{
  const char* tag = "tag1";
  if (strcmp(el, tag) == 0)
  Count++;
}  /* End of start handler */

void XMLCALL end(void *data, const char *el)
{
  const char* end = "tag";
}  /* End of end handler */


int main(int argc, char *argv[])
{
    getchar();
    XML_Parser p = XML_ParserCreate(NULL);
    if (p == NULL)
    {
      cout << "Parser create failed!" <<endl;
      system("PAUSE");
      return EXIT_FAILURE;
    }

    XML_SetElementHandler(p, start, end);

    Count = 0;
    char buffer[BUFFERSIZE];
    int done = 0;

    ifstream file("test.xml", ios::in);
    if (!file)
    {
      cout << "File could not be opend!" <<endl;
      system("PAUSE");
      return EXIT_FAILURE;
    }

    while (file >> buffer)
    {
      int length = strlen(buffer);
      done = file.eof();
      XML_Status status = XML_Parse(p, buffer, length, done);
      if (status == XML_STATUS_ERROR)
      {

        string error = XML_ErrorString(XML_GetErrorCode(p));
        cout << "parse error: " << error << endl;
        system("PAUSE");
        return EXIT_SUCCESS;
      }

      if (done)
      break;
    }
    cout << "There are " << Count << " <tag>." << endl;
    system("PAUSE");
    return EXIT_SUCCESS;
}

From kumar_qnx at yahoo.com  Fri Sep  8 23:09:54 2006
From: kumar_qnx at yahoo.com (kumar qnx)
Date: Fri, 8 Sep 2006 14:09:54 -0700 (PDT)
Subject: [Expat-discuss] Help using Libexpat
In-Reply-To: <44F731E9.20809@waclawek.net>
Message-ID: <20060908210954.43854.qmail@web55003.mail.re4.yahoo.com>


Hi,

I would like to know if there is any easy method in
keeping correspondece between the element name and the
data contained within those elements, i.e for example
i would like to know ,
<name>data</name> that the data is from the element
name.

Any help is appreciated.

regards,
Pavan.

--- Karl Waclawek <karl at waclawek.net> wrote:

> R?gis St-Gelais (Laubrass) wrote:
> > Expat is an XML parser. 
> > It only read XML files.
> >
> > I simply create my XML files using the good old
> fprintf function.
> >
> >   
> There is also genx, a C-library written by Tim Bray:
> 
>
http://www.tbray.org/ongoing/When/200x/2004/02/20/GenxStatus
> 
> Karl
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss at libexpat.org
>
http://mail.libexpat.org/mailman/listinfo/expat-discuss
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From tebaugh at gmail.com  Tue Sep 19 00:45:18 2006
From: tebaugh at gmail.com (Terry Ebaugh)
Date: Mon, 18 Sep 2006 18:45:18 -0400
Subject: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to resolve
Message-ID: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com>

Hi,

I've just started working with expat.  I have xml files that are gzipped.  I
gzcat them and pipe them to my parser.
I am getting the XML_ERROR_JUNK_AFTER_DOC_ELEMENT error message and I'm
unsure how to resolve.  I was under the impression that it
is caused by extracter character after a document root close tag.  I tried
stripping the chars after that close tag but that doesnt seem to work.  Is
this caused by a new document starting immediately after the first one has
finished?

Does anyone have any suggestions?

Here is the error message and what was in the buffer:

Parse error:file:1:row:4:column:0:reason:junk after document element
BUFFER = nter></usage></dataSet></metrics>

<?xml version='1.0' encoding='UTF-8'?>
<metrics version="3.0" cr


My main loop where I read stdin and call the parser is below:

/***********************************************************************/
  /* Read stdin                                                          */
  /***********************************************************************/
  for (;;) {
    len = (int)fread(buff, 1, BUFFSIZE-1, stdin);
    if (ferror(stdin)) {
      fprintf(stderr,"Error reading stdin\n");
      exit(-2);
    }
    done = feof(stdin);
    //if nothing read then exit so AI doesnt blow up
    if ((len == 0) && (done) && (cur_file_num==0))
       break;

   if(XML_Parse(p, buff, strlen(buff), done) == XML_STATUS_ERROR){
          fprintf(stderr, "\nParse error at
host:%s:file:%d:row:%d:column:%d:reason:%s\n",
                 host, cur_file_num, XML_GetCurrentLineNumber(p),
                 XML_GetCurrentColumnNumber(p),
XML_ErrorString(XML_GetErrorCode(p)));
          fprintf(stderr,"BUFFER = %s\n",buff);
          exit(-3);
     }
    if(done)
      break;
    }

    /* Free memory used by the parser */
    if(p) {
      XML_ParserFree(p);
    }
  return 0;
}

From nickmacd at gmail.com  Tue Sep 19 02:50:56 2006
From: nickmacd at gmail.com (Nick MacDonald)
Date: Mon, 18 Sep 2006 20:50:56 -0400
Subject: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to
	resolve
In-Reply-To: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com>
References: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com>
Message-ID: <bdcd32c90609181750t7a858c21xfe5f139a08e62606@mail.gmail.com>

Terry:

The rules for XML are very clear... only one document per file.  eXpat
is very specific about enforcing the correctness of XML files. You are
trying to process multiple documents in one file (STDIN.)

You'd need to write some sort of a filter to create a new parser for
the start of each document.  If you could guarantee that every XML
file would start with the option <?xml version> tag then you would
have a basis for your filter.

Otherwise it might be much easier to extract your zip file into
multiple files in a temporary directory, and then clean up afterward,
although that would have issues with keeping files in the same order
as the zip file unless you have then named them in the order they
would be processed from the file system.

Of course, if you're clever, you might be able to look for the error
and know that its not a "real" error then ignore it and start a new
parser at the correct place inside your buffer.

Good luck on your project...

Nick


On 9/18/06, Terry Ebaugh <tebaugh at gmail.com> wrote:
> I've just started working with expat.  I have xml files that are gzipped.  I
> gzcat them and pipe them to my parser.
> I am getting the XML_ERROR_JUNK_AFTER_DOC_ELEMENT error message and I'm
> unsure how to resolve.  I was under the impression that it
> is caused by extracter character after a document root close tag.  I tried
> stripping the chars after that close tag but that doesnt seem to work.  Is
> this caused by a new document starting immediately after the first one has
> finished?
>
> Does anyone have any suggestions?
>
> Here is the error message and what was in the buffer:
>
> Parse error:file:1:row:4:column:0:reason:junk after document element
> BUFFER = nter></usage></dataSet></metrics>
>
> <?xml version='1.0' encoding='UTF-8'?>
> <metrics version="3.0" cr
>
>
>
> My main loop where I read stdin and call the parser is below:
>
> /***********************************************************************/
>   /* Read stdin                                                          */
>   /***********************************************************************/
>   for (;;) {
>     len = (int)fread(buff, 1, BUFFSIZE-1, stdin);
>     if (ferror(stdin)) {
>       fprintf(stderr,"Error reading stdin\n");
>       exit(-2);
>     }
>     done = feof(stdin);
>     //if nothing read then exit so AI doesnt blow up
>     if ((len == 0) && (done) && (cur_file_num==0))
>        break;
>
>    if(XML_Parse(p, buff, strlen(buff), done) == XML_STATUS_ERROR){
>           fprintf(stderr, "\nParse error at
> host:%s:file:%d:row:%d:column:%d:reason:%s\n",
>                  host, cur_file_num, XML_GetCurrentLineNumber(p),
>                  XML_GetCurrentColumnNumber(p),
> XML_ErrorString(XML_GetErrorCode(p)));
>           fprintf(stderr,"BUFFER = %s\n",buff);
>           exit(-3);
>      }
>     if(done)
>       break;
>     }
>
>     /* Free memory used by the parser */
>     if(p) {
>       XML_ParserFree(p);
>     }
>   return 0;
> }

From tebaugh at gmail.com  Tue Sep 19 07:24:32 2006
From: tebaugh at gmail.com (Terry Ebaugh)
Date: Tue, 19 Sep 2006 01:24:32 -0400
Subject: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to
	resolve
In-Reply-To: <7C83A8A6B56D3A478333B1DF47E185863A274C@MPBABGEX01.corp.mphasis.com>
Message-ID: <001201c6dbab$e4680270$6a01a8c0@terry>

Nick pinpointed my problem.    I didn't have any trouble parsing the files
individually it only happened when I tried to pipe multiple documents to the
parser via stdin.

 
So now I'll either set up a filter or ignore the error message and be
clever.  

 
Thanks for the help!

 
Terry

 
-------------------------------------------------
Date: Monday, Sep 18th,2006

C:\pet C:\pet\cat C:\pet\cat\ignore\human


  _____  

From: Mukesh S [mailto:Mukesh.S at mphasis.com] 
Sent: Tuesday, September 19, 2006 1:08 AM
To: Terry Ebaugh; expat-discuss at libexpat.org
Subject: RE: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to
resolve

 
Hi,

Have you tried with individual files likes instead of reading all the files,
I suggest you to go for baby steps.

Like 

Step1 ) read the first xml file, and close it ,and immediate within you
function open another file,that make sense.

 
Step2) the next will be yours, check for the proper tag, if you are sure
that all the xml files you have same format expect the data is different
then it will works.

 
My Small web-page:

http://www.geocities.com/muki_champs

 
Regards,

Mukesh Srivastav,

Sr.Software Engineer.

India,

Bangalore.

+91-9980142921 (M)

 
-----Original Message-----
From: expat-discuss-bounces at libexpat.org
[mailto:expat-discuss-bounces at libexpat.org] On Behalf Of Terry Ebaugh
Sent: Tuesday, September 19, 2006 4:15 AM
To: expat-discuss at libexpat.org
Subject: [Expat-discuss] XML_ERROR_JUNK_AFTER_DOC_ELEMENT - How to resolve

 
Hi,

 
I've just started working with expat.  I have xml files that are gzipped.  I

gzcat them and pipe them to my parser.

I am getting the XML_ERROR_JUNK_AFTER_DOC_ELEMENT error message and I'm

unsure how to resolve.  I was under the impression that it

is caused by extracter character after a document root close tag.  I tried

stripping the chars after that close tag but that doesnt seem to work.  Is

this caused by a new document starting immediately after the first one has

finished?

 
Does anyone have any suggestions?

 
Here is the error message and what was in the buffer:

 
Parse error:file:1:row:4:column:0:reason:junk after document element

BUFFER = nter></usage></dataSet></metrics>

 
<?xml version='1.0' encoding='UTF-8'?>

<metrics version="3.0" cr

 
My main loop where I read stdin and call the parser is below:

 
/***********************************************************************/

  /* Read stdin                                                          */

  /***********************************************************************/

  for (;;) {

    len = (int)fread(buff, 1, BUFFSIZE-1, stdin);

    if (ferror(stdin)) {

      fprintf(stderr,"Error reading stdin\n");

      exit(-2);

    }

    done = feof(stdin);

    //if nothing read then exit so AI doesnt blow up

    if ((len == 0) && (done) && (cur_file_num==0))

       break;

 
   if(XML_Parse(p, buff, strlen(buff), done) == XML_STATUS_ERROR){

          fprintf(stderr, "\nParse error at

host:%s:file:%d:row:%d:column:%d:reason:%s\n",

                 host, cur_file_num, XML_GetCurrentLineNumber(p),

                 XML_GetCurrentColumnNumber(p),

XML_ErrorString(XML_GetErrorCode(p)));

          fprintf(stderr,"BUFFER = %s\n",buff);

          exit(-3);

     }

    if(done)

      break;

    }

 
    /* Free memory used by the parser */

    if(p) {

      XML_ParserFree(p);

    }

  return 0;

}

_______________________________________________

Expat-discuss mailing list

Expat-discuss at libexpat.org

http://mail.libexpat.org/mailman/listinfo/expat-discuss


From jameswhetstone at comcast.net  Sat Sep 23 21:49:22 2006
From: jameswhetstone at comcast.net (James Whetstone)
Date: Sat, 23 Sep 2006 12:49:22 -0700
Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer
References: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com>
	<bdcd32c90609181750t7a858c21xfe5f139a08e62606@mail.gmail.com>
Message-ID: <000901c6df49$5e168a60$6401a8c0@crankshaft>

Hi,

I'm looking at integrating expat into my TCP server and I'd like to use 
avoid the extra copy by utilizing XML_GetBuffer and XML_ParseBuffer.  So 
I'll use the buffer returned from XML_GetBuffer as my TCP/IP receive buffer, 
passing the buffer to XML_ParseBuffer when some data is received.  So as the 
XML stream is received and processed, I'll need to know how much of the XML 
buffer is unused and where the unused index begins so that I can receiev 
data into the buffer without overwriting XML bytes left over from the prior 
TCP read.  So...are there any functions in the expat API that were intended 
to support this kind of design?

Thanks,
---James


From jameswhetstone at comcast.net  Sat Sep 23 22:04:17 2006
From: jameswhetstone at comcast.net (James Whetstone)
Date: Sat, 23 Sep 2006 13:04:17 -0700
Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer
References: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com><bdcd32c90609181750t7a858c21xfe5f139a08e62606@mail.gmail.com>
	<000901c6df49$5e168a60$6401a8c0@crankshaft>
Message-ID: <001001c6df4b$73adf780$6401a8c0@crankshaft>

Another question along the same lines is whether or not I even need to worry 
about overwriting left over data in the XML Buffer.  I assumed there would 
sometimes be some left over data of a XML fragment in the buffer, but maybe 
that isn't the case.

JW

----- Original Message ----- 
From: "James Whetstone" <jameswhetstone at comcast.net>
To: <expat-discuss at libexpat.org>
Sent: Saturday, September 23, 2006 12:49 PM
Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer


> Hi,
>
> I'm looking at integrating expat into my TCP server and I'd like to use
> avoid the extra copy by utilizing XML_GetBuffer and XML_ParseBuffer.  So
> I'll use the buffer returned from XML_GetBuffer as my TCP/IP receive 
> buffer,
> passing the buffer to XML_ParseBuffer when some data is received.  So as 
> the
> XML stream is received and processed, I'll need to know how much of the 
> XML
> buffer is unused and where the unused index begins so that I can receiev
> data into the buffer without overwriting XML bytes left over from the 
> prior
> TCP read.  So...are there any functions in the expat API that were 
> intended
> to support this kind of design?
>
> Thanks,
> ---James
>
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss at libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss 


From karl at waclawek.net  Sat Sep 23 22:29:35 2006
From: karl at waclawek.net (Karl Waclawek)
Date: Sat, 23 Sep 2006 16:29:35 -0400
Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer
In-Reply-To: <001001c6df4b$73adf780$6401a8c0@crankshaft>
References: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com><bdcd32c90609181750t7a858c21xfe5f139a08e62606@mail.gmail.com>	<000901c6df49$5e168a60$6401a8c0@crankshaft>
	<001001c6df4b$73adf780$6401a8c0@crankshaft>
Message-ID: <4515992F.3060609@waclawek.net>

James Whetstone wrote:
> Another question along the same lines is whether or not I even need to worry 
> about overwriting left over data in the XML Buffer.  I assumed there would 
> sometimes be some left over data of a XML fragment in the buffer, but maybe 
> that isn't the case.
>
>   
If I remember correctly, Expat buffers any unused fragments. So you 
should not have to worry.

Karl

From jameswhetstone at comcast.net  Sat Sep 23 23:58:23 2006
From: jameswhetstone at comcast.net (James Whetstone)
Date: Sat, 23 Sep 2006 14:58:23 -0700
Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer
References: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com><bdcd32c90609181750t7a858c21xfe5f139a08e62606@mail.gmail.com>	<000901c6df49$5e168a60$6401a8c0@crankshaft>
	<001001c6df4b$73adf780$6401a8c0@crankshaft>
	<4515992F.3060609@waclawek.net>
Message-ID: <001901c6df5b$641edcc0$6401a8c0@crankshaft>

So I stepped through the code to see what happens to unused fragments, and 
it leaves the fragments in the buffer.  From what I can tell, instead of 
moving the offset of the input buffer, XML_GetBuffer is intended to be 
called each time new input is to be accepted.  It then allocates a new 
(larger) buffer, memcpys the fragment from the old buffer to the new buffer 
and then frees the old buffer.  I'd like to avoid this by simple moving the 
input buffer's offset to the a end of the fragment and NOT calling 
XML_GetBuffer to avoid the extran memory allocation.  Any suggestions?

---James

----- Original Message ----- 
From: "Karl Waclawek" <karl at waclawek.net>
To: "James Whetstone" <jameswhetstone at comcast.net>
Cc: <expat-discuss at libexpat.org>
Sent: Saturday, September 23, 2006 1:29 PM
Subject: Re: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer


> James Whetstone wrote:
>> Another question along the same lines is whether or not I even need to 
>> worry about overwriting left over data in the XML Buffer.  I assumed 
>> there would sometimes be some left over data of a XML fragment in the 
>> buffer, but maybe that isn't the case.
>>
>>
> If I remember correctly, Expat buffers any unused fragments. So you should 
> not have to worry.
>
> Karl 


From jameswhetstone at comcast.net  Sun Sep 24 00:38:00 2006
From: jameswhetstone at comcast.net (James Whetstone)
Date: Sat, 23 Sep 2006 15:38:00 -0700
Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer
References: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com><bdcd32c90609181750t7a858c21xfe5f139a08e62606@mail.gmail.com>	<000901c6df49$5e168a60$6401a8c0@crankshaft><001001c6df4b$73adf780$6401a8c0@crankshaft><4515992F.3060609@waclawek.net>
	<001901c6df5b$641edcc0$6401a8c0@crankshaft>
Message-ID: <002601c6df60$ed0dd900$6401a8c0@crankshaft>

So I found the easiest and maybe the best way to prevent additional memory 
allocations is to initially create a buffer that is double the size of the 
TCP input buffer.  For example,  I create a buffer using 
XML_GetBuffer(parser, 8192) and then code the rest of the program as if the 
buffer is 4096 bytes.  So subsequent calls to XML_GetBuffer are called with 
a buffer size of 4096.

---James


----- Original Message ----- 
From: "James Whetstone" <jameswhetstone at comcast.net>
To: "Karl Waclawek" <karl at waclawek.net>
Cc: <expat-discuss at libexpat.org>
Sent: Saturday, September 23, 2006 2:58 PM
Subject: Re: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer


> So I stepped through the code to see what happens to unused fragments, and
> it leaves the fragments in the buffer.  From what I can tell, instead of
> moving the offset of the input buffer, XML_GetBuffer is intended to be
> called each time new input is to be accepted.  It then allocates a new
> (larger) buffer, memcpys the fragment from the old buffer to the new 
> buffer
> and then frees the old buffer.  I'd like to avoid this by simple moving 
> the
> input buffer's offset to the a end of the fragment and NOT calling
> XML_GetBuffer to avoid the extran memory allocation.  Any suggestions?
>
> ---James
>
> ----- Original Message ----- 
> From: "Karl Waclawek" <karl at waclawek.net>
> To: "James Whetstone" <jameswhetstone at comcast.net>
> Cc: <expat-discuss at libexpat.org>
> Sent: Saturday, September 23, 2006 1:29 PM
> Subject: Re: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer
>
>
>> James Whetstone wrote:
>>> Another question along the same lines is whether or not I even need to
>>> worry about overwriting left over data in the XML Buffer.  I assumed
>>> there would sometimes be some left over data of a XML fragment in the
>>> buffer, but maybe that isn't the case.
>>>
>>>
>> If I remember correctly, Expat buffers any unused fragments. So you 
>> should
>> not have to worry.
>>
>> Karl
>
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss at libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss 


From karl at waclawek.net  Sun Sep 24 06:38:07 2006
From: karl at waclawek.net (Karl Waclawek)
Date: Sun, 24 Sep 2006 00:38:07 -0400
Subject: [Expat-discuss] Using XML_GetBuffer/XML_ParseBuffer
In-Reply-To: <001901c6df5b$641edcc0$6401a8c0@crankshaft>
References: <d70802170609181545h5a5a01fdi69cb4dad10dc173b@mail.gmail.com><bdcd32c90609181750t7a858c21xfe5f139a08e62606@mail.gmail.com>	<000901c6df49$5e168a60$6401a8c0@crankshaft>
	<001001c6df4b$73adf780$6401a8c0@crankshaft>
	<4515992F.3060609@waclawek.net>
	<001901c6df5b$641edcc0$6401a8c0@crankshaft>
Message-ID: <45160BAF.6070800@waclawek.net>

James Whetstone wrote:
> So I stepped through the code to see what happens to unused fragments, 
> and it leaves the fragments in the buffer.  From what I can tell, 
> instead of moving the offset of the input buffer, XML_GetBuffer is 
> intended to be called each time new input is to be accepted.
Yes.
> It then allocates a new (larger) buffer, memcpys the fragment from the 
> old buffer to the new buffer and then frees the old buffer.
No, only if the requested length plus the unprocessed fragment exceeds 
the size of the current buffer, otherwise
the unused fragment is simply moved to the beginning of the buffer.
> I'd like to avoid this by simple moving the input buffer's offset to 
> the a end of the fragment and NOT calling XML_GetBuffer to avoid the 
> extran memory allocation.  Any suggestions?
>
Your suggestion in your other message is good - requesting a larger 
buffer on the first call to XML_GetBuffer - should
reduce or eliminate new memory allocations.

Karl

From franky.braem at gmail.com  Wed Sep 27 23:02:13 2006
From: franky.braem at gmail.com (Franky Braem)
Date: Wed, 27 Sep 2006 23:02:13 +0200
Subject: [Expat-discuss] Always reports utf-8 encoding?
Message-ID: <451AE6D5.1020909@gmail.com>

I've compiled expat with XML_UNICODE to get UTF-16 encoding. But it 
seems that the character data handler always gets its information in UTF-8.
The xml-file is stored in UTF-16 format.

This is what I do in the handler:

void ModulesXMLParser::CharacterDataHandler(void *userData,
                                            const XML_Char *s,
                                            int len)
{
    ModulesXMLParser *modxml = (ModulesXMLParser *) userData;
    for(int i = 0; i < len; i++)
    {
        const unsigned t = s[i];
        modxml->m_chars.AppendByte(t);
    }
    //modxml->m_chars.AppendData((void *) s, len);
}

And this is how I convert the information stored in m_chars:

    wxMBConvUTF16 conv;
    modxml->m_chars.AppendByte('\0');
    modxml->m_chars.AppendByte('\0');
    wxString dllName = wxString((const char *) 
modxml->m_chars.GetData(), conv);

The above doesn't work. The following works:

    wxString dllName = wxString((const char *) 
modxml->m_chars.GetData(), wxConvUTF8);

Any ideas on how to get UTF-16 output?

Franky.

From marco.forberg at gmx.net  Wed Sep 27 23:09:58 2006
From: marco.forberg at gmx.net (Marco Forberg)
Date: Wed, 27 Sep 2006 23:09:58 +0200
Subject: [Expat-discuss] Always reports utf-8 encoding?
In-Reply-To: <451AE6D5.1020909@gmail.com>
References: <451AE6D5.1020909@gmail.com>
Message-ID: <op.tgj2ewmv3odusb@anakin>

Did you try setting the encoding when creating the parser?
XML_ParserCreate("UTF-16")


Am 27.09.2006, 23:02 Uhr, schrieb Franky Braem <franky.braem at gmail.com>:

> I've compiled expat with XML_UNICODE to get UTF-16 encoding. But it
> seems that the character data handler always gets its information in  
> UTF-8.
> The xml-file is stored in UTF-16 format.
>
> This is what I do in the handler:
>
> void ModulesXMLParser::CharacterDataHandler(void *userData,
>                                             const XML_Char *s,
>                                             int len)
> {
>     ModulesXMLParser *modxml = (ModulesXMLParser *) userData;
>     for(int i = 0; i < len; i++)
>     {
>         const unsigned t = s[i];
>         modxml->m_chars.AppendByte(t);
>     }
>     //modxml->m_chars.AppendData((void *) s, len);
> }
>
> And this is how I convert the information stored in m_chars:
>
>     wxMBConvUTF16 conv;
>     modxml->m_chars.AppendByte('\0');
>     modxml->m_chars.AppendByte('\0');
>     wxString dllName = wxString((const char *)
> modxml->m_chars.GetData(), conv);
>
> The above doesn't work. The following works:
>
>     wxString dllName = wxString((const char *)
> modxml->m_chars.GetData(), wxConvUTF8);
>
> Any ideas on how to get UTF-16 output?
>
> Franky.
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss at libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss
>


From karl at waclawek.net  Thu Sep 28 15:19:59 2006
From: karl at waclawek.net (Karl Waclawek)
Date: Thu, 28 Sep 2006 09:19:59 -0400
Subject: [Expat-discuss] Always reports utf-8 encoding?
In-Reply-To: <451AE6D5.1020909@gmail.com>
References: <451AE6D5.1020909@gmail.com>
Message-ID: <451BCBFF.5070100@waclawek.net>

Franky Braem wrote:
> I've compiled expat with XML_UNICODE to get UTF-16 encoding. But it 
> seems that the character data handler always gets its information in UTF-8.
> The xml-file is stored in UTF-16 format.
>
>   
Are you linking to the correct library? For UTF-16 it is called "libexpatw".

Karl

From franky.braem at gmail.com  Fri Sep 29 21:17:31 2006
From: franky.braem at gmail.com (Franky Braem)
Date: Fri, 29 Sep 2006 21:17:31 +0200
Subject: [Expat-discuss] Always reports utf-8 encoding?
In-Reply-To: <451BE77E.8050609@waclawek.net>
References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net>
	<451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net>
Message-ID: <451D714B.8040105@gmail.com>

Karl Waclawek wrote:
> Franky Braem wrote:
>>>>  
>>> Are you linking to the correct library? For UTF-16 it is called 
>>> "libexpatw".
>>>
>>>
>> I'm linking with libexpatwMT.lib
>>
>> Franky.
>
> I looked at the "expatw_static" project (I assume you are using Visual 
> Studio), and it seems the defines are correct.
> Did you also use the XML_STATIC define?
>
> Karl
XML_STATIC is defined. And yes I'm using Visual Studio.

Franky.


From karl at waclawek.net  Fri Sep 29 21:28:51 2006
From: karl at waclawek.net (Karl Waclawek)
Date: Fri, 29 Sep 2006 15:28:51 -0400
Subject: [Expat-discuss] Always reports utf-8 encoding?
In-Reply-To: <451D714B.8040105@gmail.com>
References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net>
	<451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net>
	<451D714B.8040105@gmail.com>
Message-ID: <451D73F3.70606@waclawek.net>

Franky Braem wrote:
>> I looked at the "expatw_static" project (I assume you are using 
>> Visual Studio), and it seems the defines are correct.
>> Did you also use the XML_STATIC define?
>>
>> Karl
> XML_STATIC is defined. And yes I'm using Visual Studio.
>
Maybe you can post a small self-contained example program that shows the 
problem.

Karl

From franky.braem at gmail.com  Fri Sep 29 22:24:20 2006
From: franky.braem at gmail.com (Franky Braem)
Date: Fri, 29 Sep 2006 22:24:20 +0200
Subject: [Expat-discuss] Always reports utf-8 encoding?
In-Reply-To: <451D73F3.70606@waclawek.net>
References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net>
	<451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net>
	<451D714B.8040105@gmail.com> <451D73F3.70606@waclawek.net>
Message-ID: <451D80F4.4090004@gmail.com>

Karl Waclawek wrote:
> Maybe you can post a small self-contained example program that shows 
> the problem.
>
> Karl
>
The following is a small example:

#include "expat.h"

XML_Char buffer[1000];
int length = 0;


void EndElementHandler(void *userData,
                       const XML_Char *name);

void CharacterDataHandler(void *userData,
                          const XML_Char *s,
                          int len);

int _tmain(int argc, _TCHAR* argv[])
{
    XML_Parser parser = XML_ParserCreate(NULL);
    XML_SetUserData(parser, NULL);
    XML_SetElementHandler(parser, NULL, EndElementHandler);
    XML_SetCharacterDataHandler(parser, CharacterDataHandler);

    FILE *f = fopen("c:\\temp\\modules.xml", "r");
    if ( f )
    {
      // obtain file size.
      fseek (f , 0 , SEEK_END);
      long lSize = ftell (f);
      rewind (f);

      // allocate memory to contain the whole file.
      char *readbuffer = (char*) malloc (lSize);
      if (readbuffer == NULL) exit (2);

      // copy the file into the buffer.
      fread (readbuffer,1,lSize,f);
      XML_Parse(parser, readbuffer, lSize, 1);
      free(readbuffer);
    }
    XML_ParserFree(parser);

    return 0;
}

void EndElementHandler(void *userData,
                       const XML_Char *name)
{
    length = 0;
}

void CharacterDataHandler(void *userData,
                          const XML_Char *s,
                          int len)
{
    for(int i = 0; i < len; i++, length++)
    {
        buffer[length] = s[i];
    }
}

The following is defined:

WIN32;_DEBUG;_CONSOLE;XML_UNICODE;XML_STATIC

And I link with libexpatwMT.lib

When I debug the above, the name of the tags are always readable, while 
I expect some UTF-16 characters.


From karl at waclawek.net  Sat Sep 30 05:23:44 2006
From: karl at waclawek.net (Karl Waclawek)
Date: Fri, 29 Sep 2006 23:23:44 -0400
Subject: [Expat-discuss] Always reports utf-8 encoding?
In-Reply-To: <451D80F4.4090004@gmail.com>
References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net>
	<451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net>
	<451D714B.8040105@gmail.com> <451D73F3.70606@waclawek.net>
	<451D80F4.4090004@gmail.com>
Message-ID: <451DE340.3020104@waclawek.net>

Franky Braem wrote:
> Karl Waclawek wrote:
>> Maybe you can post a small self-contained example program that shows 
>> the problem.
>>
>> Karl
>>
> The following is a small example:
<snip>

Seems this is just how the debugger processes and displays it - trying 
to be smart.
You can assign a value > 255 to a buffer element (array of XML_Char), 
which means
XML_Char has more than one byte.

Btw, defining XML_UNICODE and not XML_UNICODE_WCHAR_T will typedef
XML_Char as ushort, not as wchar_t.

Karl

From franky.braem at gmail.com  Sat Sep 30 17:51:46 2006
From: franky.braem at gmail.com (Franky Braem)
Date: Sat, 30 Sep 2006 17:51:46 +0200
Subject: [Expat-discuss] Always reports utf-8 encoding?
In-Reply-To: <451DE340.3020104@waclawek.net>
References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net>
	<451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net>
	<451D714B.8040105@gmail.com> <451D73F3.70606@waclawek.net>
	<451D80F4.4090004@gmail.com> <451DE340.3020104@waclawek.net>
Message-ID: <451E9292.9040007@gmail.com>

When I do the following in the characterhandler:

    ModulesXMLParser *modxml = (ModulesXMLParser *) userData;
    modxml->m_chars.AppendData((void *) s, len * 2);

it works. Note the len * 2. Is this mentioned in the docs somewhere? If 
not, please add it.

Franky.

From karl at waclawek.net  Sat Sep 30 18:31:10 2006
From: karl at waclawek.net (Karl Waclawek)
Date: Sat, 30 Sep 2006 12:31:10 -0400
Subject: [Expat-discuss] Always reports utf-8 encoding?
In-Reply-To: <451E9292.9040007@gmail.com>
References: <451AE6D5.1020909@gmail.com> <451BCBFF.5070100@waclawek.net>
	<451BE548.4020007@gmail.com> <451BE77E.8050609@waclawek.net>
	<451D714B.8040105@gmail.com> <451D73F3.70606@waclawek.net>
	<451D80F4.4090004@gmail.com> <451DE340.3020104@waclawek.net>
	<451E9292.9040007@gmail.com>
Message-ID: <451E9BCE.60701@waclawek.net>

Franky Braem wrote:
> When I do the following in the characterhandler:
>
>    ModulesXMLParser *modxml = (ModulesXMLParser *) userData;
>    modxml->m_chars.AppendData((void *) s, len * 2);
>
> it works. Note the len * 2. Is this mentioned in the docs somewhere? 
> If not, please add it.
>
>
The len refers to the number of XML_Chars, not to the number of bytes.
With XML_UNICODE the size of  XML_Char is 2, therefore the math above.
I would use sizeof(XML_Char) instead of 2.

Karl

Karl