From nick@integrasolv.com Thu Jul 4 21:25:03 2002 From: nick@integrasolv.com (Nick Lehman) Date: Thu Jul 4 20:25:03 2002 Subject: [Expat-discuss] Re: using expat as a static library In-Reply-To: Message-ID: <20020704230752.I5932-100000@overpass.exit109.com> Hi, I think the issue is the preprocessor tests for defining XMLPARSEAPI in xmlparse.[ch] is always __declspec(dllimport) type __cdecl on windows. I think this produces dll mangled names thats why you get __imp__ stuff prepended on the symbol names. I hacked those definitions in my copy of 1.95.3 so they looked like the unix definitions meaning just "type" which is actually __cdecl unless your using /Gz(stdcall) /Gr(fastcall). So the current windows code path builds dll mangled names no matter what. So even though you think you've got a legit static link libary you don't. And you don't find out until you link your application. Anyhow the end result is the current windows code path can only really build dll's because the symbols are named something you're not expecting. I haven't actually thought throuh how to make a general fix but all I did to get my test to link was: in expat.h and xmlparse I changed the lines #define XMLPARSEAPI(type) __declspec(dllimport) type __cdecl to #define XMLPARSEAPI(type) type than you can remake xmlparse.obj xmltok.obj xmlrole.obj ... and than use lib to make your .lib file the tests to decide the definition of XMLPARSEAPI should be changed for windows to figure out if _DLL is defined. If it is than the __declspec syntax is used otherwise not I hope that helps get you over the hump. Bear in mind I could be wrong Nick Nick Lehman IntegraSolv Corp http://www.integrasolv.com From gp@familiehaase.de Mon Jul 8 01:58:10 2002 From: gp@familiehaase.de (Gerrit P. Haase) Date: Mon Jul 8 00:58:10 2002 Subject: [Expat-discuss] Cygwin includes expat-1.95.3 Message-ID: <55132107500.20020708095815@familiehaase.de> Hallo, I have included Expat in the Cygwin netrelease. It is available via several setup mirrors and you should always use setup.exe to install Cygwin or parts of it: http://cygwin.com/setup.exe Expat is in the full list or in some categories like 'interpreter' or 'text'. Gerrit -- "All faults& bugs are mine - Robert" from squid/acinclude.m4, Sun Apr 21 05:21:21 2002 From fdrake@acm.org Mon Jul 8 06:23:03 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon Jul 8 05:23:03 2002 Subject: [Expat-discuss] Cygwin includes expat-1.95.3 In-Reply-To: <55132107500.20020708095815@familiehaase.de> References: <55132107500.20020708095815@familiehaase.de> Message-ID: <15657.33794.571154.530019@grendel.zope.com> Gerrit P. Haase writes: > I have included Expat in the Cygwin netrelease. It is available via > several setup mirrors and you should always use setup.exe to install > Cygwin or parts of it: http://cygwin.com/setup.exe I hope it's easy to update to 1.95.4 when it comes out (hopefully this week). Some fairly serious bugs have turned up in 1.95.3, and those are fixed in 1.95.4 (along with a handful of smaller bugs). -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Mon Jul 8 10:21:10 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon Jul 8 09:21:10 2002 Subject: [Expat-discuss] Re: using expat as a static library In-Reply-To: <20020704230752.I5932-100000@overpass.exit109.com> References: <20020704230752.I5932-100000@overpass.exit109.com> Message-ID: <15657.48074.67639.359434@grendel.zope.com> Nick Lehman writes: > I think the issue is the preprocessor tests for defining XMLPARSEAPI > in xmlparse.[ch] is always __declspec(dllimport) type __cdecl on windows. > I think this produces dll mangled names thats why you get __imp__ stuff > prepended on the symbol names. I hacked those definitions in my copy > of 1.95.3 so they looked like the unix definitions meaning just "type" > which is actually __cdecl unless your using /Gz(stdcall) /Gr(fastcall). I think the __cdecl should be kept for Windows in either case. The only time anyone should even consider changing that is if they are embedding the sources in their project and can ensure that everything is set up properly. For a generally usable library (my concern), __cdecl seems to be required. > So the current windows code path builds dll mangled names no matter what. > So even though you think you've got a legit static link libary you don't. > And you don't find out until you link your application. Anyhow the end > result is the current windows code path can only really build dll's because > the symbols are named something you're not expecting. I haven't actually > thought throuh how to make a general fix but all I did to get my test to > link was: I'm not sure what the right way to deal with the DLL name mangling on Windows is, and I don't have my Windows box available at the moment. For xmlparse.c, perhaps the "right" thing is to change the definition of XMLPARSEAPI from: #define XMLPARSEAPI(type) __declspec(dllexport) type __cdecl to: #ifndef XMLPARSEAPI #ifdef _DLL #define XMLPARSEAPI(type) __declspec(dllexport) type __cdecl #else #define XMLPARSEAPI(type) type __cdecl #endif #endif What to do about the header I really don't know -- how would a Windows developer normally indicate that a particular library is being used in a static or dynamic form? Without being able to deal with that, I'm not sure it makes sense to only change xmlparse.c. (I'm not sure that the second definition in xmlparse.c is ever needed; does anyone have a platform on which it is? If it's never used, I'd like to remove it. The Microsoft docs don't seem to suggest that __declspec can be tested using #ifdef.) -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From Josh.Martin@abq.sc.philips.com Mon Jul 8 15:43:02 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Mon Jul 8 14:43:02 2002 Subject: [Expat-discuss] Re: hi.. THx for ur help and patience. Message-ID: <200207082142.PAA14052@abqn42.abq.sc.philips.com> Hi Vijay, I'm sorry you're still having problems, and for the slow reply. I think I see your problem though. Am I crazy, or do you have a carriage return in the middle of your tag? If you do, and it looks like you do, that is probably why the code is issuing an error there. I'm not very familiar with the UTF encodings, but I think that having it parse your document in UTF-8 is causing it to misinterperet that carriage return, as you should normally be able to use a return any place you can use other normal whitespace. Take that out and try it again. Since I'm pretty sure that's the culprit, but I'm not confident about my interperetation of why the return caused an error, I'm going to post this to the expat mailing list as well in hopes that someone more knowledgable in that area can help. Also, I'm still uncomfortable with your use of the "done" variable in the main parsing loop, as I am quite certain it will break once you start reading in multi-line documents. If this doesn't help, you might want to try not specifying the encoding type in the document and in the call to XML_ParserCreate(). If you still get the same error message after that, I don't think I can help, but send another copy of your code anyway. Good luck. - Josh Martin > Hi Josh, > > Thank you very much for ur help and patience. I am attaching the current > code which I have and also the error. I made the required changes u had sent > me but I still get an error.. The output along with the "^" charecter is > attached Please tell me if u can spot as to what is causing the error. > > > /*-------------------------------code--------------------------------------- > ---*/ > #include > #include > > #define BUFFSIZE 2046 > > /* This is the structure passed bewteen handlers */ > #define XML_STACK_SIZE 50 > #define XML_ELEMENT_SIZE 50 > typedef struct > { > int Depth; > char stack[XML_STACK_SIZE][XML_ELEMENT_SIZE+1]; > } UserData; > > char totElements[20][100]; > char totAttributes[20][100]; > int count = 0; > > void start(void *data, const char *el, const char **attr) > { > int i; > UserData *pUData = (UserData *)data; > > printf("el = %s\n", el); > /* Store this in our stack */ > if (pUData->Depth < XML_STACK_SIZE) > { > strncpy(pUData->stack[pUData->Depth], el, XML_ELEMENT_SIZE); > } > (pUData->Depth)++; > } /* End of start handler */ > > > void end(void *data, const char *el) > { > int i; > UserData *pUData = (UserData *)data; > > (pUData->Depth)--; > if (pUData->Depth < XML_STACK_SIZE) > { > /* Clear this entry in the stack */ > pUData->stack[pUData->Depth][0] = 0; > } > } /* End of end handler */ > > > void charHdlr(void *data, const char *s, int len) > { > UserData *pUData = (UserData *)data; > char tmp[80]; > char *tmp1; > int i, idx, left; > char *pS = (char *)s; > > /* The input string is not NULL terminated */ > if (len <= 0) > return; > > /* Character data is any data within element tags. This includes > newlines and whitespace, etc between nested elements too. > */ > while ((*pS == ' ') || (*pS == '\n') || (*pS == '\t')) > { > pS++; > len--; > > /* Check if we run out of characters */ > if (len <= 0) > return; > } > for (i = 0; i < pUData->Depth; i++){ > tmp1 = pUData->stack[i]; > } > strcpy(totElements[count],tmp1);//,strlen(tmp1)); > /* Now print out our element too */ > idx = 0; > left = len; > while (left > 80) > { > memcpy (&tmp[0], &pS[idx], 79); > tmp[80] = 0; > printf ("\n%s", tmp); > left -= 80; > idx += 80; > } > memcpy (&tmp[0], &pS[idx], left); > tmp[left] = 0; > strcpy(totAttributes[count],&tmp[0]); > count++; > } /* End of charHdlr handler */ > > > main(int argc, char **argv) > { > XML_Parser p; > UserData u; > void *Buff; > int x = 0; > char dtr[] = " ?>06/17/200207:50:21" > ; > int len = strlen(dtr); > memset (&u, 0, sizeof(UserData)); > > printf("size of = %d user data = %d\n",len,sizeof(UserData)); > p = XML_ParserCreate("utf-8"); > if (! p) > { > fprintf(stderr, "Couldn't allocate memory for parser\n"); > exit(-1); > } > > XML_SetElementHandler(p, start, end); > XML_SetCharacterDataHandler(p, charHdlr); > XML_SetUserData(p, &u); > int done = 0; > printf("Document Recd = %s\n", dtr); > for (;;) > { > int bytes_read; > /* Last parameter indicates if this is the last chunk or not */ > > if(!XML_Parse(p,dtr,strlen(dtr),1)) //sizeof(dtr),0)) > { > int b_index = 0; > b_index = XML_GetCurrentByteIndex(p); > strtok(&dtr[b_index-XML_GetCurrentColumnNumber(p)], "\n\r"); > fprintf(stderr, "Parse error at line %1$d: %2$s\n%3$s\n%4$*5$c\n", > XML_GetCurrentLineNumber(p), > XML_ErrorString(XML_GetErrorCode(p)), > &(dtr[b_index-XML_GetCurrentColumnNumber(p)]), > '^', XML_GetCurrentColumnNumber(p)+1); > XML_ParserFree(p); > exit(-1); > } > done++; > if (done == len-1) > break; > } > /* I think this frees any buffers obtained from XML_GetBuffer too, if > required */ > XML_ParserFree (p); > > } /* End of main */ > /*-------------------------------end of > code------------------------------------------*/ > > > output : > > size of = 112 user data = 2556 > Document Recd = ?>06/17/200207:50:21 > el = msg > el = head > el = RoundTripInfo > Parse error at line 1: junk after document element > ?>06/17/200207:50:21 > ^ > > Vijay Naidu > System Analyst > Office : 814-274-6526 > Cell : 814-932-5327 > Pager : 800-541-0931 > ---------------------------------------------------------------------------- > ------------ > Smile is a curve that makes everything straight. Keep smiling > Experience, my friends, is the hardest kind of teacher. > It gives you a test first and the lesson afterwards > ---------------------------------------------------------------------------- > ------------ From Josh.Martin@abq.sc.philips.com Mon Jul 8 16:21:05 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Mon Jul 8 15:21:05 2002 Subject: [Expat-discuss] help : how to use the expat-1.95.3 Message-ID: <200207082220.QAA14067@abqn42.abq.sc.philips.com> Hi, Okay, as far as I can see you seem to have several problems going on here. First, expat is NOT a perl library/module, but a C library. To get it to work for perl you'd have to write a wrapper library for it, or use something like the XML::Parse module (which I know nothing about). Second, it looks like the copy of gcc which "configure" is trying to run might be broken, and was probably built for a different machine. Do a "gcc -v" and "gcc -dumpmachine" and make sure that it is built for the machine you are using (do "uname -a"). Third, putting "/usr/ccs/bin" in front of your path is causing the make script to run the C compiler which came with your machine, which is only built to compile the kernal source, and is not capable of compiling most any other code correctly (except for bootstrapping type sources like the gcc distribution). You have to either buy the full ANSI C language compiler from HP (which is what that error message is referring to), or get a free compiler like gcc, which you seem to have. You might also need to add "CC=gcc" along with the PATH declaration when you issue the "make" command. Fourth, you need to run "configure" just once, not three times like you are doing. You have to state all of the options that you want on that singe execution of "configure". And lastly, the example piece of code you have written there is neither correct C code, nor is it correct perl code. I'm sorry this probably doesn't help you do what you wanted to do, and I hope this information is clear enough, but I know I don't speak your language, and you still seem to be learning mine. Good luck. - Josh Martin > i am installed expat-1.95.3 in my sunOS5.8 but the > path for expat-1.95.3/lib and expat-1.95.3/bin is not > set.whenever i try to run my parser perl program is > not at all Identifying the parser methods. > > these are the septs i have followed. > 1) down loaded expat-1.95.3 from sourceforge.net > 2) put into my home directory(/misc/home/john/) > 3) gunzip and tar the .gz file(it's created a file > expat-1.95.3) > 4) ./configure -PREFIX=/misc/home/john/expat-1.95.3 > 5) ./configure CPPFLAGS=-DXML_UNICODE > 6) ./configure CFLAGS="-g -O2 -fshort-wchar" PPFLAGS=- > DXML_UNICODE_WCHAR_T > It's trowing error(./configure > CFLAGS="-g -O2 -fshort-wchar" PPFLAGS=-DXML_UNICO> > checking build system type... sparc-sun-solaris2.8 > checking host system type... sparc-sun-solaris2.8 > checking for gcc... gcc > checking for C compiler default output... configure: > error: C compiler cannot create executables) > > 7) make buildlib > 8) make installlib > 9) PATH=/usr/ccs/bin:$PATH make > throwing error( > cc -g -O2 -Wall -Wmissing-prototypes > -Wstrict-prototypes -fexceptions -DXML_UNICODE -Ilib > -I. -o xmlwf/xmlwf.o -c xmlwf/xmlwf.c > /usr/ucb/cc: language optional software package not > installed > *** Error code 1 > make: Fatal error: Command failed for target > `xmlwf/xmlwf.o' > ) > 10) LD_LIBRARY_PATH = {$LD_LIBRARY_PATH > }:/misc/home/john/expat-1.95.3 > > After this if i run any perl file having lib function > like > > XML_Parser > XML_LChar * > XML_ExpatVersion(); > > throwing error > ( > syntax error at test.pl line 3, near "* > XML_ExpatVersion(" > Execution of test.pl aborted due to compilation > errors. > ) > > Please advice what i want to do get this work well > > > govind > > ________________________________________________________________________ > Want to sell your car? advertise on Yahoo Autos Classifieds. It's Free!! > visit http://in.autos.yahoo.com > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From fdrake@acm.org Wed Jul 10 09:32:01 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed Jul 10 08:32:01 2002 Subject: [Expat-discuss] Changing Windows DLL exports Message-ID: <15660.21302.830843.574306@grendel.zope.com> For the Windows DLLs, I'm planning to change the way we export symbols from using __declspec(dllexport) to using a .DEF file. The purpose of this change is to support drop-in compatibility of the DLL; see SF feature request #579144: http://sourceforge.net/tracker/index.php?func=detail&aid=579144&group_id=10127&atid=110127 THe question I have is this: Is anyone interested in using the API defined in xmltok.h directly? Recent DLLs have not been exporting those symbols, but the original DLLs from James Clark appearantly did (from xmltok.dll). If these symbols are needed from the DLL, I'll need to accomodate them in the .DEF file. I'm planning to make this change in Expat 1.95.5. Please comment on this in the SourceForge bug tracker if possible so that the discussion of this issue is all together. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Wed Jul 10 12:29:51 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed Jul 10 11:29:51 2002 Subject: [Expat-discuss] Mac OS X expertise needed Message-ID: <15660.31968.904008.572218@grendel.zope.com> Does anyone reading this list develop on Mac OS X? I just tried building & testing the CVS version of Expat on the Mac OS X 10.1 box on the SourceForge compile farm, and ran into what appears to be a libtool glitch (using libtool 1.4.2): The build proceeds just fine, but attempting to run the test suite generates the following error: tests/runtests: error: ~/src/expat-1.95.4/tests /home/users/f/fd/fdrake/src/expat-1.95.4/tests/.libs/runtests does not exist This script is just a wrapper for runtests. See the libtool documentation for more information. Installing the Expat library and then running the test suite (using "make check") works fine, but we should be able to run the tests without installing Expat. Can anyone help figure out what's going on? Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Wed Jul 10 12:38:02 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed Jul 10 11:38:02 2002 Subject: [Expat-discuss] Mac OS X expertise needed In-Reply-To: <15660.31968.904008.572218@grendel.zope.com> References: <15660.31968.904008.572218@grendel.zope.com> Message-ID: <15660.32467.240929.331958@grendel.zope.com> Fred L. Drake, Jr. writes: > I just tried building & testing the CVS version of Expat on the Mac OS > X 10.1 box on the SourceForge compile farm, and ran into what appears > to be a libtool glitch (using libtool 1.4.2): > > The build proceeds just fine, but attempting to run the test suite > generates the following error: > > tests/runtests: error: ~/src/expat-1.95.4/tests > /home/users/f/fd/fdrake/src/expat-1.95.4/tests/.libs/runtests does not exist > This script is just a wrapper for runtests. > See the libtool documentation for more information. > > Installing the Expat library and then running the test suite (using > "make check") works fine, but we should be able to run the tests > without installing Expat. Hmm. I just ran into the same thing with Solaris 2.8. I'm starting to really dislike libtool. ;-( -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From patrick@meer.net Wed Jul 10 12:53:03 2002 From: patrick@meer.net (Patrick McCormick) Date: Wed Jul 10 11:53:03 2002 Subject: [Expat-discuss] Mac OS X expertise needed References: <15660.31968.904008.572218@grendel.zope.com> Message-ID: <01c101c22842$e597ab30$a29d9dd1@patrick01> Some apache developers noted libtool problems this morning. http://groups.yahoo.com/group/new-httpd/message/37841 http://groups.yahoo.com/group/new-httpd/message/37854 http://groups.yahoo.com/group/new-httpd/message/37863 I'm not sure if it's related to what you are seeing or not. The middle message says to use the libtool at http://www.apache.org/~pier/macosx/. patrick ----- Original Message ----- From: "Fred L. Drake, Jr." To: Sent: Wednesday, July 10, 2002 11:28 AM Subject: [Expat-discuss] Mac OS X expertise needed > > Does anyone reading this list develop on Mac OS X? > > I just tried building & testing the CVS version of Expat on the Mac OS > X 10.1 box on the SourceForge compile farm, and ran into what appears > to be a libtool glitch (using libtool 1.4.2): > > The build proceeds just fine, but attempting to run the test suite > generates the following error: > > tests/runtests: error: ~/src/expat-1.95.4/tests > /home/users/f/fd/fdrake/src/expat-1.95.4/tests/.libs/runtests does not exist > This script is just a wrapper for runtests. > See the libtool documentation for more information. > > Installing the Expat library and then running the test suite (using > "make check") works fine, but we should be able to run the tests > without installing Expat. > > Can anyone help figure out what's going on? Thanks! > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Zope Corporation > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Two, two, TWO treats in one. > http://thinkgeek.com/sf > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss > From C-Chu@TTIMAIL.TAMU.EDU Wed Jul 10 15:26:06 2002 From: C-Chu@TTIMAIL.TAMU.EDU (Chu, Chi-Leung) Date: Wed Jul 10 14:26:06 2002 Subject: [Expat-discuss] Expat in Borland C++ Builder Message-ID: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ---------------------- multipart/alternative attachment I am using Borland C++ Builder 5.0 and is new to expat. I try to include expat.h to a toy project but I have the following error when compiling: "Linkage specification not allowed", which is referring to #ifdef __cplusplus extern "C" { #endif #ifdef XML_UNICODE_WCHAR_T #define XML_UNICODE #endif Does anyone know what I need to do to make it work? Or does expat.h works under Borland C++ Builder 5.0? Thanks a lot. Chu c-chu@ttimail.tamu.edu ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020710/b9549a09/attachment.html ---------------------- multipart/alternative attachment-- From fdrake@acm.org Thu Jul 11 00:08:03 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed Jul 10 23:08:03 2002 Subject: [Expat-discuss] Expat in Borland C++ Builder In-Reply-To: References: Message-ID: <15661.8337.692024.599649@grendel.zope.com> Chu, Chi-Leung writes: > Does anyone know what I need to do to make it work? Or does expat.h works > under Borland C++ Builder 5.0? The code itself should be portable enough, but makefiles/project files for the Borland compilers are not included. I hope to add the necessary files and instructions in the next release of Expat, due to be released at the end of this week. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From F.J.Franklin@sheffield.ac.uk Thu Jul 11 02:21:02 2002 From: F.J.Franklin@sheffield.ac.uk (F J Franklin) Date: Thu Jul 11 01:21:02 2002 Subject: [Expat-discuss] Mac OS X expertise needed In-Reply-To: <01c101c22842$e597ab30$a29d9dd1@patrick01> Message-ID: On Wed, 10 Jul 2002, Patrick McCormick wrote: > I'm not sure if it's related to what you are seeing or not. The middle > message says to use the libtool at http://www.apache.org/~pier/macosx/. Trying to use libtool on MacOS X is fun; I've tried to fix it a few times, with some (though incomplete) success. fink has a custom-patched libtool which may work for expat (I don't know) and maybe my CrazyWormhole build (http://www.crazy-wormhole.com/) fixes it? I don't know anything about apache's solution. I'm planning to take another shot at libtool this weekend, so I'll try with CVS expat while I'm at it... Frank Francis James Franklin F.J.Franklin@shef.ac.uk "No, she really likes me. She told me I look like Britney Spears, and why would you say that to somebody you don't like?" --- Elle Woods From fdrake@acm.org Fri Jul 12 13:00:20 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri Jul 12 12:00:20 2002 Subject: [Expat-discuss] Expat 1.95.4 released! Message-ID: <15663.9963.543974.827015@grendel.zope.com> As I just wrote in a news item for Expat: "It's an Expat first, just five weeks in the making!" Due to an painful bug in Expat 1.95.3, we've turned out traditional release cycle around and provided a working Expat in just five week (compare that to our previous release cycle of about 11 months). Expat 1.95.4, now available for download, fixes a bug that rendered the 1.95.3 release unusable for some, as well as a variety of obscure bugs. The build process should be more stable on more platforms. Support has been added for Borland C++ Builder 5 and the free BCC 5.5 compiler, Mac OS classic, and VMS. See the change log in the release for more detailed information: http://sourceforge.net/project/shownotes.php?release_id=99244 I'll build and post the Windows installer when I get back to my Windows machine tonight. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From weida.shui@univ.ox.ac.uk Sat Jul 13 05:00:03 2002 From: weida.shui@univ.ox.ac.uk (Vidar) Date: Sat Jul 13 04:00:03 2002 Subject: [Expat-discuss] Can expat check for well-formedness and validation for XML? Message-ID: <20020713115554.1AFE.WEIDA.SHUI@univ.ox.ac.uk> As title -- Best regards, Vidar mailto:weida.shui@univ.ox.ac.uk From fdrake@acm.org Sat Jul 13 06:25:01 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat Jul 13 05:25:01 2002 Subject: [Expat-discuss] Can expat check for well-formedness and validation for XML? In-Reply-To: <20020713115554.1AFE.WEIDA.SHUI@univ.ox.ac.uk> References: <20020713115554.1AFE.WEIDA.SHUI@univ.ox.ac.uk> Message-ID: <15664.7131.246755.412120@grendel.zope.com> Expat checks for well-formedness, but is not a validating parser. There are no plans to name it a validating parser. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Sat Jul 13 06:27:02 2002 From: karl@waclawek.net (Karl Waclawek) Date: Sat Jul 13 05:27:02 2002 Subject: [Expat-discuss] Can expat check for well-formedness and validation for XML? References: <20020713115554.1AFE.WEIDA.SHUI@univ.ox.ac.uk> Message-ID: <001001c22a6b$6955e7a0$0207a8c0@karl> > As title Expat checks for well-formedness, but does not validate. Karl From martap@tango04.net Tue Jul 16 01:56:03 2002 From: martap@tango04.net (Marta Padilla) Date: Tue Jul 16 00:56:03 2002 Subject: [Expat-discuss] Compiling expat with Borland C++ Builder Message-ID: This is a multi-part message in MIME format. ---------------------- multipart/mixed attachment Hi, I'm new with Expat and so far I was working with Microsoft C++ Ide (so I was using the dsw provided by expat itself). Now I was trying to compile it with Borland C++ Builder and I'm having problems. Has anyone done this before? Concretely: I'm trying to compile just xmltok, xmlparse and xmlwf. I start trying to build xmltok project, but, even I include *.c in the directory, the project doesn't build: I don't get compilation errors but when linking it doesn't find some obj: Obj from the files that had just compiled. Do I need to include another *.c, or specify dependencies on another project (xmlparse for example)? If so, how can I do it from Borland C++? Thanks in advance for any help, Marta ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2440 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020716/0a4f3265/winmail.bin ---------------------- multipart/mixed attachment-- From karl@waclawek.net Tue Jul 16 07:00:07 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue Jul 16 06:00:07 2002 Subject: [Expat-discuss] Compiling expat with Borland C++ Builder References: Message-ID: <001e01c22cc7$b5e457b0$9e539696@citkwaclaww2k> > Hi, > I'm new with Expat and so far I was working with Microsoft C++ Ide (so I was > using the dsw provided by expat itself). > Now I was trying to compile it with Borland C++ Builder and I'm having > problems. Has anyone done this before? > > Concretely: > I'm trying to compile just xmltok, xmlparse and xmlwf. I start trying to > build xmltok project, but, even I include *.c in the directory, the project > doesn't build: I don't get compilation errors but when linking it doesn't > find some obj: Obj from the files that had just compiled. > Do I need to include another *.c, or specify dependencies on another project > (xmlparse for example)? If so, how can I do it from > Borland C++? There is a bug in the build files. A patch has been provided. Just go to expat.sourceforge.net and download the patch. Karl From martap@tango04.net Tue Jul 16 10:03:03 2002 From: martap@tango04.net (Marta Padilla) Date: Tue Jul 16 09:03:03 2002 Subject: [Expat-discuss] Changes in XML_StartElementHandler ? Message-ID: This is a multi-part message in MIME format. ---------------------- multipart/mixed attachment I've upgraded to Expat 1.95.4, and when working with code I used with older versions, the start element handler seems to receive tag's name not as a string containing full name, but a string containing only the first character. For example: the handler doesn't receive as a tag's name the string "Message", but only a "M" Concretely, the code is: void my_start_hndl(void *data, const XML_Char *el, const XML_Char **attr) I've been looking in the web but I haven't found anything. Thanks in advance! Marta ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2286 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020716/8f30202b/winmail.bin ---------------------- multipart/mixed attachment-- From karl@waclawek.net Tue Jul 16 10:11:04 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue Jul 16 09:11:04 2002 Subject: [Expat-discuss] Changes in XML_StartElementHandler ? References: Message-ID: <00ba01c22ce2$581ecdc0$9e539696@citkwaclaww2k> > I've upgraded to Expat 1.95.4, and when working with code I used with older > versions, the start element handler seems to receive tag's name not as a > string containing full name, but a string containing only the first > character. For example: > > > > > the handler doesn't receive as a tag's name the string "Message", but only a > "M" > > Concretely, the code is: > > void my_start_hndl(void *data, const XML_Char *el, const XML_Char **attr) > > I've been looking in the web but I haven't found anything. You have two options: UTF-8 and UTF-16 output. You obviously are using Expat compiled for UTF-16 (wchar_t). Karl From douet@clipper.ens.fr Tue Jul 16 10:12:06 2002 From: douet@clipper.ens.fr (Florian Douetteau) Date: Tue Jul 16 09:12:06 2002 Subject: [Expat-discuss] Changes in XML_StartElementHandler ? In-Reply-To: Message-ID: On Tue, 16 Jul 2002, Marta Padilla wrote: Maybe you compiled the Expat-library in UTF-16 mode (sizeof(XML_Char) == 2), whereas your code was compiled against an old UTF-8 mode header (sizeof(XML_Char) == 1): check if, in your example, el[2] == 'e' : it would confirm this hypothesis. -- Florian > > I've upgraded to Expat 1.95.4, and when working with code I used with older > versions, the start element handler seems to receive tag's name not as a > string containing full name, but a string containing only the first > character. For example: > > > > > the handler doesn't receive as a tag's name the string "Message", but only a > "M" > > Concretely, the code is: > > void my_start_hndl(void *data, const XML_Char *el, const XML_Char **attr) > > I've been looking in the web but I haven't found anything. > > Thanks in advance! > > Marta > > > > From binu.subramanian@barconet.com Wed Jul 17 00:28:04 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Tue Jul 16 23:28:04 2002 Subject: [Expat-discuss] Encoding Issues + Writing XML Message-ID: <6973AC049FFDD41197BB0002A52916C6280732@bninchemex01.barconet.com> My application was using the Xerces parser earlier to parse xml files. I use XML to serialise my application data. My XML files have the following encoding: When i store XML files, i handle the following characters and convert them to their numerical entities and write them to the file. Invalid characters in XML : '>,<&" When i encountered some characters like the Euro sign,the trademark symbol, etc...i found that these were not viewed correctly in IE 6.0 So i converted them to their numerical entities and stored them in the XML file. These characters were parsed correctly by the Xerces parser on loading the file. I want to switch to the expat parser and use the SAX API provided by the SAX in Cpp wrapper written by Jez But on testing with it, i found that the numerical entities were converted into the characters they represented. For egs. The Euro sign with the numerical entity € Is this because of any encoding problem. How can i solve this? I want my application to serialise the data in XML and i want it to be loaded the same way across any system in the world. Any suggestions? Is there any way i can write XML files using the expat? regards, Binu From C-Chu@TTIMAIL.TAMU.EDU Wed Jul 17 09:14:02 2002 From: C-Chu@TTIMAIL.TAMU.EDU (Chu, Chi-Leung) Date: Wed Jul 17 08:14:02 2002 Subject: [Expat-discuss] Question in Using expat in Borland Environment Message-ID: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ---------------------- multipart/alternative attachment I am using Borland C++ Builder. I downloaded the latest version of expat, opened the ExpatGroup.bpg project and "make" it. Then, in my own project, I included "expat.h" and compiled my project. There is no error message. However, when I run it, I got a linker error: Unresolved external '_XML_ParserCreate' reference from E:\MyTest\Test.obj. Do I need to include other files as well? Or do I miss something when building my project? Thanks. Chu c-chu@ttimail.tamu.edu ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020717/82400a6a/attachment.html ---------------------- multipart/alternative attachment-- From martap@tango04.net Wed Jul 17 09:18:02 2002 From: martap@tango04.net (Marta Padilla) Date: Wed Jul 17 08:18:02 2002 Subject: [Expat-discuss] Question in Using expat in Borland Environment In-Reply-To: Message-ID: This is a multi-part message in MIME format. ---------------------- multipart/alternative attachment There's an error for building files in Borland C++ Builder. Go to the web expat.sourceforgen.net and see section "patches", there's a patch for that. Marta -----Mensaje original----- De: expat-discuss-admin@lists.sourceforge.net [mailto:expat-discuss-admin@lists.sourceforge.net]En nombre de Chu, Chi-Leung Enviado el: mi=E9rcoles, 17 de julio de 2002 17:10 Para: 'expat-discuss@lists.sourceforge.net' CC: Chu, Chi-Leung Asunto: [Expat-discuss] Question in Using expat in Borland Environment I am using Borland C++ Builder. I downloaded the latest version of exp= at, opened the ExpatGroup.bpg project and "make" it. Then, in my own project= , I included "expat.h" and compiled my project. There is no error message. However, when I run it, I got a linker error: Unresolved external '_XML_ParserCreate' reference from E:\MyTest\Test.obj. Do I need to incl= ude other files as well? Or do I miss something when building my project? Thanks. Chu c-chu@ttimail.tamu.edu ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020717/c6d10388/attachment.html ---------------------- multipart/alternative attachment-- From C-Chu@TTIMAIL.TAMU.EDU Wed Jul 17 15:40:04 2002 From: C-Chu@TTIMAIL.TAMU.EDU (Chu, Chi-Leung) Date: Wed Jul 17 14:40:04 2002 Subject: [Expat-discuss] Question in Using expat in Borland Environment Message-ID: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ---------------------- multipart/alternative attachment I am using Borland C++ Builder. I downloaded the latest version of expat, as well as the patch, opened the ExpatGroup.bpg project and "make" it. And I have no problem with this. Then, in my own project, I included "expat.h" and compiled my project. There is no error message. However, when I run it, I got a linker error: Unresolved external '_XML_ParserCreate' reference from E:\MyTest\Test.obj. Do I need to include other files as well? Or do I miss something when building my project? Thanks. Chu c-chu@ttimail.tamu.edu ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020717/d4510ecd/attachment.html ---------------------- multipart/alternative attachment-- From binu.subramanian@barconet.com Wed Jul 17 22:13:02 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Wed Jul 17 21:13:02 2002 Subject: [Expat-discuss] Expat : namespaces, XInclude, XPath ,XBase and Schemas Message-ID: <6973AC049FFDD41197BB0002A52916C6280735@bninchemex01.barconet.com> Hello, Does the expat parser support namespaces, XInclude, XPath, XBase and Schemas? regards, Binu From fdrake@acm.org Thu Jul 18 12:30:04 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Jul 18 11:30:04 2002 Subject: [Expat-discuss] Expat : namespaces, XInclude, XPath ,XBase and Schemas In-Reply-To: <6973AC049FFDD41197BB0002A52916C6280735@bninchemex01.barconet.com> References: <6973AC049FFDD41197BB0002A52916C6280735@bninchemex01.barconet.com> Message-ID: <15671.2322.654901.596218@grendel.zope.com> Subramanian, Binu writes: > Does the expat parser support namespaces, XInclude, XPath, XBase and > Schemas? Expat supports namespaces directly. Everything else needs to be built on top of it. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From martap@tango04.net Fri Jul 19 02:39:05 2002 From: martap@tango04.net (Marta Padilla) Date: Fri Jul 19 01:39:05 2002 Subject: [Expat-discuss] Expat on OS400 Message-ID: This is a multi-part message in MIME format. ---------------------- multipart/mixed attachment Hi, I'm trying to compile expat on OS/400. I noticed there's a winconfig.h as well as a unixconfig.h, and expat tries to find a expat_config.h if you're not in last platforms. Does anybody know where can I find this file (or do I have to create it myself) ? Thanks, Marta ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2007 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020719/85848747/winmail.bin ---------------------- multipart/mixed attachment-- From sbadenho@kingsley.co.za Fri Jul 19 21:37:02 2002 From: sbadenho@kingsley.co.za (Louw Badenhorst) Date: Fri Jul 19 20:37:02 2002 Subject: [Expat-discuss] Character conversion from Ansi and Latin 1 to UTF-8 Message-ID: <000e01c22f9e$9064ab50$6b7219c4@solarorange> This is a multi-part message in MIME format. ---------------------- multipart/alternative attachment Hi, Could anyone please refer me to the function(s) in Expat that actually = converts from the Latin1 and Ansi character set to UTF-8? I need to = convert documents to UTF-8 and XML and would just like to see code = examples of that kind of conversion to ensure that my own understanding = is correct. Help will be appreciated. Louw ---------------------- multipart/alternative attachment An HTML attachment was scrubbed... URL: http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020719/09706e7b/attachment.html ---------------------- multipart/alternative attachment-- From karl@waclawek.net Sat Jul 20 07:17:03 2002 From: karl@waclawek.net (Karl Waclawek) Date: Sat Jul 20 06:17:03 2002 Subject: [Expat-discuss] Character conversion from Ansi and Latin 1 to UTF-8 References: <000e01c22f9e$9064ab50$6b7219c4@solarorange> Message-ID: <012901c22ff2$a70be180$0207a8c0@karl> > Could anyone please refer me to the function(s) in Expat that actually converts > from the Latin1 and Ansi character set to UTF-8? I need to convert documents > to UTF-8 and XML and would just like to see code examples of that kind of conversion > to ensure that my own understanding is correct. Help will be appreciated. There are no such conversion functions. This simply is not Expat's job, unfortunately. For the creation of XML, an XML editor might be appropriate, which will implicitly gnerate UTF-8 or UTF-16. For machine-generation, an XML writer library would be needed. There are probably several of them available, which you can likely find with a Google search. Louw From binu.subramanian@barconet.com Tue Jul 23 04:44:02 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Tue Jul 23 03:44:02 2002 Subject: [Expat-discuss] Expat as a static library in Win32 Message-ID: <6973AC049FFDD41197BB0002A52916C628074D@bninchemex01.barconet.com> Hello, I want to use expat as a static library. I tried some of the suggestions given on the mailing list...ie replacing the lines in xmlparse.cpp and compiling with the _LIB option instead of the _USRDLL option. The lib compiled successfully but when i tried to use it in an application, i get the following errors: warning LNK4003: invalid library format; library ignored Has anyone successfully converted expat to a static library and used it in an application? Any suggestion/help will be very useful. kr, Binu From fdrake@acm.org Tue Jul 23 05:49:03 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Jul 23 04:49:03 2002 Subject: [Expat-discuss] Expat as a static library in Win32 In-Reply-To: <6973AC049FFDD41197BB0002A52916C628074D@bninchemex01.barconet.com> References: <6973AC049FFDD41197BB0002A52916C628074D@bninchemex01.barconet.com> Message-ID: <15677.17041.241124.714829@grendel.zope.com> Subramanian, Binu writes: > I want to use expat as a static library. I tried some of the suggestions > given on the mailing list...ie > replacing the lines in xmlparse.cpp and compiling with the _LIB option > instead of the _USRDLL option. There is no xmlparse.cpp in the current version of Expat; it is comprised entirely of C code, not C++. Have you tried a recent version? Static libraries are now supported. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Tue Jul 23 07:06:06 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue Jul 23 06:06:06 2002 Subject: [Expat-discuss] Re: [xml-dev] version numbers and infosets References: <200207230943.KAA20828@mcpherson.cogsci.ed.ac.uk> Message-ID: <001f01c23249$929365d0$9e539696@citkwaclaww2k> > Yes. But I have since been persuaded that it would be a mistake to > require 1.0 processors to reject 1.1-labelled documents that are > otherwise well-formed XML 1.0. Great. That allows us to keep Expat the same, but still allow the application to raise an error, if the context requires it. > I think the best compromise is to say that documents labelled with a > version other than 1.0 are not well-formed XML 1.0 documents, but that > an XML 1.0 parser may accept them. This breaks no documents and no > implementations, but it allows us to unambiguously state the version > of a document. Can anyone see any problems with it? Maybe, but then the meaning of "Well-formedness" becomes ambiguous, if a parser is allowed to accept such violations in specific circumstances. Just leave it as it is - a problem of the application domain. Karl From fdrake@acm.org Tue Jul 23 11:46:05 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Jul 23 10:46:05 2002 Subject: [Expat-discuss] Line endings and the default handlers Message-ID: <15677.38446.79059.513060@grendel.zope.com> I don't know whether anyone else has noticed, but the handlers set by XML_SetDefaultHandler() and XML_SetDefaultHandlerExpand() get text for which line endings have not been normalized according to the XML 1.0 spec. Has this been a problem for anyone? Would it become a problem if this were fixed? I've filed a bug report, but am unsure as to what to do about the situation: http://sourceforge.net/tracker/index.php?func=detail&aid=585521&group_id=10127&atid=110127 -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Tue Jul 23 12:50:03 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue Jul 23 11:50:03 2002 Subject: [Expat-discuss] Line endings and the default handlers References: <15677.38446.79059.513060@grendel.zope.com> Message-ID: <002b01c23279$9d2d52a0$9e539696@citkwaclaww2k> > > I don't know whether anyone else has noticed, but the handlers set by > XML_SetDefaultHandler() and XML_SetDefaultHandlerExpand() get text for > which line endings have not been normalized according to the XML 1.0 > spec. > > Has this been a problem for anyone? Would it become a problem if this > were fixed? What should happen in the case of a CDATA section? Are line breaks supposed to be normalized in CDATA sections? I thought one should not touch those data? From karl@waclawek.net Tue Jul 23 15:29:12 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue Jul 23 14:29:12 2002 Subject: [Expat-discuss] Line endings and the default handlers References: <15677.38446.79059.513060@grendel.zope.com> <002b01c23279$9d2d52a0$9e539696@citkwaclaww2k> Message-ID: <008c01c2328f$dd48c0c0$9e539696@citkwaclaww2k> > > I don't know whether anyone else has noticed, but the handlers set by > > XML_SetDefaultHandler() and XML_SetDefaultHandlerExpand() get text for > > which line endings have not been normalized according to the XML 1.0 > > spec. > > > > Has this been a problem for anyone? Would it become a problem if this > > were fixed? > > What should happen in the case of a CDATA section? > Are line breaks supposed to be normalized in CDATA sections? > I thought one should not touch those data? > > From the implementation it looks as if the default handler is intentionally > set up to report the raw data (except for the encoding). I read the XML spec again, and it seems that absolutely every line break, no matter where, has to be normalized. the characters passed to an application by the XML processor must be as if the XML processor normalized all line breaks in external parsed entities (including the document entity) on input, before parsing Which means that Fred is right, and this is a bug. Karl From fdrake@acm.org Tue Jul 23 15:39:02 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Jul 23 14:39:02 2002 Subject: [Expat-discuss] Line endings and the default handlers In-Reply-To: <008c01c2328f$dd48c0c0$9e539696@citkwaclaww2k> References: <15677.38446.79059.513060@grendel.zope.com> <002b01c23279$9d2d52a0$9e539696@citkwaclaww2k> <008c01c2328f$dd48c0c0$9e539696@citkwaclaww2k> Message-ID: <15677.52423.62809.58430@grendel.zope.com> Karl Waclawek writes: > I read the XML spec again, and it seems that absolutely every > line break, no matter where, has to be normalized. Thanks! You saved me a cut-n-paste job. ;-) Seriously, it's good to know I'm not just misreading the spec myself. I'm more interested in whether anyone thinks fixing this bug will introduce problems for their applications than in determing whether it really is a bug. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Tue Jul 23 15:48:03 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue Jul 23 14:48:03 2002 Subject: [Expat-discuss] Line endings and the default handlers In-Reply-To: <008c01c2328f$dd48c0c0$9e539696@citkwaclaww2k> References: <15677.38446.79059.513060@grendel.zope.com> <002b01c23279$9d2d52a0$9e539696@citkwaclaww2k> <008c01c2328f$dd48c0c0$9e539696@citkwaclaww2k> Message-ID: <15677.52967.203107.65950@grendel.zope.com> Karl Waclawek writes: > I read the XML spec again, and it seems that absolutely every > line break, no matter where, has to be normalized. I've sent an email to the Python XML-SIG to see if anyone there thinks fixing this will be problematic: http://mail.python.org/pipermail/xml-sig/2002-July/008105.html -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From binu.subramanian@barconet.com Tue Jul 23 21:29:02 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Tue Jul 23 20:29:02 2002 Subject: [Expat-discuss] Expat as a static library in Win32 Message-ID: <6973AC049FFDD41197BB0002A52916C628074F@bninchemex01.barconet.com> Hello Fred, Thank you for the prompt response. I am sorry, i meant xmlparse.c Yes, i have downloaded Expat 1.95.4 for Windows ( July 12,2002 release). It still compiles to give a dll. How do i convert it to a static lib? As i mentioned, i changed the project options from _USRDLL to _LIB and replaced the #define XMLPARSEAPI(type) __declspec(dllexport) type __cdecl in xmparse.c with #ifdef _USRDLL #define XMLPARSEAPI(type) __declspec(dllexport) type __cdecl #else #define XMLPARSEAPI(type) type __cdecl #endif I still get the same error : warning LNK4003: invalid library format; library ignored Any ideas/ suggestions will be very useful. Binu -----Original Message----- From: Fred L. Drake, Jr. [mailto:fdrake@acm.org] Sent: 23 July 2002 17:19 To: Subramanian, Binu Cc: expat-discuss@lists.sourceforge.net Subject: Re: [Expat-discuss] Expat as a static library in Win32 Subramanian, Binu writes: > I want to use expat as a static library. I tried some of the suggestions > given on the mailing list...ie > replacing the lines in xmlparse.cpp and compiling with the _LIB option > instead of the _USRDLL option. There is no xmlparse.cpp in the current version of Expat; it is comprised entirely of C code, not C++. Have you tried a recent version? Static libraries are now supported. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From karl@waclawek.net Tue Jul 23 21:46:01 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue Jul 23 20:46:01 2002 Subject: [Expat-discuss] Line endings and the default handlers References: <15677.38446.79059.513060@grendel.zope.com><002b01c23279$9d2d52a0$9e539696@citkwaclaww2k><008c01c2328f$dd48c0c0$9e539696@citkwaclaww2k> <15677.52967.203107.65950@grendel.zope.com> Message-ID: <001601c232c7$7a394710$0207a8c0@karl> > Karl Waclawek writes: > > I read the XML spec again, and it seems that absolutely every > > line break, no matter where, has to be normalized. > > I've sent an email to the Python XML-SIG to see if anyone there thinks > fixing this will be problematic: > > http://mail.python.org/pipermail/xml-sig/2002-July/008105.html > I also looked at the code again. I am almost sure that it was intentional that there is no normalization for the default handler. Actually, I just dug out this comment for the default handler in expat.h, which explains it all: /* This is called for any characters in the XML document for which there is no applicable handler. This includes both characters that are part of markup which is of a kind that is not reported (comments, markup declarations), or characters that are part of a construct which could be reported but for which no handler has been supplied. The characters are passed exactly as they were in the XML document except that they will be encoded in UTF-8. Line boundaries are not normalized. Note that a byte order mark character is not passed to the default handler. There are no guarantees about how characters are divided between calls to the default handler: for example, a comment might be split between multiple calls. */ So, it would be nice to know what the intention was. Maybe to enable round-tripping? First you process the regular handler, then you call XML_DefaultCurrent from within that handler, to get the exact data from the input document. Just a guess. Karl From binu.subramanian@barconet.com Tue Jul 23 22:07:01 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Tue Jul 23 21:07:01 2002 Subject: [Expat-discuss] Encoding issues in expat Message-ID: <6973AC049FFDD41197BB0002A52916C6280751@bninchemex01.barconet.com> Hello, I am writing an XML file and using entities for special characters like the Euro and the trademark characters. My XML file uses the UTF-8 encoding. This XML file is viewed correctly in IE 6.0 and when parsed by libxml and the Xerces XML parsers, the character conversion is correct. ie The Euro sign replaced with the numerical entity € in the XML file is correctly translated back to Euro by the libxml and the Xerces XML parsers. What should i do to get the expat parser to translate the numerical entity € back to the Euro character. Am i missing something here? Thanks. Binu From binu.subramanian@barconet.com Wed Jul 24 05:46:03 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Wed Jul 24 04:46:03 2002 Subject: [Expat-discuss] Fw: Extra character inserted in CharacterData Handler? Message-ID: <6973AC049FFDD41197BB0002A52916C6280754@bninchemex01.barconet.com> Hello, I am facing exactly the same problem. In my case the characters are the Euro, trademark, etc. When i write the xml file, i replace the Euro character with its = numerical entity €=20 I have specified the encoding for my XML file as UTF-8. Now when the expat parser parses the file, it appends the =C2 = character. so it is =C2 followed by the Euro character. What should i do to get rid of the extra character? Am i missing something here? Binu >=20 > The "." character in your file - 0xB7 - is invalid UTF-8. > Maybe it is valid ISO-8859-1? > In that case you must add an XML declaration. >=20 > Actually, 1.95.3 should reject it (and it does so on my system). =20 Rolf Ade just pointed out to me that I didn't read your code. You passed the ISO-8859-1 encoding to the parser, so there was no error on your side. =20 However, what you reported looks exactly like what a word processor would show you when it expects ISO-8859-1, but gets UTF-8 (tested with Wordpad). Now, this would be a correct result, since Expat only passes UTF-8 or UTF-16 to its handlers, no matter what the input. =20 Karl =20 From karl@waclawek.net Wed Jul 24 07:12:04 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed Jul 24 06:12:04 2002 Subject: [Expat-discuss] Encoding issues in expat References: <6973AC049FFDD41197BB0002A52916C6280751@bninchemex01.barconet.com> Message-ID: <001001c23313$94bb4010$9e539696@citkwaclaww2k> > I am writing an XML file and using entities for special characters like the > Euro and the trademark characters. > My XML file uses the UTF-8 encoding. > > This XML file is viewed correctly in IE 6.0 and when parsed by libxml and > the Xerces XML parsers, the character conversion is correct. > ie The Euro sign replaced with the numerical entity € in the XML file > is correctly translated back to Euro by the libxml and the Xerces XML > parsers. > > What should i do to get the expat parser to translate the numerical entity > € back to the Euro character. It works fine for me. What document are you using? What version of Expat are you using? How is it compiled, for UTF-8 or UTF-16 output? Karl From karl@waclawek.net Wed Jul 24 07:23:04 2002 From: karl@waclawek.net (Karl Waclawek) Date: Wed Jul 24 06:23:04 2002 Subject: [Expat-discuss] Line endings and the default handlers References: <15677.38446.79059.513060@grendel.zope.com> <002b01c23279$9d2d52a0$9e539696@citkwaclaww2k> <008c01c2328f$dd48c0c0$9e539696@citkwaclaww2k> Message-ID: <002801c23315$298a6260$9e539696@citkwaclaww2k> > So, it would be nice to know what the intention was. > Maybe to enable round-tripping? First you process the > regular handler, then you call XML_DefaultCurrent from > within that handler, to get the exact data from the input > document. Just a guess. Sorry, but I am changing my opinion again (as often as my shirts, it seems ). Yes, the XML 1.0 specs state that all characters reported to the application must have normalized linebreaks. However, I do not take this as a rule that forbids us to have a feature that gives access to the raw data. It seems pretty clear that the default handler was meant for that. It even allows double-reporting of the same data, once through a regular handler, and then a second time using the XML_DefaultCurrent function, which class the default handler. This suggest that it is not meant to be a regular handler for reporting the XML InfoSet. And all applications that use this feature of having access to the raw data would be broken with the proposed fix. So, my vote goes for leaving it as it is, but fixing the documentation in reference.html to be in sync with the comments in expat.h. I hope this is the last time I changed my opinion. Karl From martap@tango04.net Wed Jul 24 11:30:05 2002 From: martap@tango04.net (Marta Padilla) Date: Wed Jul 24 10:30:05 2002 Subject: [Expat-discuss] Expat on AS400. Help needed ! Message-ID: This is a multi-part message in MIME format. ---------------------- multipart/mixed attachment Hi, I'm using Expat on OS400. I've already have the program XMLWF (shipped with Expat) compiled, linked and ready for running but when I execute it I get the following error: In file XMLPARSE, inside function poolGrow and when executing following sentence, tem = pool->mem->Global Function Callmalloc_fcn(Global Function Calloffsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); the program fails, because pool has all his values set to NULL. That is, not initialized. I don't have the same error in Windows. Has anyone worked with Expat on OS400 ??? Any ideas why pool doesn't initialize ? Thanks in advance! Marta ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2602 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020724/6d91c12e/winmail.bin ---------------------- multipart/mixed attachment-- From Josh.Martin@abq.sc.philips.com Wed Jul 24 15:58:06 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Wed Jul 24 14:58:06 2002 Subject: [Expat-discuss] Encoding Issues + Writing XML Message-ID: <200207242158.PAA02354@abqn42.abq.sc.philips.com> Please correct me if I'm wrong, but you might be able to turn that off by passing XML_PARAM_ENTITY_PARSING_NEVER to the XML_SetParamEntityParsing() function. But then again, it's been about a year since I've done anything with XML, and the &#nnn type entity probably isn't considered a parameter entity. - Josh Martin > My application was using the Xerces parser earlier to parse xml files. I use > XML to serialise my application data. > My XML files have the following encoding: > > > > When i store XML files, i handle the following characters and convert them > to their numerical entities and write them to the file. > Invalid characters in XML : '>,<&" > > When i encountered some characters like the Euro sign,the trademark symbol, > etc...i found that these were not viewed correctly in IE 6.0 > So i converted them to their numerical entities and stored them in the XML > file. These characters were parsed correctly by the Xerces parser on loading > the file. > > I want to switch to the expat parser and use the SAX API provided by the SAX > in Cpp wrapper written by Jez > > But on testing with it, i found that the numerical entities were converted > into the characters they represented. > For egs. The Euro sign with the numerical entity € > > Is this because of any encoding problem. How can i solve this? I want my > application to serialise the data in XML and i want it to be loaded the same > way across any system in the world. > Any suggestions? > > Is there any way i can write XML files using the expat? > > regards, > Binu > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From Josh.Martin@abq.sc.philips.com Wed Jul 24 16:03:02 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Wed Jul 24 15:03:02 2002 Subject: [Expat-discuss] Question in Using expat in Borland Environment Message-ID: <200207242202.QAA02363@abqn42.abq.sc.philips.com> You have to also include the expat library or dll that you just built with borland into your other project. - Josh Martin > I am using Borland C++ Builder. I downloaded the latest version of expat, > as well as the patch, opened the ExpatGroup.bpg project and "make" it. And > I have no problem with this. > > Then, in my own project, I included "expat.h" and compiled my project. > There is no error message. However, when I run it, I got a linker error: > Unresolved external '_XML_ParserCreate' reference from E:\MyTest\Test.obj. > Do I need to include other files as well? Or do I miss something when > building my project? Thanks. > > Chu > c-chu@ttimail.tamu.edu > From Josh.Martin@abq.sc.philips.com Wed Jul 24 16:22:03 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Wed Jul 24 15:22:03 2002 Subject: [Expat-discuss] Fw: Extra character inserted in CharacterData Handler? Message-ID: <200207242222.QAA02372@abqn42.abq.sc.philips.com> Hi, Call me crazy, but isn't UTF-8 an 8-bit wide character encoding format? An= d if=20 so, isn't the number 8364 a bit out of its league? If this is true then I = would=20 think that the =C2 character is either some sort of multi-byte character=20 indicator, or is just expat fudging on the numbers it doesn't understand. = Try=20 not specifying the encoding format for the XML document and the XML parser.= ..=20 see what happens. Let us know how it goes. I think that is how I solved t= his=20 problem when I encountered it about a year ago. BTW, I thought you were trying to keep the parser from converting character= =20 entities to their character representation? - Josh Martin > Hello, >=20 > I am facing exactly the same problem. In my case the characters are the > Euro, trademark, etc. > When i write the xml file, i replace the Euro character with its numerica= l > entity €=20 > I have specified the encoding for my XML file as UTF-8. >=20 > Now when the expat parser parses the file, it appends the =C2 character. = so it > is =C2 followed by the Euro character. > What should i do to get rid of the extra character? > Am i missing something here? > Binu >=20 >=20 > >=20 > > The "." character in your file - 0xB7 - is invalid UTF-8. > > Maybe it is valid ISO-8859-1? > > In that case you must add an XML declaration. > >=20 > > Actually, 1.95.3 should reject it (and it does so on my system). > =20 > Rolf Ade just pointed out to me that I didn't read your code. > You passed the ISO-8859-1 encoding to the parser, so there > was no error on your side. > =20 > However, what you reported looks exactly like what a word processor > would show you when it expects ISO-8859-1, but gets UTF-8 (tested with > Wordpad). > Now, this would be a correct result, since Expat only passes UTF-8 > or UTF-16 to its handlers, no matter what the input. > =20 > Karl > =20 >=20 >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From Josh.Martin@abq.sc.philips.com Wed Jul 24 16:30:05 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Wed Jul 24 15:30:05 2002 Subject: [Expat-discuss] Expat on AS400. Help needed ! Message-ID: <200207242229.QAA02376@abqn42.abq.sc.philips.com> Could you throw me a bone and give me some background information on OS400? Such as: What platform is it for? Is it POSIX compliant? How old is it? My first impression is that you're building in a bogus or buggy malloc function, or you are magically and repeatedly running out of memory when trying to allocate/initialize pool, and Frank inexplicably forgot to check for that. And whatever happened to your problem with the non-existant expat_config.h file? - Josh Martin > Hi, > > I'm using Expat on OS400. I've already have the program XMLWF (shipped with > Expat) compiled, linked and ready for running but > when I execute it I get the following error: > > In file XMLPARSE, inside function poolGrow and when executing following > sentence, > > tem = pool->mem->Global Function Callmalloc_fcn(Global Function > Calloffsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); > > the program fails, because pool has all his values set to NULL. That is, not > initialized. I don't have the > same error in Windows. > > Has anyone worked with Expat on OS400 ??? Any ideas why pool doesn't > initialize ? > > Thanks in advance! > > Marta > > > > From binu.subramanian@barconet.com Wed Jul 24 21:42:02 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Wed Jul 24 20:42:02 2002 Subject: [Expat-discuss] Fw: Extra character inserted in CharacterData Handler? Message-ID: <6973AC049FFDD41197BB0002A52916C6280756@bninchemex01.barconet.com> Hi, No UTF-8 is not restricted to 8 bits. Do refer this link: http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 I have some data which contains some characters like Euro, trademark, = etc. I want to serialise it in XML format. So when i write the XML file, i replace the characters with their numerical entities. This XML file is viewed correctly in IE 6.0. But when i parse the XML file, the expat prefixes the =C2 character. ie i get the =C2 character followed by the Euro character. My XML file has the encoding specified as UTF-8. I will try changing the encoding of the parser and check. Binu -----Original Message----- From: Josh Martin [mailto:Josh.Martin@abq.sc.philips.com] Sent: 25 July 2002 03:52 To: expat-discuss@lists.sourceforge.net; binu.subramanian@barconet.com Subject: Re: [Expat-discuss] Fw: Extra character inserted in CharacterData Handler? Hi, Call me crazy, but isn't UTF-8 an 8-bit wide character encoding format? = And if=20 so, isn't the number 8364 a bit out of its league? If this is true = then I would=20 think that the =C2 character is either some sort of multi-byte = character=20 indicator, or is just expat fudging on the numbers it doesn't = understand. Try=20 not specifying the encoding format for the XML document and the XML parser...=20 see what happens. Let us know how it goes. I think that is how I = solved this=20 problem when I encountered it about a year ago. BTW, I thought you were trying to keep the parser from converting = character=20 entities to their character representation? - Josh Martin > Hello, >=20 > I am facing exactly the same problem. In my case the characters are = the > Euro, trademark, etc. > When i write the xml file, i replace the Euro character with its = numerical > entity €=20 > I have specified the encoding for my XML file as UTF-8. >=20 > Now when the expat parser parses the file, it appends the =C2 = character. so it > is =C2 followed by the Euro character. > What should i do to get rid of the extra character? > Am i missing something here? > Binu >=20 >=20 > >=20 > > The "." character in your file - 0xB7 - is invalid UTF-8. > > Maybe it is valid ISO-8859-1? > > In that case you must add an XML declaration. > >=20 > > Actually, 1.95.3 should reject it (and it does so on my system). > =20 > Rolf Ade just pointed out to me that I didn't read your code. > You passed the ISO-8859-1 encoding to the parser, so there > was no error on your side. > =20 > However, what you reported looks exactly like what a word processor > would show you when it expects ISO-8859-1, but gets UTF-8 (tested = with > Wordpad). > Now, this would be a correct result, since Expat only passes UTF-8 > or UTF-16 to its handlers, no matter what the input. > =20 > Karl > =20 >=20 >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From binu.subramanian@barconet.com Thu Jul 25 00:50:03 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Wed Jul 24 23:50:03 2002 Subject: [Expat-discuss] Expat as a static library in Win32 Message-ID: <6973AC049FFDD41197BB0002A52916C6280759@bninchemex01.barconet.com> Hello Fred, Can u tell me how can i make expat a static library? - Binu -----Original Message----- From: Fred L. Drake, Jr. [mailto:fdrake@acm.org] Sent: 23 July 2002 17:19 To: Subramanian, Binu Cc: expat-discuss@lists.sourceforge.net Subject: Re: [Expat-discuss] Expat as a static library in Win32 Subramanian, Binu writes: > I want to use expat as a static library. I tried some of the suggestions > given on the mailing list...ie > replacing the lines in xmlparse.cpp and compiling with the _LIB option > instead of the _USRDLL option. There is no xmlparse.cpp in the current version of Expat; it is comprised entirely of C code, not C++. Have you tried a recent version? Static libraries are now supported. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From martap@tango04.net Thu Jul 25 05:32:02 2002 From: martap@tango04.net (Marta Padilla) Date: Thu Jul 25 04:32:02 2002 Subject: [Expat-discuss] Expat on AS400. Help needed ! In-Reply-To: <200207242229.QAA02376@abqn42.abq.sc.philips.com> Message-ID: I finally solved the problem, creating an .h specially for AS/400. In fact, the problem I had with the pool was related to the other one (that Josh mentioned), the expat_config.h. In case anyone needs to build expat on OS400 I can post the os400.h file I created in the web. Then only with: #ifdef __OS400__ #include "os400.h" #endif should work. Marta -----Mensaje original----- De: expat-discuss-admin@lists.sourceforge.net [mailto:expat-discuss-admin@lists.sourceforge.net]En nombre de Josh Martin Enviado el: jueves, 25 de julio de 2002 0:30 Para: expat-discuss@lists.sourceforge.net Asunto: Re: [Expat-discuss] Expat on AS400. Help needed ! Could you throw me a bone and give me some background information on OS400? Such as: What platform is it for? Is it POSIX compliant? How old is it? My first impression is that you're building in a bogus or buggy malloc function, or you are magically and repeatedly running out of memory when trying to allocate/initialize pool, and Frank inexplicably forgot to check for that. And whatever happened to your problem with the non-existant expat_config.h file? - Josh Martin > Hi, > > I'm using Expat on OS400. I've already have the program XMLWF (shipped with > Expat) compiled, linked and ready for running but > when I execute it I get the following error: > > In file XMLPARSE, inside function poolGrow and when executing following > sentence, > > tem = pool->mem->Global Function Callmalloc_fcn(Global Function > Calloffsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); > > the program fails, because pool has all his values set to NULL. That is, not > initialized. I don't have the > same error in Windows. > > Has anyone worked with Expat on OS400 ??? Any ideas why pool doesn't > initialize ? > > Thanks in advance! > > Marta > > > > ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Expat-discuss mailing list Expat-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/expat-discuss From fdrake@acm.org Thu Jul 25 05:41:03 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Jul 25 04:41:03 2002 Subject: [Expat-discuss] Expat on AS400. Help needed ! In-Reply-To: References: <200207242229.QAA02376@abqn42.abq.sc.philips.com> Message-ID: <15679.58276.425854.242020@grendel.zope.com> Marta Padilla writes: > I finally solved the problem, creating an .h specially for AS/400. In fact, > the problem I had with the pool was related to the other one (that Josh > mentioned), the expat_config.h. Aha! Not too surprising, though I'd have no idea what changed, having never used one of those systems. > In case anyone needs to build expat on OS400 I can post the os400.h file > I created in the web. > Then only with: > > #ifdef __OS400__ > #include "os400.h" > #endif Was this change only needed where expat_config.h is used in the lib directory, or were other changes needed? If you can send the file by email, of file a feature request for OS/400 support on SourceForge with the file attached, I'll be glad to integrate the changes into the official Expat distribution so that platform is supported in the future. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From martap@tango04.net Thu Jul 25 05:52:05 2002 From: martap@tango04.net (Marta Padilla) Date: Thu Jul 25 04:52:05 2002 Subject: [Expat-discuss] Expat on AS400. Help needed ! In-Reply-To: <15679.58276.425854.242020@grendel.zope.com> Message-ID: Unfortunately, there were other changes: - In OS400 file names cannot be longer than 8 characters so I had to change file names like "xmltok_impl" and the source code of both library and xmlwf (the application I was testing) with these changes. - Other "silly" issue was to change lines longer than 80 characters (another wonderful feature of OS400) in all source code. Changing that, and including the os400.h file instead of expat_config.h with the defines, should work. -----Mensaje original----- De: Fred L. Drake, Jr. [mailto:fdrake@acm.org] Enviado el: jueves, 25 de julio de 2002 13:40 Para: martap@tango04.net CC: Josh Martin; expat-discuss@lists.sourceforge.net Asunto: RE: [Expat-discuss] Expat on AS400. Help needed ! Marta Padilla writes: > I finally solved the problem, creating an .h specially for AS/400. In fact, > the problem I had with the pool was related to the other one (that Josh > mentioned), the expat_config.h. Aha! Not too surprising, though I'd have no idea what changed, having never used one of those systems. > In case anyone needs to build expat on OS400 I can post the os400.h file > I created in the web. > Then only with: > > #ifdef __OS400__ > #include "os400.h" > #endif Was this change only needed where expat_config.h is used in the lib directory, or were other changes needed? If you can send the file by email, of file a feature request for OS/400 support on SourceForge with the file attached, I'll be glad to integrate the changes into the official Expat distribution so that platform is supported in the future. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Thu Jul 25 09:01:02 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu Jul 25 08:01:02 2002 Subject: [Expat-discuss] Expat on AS400. Help needed ! In-Reply-To: References: <15679.58276.425854.242020@grendel.zope.com> Message-ID: <15680.4741.126908.941121@grendel.zope.com> Marta Padilla writes: > - In OS400 file names cannot be longer than 8 characters so I had to change > file names like "xmltok_impl" and the source code of both library > and xmlwf (the application I was testing) with these changes. Is that 8 character for the entire filename, or the first segment? Does OS/400 use the old 8.3 scheme? (Is there online documentation for OS/400 from IBM?) > - Other "silly" issue was to change lines longer than 80 characters (another > wonderful feature of OS400) in all source code. That's doable, though there will be some pain in xmlparse.c. I've committed the needed changes to most of the files in CVS. > Changing that, and including the os400.h file instead of > expat_config.h with the defines, should work. It would be nice to support another platform! Please send the new header so I can get that added as well. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From Josh.Martin@abq.sc.philips.com Thu Jul 25 12:22:07 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Thu Jul 25 11:22:07 2002 Subject: [Expat-discuss] Fw: Extra character inserted in CharacterData Handler? Message-ID: <200207251821.MAA02895@abqn42.abq.sc.philips.com> Hi, I finally figured it out, thanks in no small way to your informative link t= o the=20 UTF-8 encoding. The output you are recieving from expat is completely correct. Here's the = deal:=20 I was right, although I didn't know why, the (A WITH CIRCUMFLEX) IS a sort = of=20 multi-byte character indicator. It is the first byte of the UTF-8 encoding= for=20 the trademark symbol. The output that you are seeing from expat are the=20 properly UTF-8 encoded characters viewed in the ISO-8859-1 encoding (or wha= tever=20 ISO-8859 derivative your OS happens to be using). It just so happens that = the=20 UTF-8 encoding for the trademark symbol, also happens to contain the charac= ter=20 representing the trademark symbol encoded in ISO-8859-1 (preceded by the = =C2=20 character). So basically what I'm saying is that if you take the output from the charac= ter=20 data handler, save it in a text file, load it into a browser (preferrably= =20 Netscape, as that will make it much easier to:) and change the encoding to= =20 UTF-8, then the output will look exactly like it is supposed to. In the end, expat did what it was supposed to, and your OS did what it was= =20 supposed to, but we (you and I), being mere mortals, were confused because = they=20 did what we said, not what we meant. I hope this information helps you and I know I certainly learned alot. Happ= y=20 coding. - Josh Martin > Hi, >=20 > No UTF-8 is not restricted to 8 bits. Do refer this link: > http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 >=20 > I have some data which contains some characters like Euro, trademark, etc= . > I want to serialise it in XML format. So when i write the XML file, i > replace > the characters with their numerical entities. This XML file is viewed > correctly > in IE 6.0. But when i parse the XML file, the expat prefixes the =C2 > character. ie > i get the =C2 character followed by the Euro character. >=20 > My XML file has the encoding specified as UTF-8. >=20 > I will try changing the encoding of the parser and check. >=20 > Binu > -----Original Message----- > From: Josh Martin [mailto:Josh.Martin@abq.sc.philips.com] > Sent: 25 July 2002 03:52 > To: expat-discuss@lists.sourceforge.net; binu.subramanian@barconet.com > Subject: Re: [Expat-discuss] Fw: Extra character inserted in > CharacterData Handler? >=20 >=20 > Hi, >=20 > Call me crazy, but isn't UTF-8 an 8-bit wide character encoding format? = And > if=20 > so, isn't the number 8364 a bit out of its league? If this is true then = I > would=20 > think that the =C2 character is either some sort of multi-byte character= =20 > indicator, or is just expat fudging on the numbers it doesn't understand. > Try=20 > not specifying the encoding format for the XML document and the XML > parser...=20 > see what happens. Let us know how it goes. I think that is how I solved > this=20 > problem when I encountered it about a year ago. >=20 > BTW, I thought you were trying to keep the parser from converting charact= er=20 > entities to their character representation? >=20 > - Josh Martin >=20 > > Hello, > >=20 > > I am facing exactly the same problem. In my case the characters are the > > Euro, trademark, etc. > > When i write the xml file, i replace the Euro character with its numeri= cal > > entity €=20 > > I have specified the encoding for my XML file as UTF-8. > >=20 > > Now when the expat parser parses the file, it appends the =C2 character= . so > it > > is =C2 followed by the Euro character. > > What should i do to get rid of the extra character? > > Am i missing something here? > > Binu > >=20 > >=20 > > >=20 > > > The "." character in your file - 0xB7 - is invalid UTF-8. > > > Maybe it is valid ISO-8859-1? > > > In that case you must add an XML declaration. > > >=20 > > > Actually, 1.95.3 should reject it (and it does so on my system). > > =20 > > Rolf Ade just pointed out to me that I didn't read your code. > > You passed the ISO-8859-1 encoding to the parser, so there > > was no error on your side. > > =20 > > However, what you reported looks exactly like what a word processor > > would show you when it expects ISO-8859-1, but gets UTF-8 (tested with > > Wordpad). > > Now, this would be a correct result, since Expat only passes UTF-8 > > or UTF-16 to its handlers, no matter what the input. > > =20 > > Karl > > =20 > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Expat-discuss mailing list > > Expat-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/expat-discuss >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by: Jabber - The world's fastest growing= =20 > real-time communications platform! Don't just IM. Build it in!=20 > http://www.jabber.com/osdn/xim > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From Josh.Martin@abq.sc.philips.com Thu Jul 25 12:24:06 2002 From: Josh.Martin@abq.sc.philips.com (Josh Martin) Date: Thu Jul 25 11:24:06 2002 Subject: [Expat-discuss] Expat as a static library in Win32 Message-ID: <200207251824.MAA02899@abqn42.abq.sc.philips.com> Do you have expat 1.95.4? Are you using Microsoft Visual C, or Borland Builder? - Josh Martin > Hello Fred, > > Can u tell me how can i make expat a static library? > > - Binu > > -----Original Message----- > From: Fred L. Drake, Jr. [mailto:fdrake@acm.org] > Sent: 23 July 2002 17:19 > To: Subramanian, Binu > Cc: expat-discuss@lists.sourceforge.net > Subject: Re: [Expat-discuss] Expat as a static library in Win32 > > > > Subramanian, Binu writes: > > I want to use expat as a static library. I tried some of the suggestions > > given on the mailing list...ie > > replacing the lines in xmlparse.cpp and compiling with the _LIB option > > instead of the _USRDLL option. > > There is no xmlparse.cpp in the current version of Expat; it is > comprised entirely of C code, not C++. Have you tried a recent > version? Static libraries are now supported. > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Zope Corporation > > > ------------------------------------------------------- > This sf.net email is sponsored by: Jabber - The world's fastest growing > real-time communications platform! Don't just IM. Build it in! > http://www.jabber.com/osdn/xim > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From binu.subramanian@barconet.com Thu Jul 25 23:18:02 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Thu Jul 25 22:18:02 2002 Subject: [Expat-discuss] Expat as a static library in Win32 Message-ID: <6973AC049FFDD41197BB0002A52916C657F08C@bninchemex01.barconet.com> Hello, I finally got it. Yes, i am having expat 1.95.4 and am using Microsoft Visual C. Made an empty Win 32 static lib, copied all the *.c and *.h files to it and finally changed the lines in xmlparse.c and expat.h from #define XMLPARSEAPI(type) __declspec(dllimport) type __cdecl to #define XMLPARSEAPI(type) type __cdecl I was finally able to compile and link it to my project. Thank you. regards, Binu -----Original Message----- From: Josh Martin [mailto:Josh.Martin@abq.sc.philips.com] Sent: 25 July 2002 23:54 To: fdrake@acm.org; binu.subramanian@barconet.com Cc: expat-discuss@lists.sourceforge.net Subject: RE: [Expat-discuss] Expat as a static library in Win32 Do you have expat 1.95.4? Are you using Microsoft Visual C, or Borland Builder? - Josh Martin > Hello Fred, > > Can u tell me how can i make expat a static library? > > - Binu > > -----Original Message----- > From: Fred L. Drake, Jr. [mailto:fdrake@acm.org] > Sent: 23 July 2002 17:19 > To: Subramanian, Binu > Cc: expat-discuss@lists.sourceforge.net > Subject: Re: [Expat-discuss] Expat as a static library in Win32 > > > > Subramanian, Binu writes: > > I want to use expat as a static library. I tried some of the suggestions > > given on the mailing list...ie > > replacing the lines in xmlparse.cpp and compiling with the _LIB option > > instead of the _USRDLL option. > > There is no xmlparse.cpp in the current version of Expat; it is > comprised entirely of C code, not C++. Have you tried a recent > version? Static libraries are now supported. > > > -Fred > > -- > Fred L. Drake, Jr. > PythonLabs at Zope Corporation > > > ------------------------------------------------------- > This sf.net email is sponsored by: Jabber - The world's fastest growing > real-time communications platform! Don't just IM. Build it in! > http://www.jabber.com/osdn/xim > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From binu.subramanian@barconet.com Fri Jul 26 01:55:05 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Fri Jul 26 00:55:05 2002 Subject: [Expat-discuss] Fw: Extra character inserted in CharacterData Handler? Message-ID: <6973AC049FFDD41197BB0002A52916C657F091@bninchemex01.barconet.com> Hi, Would you have any idea how to convert the UTF-8 characters back to = ascii ( or extended ascii) programatically so that i would be rid of the multi byte character indicator? I am working on VC 6.0 and i would like my application to run on all = windows platforms. Any help/ suggestion will be useful. kr, Binu -----Original Message----- From: Josh Martin [mailto:Josh.Martin@abq.sc.philips.com] Sent: 25 July 2002 23:52 To: expat-discuss@lists.sourceforge.net; binu.subramanian@barconet.com Subject: RE: [Expat-discuss] Fw: Extra character inserted in CharacterData Handler? Hi, I finally figured it out, thanks in no small way to your informative = link to the=20 UTF-8 encoding. The output you are recieving from expat is completely correct. Here's = the deal:=20 I was right, although I didn't know why, the (A WITH CIRCUMFLEX) IS a = sort of=20 multi-byte character indicator. It is the first byte of the UTF-8 = encoding for=20 the trademark symbol. The output that you are seeing from expat are = the=20 properly UTF-8 encoded characters viewed in the ISO-8859-1 encoding (or whatever=20 ISO-8859 derivative your OS happens to be using). It just so happens = that the=20 UTF-8 encoding for the trademark symbol, also happens to contain the character=20 representing the trademark symbol encoded in ISO-8859-1 (preceded by = the =C2=20 character). So basically what I'm saying is that if you take the output from the character=20 data handler, save it in a text file, load it into a browser = (preferrably=20 Netscape, as that will make it much easier to:) and change the encoding = to=20 UTF-8, then the output will look exactly like it is supposed to. In the end, expat did what it was supposed to, and your OS did what it = was=20 supposed to, but we (you and I), being mere mortals, were confused = because they=20 did what we said, not what we meant. I hope this information helps you and I know I certainly learned alot. = Happy coding. - Josh Martin > Hi, >=20 > No UTF-8 is not restricted to 8 bits. Do refer this link: > http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 >=20 > I have some data which contains some characters like Euro, trademark, = etc. > I want to serialise it in XML format. So when i write the XML file, i > replace > the characters with their numerical entities. This XML file is viewed > correctly > in IE 6.0. But when i parse the XML file, the expat prefixes the =C2 > character. ie > i get the =C2 character followed by the Euro character. >=20 > My XML file has the encoding specified as UTF-8. >=20 > I will try changing the encoding of the parser and check. >=20 > Binu > -----Original Message----- > From: Josh Martin [mailto:Josh.Martin@abq.sc.philips.com] > Sent: 25 July 2002 03:52 > To: expat-discuss@lists.sourceforge.net; = binu.subramanian@barconet.com > Subject: Re: [Expat-discuss] Fw: Extra character inserted in > CharacterData Handler? >=20 >=20 > Hi, >=20 > Call me crazy, but isn't UTF-8 an 8-bit wide character encoding = format? And > if=20 > so, isn't the number 8364 a bit out of its league? If this is true = then I > would=20 > think that the =C2 character is either some sort of multi-byte = character=20 > indicator, or is just expat fudging on the numbers it doesn't = understand. > Try=20 > not specifying the encoding format for the XML document and the XML > parser...=20 > see what happens. Let us know how it goes. I think that is how I = solved > this=20 > problem when I encountered it about a year ago. >=20 > BTW, I thought you were trying to keep the parser from converting character=20 > entities to their character representation? >=20 > - Josh Martin >=20 > > Hello, > >=20 > > I am facing exactly the same problem. In my case the characters are = the > > Euro, trademark, etc. > > When i write the xml file, i replace the Euro character with its numerical > > entity €=20 > > I have specified the encoding for my XML file as UTF-8. > >=20 > > Now when the expat parser parses the file, it appends the =C2 = character. so > it > > is =C2 followed by the Euro character. > > What should i do to get rid of the extra character? > > Am i missing something here? > > Binu > >=20 > >=20 > > >=20 > > > The "." character in your file - 0xB7 - is invalid UTF-8. > > > Maybe it is valid ISO-8859-1? > > > In that case you must add an XML declaration. > > >=20 > > > Actually, 1.95.3 should reject it (and it does so on my system). > > =20 > > Rolf Ade just pointed out to me that I didn't read your code. > > You passed the ISO-8859-1 encoding to the parser, so there > > was no error on your side. > > =20 > > However, what you reported looks exactly like what a word = processor > > would show you when it expects ISO-8859-1, but gets UTF-8 (tested = with > > Wordpad). > > Now, this would be a correct result, since Expat only passes UTF-8 > > or UTF-16 to its handlers, no matter what the input. > > =20 > > Karl > > =20 > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > Expat-discuss mailing list > > Expat-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/expat-discuss >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by: Jabber - The world's fastest = growing=20 > real-time communications platform! Don't just IM. Build it in!=20 > http://www.jabber.com/osdn/xim > _______________________________________________ > Expat-discuss mailing list > Expat-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/expat-discuss From binu.subramanian@barconet.com Fri Jul 26 05:06:03 2002 From: binu.subramanian@barconet.com (Subramanian, Binu) Date: Fri Jul 26 04:06:03 2002 Subject: [Expat-discuss] Encoding issues in expat Message-ID: <6973AC049FFDD41197BB0002A52916C657F093@bninchemex01.barconet.com> This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ---------------------- multipart/mixed attachment I am working on Win 2000, VC++ 6.0 I am using expat 1.95.4 version. It is compiled for UTF-8 output and i have specifed the encoding in the = XML file as UTF-8. Still when i load the XML file, a character =C2 is prefixed to the = special characters like ( Euro, trademark, etc). What can i do to see that the Euro character properly? I have enclosed = the XML file i am using. Any suggestion/help will be useful. kr, Binu -----Original Message----- From: Karl Waclawek [mailto:karl@waclawek.net] Sent: 24 July 2002 18:41 To: Subramanian, Binu; expat-discuss@lists.sourceforge.net Subject: Re: [Expat-discuss] Encoding issues in expat > I am writing an XML file and using entities for special characters = like the > Euro and the trademark characters. > My XML file uses the UTF-8 encoding. >=20 > This XML file is viewed correctly in IE 6.0 and when parsed by libxml = and > the Xerces XML parsers, the character conversion is correct. > ie The Euro sign replaced with the numerical entity € in the = XML file > is correctly translated back to Euro by the libxml and the Xerces XML > parsers. >=20 > What should i do to get the expat parser to translate the numerical = entity > € back to the Euro character. It works fine for me. What document are you using? What version of Expat are you using? How is it compiled, for UTF-8 or UTF-16 output? Karl ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: expat_euro_EN.dtd Type: application/octet-stream Size: 23929 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020726/24faebe2/expat_euro_EN.exe ---------------------- multipart/mixed attachment A non-text attachment was scrubbed... Name: expat_euro.xml Type: application/octet-stream Size: 1215 bytes Desc: not available Url : http://mail.libexpat.org/pipermail-21/expat-discuss/attachments/20020726/24faebe2/expat_euro.exe ---------------------- multipart/mixed attachment-- From karl@waclawek.net Fri Jul 26 07:26:02 2002 From: karl@waclawek.net (Karl Waclawek) Date: Fri Jul 26 06:26:02 2002 Subject: [Expat-discuss] Encoding issues in expat References: <6973AC049FFDD41197BB0002A52916C657F093@bninchemex01.barconet.com> Message-ID: <005e01c234a7$e3619080$9e539696@citkwaclaww2k> > I am working on Win 2000, VC++ 6.0 > I am using expat 1.95.4 version. > It is compiled for UTF-8 output and i have specifed the encoding in the XML > file as UTF-8. > Still when i load the XML file, a character  is prefixed to the special > characters like ( Euro, trademark, etc). > What can i do to see that the Euro character properly? I have enclosed the > XML file i am using. > Any suggestion/help will be useful. To understand you correctly: You are reading an XML file using Expat, writing it out again to another file, based on the callbacks from Expat. Then you view this other file loading it into some word processor, is that right? Well, which word processor do you use? On Windows, not all editors can display UTF-8 well. And even if they can, they usually require a BOM (byte order mark) at the beginning of the file, even for UTF-16. In any case, the native Unicode version for Windows is UTF-16(LE). So, first I recommend you compile Expat for UTF-16. Then I recommend you write a BOM to the output file, details about BOMs can be found on http://www.unicode.org. Then you should be able to display it. Btw, it seems the file you attached does not contain the Euro symbol. Karl From karl@waclawek.net Mon Jul 29 22:17:02 2002 From: karl@waclawek.net (Karl Waclawek) Date: Mon Jul 29 21:17:02 2002 Subject: [Expat-discuss] Re: [xml-dev] Borland C++ and Expat - performance hit. References: <005f01c2377e$451e3610$0201a8c0@sgdev.com.au> Message-ID: <001b01c23782$dd2f5ca0$0207a8c0@karl> > Hello, > > I've been testing an application that uses EXPAT on a clean install of > Windows XP. It was working perfectly well, and EXPAT in particular was > extremely fast. After I loaded Borland C++ Builder 5 onto the system the > performance of EXPAT was greatly reduced. > > I've now un-installed C++ Builder, but the performance of EXPAT has not > returned to what it originally was. > > I think it's fairly obvious that C++ Builder has replaced a system library > module with a debug version or something like that. Did you build the Expat Dll with C++ Builder? The Dll itself only depends on kernel32.dll. > Before I start digging to find out which libraries are involved I though I > would ask in this list whether anybody has already figured this out. Maybe you should direct your question to one of the Expat mailing lists at http://www.libexpat.org. Build support for C++ Builder was recently added, but it seems to default to a debug build. Karl From mark@mitchenall.com Tue Jul 30 17:21:02 2002 From: mark@mitchenall.com (Mark Mitchenall) Date: Tue Jul 30 16:21:02 2002 Subject: [Expat-discuss] Predefined Entity Expanding In-Reply-To: <001b01c23782$dd2f5ca0$0207a8c0@karl> Message-ID: Is Expat supposed to expand predefined entities in the character data handler? e.g. for the following document, with just the element and character data handlers enabled to simply output the returned text and using Expat 1.95.4... Test & more text .... I get the following result .... Test more text Instead of the ... Test & more text ... which I'd hoped for. Makes no difference whether I define the entity or not. Also, other entities in the form... ... don't seem to get expanded, but all others do. Is this normal? Is there another handler I should be using for these? TIA, Mark -- Mark Mitchenall Principal Consultant mitchenall.com Email: mark@mitchenall.com Tel: +44(0)20 8452 3031 Mobile: +44(0)7850 847 543 http://www.mitchenall.com/ From karl@waclawek.net Tue Jul 30 19:22:02 2002 From: karl@waclawek.net (Karl Waclawek) Date: Tue Jul 30 18:22:02 2002 Subject: [Expat-discuss] Predefined Entity Expanding References: Message-ID: <000201c23833$aecbe200$0207a8c0@karl> > Is Expat supposed to expand predefined entities in the character data > handler? > > e.g. for the following document, with just the element and character data > handlers enabled to simply output the returned text and using Expat > 1.95.4... > > > > Test & more text > > > .... I get the following result .... > > > Test > more text > Are you saying that you get this erroneous behaviour only if you clear all handlers except the element and character data handlers? On my little test app (with more handlers set) it works as expected. Karl