From promo@high-tech-communications.com Thu Nov 1 09:30:06 2001 From: promo@high-tech-communications.com (Will To Live) Date: Thu, 1 Nov 2001 01:30:06 -0800 Subject: [XML-SIG] A Hope For Cancer... Message-ID: <200111010930.fA19U6d04875@mail.high-tech-communications.com> WilltoLive
A   H O P E   F O R   C A N C E R . .
SEND THIS TO A FRIEND OR LOVED ONE. ENTER EMAIL 

Can-Scan
Enhanced Breast Cancer
Examination Software
e
+
Will To Live
Pre-Release Feature Film
(VHS or DVD)
 
or
   
ONLY .99 (plus shipping & handling) List Price .99
© Ruby Pictures/ netGuru, Inc.
All right reserved
You are receiving this email because you have subscribed to some health-related web sites.
This promotional campaign is exclusively brought by Kanatsiz Communications to the recipients undisclosed to the underlying advertiser, netGuru, Inc. Kanatsiz Communications has not obtained the consent, express or implied, from netGuru, Inc. with respect to proposed recipients of the advertisements. If you do not wish to receive similar advertisements please click here
From Jan.Delgado@unamite.com Thu Nov 1 10:52:56 2001 From: Jan.Delgado@unamite.com (Jan Delgado) Date: Thu, 1 Nov 2001 11:52:56 +0100 Subject: [XML-SIG] Relative URI In-Reply-To: Message-ID: Hi all, i solved the problem by defining the dtd-reference as something like: that works like expected. greetings jan From Juergen Hermann" Message-ID: > You are receiving this email because you have subscribed to some > health-related web sites. Damn! mailman developed into an AI and now wants to pursue a medical career. So who will process our mail in the future? Ciao, J=FCrgen From paul@boddie.net Thu Nov 1 18:07:59 2001 From: paul@boddie.net (paul@boddie.net) Date: 1 Nov 2001 18:07:59 -0000 Subject: [XML-SIG] Relative URI Message-ID: <20011101180759.24283.qmail@www2.nameplanet.com> "Jan Delgado" wrote: > >:2:47: Cannot resolve relative URI 'order.dtd' >when document URI unknown > >how can i set the document URI ? Then: >i solved the problem by defining the dtd-reference as something like: > > > >that works like expected. I suppose that what you should do, based on my experience with Xerces-J rather than PyXML, is to either make sure that the DTD resides in the same directory as the XML file, or resides in the same "base" location as the XML resource (if you're not using a file). If that isn't the case, you need to overload the way the parser goes about finding the DTD. In JAXP (Java APIs for XML Parsing) you would implement the org.xml.sax.EntityResolver interface in a class with a special version of the resolveEntity method. Then you would pass an instance of this class to the parser or reader through some method like DocumentBuilder.setEntityResolver. This is all speculation with respect to PyXML or 4Suite, though. I just hope one doesn't have to jump through as many hoops as you do with so many things in the Java APIs. One would think that no-one would ever need to do this kind of thing, given the complexity of the API... Paul -- Get your firstname@lastname email for FREE at http://Nameplanet.com/?su From faassen@vet.uu.nl Thu Nov 1 22:22:55 2001 From: faassen@vet.uu.nl (Martijn Faassen) Date: Thu, 1 Nov 2001 23:22:55 +0100 Subject: [XML-SIG] donating DOM unit tests Message-ID: <20011101232255.A5555@vet.uu.nl> Hi there, How would I go about donating ParsedXML's extensive DOM unit tests to PyXML? Do you want them? If so, where do I send the sources, what's the procedure? I'd be happy to help maintaining them. Regards, Martijn From noreply@sourceforge.net Fri Nov 2 02:33:44 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 01 Nov 2001 18:33:44 -0800 Subject: [XML-SIG] [ pyxml-Bugs-477364 ] XBEL : msie_parse.py Message-ID: Bugs item #477364, was opened at 2001-11-01 18:33 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=477364&group_id=6473 Category: expat Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: XBEL : msie_parse.py Initial Comment: When I use msie_pase.py, I found two problems: 1. there should be self.bms.leave_folder() after line 34, so the structure is processed right 2. some url files doesn't have the [InternetShortcut] at first line (my is windows ME, and some my own bookmarks starts with [Default]... 3. And I found the xml parser cann't prosess the document contained special characters(Like Trade Mark, Romans and so on), I have to delete those characters in my bookmark file to make it work ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=477364&group_id=6473 From martin@v.loewis.de Fri Nov 2 07:50:12 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 2 Nov 2001 08:50:12 +0100 Subject: [XML-SIG] donating DOM unit tests In-Reply-To: <20011101232255.A5555@vet.uu.nl> (message from Martijn Faassen on Thu, 1 Nov 2001 23:22:55 +0100) References: <20011101232255.A5555@vet.uu.nl> Message-ID: <200111020750.fA27oCm01233@mira.informatik.hu-berlin.de> > How would I go about donating ParsedXML's extensive DOM unit tests to > PyXML? Do you want them? We certainly do. How much is this? Can it be meaningfully distributed together with the PyXML distribution, or would it bloat it too much? > If so, where do I send the sources, what's the > procedure? I'd be happy to help maintaining them. Depending on how large they are, they should be checked either into /xml/test or /test of the PyXML CVS. This is best achieved by you checking them in yourself; please let me know what your SF account is so I can make you a project member. Thanks, Martin From noreply@sourceforge.net Sat Nov 3 04:40:29 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 02 Nov 2001 20:40:29 -0800 Subject: [XML-SIG] [ pyxml-Bugs-477717 ] Mac OS X 10.1 compile error w/fix Message-ID: Bugs item #477717, was opened at 2001-11-02 20:40 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=477717&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Dan Grassi (dgrassi) Assigned to: Nobody/Anonymous (nobody) Summary: Mac OS X 10.1 compile error w/fix Initial Comment: Under Mac OS X version 10.1 twolevel namespaces are used by default. Unfortunatly this causes a link error. One resolution is to modify setup.py and pass -flat_namespace in extra_link_args. This can be accomplished by addind a check dor darwin1 and appending this to LDFLAGS just after parsing argv[] as follows. if sys.platform == "darwin1": # Mac OS X LDFLAGS.append('-flat_namespace') Unfortunatly this is not quite enough because '.parsers.sgmlop' and '.utils.boolean' do not pass extra_link_args. This can be resolved as follows: # Build sgmlop ext_modules.append( Extension(xml('.parsers.sgmlop'), sources=['extensions/sgmlop.c'], extra_link_args=LDFLAGS)) # Build boolean ext_modules.append( Extension(xml('.utils.boolean'), sources=['extensions/boolean.c'], extra_link_args=LDFLAGS)) I have attached a modified file for review. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=477717&group_id=6473 From dan@grassi.org Sat Nov 3 04:48:31 2001 From: dan@grassi.org (Dan Grassi) Date: Fri, 2 Nov 2001 23:48:31 -0500 Subject: [XML-SIG] [ #477717 ] Mac OS X 10.1 compile error w/fix Message-ID: <07D620DD-D016-11D5-9EEC-003065F99F04@grassi.org> --Apple-Mail-3-685907899 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Under Mac OS X version 10.1 twolevel namespaces are used by default. Unfortunatly this causes a link error. One resolution is to modify setup.py and pass -flat_namespace in extra_link_args. This can be accomplished by addind a check dor darwin1 and appending this to LDFLAGS just after parsing argv[] as follows. if sys.platform == "darwin1": # Mac OS X LDFLAGS.append('-flat_namespace') Unfortunatly this is not quite enough because '.parsers.sgmlop' and '.utils.boolean' do not pass extra_link_args. This can be resolved as follows: # Build sgmlop ext_modules.append( Extension(xml('.parsers.sgmlop'), sources=['extensions/sgmlop.c'], extra_link_args=LDFLAGS)) # Build boolean ext_modules.append( Extension(xml('.utils.boolean'), sources=['extensions/boolean.c'], extra_link_args=LDFLAGS)) The above also means that: dgrassi% python setup.py --ldflags=-flat_namespace build will not work! I have attached a modified file for review. --Apple-Mail-3-685907899 Content-Disposition: attachment Content-Type: multipart/appledouble; boundary=Apple-Mail-4-685907900 --Apple-Mail-4-685907900 Content-Disposition: attachment; filename=setup.py Content-Transfer-Encoding: base64 Content-Type: application/applefile; name="setup.py" AAUWBwACAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAJAAAAMgAAAAoAAAADAAAAPAAAAAhURVhUAAAA AAAAc2V0dXAucHk= --Apple-Mail-4-685907900 Content-Disposition: attachment; filename=setup.py Content-Transfer-Encoding: 7bit Content-Type: application/text; x-mac-creator=0; x-unix-mode=0644; x-mac-type=54455854; name="setup.py" #! /usr/bin/env python # Setup script for the XML tools # # Targets: build install help import sys, os, string from distutils.core import setup, Extension from setupext import Data_Files, install_Data_Files, wininst_request_delete # I want to override the default build directory so the extension # modules are compiled and placed in the build/xml directory # tree. This is a bit clumsy, but I don't see a better way to do # this at the moment. ext_modules = [] # Rename xml to _xmlplus for Python 2.0 if sys.hexversion < 0x2000000: def xml(s): return "xml"+s else: def xml(s): return "_xmlplus"+s # special command-line arguments LIBEXPAT = None LDFLAGS = [] args = sys.argv[:] for arg in args: if string.find(arg, '--with-libexpat=') == 0: LIBEXPAT = string.split(arg, '=')[1] sys.argv.remove(arg) elif string.find(arg, '--ldflags=') == 0: LDFLAGS = string.split(string.split(arg, '=')[1]) print "LDFLAGS", LDFLAGS sys.argv.remove(arg) if sys.platform == "darwin1": # Mac OS X LDFLAGS.append('-flat_namespace') def should_build_pyexpat(): try: import pyexpat # The following features of are required by PyXML from pyexpat, # which are not available in older versions: # ExternalEntityParserCreate, available only from 2.25 on. # ParseFile throws exception, not available up to 2.28. # Memory leak fixes, merged into 2.33 # Wrong array boundaries fixed in 2.35 if pyexpat.__version__ <= '2.39': if 'pyexpat' in sys.builtin_module_names: print "Error: builtin expat library will conflict with ours" print "Re-build python without builtin expat module" raise SystemExit return 1 except ImportError: return 1 else: return 0 def get_expat_prefix(): if LIBEXPAT: return LIBEXPAT for p in ("/usr", "/usr/local"): incs = os.path.join(p, "include") libs = os.path.join(p, "lib") if os.path.isfile(os.path.join(incs, "expat.h")) \ and (os.path.isfile(os.path.join(libs, "libexpat.so")) or os.path.isfile(os.path.join(libs, "libexpat.a"))): return p # Don't build pyexpat if the Python installation provides one. # FIXME: It should be build for binary distributions even if the core has it. build_pyexpat = should_build_pyexpat() #if build_pyexpat: if 1: expat_prefix = get_expat_prefix() if build_pyexpat: sources = ['extensions/pyexpat.c'] if expat_prefix: define_macros = [('HAVE_EXPAT_H', None)] include_dirs = [os.path.join(expat_prefix, "include")] libraries = ['expat'] library_dirs = [os.path.join(expat_prefix, "lib")] else: define_macros = [('XML_NS', None), ('XML_DTD', None), ('EXPAT_VERSION','0x010200')] include_dirs = ['extensions/expat/xmltok', 'extensions/expat/xmlparse'] sources.extend(['extensions/expat/xmltok/xmltok.c', 'extensions/expat/xmltok/xmlrole.c', 'extensions/expat/xmlparse/xmlparse.c']) libraries = [] library_dirs = [] ext_modules.append( Extension(xml('.parsers.pyexpat'), define_macros=define_macros, include_dirs=include_dirs, library_dirs=library_dirs, libraries=libraries, extra_link_args=LDFLAGS, sources=sources )) from pprint import pprint print ">>>>>>>>>>>" pprint(Extension) # Build sgmlop ext_modules.append( Extension(xml('.parsers.sgmlop'), sources=['extensions/sgmlop.c'], extra_link_args=LDFLAGS)) # Build boolean ext_modules.append( Extension(xml('.utils.boolean'), sources=['extensions/boolean.c'], extra_link_args=LDFLAGS)) # On Windows, install the documentation into a directory xmldoc, along # with xml/_xmlplus. For RPMs, docs are installed into the RPM doc # directory via setup.cfg (usuall /usr/doc). On all other systems, the # documentation is not installed. doc2xmldoc = 0 if sys.platform == 'win32': doc2xmldoc = 1 # This is a fragment from MANIFEST.in which should contain all # files which are considered documentation (doc, demo, test, plus some # toplevel files) # distutils 1.0 has a bug where # recursive-include test/output test_* # is translated into a pattern ^test\\output\.*test\_[^/]*$ # on windows, which results in files not being included. Work around # this bug by using graft where possible. docfiles=""" recursive-include doc *.html *.tex *.txt *.gif *.css *.api *.web recursive-include demo README demo *.py demo *.xml *.dtd *.html *.htm include demo/genxml/data.txt include demo/dom/html2html include demo/xbel/doc/xbel.bib include demo/xbel/doc/xbel.tex include demo/xmlproc/catalog.soc recursive-include test *.py *.xml include test/test.xml.out graft test/output include ANNOUNCE CREDITS LICENCE README* TODO """ if doc2xmldoc: xmldocfiles = [ Data_Files(copy_to = 'xmldoc', template = string.split(docfiles,"\n"), preserve_path = 1) ] else: xmldocfiles = [] setup (name = "PyXML", version = "0.6.6", # Needs to match xml/__init__.version_info description = "Python/XML package", author = "XML-SIG", author_email = "xml-sig@python.org", url = "http://www.python.org/sigs/xml-sig/", long_description = """XML Parsers and API for Python This version of PyXML was tested with Python 2.0 and 1.5.2. """, # Override certain command classes with our own ones cmdclass = {'install_data':install_Data_Files, 'bdist_wininst':wininst_request_delete }, package_dir = {xml(''):'xml'}, data_files = [Data_Files(base_dir='install_lib', copy_to=xml('/dom/de/LC_MESSAGES'), files=['xml/dom/de/LC_MESSAGES/4Suite.mo']), Data_Files(base_dir='install_lib', copy_to=xml('/dom/en_US/LC_MESSAGES'), files=['xml/dom/en_US/LC_MESSAGES/4Suite.mo']), Data_Files(base_dir='install_lib', copy_to=xml('/dom/fr_FR/LC_MESSAGES'), files=['xml/dom/fr_FR/LC_MESSAGES/4Suite.mo']), ] + xmldocfiles, packages = [xml(''), xml('.dom'), xml('.dom.html'), xml('.dom.ext'), xml('.dom.ext.reader'), xml('.marshal'), xml('.unicode'), xml('.parsers'), xml('.parsers.xmlproc'), xml('.sax'), xml('.sax.drivers'), xml('.sax.drivers2'), xml('.utils') ], ext_modules = ext_modules, scripts = ['scripts/xmlproc_parse', 'scripts/xmlproc_val'] ) --Apple-Mail-4-685907900-- --Apple-Mail-3-685907899 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Dan --Apple-Mail-3-685907899-- From pobrien@orbtech.com Sat Nov 3 16:15:26 2001 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Sat, 3 Nov 2001 10:15:26 -0600 Subject: [XML-SIG] HowTo Error Message-ID: I'm getting an error when trying to open the xml-howto.pdf file located at: http://py-howto.sourceforge.net/pdf/xml-howto.pdf Could those in charge look into this and perhaps build a new pdf file? Thanks. --- Patrick K. O'Brien Orbtech "I am, therefore I think." From prema@prema.co.nz Sat Nov 3 18:32:05 2001 From: prema@prema.co.nz (Mike MacDonald) Date: Sun, 4 Nov 2001 07:32:05 +1300 (New Zealand Daylight Time) Subject: [XML-SIG] Installing PyXML Message-ID: <3BE43825.000006.89295@eds017845.telecom.co.nz> --------------Boundary-00=_H5M812S0000000000000 Content-Type: Multipart/Alternative; boundary="------------Boundary-00=_H5M8BHK0000000000000" --------------Boundary-00=_H5M8BHK0000000000000 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi People !=0D I'm looking forward to getting into PyXML. =0D I notice that on Win98 that the installer doesn't seem to get past the second page of the wizard -- at that stage no directories are offered.=0D =0D Can you offer any suggestions ?=0D Thanks very much for the library !=0D =0D Best regards=0D Mike --------------Boundary-00=_H5M8BHK0000000000000 Content-Type: Text/HTML; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi People !
I'm looking forward to getting into PyXML.  
I notice that on Win98 that the installer doesn't seem to get = past=20 the second page of the wizard -- at that stage no directories are=20 offered.
 
Can you offer any suggestions ?
Thanks very much for the library !
 
Best regards
Mike
 
=09 =09 =09 =09 =09 =09 =09
____________________________________________________
 =20 IncrediMail - Email has finally evolved -
Click=20 Here
--------------Boundary-00=_H5M8BHK0000000000000-- --------------Boundary-00=_H5M812S0000000000000 Content-Type: image/gif Content-Transfer-Encoding: base64 Content-ID: R0lGODlhFAAPALMIAP9gAM9gAM8vAM9gL/+QL5AvAGAvAP9gL////wAAAAAAAAAAAAAAAAAAAAAA AAAAACH/C05FVFNDQVBFMi4wAwEAAAAh+QQJFAAIACwAAAAAFAAPAAAEVRDJSaudJuudrxlEKI6B URlCUYyjKpgYAKSgOBSCDEuGDKgrAtC3Q/R+hkPJEDgYCjpKr5A8WK9OaPFZwHoPqm3366VKyeRt E30tVVRscMHDqV/u+AgAIfkEBWQACAAsAAAAABQADwAABBIQyUmrvTjrzbv/YCiOZGmeaAQAIfkE CRQACAAsAgABABAADQAABEoQIUOrpXIOwrsPxiQUheeRAgUA49YNhbCqK1kS9grQhXGAhsDBUJgZ AL2Dcqkk7ogFpvRAokSn0p4PO6UIuUsQggSmFjKXdAgRAQAh+QQFCgAIACwAAAAAFAAPAAAEEhDJ Sau9OOvNu/9gKI5kaZ5oBAAh+QQJFAAIACwCAAEAEAANAAAEShAhQ6ulcg7Cuw/GJBSF55ECBQDj 1g2FsKorWRL2CtCFcYCGwMFQmBkAvYNyqSTuiAWm9ECiRKfSng87pQi5SxCCBKYWMpd0CBEBACH5 BAVkAAgALAAAAAAUAA8AAAQSEMlJq7046827/2AojmRpnmgEADs= --------------Boundary-00=_H5M812S0000000000000-- From gerhard@bigfoot.de Sat Nov 3 19:05:30 2001 From: gerhard@bigfoot.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Sat, 3 Nov 2001 20:05:30 +0100 Subject: [XML-SIG] Installing PyXML In-Reply-To: <3BE43825.000006.89295@eds017845.telecom.co.nz>; from prema@prema.co.nz on Sun, Nov 04, 2001 at 07:32:05AM +1300 References: <3BE43825.000006.89295@eds017845.telecom.co.nz> Message-ID: <20011103200529.A9727@lilith.hqd-internal> On Sun, Nov 04, 2001 at 07:32:05AM +1300, Mike MacDonald wrote: > Hi People ! > I'm looking forward to getting into PyXML. > I notice that on Win98 that the installer doesn't seem to get past the > second page of the wizard -- at that stage no directories are offered. > > Can you offer any suggestions ? Normally you should get a screen that shows "Python (2.1) c:\python2.1 - Found in registry." or similar. If the installer cannot find the Python directory, this can have several reasons: - did you download the correct version (there are downloads for Python 1.5/2.0/2.1 - you need the right one)? - have you *installed* either the Python from www.python.org or the ActiveState one? you need to install it in order to set the registry key the PyXML installer depends on -- if you simply copy the Python directory from somewhere else and have python21.dll in your path, Python itself will work, but not the installers; the same effect occurs if you happen to use the Python from www.pythonware.com Gerhard -- mail: gerhard bigfoot de registered Linux user #64239 web: http://www.cs.fhm.edu/~ifw00065/ OpenPGP public key id 86AB43C0 public key fingerprint: DEC1 1D02 5743 1159 CD20 A4B6 7B22 6575 86AB 43C0 reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b'))) From tpassin@home.com Sun Nov 4 22:37:01 2001 From: tpassin@home.com (Thomas B. Passin) Date: Sun, 4 Nov 2001 17:37:01 -0500 Subject: [XML-SIG] HowTo Error References: Message-ID: <001601c16581$388150b0$7cac1218@cj64132b> [Patrick K. O'Brien] > I'm getting an error when trying to open the xml-howto.pdf file located at: > > http://py-howto.sourceforge.net/pdf/xml-howto.pdf > > Could those in charge look into this and perhaps build a new pdf file? > Thanks. > I also get an error on this file - Acrobat Reader v4. Windows 2000. The error message is not useful. Cheers, Tom P From kent@springfed.com Tue Nov 6 02:17:15 2001 From: kent@springfed.com (Kent Tenney) Date: Mon, 5 Nov 2001 20:17:15 -0600 Subject: [XML-SIG] HowTo Error References: <001601c16581$388150b0$7cac1218@cj64132b> Message-ID: <200111050227.UAA11460@svc.millstream.net> There's a good copy here; http://www.auth.gr/mirrors/python/doc/howto/pdf/ It seems to be a year old, hasn't a lot changed since then? On Sun, 4 Nov 2001 17:37:01 -0500, Thomas B. Passin wrote: >[Patrick K. O'Brien] > >> I'm getting an error when trying to open the xml-howto.pdf= file >>located >at: >> >> http://py-howto.sourceforge.net/pdf/xml-howto.pdf >> >> Could those in charge look into this and perhaps build a new pdf >>file? >> Thanks. >> >I also get an error on this file - Acrobat Reader v4. Windows 2000. > The >error message is not useful. > >Cheers, > >Tom P > > >_______________________________________________ >XML-SIG maillist - XML-SIG@python.org >http://mail.python.org/mailman/listinfo/xml-sig From pobrien@orbtech.com Mon Nov 5 02:37:34 2001 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Sun, 4 Nov 2001 20:37:34 -0600 Subject: [XML-SIG] HowTo Error In-Reply-To: <200111050227.UAA11460@svc.millstream.net> Message-ID: Thanks. I'm unable to print that one, however. Is there a problem with printing these PDFs from Acrobat Reader 4.0? --- Patrick K. O'Brien Orbtech "I am, therefore I think." -----Original Message----- From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On Behalf Of Kent Tenney Sent: Monday, November 05, 2001 8:17 PM To: tpassin@home.com; PyXML Subject: Re: [XML-SIG] HowTo Error There's a good copy here; http://www.auth.gr/mirrors/python/doc/howto/pdf/ It seems to be a year old, hasn't a lot changed since then? From market@now.net.cn Mon Nov 5 11:38:34 2001 From: market@now.net.cn (ʱ´´ÍøÂç) Date: 5 Nov 2001 11:38:34 -0000 Subject: [XML-SIG] »ú²»¿Éʧ--½ô¼±ÇÀ×¢.info Message-ID: <20011105113834.6754.qmail@localhost.localdomain> »ú²»¿Éʧ--½ô¼±ÇÀ×¢.info ×÷ΪÐÅϢʱ´ú×îÃ÷È·±êÖ¾,.INFOµÄ·¢Õ¹¿Õ¼äÎãÓ¹¶à˵£¬Ëü½«³ÉΪÍøÂçÐÅÏ¢·þÎñµÄÊ×Ñ¡ÓòÃû¡£ .INFOΪͨÓö¥¼¶ÓòÃû£¬.INFO´ú±íÒ»°ãµÄÐÅÏ¢·þÎñʹÓá£Ëü×î´óµÄÌصãÊÇÈ«ÇòͨÓÃ,Ò×ÓÚʹÓà £¬ºÜÇ¿µÄʶ±ðÐÔ£¬¿ÉÒÔÌæ´ú.COMµÄͨÓö¥¼¶ÓòÃû£¬·Ç³£ÊÊÓÃÓÚÌṩÐÅÏ¢·þÎñµÄÆóÒµ¡£ http://www.now.net.cn ʱ´´ÍøÂçÕýʽÍƳö.infoÓòÃûÕýʽע²á£¬×¢²áÁ÷³Ìͬ¹ú¼ÊÓòÃû £¨.com/.net/.org£©Ò»Ñù¼òµ¥£¬²¢¿É¿ìËٳɹ¦£¬×¢²á³É¹¦ºóÂíÉÏ¿ÉÒÔʹÓᣠ(1) Ë­ÓÐ×ʸñÉêÇë.infoÓòÃû .infoÊÇΨһûÓÐ×¢²áÏÞÖƵÄ×îйú¼Ê¶¥¼¶ÓòÃû£¬ËùÒÔÈκÎÈ˶¼¿ÉÒÔÉêÇë (2) .infoÓòÃûÓë.ccºÍ.tv .infoÊÇͨÓùú¼Ê¶¥¼¶ÓòÃû£¬ËüÓë.com .net .org ÊôÓÚͬÀàÓòÃû£¬ÓÉICANNͳһ½øÐйÜÀí£» ¶ø. cc,.tvÊǹú¼Ò´úÂë¹ú¼Ê¶¥¼¶ÓòÃû£¬ºÍ.cn .caÊôͬÀàÓòÃû£¬ .cnÕâÑùµÄÓòÃûÊǹéÏà¹ØµÄ ¹ú¼Ò½øÐйÜÀíµÄ£¬Ò»°ãÖ»¶Ô±¾¹ú½øÐÐÏúÊÛ£¬Ö»Óзdz£ÌØÊâµÄ¾ßÓкܸßÉÌÒµ¼ÛÖµµÄÓòÃû²Å¶Ô¹ú¼Ê ÆäËü¹ú¼Ò½øÐÐÏúÊÛ¡£ (3) .infoÓòÃûµÄ×¢²á¼Û¸ñºÍÆÚÏÞÊÇʲô .infoÓòÃûÊÇ420Ôª/Á½Ä꣬¹ú¼ÊÓòÃû¹ÜÀí»ú¹¹¹æ¶¨ÐÂ×¢²á×îÉÙ×¢²á2Äê¡£ (4) ÎÒ¿ÉÒԺܿìµÄʹÓÃ.infoÓòÃûÂð ÎÒÃÇÈ·ÈÏÄú½»¿îºó»áÁ¢¼´ÎªÄúÕýʽע²á£¬²¢ÔÚ24СʱºóÈ«ÇòÉúЧ£¡ (5) ж¥¼¶¹ú¼ÊÓòÃûµÄ×¢²á¹æÔòºÍÒÔÍùµÄ¹ú¼ÊÓòÃû×¢²á¹æÔòÒ»ÑùÂ𣿠´ð£ºÐ¶¥¼¶¹ú¼ÊÓòÃûµÄ×¢²á¹æÔòºÍÒÔÍùÏà±ÈÓкܴóµÄ²»Í¬£¬ÒÔÏ»áÓзÖÀà˵Ã÷£¬ÇëÄúÏêϸ¹Ø×¢¡£ (6)µ½ÄĶù¿ÉÉêÇëµ½.info×îйú¼Ê¶¥¼¶ ´ð£ºwww.now.net.cnʱ´´ÍøÂçÊǹúÄÚ¹ú¼Ê¶¥¼¶ÓòÃû×¢²á»ú¹¹£¬¶ÀÓÐVDNSϵͳÄÜ·½±ãµØ¹ÜÀíÄúµÄ´Î¼¶ÓòÃû£¬½¨Á¢×ÓÍøÕ¾£¬ ÔÚ´Ë×¢²áÓòÃû¼Û¸ñ¿ÕÇ°ÓŻݣ¬ÁíÍâÎÒÃÇÅ䱸¾«Á¼¼¼ÊõÖ§³Ö£¬¿É°éÄúÇáËɲ½ÈëÉÌÎñÖ®Âᣠ»¶Ó­ÄúÖÂÐÅ support@now.net.cn »¶Ó­Äú·ÃÎÊ http://www.now.net.cn Ö麣Ì컥¿Æ¼¼ÓÐÏÞ¹«Ë¾ ÁªÏµÈË£ºÇñС½ã¡¡»ÆС½ã ¹«Ë¾µç»°£º 0756--2125583 2125593 2125523 2252872 ¹«Ë¾´«Õæ: 0756--2229669 From cprevost@grouperf.com Mon Nov 5 16:26:54 2001 From: cprevost@grouperf.com (=?iso-8859-1?Q?Christophe_Pr=E9vost?=) Date: Mon, 5 Nov 2001 16:26:54 -0000 Subject: [XML-SIG] [ Newbie ] Parsing XML with an external DTD Message-ID: Hi everybody ! I want to parse an xml document Blah blah With a dtd... How can i do that without writing the dtd link into the document ? Can i set the DTD to use to the validating parser ? Thanks! PS: using PyXML 0.6 on Python 2.1 From fdrake@acm.org Mon Nov 5 16:25:08 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 5 Nov 2001 11:25:08 -0500 Subject: [XML-SIG] HowTo Error In-Reply-To: References: <200111050227.UAA11460@svc.millstream.net> Message-ID: <15334.48484.537486.959459@grendel.zope.com> Patrick K. O'Brien writes: > Thanks. I'm unable to print that one, however. Is there a problem with > printing these PDFs from Acrobat Reader 4.0? I've had reports of people having a problem printing PDF documents generated by pdfTeX (what the Python doc tools use) with Acrobat (Reader) 5, but I don't think I've had any reported problems with 4.x. Until this. --sigh-- -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From Alexandre.Fayolle@logilab.fr Mon Nov 5 15:42:19 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 5 Nov 2001 16:42:19 +0100 (CET) Subject: [XML-SIG] [ Newbie ] Parsing XML with an external DTD In-Reply-To: Message-ID: On Mon, 5 Nov 2001, Christophe Prévost wrote: > Hi everybody ! > > I want to parse an xml document > > > Blah blah > > > With a dtd... How can i do that without writing the dtd link into the > document ? I guess you could wrap your document into another one using an entity (untested) : --------------8<-------------------------------- ] > &core; --------------8<-------------------------------- Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From pobrien@orbtech.com Mon Nov 5 16:02:35 2001 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Mon, 5 Nov 2001 10:02:35 -0600 Subject: [XML-SIG] HowTo Error In-Reply-To: <15334.48484.537486.959459@grendel.zope.com> Message-ID: Oops. I lied. I'm on Acrobat 5.0. Sorry for the false alarm. (Still can't print them, of course. ) --- Patrick K. O'Brien Orbtech "I am, therefore I think." -----Original Message----- From: Fred L. Drake, Jr. [mailto:fdrake@acm.org] Sent: Monday, November 05, 2001 10:25 AM To: pobrien@orbtech.com Cc: PyXML Subject: RE: [XML-SIG] HowTo Error Patrick K. O'Brien writes: > Thanks. I'm unable to print that one, however. Is there a problem with > printing these PDFs from Acrobat Reader 4.0? I've had reports of people having a problem printing PDF documents generated by pdfTeX (what the Python doc tools use) with Acrobat (Reader) 5, but I don't think I've had any reported problems with 4.x. Until this. --sigh-- -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Mon Nov 5 16:59:46 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 5 Nov 2001 11:59:46 -0500 Subject: [XML-SIG] HowTo Error In-Reply-To: References: <15334.48484.537486.959459@grendel.zope.com> Message-ID: <15334.50562.222479.157484@grendel.zope.com> Patrick K. O'Brien writes: > Oops. I lied. I'm on Acrobat 5.0. Sorry for the false alarm. (Still can't > print them, of course. ) This is actually good news. It means you can install Acrobat Reader 4 and print them. ;-) There doesn't seem to be much I'll be able to do about this; Adobe has acknowledged a bug in Acrobat Reader 5.0 that causes it to not be able to print PDF generated by pdfTeX, but it is a bug in the reader, not the PDF. If you have access to ghostscript or xpdf, you may be able to print with one of those. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From fdrake@acm.org Mon Nov 5 17:28:53 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 5 Nov 2001 12:28:53 -0500 Subject: [XML-SIG] [ Newbie ] Parsing XML with an external DTD In-Reply-To: References: Message-ID: <15334.52309.682715.231375@grendel.zope.com> Alexandre Fayolle writes: > I guess you could wrap your document into another one using an entity > (untested) : Sorry, not quite! Aside from the typographic error in the XML declaration (missing "?" at end), there's the little matter that the document element must start & end in the document entity; it can't be included through an entity reference. I think you can do this in SGML, but not in XML. You can't even just concatenate a new prolog with the document if the document contains an XML declaration, because the second XML declaration is a well-formedness error. In practice, what you probably want to do is check for the XML declaration; if present, provide the declaration to the parser, then the DOCTYPE declaration, then the rest of the input document. How to do this depends on the parser interface you're using. > --------------8<-------------------------------- > > [] > > > &core; > --------------8<-------------------------------- -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From lars@larsshack.org Mon Nov 5 17:12:27 2001 From: lars@larsshack.org (Lars Kellogg-Stedman) Date: Mon, 5 Nov 2001 09:12:27 -0800 (PST) Subject: [XML-SIG] Whither xml.dom.core? Message-ID: <20011105171227.78366.qmail@web12306.mail.yahoo.com> Howdy, I've just getting my toes into Python, and I've run into an odd stumbling block. I'd like to use python to generate some xml -- I've got python 2.1.1 and PyXML 0.6.6. Unfortunately, all the tutorials I've encountered start out with: from xml.dom import core Which, on my systems, results in: from xml.dom import core ImportError: cannot import name core Is this part of the base python distribution? Or am I missing something critical? Thanks, -- Lars ===== lars@larsshack.org --> http://www.larsshack.org/ __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From fdrake@acm.org Mon Nov 5 18:13:53 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 5 Nov 2001 13:13:53 -0500 Subject: [XML-SIG] Whither xml.dom.core? In-Reply-To: <20011105171227.78366.qmail@web12306.mail.yahoo.com> References: <20011105171227.78366.qmail@web12306.mail.yahoo.com> Message-ID: <15334.55009.52738.953001@grendel.zope.com> Lars Kellogg-Stedman writes: > I've just getting my toes into Python, and I've run into an odd > stumbling block. I'd like to use python to generate some xml -- I've > got python 2.1.1 and PyXML 0.6.6. Unfortunately, all the tutorials > I've encountered start out with: I'm afraid the tutorials are woefully out of date; xml.dom.core is no longer part of PyXML. I would suggest starting with xml.dom.minidom; this is available as part of the standard library, and there is documentation for it. A good place to start, including links to other XML-related module in the standard library, is: http://www.python.org/doc/current/lib/markup.html -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From pobrien@orbtech.com Mon Nov 5 17:48:31 2001 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Mon, 5 Nov 2001 11:48:31 -0600 Subject: [XML-SIG] HowTo Error In-Reply-To: <15334.50562.222479.157484@grendel.zope.com> Message-ID: Okay. That's good to know. Thanks. --- Patrick K. O'Brien Orbtech "I am, therefore I think." -----Original Message----- From: Fred L. Drake, Jr. [mailto:fdrake@acm.org] Sent: Monday, November 05, 2001 11:00 AM To: pobrien@orbtech.com Cc: PyXML Subject: RE: [XML-SIG] HowTo Error Patrick K. O'Brien writes: > Oops. I lied. I'm on Acrobat 5.0. Sorry for the false alarm. (Still can't > print them, of course. ) This is actually good news. It means you can install Acrobat Reader 4 and print them. ;-) There doesn't seem to be much I'll be able to do about this; Adobe has acknowledged a bug in Acrobat Reader 5.0 that causes it to not be able to print PDF generated by pdfTeX, but it is a bug in the reader, not the PDF. If you have access to ghostscript or xpdf, you may be able to print with one of those. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From martin@v.loewis.de Mon Nov 5 23:06:44 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Tue, 6 Nov 2001 00:06:44 +0100 Subject: [XML-SIG] [ Newbie ] Parsing XML with an external DTD In-Reply-To: (message from Christophe =?ISO-8859-1?Q?Pr=E9vost?= on Mon, 5 Nov 2001 16:26:54 -0000) References: Message-ID: <200111052306.fA5N6ix01771@mira.informatik.hu-berlin.de> > > > Blah blah > > > With a dtd... How can i do that without writing the dtd link into the > document ? You don't need to put a link to the DTD into the document. It is also possible to put the entire DTD into the document, into the "internal subset" (i.e. the [...] block). > Can i set the DTD to use to the validating parser ? With xml.parsers.xmlproc, I believe this is possible. You need to use xmldtd.load_dtd[_string] to create a DTD object. Then you create a xmlval.XMLValidator, passing the DTD. Unfortunately, due to a bug, this appears not to be possible - you should add support for passing a DTD that way by modifying your copy of PyXML. Finally, you can parse the document using the XMLValidator. HTH, Martin From larsga@garshol.priv.no Mon Nov 5 23:30:43 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 06 Nov 2001 00:30:43 +0100 Subject: [XML-SIG] [ Newbie ] Parsing XML with an external DTD In-Reply-To: <200111052306.fA5N6ix01771@mira.informatik.hu-berlin.de> References: <200111052306.fA5N6ix01771@mira.informatik.hu-berlin.de> Message-ID: * Christophe Prévost | | Can i set the DTD to use to the validating parser ? * Martin v. Loewis | | With xml.parsers.xmlproc, I believe this is possible. It _is_ possible, but the reference management necessary to carry it off is such that I don't expect anyone who doesn't know the code very well to be able to do it. It's been a goal for a long time to be able to offer this functionality through the API, since I think this would be a good way to fix one of the basic design flaws of XML 1.0, but so far the necessary time has failed to materialize. --Lars M. From uche.ogbuji@fourthought.com Tue Nov 6 07:16:08 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 06 Nov 2001 00:16:08 -0700 Subject: [XML-SIG] Re: [4suite] Newbie install problems References: <3BE5FB95.5F7C0ED2@cyber-dyne.com> <3BE628BC.D19438A6@fourthought.com> <3BE730FF.7C35DD6E@cyber-dyne.com> Message-ID: <3BE78E38.37C229F5@fourthought.com> "J." wrote: > > Uche Ogbuji wrote: > > > > "J." wrote: > > > > > > Hi, > > > > > > I just installed the foursuite rpm for Python 2.1 > > > (4Suite-0.11.1-py2.1.i686.rpm) on a Mandrake 8.0 > > > installation, and when I try to run the test suite I get a failed import > > > > > > from xml.FtCore import FtException > > > > > > Sure enough, grepping around I can't find anything named FtCore in the > > > whole /usr/lib/python2.1 > > > heirarchy. Have I hosed the installation somehow? Is this a name > > > collision with the standard xml > > > package? > > > > You need to install PyXXML 0.6.6, which is a prerequisite. > > Hmm, That's not what the FAQ says at 4suite.org . The FAQ is out of date. We're working on it. > Also, I notice that the sourceforge site for PyXML has RPMs for 1.5.2 > and 2.0 but not for 2.1. If I install the 2.0 RPM, then naively copying > 'cp -a' the _xmlplus directory from my 2.0 site-packages to the 2.1 > site-packages doesn't work either, but I probably should have known > better anyways. :) This shouldn't be so (therefore I'm cc'ing the XML-SIG). However, it's so easy to build, I think you shouldn't worry about the lack of an RPM. Just get the source package, untar it and run "python setup.py install". That's all. 4Suite is just as easy to install. > I'm also puzzled because I thought that the release > notes on python.org for the 2.1 release said that PyXML was mostly > merged into the standard xml package. Anyhow, I'm not trying to be a > pain, just trying to get this running. Any help at all would be greatly > appreciated. Certainly not a pain. The expat parser, the SAX package and minidom did get merged into Python 2.0, but there are other things 4Suite needs such as the boolean extension and 4DOM. Good luck. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji@fourthought.com Tue Nov 6 19:54:23 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 06 Nov 2001 12:54:23 -0700 Subject: [XML-SIG] [Fwd: [Fwd: [4suite] Tutorials on 4Suite/PyXML DOM and XSLT]] Message-ID: <3BE83FEF.59965393@fourthought.com> This is a multi-part message in MIME format. --------------C32C630BD6F543F3780D5C35 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Looks as if the IBM dW tutorials on 4DOM, 4XPath and 4XSLT are now all fixed. Here's the announcement I made earlier: http://lists.fourthought.com/pipermail/4suite/2001-October/002704.html -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management --------------C32C630BD6F543F3780D5C35 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Return-Path: Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id f9VIWZ909467 for ; Wed, 31 Oct 2001 11:32:39 -0700 Received: from northrelay02.pok.ibm.com (northrelay02.pok.ibm.com [9.117.200.22]) by e1.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id NAA414106; Wed, 31 Oct 2001 13:29:23 -0500 Received: from d27ml601.rchland.ibm.com (d27ml601.rchland.ibm.com [9.10.226.13]) by northrelay02.pok.ibm.com (8.11.1m3/NCO v5.00) with ESMTP id f9VIVs650684; Wed, 31 Oct 2001 13:31:54 -0500 Importance: Normal Subject: Re: [Fwd: [4suite] Tutorials on 4Suite/PyXML DOM and XSLT] To: Uche Ogbuji Cc: "Guy Robinson" From: "Nancy Dunn" Date: Wed, 31 Oct 2001 12:29:24 -0600 Message-ID: X-MIMETrack: Serialize by Router on d27ml601/27/M/IBM(Release 5.0.8 |June 18, 2001) at 10/31/2001 12:29:26 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii We fixed the registration page problem with the new 4Suite tutorial once the production people went on duty in Raleigh this morning. I've logged on successfully this moriung, and I trust that Guy will now be able to log on to the tutorial. Thanks for letting us know about the problem. Best, Nancy Dunn editor, XML zone, IBM developerWorks IBM office in San Francisco (Pacific Time): 415 545-2139 tie line: 473-2139 mobile phone 415 613 5561 FAX: 415 545-3588 ndunn@us.ibm.com ====================================== Need it? Get it. http://www.ibm.com/developerWorks ====================================== --------------C32C630BD6F543F3780D5C35-- From uche.ogbuji@fourthought.com Tue Nov 6 23:42:56 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Tue, 06 Nov 2001 16:42:56 -0700 Subject: [XML-SIG] Tutorials on 4Suite/PyXML DOM and XSLT Message-ID: <3BE87580.8D449702@fourthought.com> This is a multi-part message in MIME format. --------------1E82772731E777169582AB86 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Just passing on what Bill found. Looks as if the links I posted are dead ends (though they shouldn't be). Bill found a working entry point. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management --------------1E82772731E777169582AB86 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Return-Path: Received: from majordomo.vol.cz (smtp4.vol.cz [195.250.128.43]) by yen.fourthought.com (8.11.2/8.11.2) with ESMTP id fA6MRbU22840 for ; Tue, 6 Nov 2001 15:27:37 -0700 Received: from rfa.org (dateld-131.dialup.vol.cz [212.20.106.131]) by majordomo.vol.cz (8.11.3/8.11.3) with ESMTP id fA6MRW897880 for ; Tue, 6 Nov 2001 23:27:35 +0100 (CET) (envelope-from bill@rfa.org) Sender: bill@majordomo.vol.cz Message-ID: <3BE86313.E455EE90@rfa.org> Date: Tue, 06 Nov 2001 23:24:19 +0100 From: Bill Eldridge X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.0-test5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Uche Ogbuji Subject: Re: [Fwd: [Fwd: [4suite] Tutorials on 4Suite/PyXML DOM and XSLT]] References: <3BE83FEF.59965393@fourthought.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Okay, it worked going from links on the following page: http://www-105.ibm.com/developerworks/education.nsf/dw/xml-onlinecourse-bynewest?OpenDocument&Count=500 Uche Ogbuji wrote: > > Looks as if the IBM dW tutorials on 4DOM, 4XPath and 4XSLT are now all > fixed. > > Here's the announcement I made earlier: > > http://lists.fourthought.com/pipermail/4suite/2001-October/002704.html > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Boulder, CO 80301-2537, USA > XML strategy, XML tools (http://4Suite.org), knowledge management > > ------------------------------------------------------------------------ > > Subject: Re: [Fwd: [4suite] Tutorials on 4Suite/PyXML DOM and XSLT] > Date: Wed, 31 Oct 2001 12:29:24 -0600 > From: "Nancy Dunn" > To: Uche Ogbuji > CC: "Guy Robinson" > > We fixed the registration page problem with the new 4Suite tutorial once > the production people went on duty in Raleigh this morning. I've logged on > successfully this moriung, and I trust that Guy will now be able to log on > to the tutorial. > > Thanks for letting us know about the problem. > > Best, > Nancy Dunn > editor, XML zone, IBM developerWorks > IBM office in San Francisco (Pacific Time): 415 545-2139 > tie line: 473-2139 > mobile phone 415 613 5561 > FAX: 415 545-3588 > ndunn@us.ibm.com > > ====================================== > Need it? Get it. > http://www.ibm.com/developerWorks > ====================================== -- Bill Eldridge Radio Free Asia bill@rfa.org --------------1E82772731E777169582AB86-- From rodsenra@gpr.com.br Wed Nov 7 14:12:58 2001 From: rodsenra@gpr.com.br (Rodrigo Senra) Date: Wed, 07 Nov 2001 12:12:58 -0200 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars Message-ID: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> Hi, I don't know if I stepped in a bug or it is just my newbieness ;o) Trying to parse the file: ------------------- pau.xml ------------------- This line is ok. This line has characters ISO-8859-1 with accents: Houve mudanças nos preços? Linha ok. ------------------ end of file pau.xml -------- with the script: ------------------ file teste.py ---------------------------- from xml.dom.ext.reader import Sax2 from xml.dom.ext import PrettyPrint doc = Sax2.FromXmlStream(open('pau.xml')) PrettyPrint(doc,encoding='iso-8859-1') -------------------- end of teste.py script ------------ produces: ----------- stdout trace ------------- This line is ok. Linha ok. ----------- end of trace ------------- Am I doing something obviously wrong ? Should I try another parser ? TIA Senra Rodrigo Senra Computer Engineer (GPr Sistemas Ltda) rodsenra@gpr.com.br MSc Student (IC - UNICAMP) Rodrigo.Senra@ic.unicamp.br http://www.ic.unicamp.br/~921234 (LinUxer 217.243) (ICQ 114477550) From Alexandre.Fayolle@logilab.fr Wed Nov 7 14:34:30 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 7 Nov 2001 15:34:30 +0100 (CET) Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> Message-ID: On Wed, 7 Nov 2001, Rodrigo Senra wrote: > Hi, > > I don't know if I stepped in a bug or it is just my newbieness ;o) > Trying to parse the file: try adding as the first line of pau.xml. The default encoding for an xml document is UTF-8. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From tpassin@home.com Wed Nov 7 14:43:38 2001 From: tpassin@home.com (Thomas B. Passin) Date: Wed, 7 Nov 2001 09:43:38 -0500 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> Message-ID: <001401c1679a$969fe3d0$7cac1218@cj64132b> It seems that this xml file should caused an exception, since it is not well-formed: the actual encoding does not match the presumed encoding (namely, utf-8). The fact that the parse partially succeeded is disturbing. I tried this example myself. I am running pyxml 6.6 on Windows2000. I did get an exception, but it was from the pretty-printer, not the parser. Adding an xml declaration declaring the actual iso-8859-1 encoding did in fact allow the program to complete properly, as expected. Why didn't the parser complain? Cheers, Tom P [Rodrigo Senra] > > I don't know if I stepped in a bug or it is just my newbieness ;o) > Trying to parse the file: > > ------------------- pau.xml ------------------- > > > This line is ok. > This line has characters ISO-8859-1 with accents: Houve mudanças nos > preços? > Linha ok. > > > ------------------ end of file pau.xml -------- > > with the script: > > ------------------ file teste.py ---------------------------- > from xml.dom.ext.reader import Sax2 > from xml.dom.ext import PrettyPrint > > doc = Sax2.FromXmlStream(open('pau.xml')) > PrettyPrint(doc,encoding='iso-8859-1') > -------------------- end of teste.py script ------------ > > produces: > > ----------- stdout trace ------------- > > > > > This line is ok. > > Linha ok. > > > ----------- end of trace ------------- > > Am I doing something obviously wrong ? Should I try another parser ? From dkgunter@lbl.gov Wed Nov 7 15:02:35 2001 From: dkgunter@lbl.gov (Dan Gunter) Date: Wed, 07 Nov 2001 07:02:35 -0800 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> Message-ID: <3BE94D0B.1080101@lbl.gov> The simple answer is that the XML parser is illiterate. Since there are=20 no bit patterns that are illegal in UTF-8, I don't see how the parser=20 could know that the chosen encoding produced, from the user's=20 perspective, garbage. The pretty-printer, on the other hand, knows the=20 difference between printable and non-printable characters and can thus=20 complain. Dan Thomas B. Passin wrote: > It seems that this xml file should caused an exception, since it is not > well-formed: the actual encoding does not match the presumed encoding > (namely, utf-8). The fact that the parse partially succeeded is distur= bing. >=20 > I tried this example myself. I am running pyxml 6.6 on Windows2000. I= did > get an exception, but it was from the pretty-printer, not the parser. > Adding an xml declaration declaring the actual iso-8859-1 encoding did = in > fact allow the program to complete properly, as expected. >=20 > Why didn't the parser complain? >=20 > Cheers, >=20 > Tom P >=20 >=20 > [Rodrigo Senra] >=20 >=20 >> I don't know if I stepped in a bug or it is just my newbieness ;o) >> Trying to parse the file: >> >>------------------- pau.xml ------------------- >> >> >> This line is ok. >> This line has characters ISO-8859-1 with accents: Houve mudan=E7as = nos >>pre=E7os? >> Linha ok. >> >> >>------------------ end of file pau.xml -------- >> >>with the script: >> >>------------------ file teste.py ---------------------------- >>from xml.dom.ext.reader import Sax2 >>from xml.dom.ext import PrettyPrint >> >>doc =3D Sax2.FromXmlStream(open('pau.xml')) >>PrettyPrint(doc,encoding=3D'iso-8859-1') >>-------------------- end of teste.py script ------------ >> >>produces: >> >>----------- stdout trace ------------- >> >> >> >> >> This line is ok. >> >> Linha ok. >> >> >>----------- end of trace ------------- >> >>Am I doing something obviously wrong ? Should I try another parser ? >> >=20 >=20 >=20 > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig >=20 From morus.walter@tanto-xipolis.de Wed Nov 7 16:22:24 2001 From: morus.walter@tanto-xipolis.de (Morus Walter) Date: Wed, 7 Nov 2001 17:22:24 +0100 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <3BE94D0B.1080101@lbl.gov> References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> Message-ID: <15337.24512.44723.684177@morus.xipolis.net> Dan Gunter writes: > The simple answer is that the XML parser is illiterate. Since there a= re=20 > no bit patterns that are illegal in UTF-8, I don't see how the parser= =20 > could know that the chosen encoding produced, from the user's=20 > perspective, garbage. The pretty-printer, on the other hand, knows th= e=20 > difference between printable and non-printable characters and can thu= s=20 > complain. >=20 Sorry. This is wrong. There are a lot of byte combinations that can never occur in UTF-8. E.g. there can never be a single 8-bit character between 7-bi= t characters ([\x20-\x7F][\x80-\xFF][\x20-\x7F]). So the parser could check, whether the byte stream forms valid utf-8. greetings =09Morus --=20 Th. Morus Walter =B7 Manager Content & Data Development xipolis.net GmbH & Co. KG =B7 Schellingstrasse 35 =B7 80799 M=FCnchen From fg@nuxeo.com Wed Nov 7 17:47:29 2001 From: fg@nuxeo.com (Florent Guillaume) Date: 7 Nov 2001 17:47:29 GMT Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> Message-ID: <9sbs3h$7ng$1@dev.in.nuxeo.com> > The simple answer is that the XML parser is illiterate. Since there are > no bit patterns that are illegal in UTF-8, I don't see how the parser > could know that the chosen encoding produced, from the user's > perspective, garbage. On the contrary, there are a lot of bit-patterns that are illegal in UTF-8, and an application that fails to identify them as such can be subject to many security holes. See a number of Microsoft IIS "unicode" holes. -- Florent -- Florent Guillaume, Nuxeo SARL (Paris, France) +33 1 40 33 79 10 http://nuxeo.com mailto:fg@nuxeo.com From DKGunter@lbl.gov Wed Nov 7 18:15:33 2001 From: DKGunter@lbl.gov (Dan Gunter) Date: Wed, 07 Nov 2001 10:15:33 -0800 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> <15337.24512.44723.684177@morus.xipolis.net> Message-ID: <3BE97A45.1B3746CC@lbl.gov> This is a multi-part message in MIME format. --------------173FBD0C4E7173794F79C54C Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by portnoy.lbl.gov id fA7IFXh19857 I stand corrected. That's what you get for skimming the first reference that pops up in Google :) Of course, checking an _arbitrary_ encoding for correctness seems like a real burden on the parser, but maybe UTF-8 is so common it should be checked. This time, I will do the wise thing and defer to the experts on this issue. Dan Morus Walter wrote: >=20 > Dan Gunter writes: > > The simple answer is that the XML parser is illiterate. Since there a= re > > no bit patterns that are illegal in UTF-8, I don't see how the parser > > could know that the chosen encoding produced, from the user's > > perspective, garbage. The pretty-printer, on the other hand, knows th= e > > difference between printable and non-printable characters and can thu= s > > complain. > > > Sorry. This is wrong. > There are a lot of byte combinations that can never occur > in UTF-8. E.g. there can never be a single 8-bit character between 7-bi= t > characters ([\x20-\x7F][\x80-\xFF][\x20-\x7F]). > So the parser could check, whether the byte stream forms valid utf-8. >=20 > greetings > Morus >=20 > -- > Th. Morus Walter =B7 Manager Content & Data Development > xipolis.net GmbH & Co. KG =B7 Schellingstrasse 35 =B7 80799 M=FCnchen >=20 > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig --=20 [ Dan Gunter, LBNL - http://www-didc.lbl.gov/~dang/ ] --------------173FBD0C4E7173794F79C54C Content-Type: text/x-vcard; charset=us-ascii; name="dkgunter.vcf" Content-Description: Card for Dan Gunter Content-Disposition: attachment; filename="dkgunter.vcf" Content-Transfer-Encoding: 7bit begin:vcard n:; x-mozilla-html:FALSE org:LBNL;DIDC group, DSD Division, NERSC adr;quoted-printable:;;One Cyclotron Road=0D=0AM/S 50B-2239;Berkeley;CA;94720;USA adr:;;;;;; version:2.1 x-mozilla-cpt:;-29184 end:vcard --------------173FBD0C4E7173794F79C54C-- From larsga@garshol.priv.no Wed Nov 7 19:25:06 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 07 Nov 2001 20:25:06 +0100 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <3BE97A45.1B3746CC@lbl.gov> References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> <15337.24512.44723.684177@morus.xipolis.net> <3BE97A45.1B3746CC@lbl.gov> Message-ID: * Dan Gunter | | Of course, checking an _arbitrary_ encoding for correctness seems | like a real burden on the parser, but maybe UTF-8 is so common it | should be checked. All encodings should be checked for correctness, although not all of them can be. Most single-byte encodings (like the ISO 8859-x series) have no illegal bit sequences, and so cannot be checked with anything short of full-scale AI. Most multi-byte encodings, however, have illegal bit sequences and converters can and should check these for correctness. This is really no different from or less important than verifying syntactical correctness. What sets UTF-8 apart in this context is that it is the default encoding for XML documents, so that if you find illegal UTF-8 bit sequences you can be pretty sure that the user is not using UTF-8, but has just omitted to declare that fact. --Lars M. From martin@v.loewis.de Wed Nov 7 21:57:42 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Wed, 7 Nov 2001 22:57:42 +0100 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <001401c1679a$969fe3d0$7cac1218@cj64132b> (tpassin@home.com) References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> Message-ID: <200111072157.fA7Lvg602553@mira.informatik.hu-berlin.de> > It seems that this xml file should caused an exception, since it is > not well-formed: the actual encoding does not match the presumed > encoding (namely, utf-8). The fact that the parse partially > succeeded is disturbing. Indeed. IMO, Expat should detect the error, but it doesn't, instead it treats all contents >128 as proper UTF-8 (remember that all markup is ASCII). So Expat passes it to the application (pyexpat), which invokes the UTF-8 decoder, which fails. Due to a bug, this exception is lost, but the entire chunk of data reported by expat isn't reported to the Python application, either. This is now fixed in pyexpat.c 1.42; thanks for the report. Regards, Martin From tpassin@home.com Thu Nov 8 00:19:25 2001 From: tpassin@home.com (Thomas B. Passin) Date: Wed, 7 Nov 2001 19:19:25 -0500 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <200111072157.fA7Lvg602553@mira.informatik.hu-berlin.de> Message-ID: <002101c167eb$0600dd60$7cac1218@cj64132b> [Martin v. Loewis] > > It seems that this xml file should caused an exception, since it is > > not well-formed: the actual encoding does not match the presumed > > encoding (namely, utf-8). The fact that the parse partially > > succeeded is disturbing. > > Indeed. IMO, Expat should detect the error, but it doesn't, instead it > treats all contents >128 as proper UTF-8 (remember that all markup is > ASCII). So Expat passes it to the application (pyexpat), which invokes > the UTF-8 decoder, which fails. Due to a bug, this exception is lost, > but the entire chunk of data reported by expat isn't reported to the > Python application, either. > > This is now fixed in pyexpat.c 1.42; thanks for the report. Excellent. Thanks, Martin. Tom P From morus.walter@tanto-xipolis.de Thu Nov 8 08:05:24 2001 From: morus.walter@tanto-xipolis.de (Morus Walter) Date: Thu, 8 Nov 2001 09:05:24 +0100 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> <15337.24512.44723.684177@morus.xipolis.net> <3BE97A45.1B3746CC@lbl.gov> Message-ID: <15338.15556.467017.214971@morus.xipolis.net> Lars Marius Garshol writes: >=20 > * Dan Gunter > |=20 > | Of course, checking an _arbitrary_ encoding for correctness seems > | like a real burden on the parser, but maybe UTF-8 is so common it > | should be checked. >=20 > All encodings should be checked for correctness, although not all of > them can be. Most single-byte encodings (like the ISO 8859-x series) > have no illegal bit sequences, and so cannot be checked with anything= > short of full-scale AI. Most multi-byte encodings, however, have > illegal bit sequences and converters can and should check these for > correctness. This is really no different from or less important than > verifying syntactical correctness. >=20 Doesn't handling non standard (standard with respect to xml) encodings imply conversion to unicode somehow? E.g. inn XML names are further restricted to specific unicode character= s... I mean even ASCII contains characters that are not allowed in XML docum= ents (such as 0x00, 0x01...). The same aplies to ISO 8859-x (since they are ascii based). Apart from that, any byte within [\x00-\x7F\xA0-\xFF] is=20= valid ISO 8859-x so checking is rather easy than requiring AI. (There's no requirement that the content makes sense ;-)) Of course a parser might be sloppy on some of these restrictions due to= performance considerations. However it should be clear, that it fails t= o be a conforming parser then. greetings =09Morus --=20 Th. Morus Walter =B7 Manager Content & Data Development xipolis.net GmbH & Co. KG =B7 Schellingstrasse 35 =B7 80799 M=FCnchen From martin@v.loewis.de Thu Nov 8 08:28:47 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 8 Nov 2001 09:28:47 +0100 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <15338.15556.467017.214971@morus.xipolis.net> (message from Morus Walter on Thu, 8 Nov 2001 09:05:24 +0100) References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> <15337.24512.44723.684177@morus.xipolis.net> <3BE97A45.1B3746CC@lbl.gov> <15338.15556.467017.214971@morus.xipolis.net> Message-ID: <200111080828.fA88Sl301269@mira.informatik.hu-berlin.de> > Doesn't handling non standard (standard with respect to xml) encodings > imply conversion to unicode somehow? > E.g. inn XML names are further restricted to specific unicode characters... That certainly implied additional well-formedness constraints on the encoding. However, in real life, these constraints never lead to a rejection of a document: Most users restrict themselves to ASCII in element and attribute names, and use non-ASCII characters only in character content (i.e. not in markup). Therefore, encoding problems in a single-byte encoding usually won't be detected. > I mean even ASCII contains characters that are not allowed in XML documents > (such as 0x00, 0x01...). That doesn't help much. *No* encoding allows to use those bytes in XML (except for UTF-16). So if the only error in the document is that the parser uses the wrong encoding, then this aspect won't lead to a problem detection, either. > The same aplies to ISO 8859-x (since they are ascii based). Apart > from that, any byte within [\x00-\x7F\xA0-\xFF] is valid ISO 8859-x > so checking is rather easy than requiring AI. How does that help? If the document was declared as iso-8859-1, but really is iso-8859-2, we cannot detect that fact. If the document really is KOI-8R, we cannot detect that fact. If the document really is UTF-8, we cannot detect that fact. About the only case that *can* be detected if the document is declared UTF-8 (e.g. by leaving out the xml header), and it isn't. > Of course a parser might be sloppy on some of these restrictions due to > performance considerations. However it should be clear, that it fails to > be a conforming parser then. Can you give a specific example of a document that contains an error regarding the declaration of an incorrect encoding which can and should be detected? Regards, Martin From morus.walter@tanto-xipolis.de Thu Nov 8 09:22:13 2001 From: morus.walter@tanto-xipolis.de (Morus Walter) Date: Thu, 8 Nov 2001 10:22:13 +0100 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <200111080828.fA88Sl301269@mira.informatik.hu-berlin.de> References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> <15337.24512.44723.684177@morus.xipolis.net> <3BE97A45.1B3746CC@lbl.gov> <15338.15556.467017.214971@morus.xipolis.net> <200111080828.fA88Sl301269@mira.informatik.hu-berlin.de> Message-ID: <15338.20165.464064.678981@morus.xipolis.net> Martin v. Loewis writes: > > I mean even ASCII contains characters that are not allowed in XML documents > > (such as 0x00, 0x01...). > > That doesn't help much. *No* encoding allows to use those bytes in XML > (except for UTF-16). So if the only error in the document is that the > parser uses the wrong encoding, then this aspect won't lead to a > problem detection, either. > Sorry I don't know about all encodings. I don't think that there is a principal problem to define encodings that use x00 for 'a'. > > The same aplies to ISO 8859-x (since they are ascii based). Apart > > from that, any byte within [\x00-\x7F\xA0-\xFF] is valid ISO 8859-x > > so checking is rather easy than requiring AI. > > How does that help? If the document was declared as iso-8859-1, but > really is iso-8859-2, we cannot detect that fact. If the document > really is KOI-8R, we cannot detect that fact. If the document really > is UTF-8, we cannot detect that fact. > > About the only case that *can* be detected if the document is declared > UTF-8 (e.g. by leaving out the xml header), and it isn't. > > > Of course a parser might be sloppy on some of these restrictions due to > > performance considerations. However it should be clear, that it fails to > > be a conforming parser then. > > Can you give a specific example of a document that contains an error > regarding the declaration of an incorrect encoding which can and > should be detected? > I would speak of an encoding error if the content of a xml text is erroneous with respect to the provided encoding info. So \129 (where \... stands for the byte with decimal number ...) is incorrect, since \129 is not defined in iso-8859-1. Of course you cannot tell if a text in iso-latin1 is said to be encoded in iso-latin2 since they are formally equivalent (and you will output garbage if you convert that to unicode). To me encoding checking is a formal check and if there are formally equivalent encodings there will be no difference. If Lars Marius and you are talking about deciding if the encoding of a text and the declaration of the encoding match, I agree that you need AI for that. But that does not mean that you cannot check anything. greetings Morus From martin@v.loewis.de Thu Nov 8 09:59:32 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 8 Nov 2001 10:59:32 +0100 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <15338.20165.464064.678981@morus.xipolis.net> (message from Morus Walter on Thu, 8 Nov 2001 10:22:13 +0100) References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> <15337.24512.44723.684177@morus.xipolis.net> <3BE97A45.1B3746CC@lbl.gov> <15338.15556.467017.214971@morus.xipolis.net> <200111080828.fA88Sl301269@mira.informatik.hu-berlin.de> <15338.20165.464064.678981@morus.xipolis.net> Message-ID: <200111080959.fA89xWJ01601@mira.informatik.hu-berlin.de> > Sorry I don't know about all encodings. I don't think that there is a > principal problem to define encodings that use x00 for 'a'. Well, there is: It wouldn't be an ASCII superset. Most real-life encodings are ASCII supersets, unless they are EBCDIC superset. Anything else would not survive long (except for special markets, such as GSM short messages). *This* specific encoding would have another problem: it wouldn't be C compatible, since the \0 byte terminates a string, independent of the encoding. > I would speak of an encoding error if the content of a xml text is > erroneous with respect to the provided encoding info. > So > > \129 > (where \... stands for the byte with decimal number ...) > is incorrect, since \129 is not defined in iso-8859-1. [Assuming you mean the character decimal 129 here; \129 is not a valid octal escape] It certainly is. It represents the control character HOP (high octet present), #x0081; see http://208.56.196.240/misc/ISO-8859-1.HTML *All* bytes are valid charaters in ISO-8859-1 (it is a common misconception about Latin-1 that 128-159 are not defined). Furthermore, this character (HOP) is even valid in XML character data: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] As you can see, only the low control block (C0) is partially excluded; the high control block (C1, #x0080-#xx009F) is completely valid in XML character data. > Of course you cannot tell if a text in iso-latin1 is said to be > encoded in iso-latin2 since they are formally equivalent (and you will > output garbage if you convert that to unicode). I cannot understand this statement. Latin-1 and Latin-2, are *not* formally equivalent: even though they use the same bytes (namely, all of them), but the bytes denote different characters. > But that does not mean that you cannot check anything. It seemed to me that you suggested that you can formally check whether an input really is Latin-1; you cannot. You cannot formally check any of the ISO-8859 encodings, unless you are presented with an EBCDIC file, in which case even markup is encoded differently. You cannot check UTF-16, either, unless you happen to run into a character that has been excluded (such as an unpaired surrogate). So I come back to my original claim: the only thing you can check in practice is whether something could be UTF-8. Of course, it is even possible to come up with a Latin-1 text that decodes as UTF-8 successfully. Regards, Martin From fdrake@acm.org Thu Nov 8 12:52:54 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 8 Nov 2001 07:52:54 -0500 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <200111072157.fA7Lvg602553@mira.informatik.hu-berlin.de> References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <200111072157.fA7Lvg602553@mira.informatik.hu-berlin.de> Message-ID: <15338.32806.347736.172319@grendel.zope.com> Martin v. Loewis writes: > Indeed. IMO, Expat should detect the error, but it doesn't, instead it > treats all contents >128 as proper UTF-8 (remember that all markup is I expect that this will be fixed for Expat 1.95.3. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From mal@lemburg.com Thu Nov 8 14:36:59 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 08 Nov 2001 15:36:59 +0100 Subject: [XML-SIG] Strings or Unicode ? Message-ID: <3BEA988B.BDEFE28B@lemburg.com> I'm working on a fast XML scanner and was wondering what the prefered method is for dealing with encodings and Unicode. There are basically two options: 1. find out the XML encoding by looking at the header, decode the data into Unicode, run the parser over the Unicode string and let it generate Unicode tag names, attributes, etc. 2. run the parser over the raw string data and let it generate raw string tag names, attributes, etc., convert the tag names, attributes, data etc. to Unicode on a case-by-case basis and only if needed Both could be implemented from a single source file, but I wonder whether it's worth providing option 2. Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@v.loewis.de Thu Nov 8 16:04:59 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 8 Nov 2001 17:04:59 +0100 Subject: [XML-SIG] Strings or Unicode ? In-Reply-To: <3BEA988B.BDEFE28B@lemburg.com> (mal@lemburg.com) References: <3BEA988B.BDEFE28B@lemburg.com> Message-ID: <200111081604.fA8G4xF01337@mira.informatik.hu-berlin.de> > 1. find out the XML encoding by looking at the header, > decode the data into Unicode, > run the parser over the Unicode string and let it > generate Unicode tag names, attributes, etc. That is what xmlproc does. It supports chunked input (i.e. feeding), and converts any new chunk using the established encoding. Processing the first chunk is tricky: it first tries to do encoding autodetection. If that does not give any clue, it parses the xml declaration as a byte string until it sees the encoding declaration. It then recodes the first chunk using the established encoding, and trusts that the current position in the string is good for the Unicode string also. Since XML only supports ASCII supersets as encodings (*), I think this is a reliable assumption. Regards, Martin (*) Other encodings apparently are only supported when some higher-level protocol already reports the encoding used, or if EBCDIC is autodetected. From mal@lemburg.com Thu Nov 8 16:18:35 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 08 Nov 2001 17:18:35 +0100 Subject: [XML-SIG] Strings or Unicode ? References: <3BEA988B.BDEFE28B@lemburg.com> <200111081604.fA8G4xF01337@mira.informatik.hu-berlin.de> Message-ID: <3BEAB05B.28272CD4@lemburg.com> "Martin v. Loewis" wrote: > > > 1. find out the XML encoding by looking at the header, > > decode the data into Unicode, > > run the parser over the Unicode string and let it > > generate Unicode tag names, attributes, etc. > > That is what xmlproc does. It supports chunked input (i.e. feeding), > and converts any new chunk using the established encoding. > > Processing the first chunk is tricky: it first tries to do encoding > autodetection. If that does not give any clue, it parses the xml > declaration as a byte string until it sees the encoding > declaration. It then recodes the first chunk using the established > encoding, and trusts that the current position in the string is good > for the Unicode string also. Since XML only supports ASCII supersets > as encodings (*), I think this is a reliable assumption. Thanks for the insight. The question still remains, though: is this an acceptable approach in practice ? (Converting Unicode back to strings has its cost and it might be worthwhile having the 8-bit string approach available too.) > (*) Other encodings apparently are only supported when some > higher-level protocol already reports the encoding used, or if EBCDIC > is autodetected. Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mlh@idi.ntnu.no Thu Nov 8 17:30:17 2001 From: mlh@idi.ntnu.no (Magnus Lie Hetland) Date: Thu, 8 Nov 2001 18:30:17 +0100 Subject: [XML-SIG] Expat problems again... Message-ID: <06a001c1687b$08c77500$156ff181@idi.ntnu.no> I guess this is jut me being stupid, but I'm having problems with Expat again... I installed Python 2.2b1 a while ago, and now realised that I hadn't installed PyXML/4Suite; so I ran the "setup.py install" in both (without doing anything about Expat which I have lying about), and now get the following: Python 2.2b1 (#4, Nov 6 2001, 12:21:03) [GCC 3.0.1] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> import xml.xslt Traceback (most recent call last): File "", line 1, in ? File "/home/idi/f/mlh/python/current/lib/python2.2/site-packages/_xmlplus/ xslt/__init__.py", line 38, in ? from Ft.Lib import pDomlette File "/home/idi/f/mlh/python/current/lib/python2.2/site-packages/Ft/Lib/pD omlette.py", line 718, in ? from pDomletteReader import * File "/home/idi/f/mlh/python/current/lib/python2.2/site-packages/Ft/Lib/pD omletteReader.py", line 27, in ? from xml.parsers import expat File "/home/idi/f/mlh/python/current/lib/python2.2/site-packages/_xmlplus/ parsers/expat.py", line 4, in ? from pyexpat import * ImportError: ld.so.1: python: fatal: relocation error: file /home/idi/f/mlh/ python/current/lib/python2.2/site-packages/_xmlplus/parsers/pyexpat.so: symb ol _PyGC_Insert: referenced symbol not found >>> I guess I haven't linked properly with libexpat or something, but I don't find it exactly obvious how to do that and I don't remember where I found out the last time... (May it be that some residual stuff compiled against Python 2.1 is still lingering about, destroying everything?) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.sf.net From rodsenra@gpr.com.br Fri Nov 9 16:57:03 2001 From: rodsenra@gpr.com.br (Rodrigo Senra) Date: Fri, 9 Nov 2001 14:57:03 -0200 Subject: [XML-SIG] Processing xml files with ISO 8859-1 chars In-Reply-To: <3BE94D0B.1080101@lbl.gov> References: <5.1.0.14.0.20011107120710.00a53d30@pop.sao.terra.com.br> <001401c1679a$969fe3d0$7cac1218@cj64132b> <3BE94D0B.1080101@lbl.gov> Message-ID: <20011109145703.13623f94.rodsenra@gpr.com.br> |Dan Gunter , |on Wed, 07 Nov 2001 07:02:35 -0800 |about Re: [XML-SIG] Processing xml files with ISO 8859-1 chars > The simple answer is that the XML parser is illiterate. Since there are > no bit patterns that are illegal in UTF-8, I don't see how the parser > could know that the chosen encoding produced, from the user's > perspective, garbage. The pretty-printer, on the other hand, knows the > difference between printable and non-printable characters and can thus > complain. Thank you all for your replies! Adding the header with encoding='iso-8859-1' was sufficient. best regards Senra ___ Rodrigo Senra Computer Engineer (GPr Sistemas Ltda) rodsenra@gpr.com.br MSc Student (IC - UNICAMP) Rodrigo.Senra@ic.unicamp.br http://www.ic.unicamp.br/~921234 (LinUxer 217.243) (ICQ 114477550) From fdrake@acm.org Thu Nov 8 18:44:09 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 8 Nov 2001 13:44:09 -0500 Subject: [XML-SIG] Expat problems again... In-Reply-To: <06a001c1687b$08c77500$156ff181@idi.ntnu.no> References: <06a001c1687b$08c77500$156ff181@idi.ntnu.no> Message-ID: <15338.53881.234732.720795@grendel.zope.com> Magnus Lie Hetland writes: > I guess I haven't linked properly with libexpat or something, but > I don't find it exactly obvious how to do that and I don't remember > where I found out the last time... Are you working with Python from CVS? I'm going to guess that pyexpat.so did not get rebuilt when the GC code changed. The first thing to try is to use a clean source dir and build again. Before installing, make sure you can "import pyexpat" in the newly built interpreter before installing. > (May it be that some residual stuff compiled against Python 2.1 is > still lingering about, destroying everything?) Probably not, unless you're sharing compiled binary extensions across Python versions. This would not be the case if you take the defaults from distutils. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From mlh@idi.ntnu.no Thu Nov 8 20:27:14 2001 From: mlh@idi.ntnu.no (Magnus Lie Hetland) Date: Thu, 8 Nov 2001 21:27:14 +0100 Subject: [XML-SIG] Expat problems again... References: <06a001c1687b$08c77500$156ff181@idi.ntnu.no> <15338.53881.234732.720795@grendel.zope.com> Message-ID: <074201c16893$c0af7510$156ff181@idi.ntnu.no> From: "Fred L. Drake, Jr." > Are you working with Python from CVS? No - the plain 2.2b1 tarball... > I'm going to guess that > pyexpat.so did not get rebuilt when the GC code changed. That seems to be the case, yes. I used the same PyXML I had lying around and had used with a previous Python version (some alpha, I guess). Anyway, I removed everything and installed it from scratch, and now it works perfectly. Nothing problematic about it at all :) > -Fred -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.sf.net From fdrake@acm.org Thu Nov 8 21:32:53 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 8 Nov 2001 16:32:53 -0500 Subject: [XML-SIG] Expat problems again... In-Reply-To: <074201c16893$c0af7510$156ff181@idi.ntnu.no> References: <06a001c1687b$08c77500$156ff181@idi.ntnu.no> <15338.53881.234732.720795@grendel.zope.com> <074201c16893$c0af7510$156ff181@idi.ntnu.no> Message-ID: <15338.64005.63528.608476@grendel.zope.com> Magnus Lie Hetland writes: > Anyway, I removed everything and installed it from scratch, > and now it works perfectly. Nothing problematic about it at > all :) --sigh-- There's gotta be a better way... -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From mlh@idi.ntnu.no Thu Nov 8 21:12:54 2001 From: mlh@idi.ntnu.no (Magnus Lie Hetland) Date: Thu, 8 Nov 2001 22:12:54 +0100 Subject: [XML-SIG] Expat problems again... References: <06a001c1687b$08c77500$156ff181@idi.ntnu.no><15338.53881.234732.720795@grendel.zope.com><074201c16893$c0af7510$156ff181@idi.ntnu.no> <15338.64005.63528.608476@grendel.zope.com> Message-ID: <079f01c1689a$21f8a480$156ff181@idi.ntnu.no> From: "Fred L. Drake, Jr." > > Magnus Lie Hetland writes: > > Anyway, I removed everything and installed it from scratch, > > and now it works perfectly. Nothing problematic about it at > > all :) > > --sigh-- > There's gotta be a better way... Before doing this I tried python setup.py clean If that had really removed the rogue files, everything would have been just fine... Or if distutils had some sort of uninstall command... Well, well. At least basic installation of PyXML and 4Suite is very easy. > -Fred -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.sf.net From martin@v.loewis.de Thu Nov 8 22:25:23 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 8 Nov 2001 23:25:23 +0100 Subject: [XML-SIG] Expat problems again... In-Reply-To: <15338.64005.63528.608476@grendel.zope.com> (fdrake@acm.org) References: <06a001c1687b$08c77500$156ff181@idi.ntnu.no> <15338.53881.234732.720795@grendel.zope.com> <074201c16893$c0af7510$156ff181@idi.ntnu.no> <15338.64005.63528.608476@grendel.zope.com> Message-ID: <200111082225.fA8MPNn01289@mira.informatik.hu-berlin.de> > Magnus Lie Hetland writes: > > Anyway, I removed everything and installed it from scratch, > > and now it works perfectly. Nothing problematic about it at > > all :) > > --sigh-- > There's gotta be a better way... I think the real problem is that the GC API changed between 2.1 and 2.2, and that the only "backwards compatibility" provided is a) existing modules fail to load instead of outright crashing b) recompilation of existing modules disables GC support in those modules instead of giving bad code. I guess only few modules will suffer from that, since not much extension types have been supporting GC. Unfortunately, pyexpat is one of them. Regards, Martin From fdrake@acm.org Thu Nov 8 23:38:26 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 8 Nov 2001 18:38:26 -0500 Subject: [XML-SIG] Expat problems again... In-Reply-To: <200111082225.fA8MPNn01289@mira.informatik.hu-berlin.de> References: <06a001c1687b$08c77500$156ff181@idi.ntnu.no> <15338.53881.234732.720795@grendel.zope.com> <074201c16893$c0af7510$156ff181@idi.ntnu.no> <15338.64005.63528.608476@grendel.zope.com> <200111082225.fA8MPNn01289@mira.informatik.hu-berlin.de> Message-ID: <15339.6002.994480.82332@grendel.zope.com> Martin v. Loewis writes: > I think the real problem is that the GC API changed between 2.1 and > 2.2, and that the only "backwards compatibility" provided is > a) existing modules fail to load instead of outright crashing > b) recompilation of existing modules disables GC support in > those modules instead of giving bad code. Seems like an API version change needs to be recorded as well; not sure what more b/w compatibility would cost in maintenance time. > I guess only few modules will suffer from that, since not much > extension types have been supporting GC. Unfortunately, pyexpat is one > of them. Changing the API is a sure way to make sure people don't bother. Ugh. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From martin@v.loewis.de Thu Nov 8 18:33:50 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 8 Nov 2001 19:33:50 +0100 Subject: [XML-SIG] Strings or Unicode ? In-Reply-To: <3BEAB05B.28272CD4@lemburg.com> (mal@lemburg.com) References: <3BEA988B.BDEFE28B@lemburg.com> <200111081604.fA8G4xF01337@mira.informatik.hu-berlin.de> <3BEAB05B.28272CD4@lemburg.com> Message-ID: <200111081833.fA8IXoQ02126@mira.informatik.hu-berlin.de> > The question still remains, though: is this an acceptable approach > in practice ? (Converting Unicode back to strings has its cost and > it might be worthwhile having the 8-bit string approach available > too.) That depends on the processing you want to do after parsing. In general, I'd argue that not having to deal with encodings, but being able to rely that everything is Unicode simplifies the application. The only known drawback is that it may be difficult to restore the original document: You may forget what the input encoding was, and in what places character references had been used. I personally don't consider this as a drawback: In most applications, it is good thing if the application "normalizes" the XML documents, since that reduces the hassles in later processing stages. In the early days after Python 2, there was a desire to make Unicode optional in PyXML, by propagating a "wants_unicode" flag throughout the processing chain. Event though this is still supported in a number of places, I doubt it is used much. Regards, Martin From mal@lemburg.com Fri Nov 9 09:12:06 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 09 Nov 2001 10:12:06 +0100 Subject: [XML-SIG] Strings or Unicode ? References: <3BEA988B.BDEFE28B@lemburg.com> <200111081604.fA8G4xF01337@mira.informatik.hu-berlin.de> <3BEAB05B.28272CD4@lemburg.com> <200111081833.fA8IXoQ02126@mira.informatik.hu-berlin.de> Message-ID: <3BEB9DE6.357533A9@lemburg.com> "Martin v. Loewis" wrote: > > > The question still remains, though: is this an acceptable approach > > in practice ? (Converting Unicode back to strings has its cost and > > it might be worthwhile having the 8-bit string approach available > > too.) > > That depends on the processing you want to do after parsing. In > general, I'd argue that not having to deal with encodings, but being > able to rely that everything is Unicode simplifies the application. > The only known drawback is that it may be difficult to restore the > original document: You may forget what the input encoding was, > and in what places character references had been used. True, even though I believe that you can tune the tools to "remember" this information. For most applications which work on computer generated and computer read data, this probably isn't worth it, but could be worth it for human edited XML files. After having looked into handling the various problems you run into when expanding entities, I found that the all-Unicode approach fits best: you simply don't have the problem of character entities not expanding due to constraints on the range of supported character points. That certainly makes life easier. One thing I'd be curious to know is whether it's common practice to restrict XML tag names and attribute names to Latin-1 or even ASCII... ? I've used the approach of using Latin-1 8-bit strings if possible and reverting to Unicode objects for cases where this doesn't work. I've never seen an XML file with non-ASCII tag names, so I suppose the Unicode case is artificial. FWIW, I've changed my parser to now use Unicode exclusively and to my surprise things got faster :-) > I personally don't consider this as a drawback: In most applications, > it is good thing if the application "normalizes" the XML documents, > since that reduces the hassles in later processing stages. > > In the early days after Python 2, there was a desire to make Unicode > optional in PyXML, by propagating a "wants_unicode" flag throughout > the processing chain. Event though this is still supported in a number > of places, I doubt it is used much. Thanks for the suggestions, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Fri Nov 9 09:21:47 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 09 Nov 2001 10:21:47 +0100 Subject: [XML-SIG] Writing SAX-drivers Message-ID: <3BEBA02B.1CBDE961@lemburg.com> I haven't been able to find any documentation on writing SAX-driver for PyXML -- is there any ? (I've looked at the code in PyXML and the PyXML docs, but the code is undocumented and the docs only mention *using* SAX interface compatible parsers.) Or does PyXML simply use a well-defined standard for drivers which I can lookup on some web-site ? Thanks for any pointers, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From Alexandre.Fayolle@logilab.fr Fri Nov 9 10:09:32 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 9 Nov 2001 11:09:32 +0100 (CET) Subject: [XML-SIG] Writing SAX-drivers In-Reply-To: <3BEBA02B.1CBDE961@lemburg.com> Message-ID: On Fri, 9 Nov 2001, M.-A. Lemburg wrote: > I haven't been able to find any documentation on writing SAX-driver > for PyXML -- is there any ? (I've looked at the code in PyXML and > the PyXML docs, but the code is undocumented and the docs only > mention *using* SAX interface compatible parsers.) > > Or does PyXML simply use a well-defined standard for drivers > which I can lookup on some web-site ? > > Thanks for any pointers, For very simple parsers (parsing non XML data, but this is not the point, you may want to give a look at pypasax (SAX parser for python code http://www.logilab.org/pypasax) and vcalsax (SAX parser for VCAL files http://www.logilab.org/vcalsax)) At the very least the parser should inherit from xml.sax.saxlib.XMLReader class (actually defined in xml.sax.xmlreader). It should provide implementation for the parse(source) method. Other methods of note are the set/getProperty methods. The parse method should call the self._cont_handler.setLocator() method before doing anything else (or omit the call altogether if it doesn't want to or cannot provide a locator object). Then during the parse, it can call the callbakcs of the various registered handlers, most notably startDocument(), endDocument(), startElementNS(), endElementNS(), characters(). For comprehensive documentation on the order of calls, you may want to give a look at the javadoc documentation of SAX2 available at http://sax.sourceforge.net/apidoc/org/xml/sax/XMLReader.html, you'll need to doublecheck for pythonisms in the xml.sax.saxlib module (which provides nice default implementatoin for a number of interfaces required by XMLReader, such as Attributes and InputSource. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From mal@lemburg.com Fri Nov 9 10:32:19 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 09 Nov 2001 11:32:19 +0100 Subject: [XML-SIG] Writing SAX-drivers References: Message-ID: <3BEBB0B3.1971CB09@lemburg.com> Alexandre Fayolle wrote: > > On Fri, 9 Nov 2001, M.-A. Lemburg wrote: > > > I haven't been able to find any documentation on writing SAX-driver > > for PyXML -- is there any ? (I've looked at the code in PyXML and > > the PyXML docs, but the code is undocumented and the docs only > > mention *using* SAX interface compatible parsers.) > > > > Or does PyXML simply use a well-defined standard for drivers > > which I can lookup on some web-site ? > > > > Thanks for any pointers, > > For very simple parsers (parsing non XML data, but this is not the point, > you may want to give a look at pypasax (SAX parser for python code > http://www.logilab.org/pypasax) and vcalsax (SAX parser for VCAL files > http://www.logilab.org/vcalsax)) > > At the very least the parser should inherit from xml.sax.saxlib.XMLReader > class (actually defined in xml.sax.xmlreader). It should provide > implementation for the parse(source) method. Other methods of note are the > set/getProperty methods. > > The parse method should call the self._cont_handler.setLocator() method > before doing anything else (or omit the call altogether if it doesn't want > to or cannot provide a locator object). Then during the parse, it can call > the callbakcs of the various registered handlers, most notably > startDocument(), endDocument(), startElementNS(), endElementNS(), > characters(). > > For comprehensive documentation on the order of calls, you may want to > give a look at the javadoc documentation of SAX2 available at > http://sax.sourceforge.net/apidoc/org/xml/sax/XMLReader.html, you'll need > to doublecheck for pythonisms in the xml.sax.saxlib module (which provides > nice default implementatoin for a number of interfaces required by > XMLReader, such as Attributes and InputSource. Thank you very much for all the details ! I appreciate your help. Now I will just have to wrap my head around this ... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From Alexandre.Fayolle@logilab.fr Fri Nov 9 10:36:36 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 9 Nov 2001 11:36:36 +0100 (CET) Subject: [XML-SIG] Writing SAX-drivers In-Reply-To: <3BEBB0B3.1971CB09@lemburg.com> Message-ID: On Fri, 9 Nov 2001, M.-A. Lemburg wrote: > Thank you very much for all the details ! I appreciate your > help. > > Now I will just have to wrap my head around this ... Feel free to ask if you encounter problems. A quick way of testing your parser is: from xml.dom.ext.reader import Sax2 def load_to_dom(filename): parser = MySax2Parser() reader = Sax2.Reader(0, 0, None, Sax2.XmlDomGenerator, parser) stream = open(filename,'r') d = reader.fromStream(stream,ownerDoc) return d You can then PrettyPrint the result to see if it looks like expected. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Juergen Hermann" Message-ID: On Fri, 09 Nov 2001 10:12:06 +0100, M.-A. Lemburg wrote: >One thing I'd be curious to know is whether it's common practice >to restrict XML tag names and attribute names to Latin-1 or even >ASCII... ? I've used the approach of using Latin-1 8-bit strings >if possible and reverting to Unicode objects for cases where this >doesn't work. I've never seen an XML file with non-ASCII tag >names, so I suppose the Unicode case is artificial. Never seen a chinese (or other asian) XML document? For them, it's mother tongue, not artificial. ;)) Essentially, the only western-comprehensible remains are the xml decl and the <>=3D"' markup chars. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From Juergen Hermann" Message-ID: On Fri, 09 Nov 2001 10:21:47 +0100, M.-A. Lemburg wrote: >I haven't been able to find any documentation on writing SAX-driver >for PyXML -- is there any ? (I've looked at the code in PyXML and >the PyXML docs, but the code is undocumented and the docs only >mention *using* SAX interface compatible parsers.) If you plan to write an extension SAX driver (and we know you like to write extensions ;), you can look at my pirxx.sf.net project. For pure Python, as Alexandre already mentioned, there are "interface definition" classes in XMLReader.py. From faassen@vet.uu.nl Fri Nov 9 16:49:06 2001 From: faassen@vet.uu.nl (Martijn Faassen) Date: Fri, 9 Nov 2001 17:49:06 +0100 Subject: [XML-SIG] where is test_support.py? Message-ID: <20011109174906.A17107@vet.uu.nl> Hi there, I've been exploring pyxml's current testsuite in order to evaluate where best to add the domapi tests from ParsedXML. The first thing I thought I'd try is running the tests. Unfortunately many tests, including regrtest.py, want to import something called test_support, which I think is a module. However, I can't find any such module anywhere; not in xml/test and not in test either, and not even in the CVS attic. What's up? Regards, Martijn From martin@v.loewis.de Fri Nov 9 16:55:39 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 9 Nov 2001 17:55:39 +0100 Subject: [XML-SIG] Strings or Unicode ? In-Reply-To: <3BEB9DE6.357533A9@lemburg.com> (mal@lemburg.com) References: <3BEA988B.BDEFE28B@lemburg.com> <200111081604.fA8G4xF01337@mira.informatik.hu-berlin.de> <3BEAB05B.28272CD4@lemburg.com> <200111081833.fA8IXoQ02126@mira.informatik.hu-berlin.de> <3BEB9DE6.357533A9@lemburg.com> Message-ID: <200111091655.fA9Gtdr01293@mira.informatik.hu-berlin.de> > One thing I'd be curious to know is whether it's common practice > to restrict XML tag names and attribute names to Latin-1 or even > ASCII... ? This is indeed common. xmlproc in PyXML 0.6 restricts those names to latin-1, and I believe some of the DOM implementations may also react "funny" when confronted with non-ASCII names. OTOH, pyexpat converts everything to Unicode. > I've used the approach of using Latin-1 8-bit strings if possible > and reverting to Unicode objects for cases where this doesn't > work. I've never seen an XML file with non-ASCII tag names, so I > suppose the Unicode case is artificial. It probably is. Following the Python convention, I'd suggest to use byte strings only in the ASCII case, and convert non-ASCII Latin-1 to Unicode. It will be simpler that way *if* you have Latin-1 element names, since ASCII autoconverts, whereas full Latin-1 doesn't. Regards, Martin From martin@v.loewis.de Fri Nov 9 17:00:56 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 9 Nov 2001 18:00:56 +0100 Subject: [XML-SIG] Writing SAX-drivers In-Reply-To: <3BEBA02B.1CBDE961@lemburg.com> (mal@lemburg.com) References: <3BEBA02B.1CBDE961@lemburg.com> Message-ID: <200111091700.fA9H0up01336@mira.informatik.hu-berlin.de> > I haven't been able to find any documentation on writing SAX-driver > for PyXML -- is there any ? (I've looked at the code in PyXML and > the PyXML docs, but the code is undocumented and the docs only > mention *using* SAX interface compatible parsers.) Parsers in SAX are called "readers", and they implement the XMLReader interface, see http://www.python.org/doc/current/lib/module-xml.sax.xmlreader.html Optionally, they also implement the IncrementalReader interface, documented at the same location. I'm not sure what you mean when you say "docs only mention using SAX interface compatible parsers". The docs obviously cannot state how you implement your interface; they only state what the interface is that you implement. > Or does PyXML simply use a well-defined standard for drivers > which I can lookup on some web-site ? Python SAX closely follows Java SAX, although the Incremental interface is a Python invention. Regards, Martin From martin@v.loewis.de Fri Nov 9 17:20:15 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 9 Nov 2001 18:20:15 +0100 Subject: [XML-SIG] Strings or Unicode ? In-Reply-To: (jh@web.de) References: Message-ID: <200111091720.fA9HKFd01390@mira.informatik.hu-berlin.de> > Never seen a chinese (or other asian) XML document? For them, it's > mother tongue, not artificial. ;)) Right, I withdraw my earlier claim to the contrary. Please have a look at /test/oasis/japanese/weekly-utf-8.xml (*) in the PyXML CVS for an example. Regards, Martin (*) http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/pyxml/test/oasis/japanese/weekly-utf-8.xml?rev=1.1.1.1&content-type=text/xml From martin@v.loewis.de Fri Nov 9 17:36:21 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 9 Nov 2001 18:36:21 +0100 Subject: [XML-SIG] where is test_support.py? In-Reply-To: <20011109174906.A17107@vet.uu.nl> (message from Martijn Faassen on Fri, 9 Nov 2001 17:49:06 +0100) References: <20011109174906.A17107@vet.uu.nl> Message-ID: <200111091736.fA9HaLd01472@mira.informatik.hu-berlin.de> > The first thing I thought I'd try is running the tests. Unfortunately > many tests, including regrtest.py, want to import something called > test_support, which I think is a module. However, I can't find any > such module anywhere; not in xml/test and not in test either, and not > even in the CVS attic. What's up? If you run PyXML's test/regrtest.py, it will import test_support from test. If you have Python 1.5.1 or later, there should be a module test.test_support available. Regards, Martin From Juergen Hermann" Message-ID: On Fri, 9 Nov 2001 17:55:39 +0100, Martin v. Loewis wrote: >It probably is. Following the Python convention, I'd suggest to use >byte strings only in the ASCII case, and convert non-ASCII Latin-1 to >Unicode. It will be simpler that way *if* you have Latin-1 element >names, since ASCII autoconverts, whereas full Latin-1 doesn't. I do this: PirxxObject PirxxBuildStringOrUnicode(const XMLCh* xmlstr) { PirxxObject result =3D PirxxBuildUnicode(xmlstr); if (result) { PyObject* latin1 =3D PyUnicode_AsLatin1String(result); if (latin1) { result =3D latin1; Py_DECREF(latin1); } else { PyErr_Clear(); } } return result; } In a nutshell this means "return unicode unless it's convertible to latin-1, then return a latin1-encoded bytestring". I wonder if that is exactly what you said above, or only very similar. Ciao, J=FCrgen From martin@v.loewis.de Fri Nov 9 20:05:18 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 9 Nov 2001 21:05:18 +0100 Subject: [XML-SIG] Strings or Unicode ? In-Reply-To: (jh@web.de) References: Message-ID: <200111092005.fA9K5IA07452@mira.informatik.hu-berlin.de> > In a nutshell this means "return unicode unless it's convertible to > latin-1, then return a latin1-encoded bytestring". I wonder if that is > exactly what you said above, or only very similar. No. If it is a Latin-1 string containing accented characters, it should be converted to Unicode. Otherwise, suppose somebody does result = u"<"+elem.nodeName it will give a UnicodeError. OTOH, if you only create byte strings if the string is ASCII, combining it with either byte strings or Unicode strings will succeed. Regards, Martin From faassen@vet.uu.nl Fri Nov 9 20:09:28 2001 From: faassen@vet.uu.nl (Martijn Faassen) Date: Fri, 9 Nov 2001 21:09:28 +0100 Subject: [XML-SIG] where is test_support.py? In-Reply-To: <200111091736.fA9HaLd01472@mira.informatik.hu-berlin.de> References: <20011109174906.A17107@vet.uu.nl> <200111091736.fA9HaLd01472@mira.informatik.hu-berlin.de> Message-ID: <20011109210928.A17707@vet.uu.nl> Martin v. Loewis wrote: > > The first thing I thought I'd try is running the tests. Unfortunately > > many tests, including regrtest.py, want to import something called > > test_support, which I think is a module. However, I can't find any > > such module anywhere; not in xml/test and not in test either, and not > > even in the CVS attic. What's up? > > If you run PyXML's test/regrtest.py, it will import test_support from > test. Yes, I noticed, but the module can't be found. > If you have Python 1.5.1 or later, there should be a module > test.test_support available. Am I to understand this is part of the Python standard library (looked, but couldn't find it), or is this part of Python's testing framework? I don't think I have the Python tests installed on my debian box; I'll take a look to see if that helps. [back later] Ah, it does. :) Onwards! Regards, Martijn From fdrake@acm.org Fri Nov 9 21:24:20 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 9 Nov 2001 16:24:20 -0500 Subject: [XML-SIG] where is test_support.py? In-Reply-To: <20011109210928.A17707@vet.uu.nl> References: <20011109174906.A17107@vet.uu.nl> <200111091736.fA9HaLd01472@mira.informatik.hu-berlin.de> <20011109210928.A17707@vet.uu.nl> Message-ID: <15340.18820.168499.661016@grendel.zope.com> Martijn Faassen writes: > Am I to understand this is part of the Python standard library (looked, but > couldn't find it), or is this part of Python's testing framework? > I don't think I have the Python tests installed on my debian box; I'll > take a look to see if that helps. > > [back later] > > Ah, it does. :) Onwards! Martijn, Does Debian not install the tests by default? That should be filed against Debian as a bug report -- Python's regression test is part of it's standard library, and should always be installed. Event the most misertly should get test/regrtest.y and test/test_support.py installed even if the regression tests themselves are dropped; those are the framework components. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From faassen@vet.uu.nl Fri Nov 9 22:50:57 2001 From: faassen@vet.uu.nl (Martijn Faassen) Date: Fri, 9 Nov 2001 23:50:57 +0100 Subject: [XML-SIG] where is test_support.py? In-Reply-To: <15340.18820.168499.661016@grendel.zope.com> References: <20011109174906.A17107@vet.uu.nl> <200111091736.fA9HaLd01472@mira.informatik.hu-berlin.de> <20011109210928.A17707@vet.uu.nl> <15340.18820.168499.661016@grendel.zope.com> Message-ID: <20011109235057.A18223@vet.uu.nl> Fred L. Drake, Jr. wrote: > > Martijn Faassen writes: > > Am I to understand this is part of the Python standard library (looked, but > > couldn't find it), or is this part of Python's testing framework? > > I don't think I have the Python tests installed on my debian box; I'll > > take a look to see if that helps. > > > > [back later] > > > > Ah, it does. :) Onwards! > > Does Debian not install the tests by default? No, at least not since the recent changes in the Python packaging in unstable. It says this in the description: Regression test for the Python (v2.1) distribution. You only want to install this if you don't trust the Python (v2.1) packages. > That should be filed > against Debian as a bug report -- Python's regression test is part of > it's standard library, and should always be installed. Event the most > misertly should get test/regrtest.y and test/test_support.py > installed even if the regression tests themselves are dropped; those > are the framework components. I'll pass this on to debian as coming from you. :) Regards, Martijn From melissarobins@sleuthplanet.com Mon Nov 12 11:40:23 2001 From: melissarobins@sleuthplanet.com (Melissa at SleuthPlanet) Date: Mon, 12 Nov 2001 06:40:23 -0500 Subject: [XML-SIG] demo request Message-ID: <20011112112331.FAUC24249.tomts11-srv.bellnexxia.net@b1nasz29> Dear xml-sig, I found your email address on the Internet and decided to take a chance and contact you. My company has recently released cutting edge Contact Marketing Software with 80 000 contact records already included! Fully Exportable Companies and Contacts, Fax Numbers, Email Addresses, Phone Numbers, and 50 other categories are included in our Industry-leading Sleuth Marketing System. Reply (email or call) and request our FREE DEMO and reserve our special half price offer good for this week only. Please let me know if you would like to be removed from my address book, and we promise to discontinue any communications. Thank you for your time. Warmest Regards, Melissa Robins Sleuth Marketing Systems (Genis Software) (416) 686-1444 melissarobins@sleuthplanet.com From noreply@sourceforge.net Mon Nov 12 18:00:56 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 12 Nov 2001 10:00:56 -0800 Subject: [XML-SIG] [ pyxml-Bugs-480982 ] sgmlop HTML script bug Message-ID: Bugs item #480982, was opened at 2001-11-12 10:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=480982&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: sgmlop HTML script bug Initial Comment: With the following input, sgmlop reports the '<' in v < 4 as a start tag with an empty tag name and a lot of attributes, the last attribute having the name and value " and
 

Hello,

I have visited glory.python.or.kr and noticed that your website is not listed on some search engines. I am sure that through our service the number of people who visit your website will definitely increase. SeekerCenter is a unique technology that instantly submits your website to over 500,000 search engines and directories -- a really low-cost and effective way to advertise your site. For more details please go to SeekerCenter.net.

Give your website maximum exposure today!
Looking forward to hearing from you.

Best Regards,
Vanessa Lintner
Sales & Marketing
www.SeekerCenter.net

     
     
From mike@skew.org Tue Nov 20 18:11:54 2001 From: mike@skew.org (Mike Brown) Date: Tue, 20 Nov 2001 11:11:54 -0700 (MST) Subject: [4suite] Re: [XML-SIG] empty namespaces In-Reply-To: "from Alexandre Fayolle at Nov 20, 2001 09:25:22 am" Message-ID: <200111201811.fAKIBsN89028@skew.org> Alexandre Fayolle wrote: > > I don't think so. We're not going to change this, so I don't really > > see what we achieve by it. > > Source code readability. In the code I write, I tend to use symbolic > constants for the namespaces I use. Having EMPTY_NAMESPACE as an alias for > None makes the code more readable. A stronger argument might be that "None" is ubiquitous and generic, whereas a constant has specific semantics. I can appreciate wanting to distinguish between an accidental None and a deliberate EMPTY_NAMESPACE. - Mike ____________________________________________________________________________ mike j. brown, fourthought.com | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | personal: http://hyperreal.org/~mike/ From martin@v.loewis.de Tue Nov 20 18:29:09 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Tue, 20 Nov 2001 19:29:09 +0100 Subject: [XML-SIG] Preparing for PyXML 0.7 References: <200111201432.fAKEWe003818@paros.informatik.hu-berlin.de> Message-ID: <200111201829.fAKIT9l01325@mira.informatik.hu-berlin.de> As some may have noticed, I'm in the progress of putting together PyXML 0.7. For the moment, I'd like contributors to review README/ANNOUNCE for thing that are missing (focusing on new features; bug fixes won't be listed individually unless somebody volunteers to determine them all). Further, if you have a list of things that you want to do before 0.7, don't hesitate to share them with us. Regards, Martin From Alexandre.Fayolle@logilab.fr Wed Nov 21 09:41:31 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 10:41:31 +0100 (CET) Subject: [XML-SIG] Preparing for PyXML 0.7 In-Reply-To: <200111201829.fAKIT9l01325@mira.informatik.hu-berlin.de> Message-ID: I'd be in favor of including xml.xslt in 0.7, maybe mentioning it is in a Beta state, in order to get more feedback. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Wed Nov 21 09:44:40 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 10:44:40 +0100 (CET) Subject: [XML-SIG] xmlproc and l18n Message-ID: I may be very wrong, but I seem to remember translating messages for xmlproc, and I cannot see them in the CVS repository. Am I misremembering, or have the related changes not been commited in PyXML? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Wed Nov 21 09:58:52 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 10:58:52 +0100 (CET) Subject: [XML-SIG] DOM and empty namespaces Message-ID: I would like to add an assertion in the NS methods of 4DOM, to make sure that they are not called with empty strings as the namespaceURI. An alternate way would be to silently change the empty string to None, but I'm not too keen on that. Maybe a Warning could be used on python 2.1. Opinions? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From larsga@garshol.priv.no Wed Nov 21 10:27:05 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Nov 2001 11:27:05 +0100 Subject: [XML-SIG] xmlproc and l18n In-Reply-To: References: Message-ID: * Alexandre Fayolle | | I may be very wrong, but I seem to remember translating messages for | xmlproc, and I cannot see them in the CVS repository. You did translate the error messages, and they are in xml/parsers/xmlproc/errors.py, the current revision of which is 1.14. Your French translation was added in revision 1.8 back in March. --Lars M. From Alexandre.Fayolle@logilab.fr Wed Nov 21 10:32:48 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 11:32:48 +0100 (CET) Subject: [XML-SIG] xmlproc and l18n In-Reply-To: Message-ID: On 21 Nov 2001, Lars Marius Garshol wrote: > xml/parsers/xmlproc/errors.py, the current revision of which is 1.14. > Your French translation was added in revision 1.8 back in March. Silly me. I was looking for pygettext .po and .mo files... Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Wed Nov 21 10:36:52 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 11:36:52 +0100 (CET) Subject: [XML-SIG] 4DOM and i18n Message-ID: In the CVS repository, the 4DOM l18n files are * de.po * en_US.po * fr_FR.po together with matching directories for mo files. I think this means that someone having his LANG environment variable set to fr_CH or en_AU will not be seeing the localized messages, and that en_US and fr_FR should be renamed to en and fr respectively. Could someone more familiar than I in pygettext confirm this and give me a green light for renaming the files and directories? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From larsga@garshol.priv.no Wed Nov 21 10:40:01 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Nov 2001 11:40:01 +0100 Subject: [XML-SIG] xmlproc and l18n In-Reply-To: References: Message-ID: * Lars Marius Garshol | | xml/parsers/xmlproc/errors.py, the current revision of which is | 1.14. Your French translation was added in revision 1.8 back in | March. * Alexandre Fayolle | | Silly me. I was looking for pygettext .po and .mo files... Using pygettext would probably be a good idea, but it's not what I did. That should probably be changed some time. (Oh, BTW. This is l10n, not i18n. :-) --Lars M. From Alexandre.Fayolle@logilab.fr Wed Nov 21 10:44:37 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 11:44:37 +0100 (CET) Subject: [XML-SIG] xmlproc and l18n^H^H0n In-Reply-To: Message-ID: On 21 Nov 2001, Lars Marius Garshol wrote: > (Oh, BTW. This is l10n, not i18n. :-) Woops ;o) Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Wed Nov 21 12:49:56 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 13:49:56 +0100 (CET) Subject: [XML-SIG] sax Attributes interface Message-ID: The Attributes interface is defined in xml.sax.saxlib. The AttributesImpl implementation implicitely implements this interface, and adds a new method, getNameByQName(self, name) 1. shouldn't the implementation be made explicit, or is there a performance hit? 2. should the getNameByQName method be added to the Attributes interface, or is there a backward compatibility concern with other implementations of Attributes? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Wed Nov 21 13:40:14 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 14:40:14 +0100 (CET) Subject: [XML-SIG] 4DOM and namespaces Message-ID: While we are at it, a very common newbie question with 4DOM is "I've parsed my document using Sax2.Reader().fromString(str), and I can't get the value of the bar attribute using doc.documentElement.getAttribute('bar') , though I can see it when I PrettyPrint(doc)". The answer is "use getAttributeNS(EMPTY_NAMESPACE, 'bar')", but maybe we could enhance 4DOM to be non-NS-methods friendly. AFAIK, minidom offers such syntaxic sugar. If people here think this would be a Good Thing (or at least something nice), I'll be glad to implement it. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From uche.ogbuji@fourthought.com Wed Nov 21 14:11:03 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 21 Nov 2001 07:11:03 -0700 Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: Message from Alexandre Fayolle of "Wed, 21 Nov 2001 10:58:52 +0100." Message-ID: <200111211411.fALEB3T20960@localhost.localdomain> > I would like to add an assertion in the NS methods of 4DOM, to make sure > that they are not called with empty strings as the namespaceURI. I think it's a good idea, given all the confusion over this issue. > An alternate way would be to silently change the empty string to None, but > I'm not too keen on that. Maybe a Warning could be used on python 2.1. Better to fail noisily. I'd go even stronger than warning. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji@fourthought.com Wed Nov 21 14:25:07 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 21 Nov 2001 07:25:07 -0700 Subject: [XML-SIG] 4DOM and namespaces In-Reply-To: Message from Alexandre Fayolle of "Wed, 21 Nov 2001 14:40:14 +0100." Message-ID: <200111211425.fALEP7w21006@localhost.localdomain> > While we are at it, a very common newbie question with 4DOM is "I've > parsed my document using Sax2.Reader().fromString(str), and I can't get > the value of the bar attribute using > doc.documentElement.getAttribute('bar') , though I can see it when I > PrettyPrint(doc)". > > The answer is "use getAttributeNS(EMPTY_NAMESPACE, 'bar')", but maybe we > could enhance 4DOM to be non-NS-methods friendly. AFAIK, minidom offers > such syntaxic sugar. > > If people here think this would be a Good Thing (or at least something > nice), I'll be glad to implement it. I'd actually decided to do this a few months ago, but never had the time. I must say I'm not crazy about the idea, but I've caved in as I learn how much confusion this causes newbies. I think we really need to come up with a better alt to DOM (this is another topic that frequently comes up). Sort of a Pythonic JDOM. As Guido rightly says "DOM sucks". -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From rsalz@zolera.com Wed Nov 21 14:57:21 2001 From: rsalz@zolera.com (Rich Salz) Date: Wed, 21 Nov 2001 09:57:21 -0500 Subject: [XML-SIG] DOM and empty namespaces References: <200111211411.fALEB3T20960@localhost.localdomain> Message-ID: <3BFBC0D1.1D7719BF@zolera.com> So that means if I have code that wants to work with pyxml 0.6 and 0.7 I need to do this: try: from xml.dom import EMPTY_NAMESPACE except: EMPTY_NAMESPACE='' and make sure all my get...NS() calls use EMPTY_NAMESPACE Right? We should probably document this. /r$ -- Zolera Systems, Your Key to Online Integrity Securing Web services: XML, SOAP, Dig-sig, Encryption http://www.zolera.com From larsga@garshol.priv.no Wed Nov 21 15:13:27 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 21 Nov 2001 16:13:27 +0100 Subject: [XML-SIG] 4DOM and namespaces In-Reply-To: <200111211425.fALEP7w21006@localhost.localdomain> References: <200111211425.fALEP7w21006@localhost.localdomain> Message-ID: * Uche Ogbuji | | I think we really need to come up with a better alt to DOM (this is | another topic that frequently comes up). Sort of a Pythonic JDOM. Yes! Yes, yes, YES! And the sooner, the better. The reason I don't participate much in DOM discussions is that I think it's a waste of time and that we should replace the whole thing with something better. Pyxie is a good place to look for inspiration. --Lars M. From Alexandre.Fayolle@logilab.fr Wed Nov 21 16:04:52 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 17:04:52 +0100 (CET) Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: <3BFBC0D1.1D7719BF@zolera.com> Message-ID: On Wed, 21 Nov 2001, Rich Salz wrote: > So that means if I have code that wants to work with pyxml 0.6 and 0.7 I > need to do this: > try: > from xml.dom import EMPTY_NAMESPACE > except: > EMPTY_NAMESPACE='' > > and make sure all my get...NS() calls use EMPTY_NAMESPACE > > Right? Almost. If the DOM you're using was created with validation enabled on PyXML-0.6.6, empty namespace is None... > We should probably document this. Hmm. Probably ;o) Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From walter@livinglogic.de Wed Nov 21 16:13:48 2001 From: walter@livinglogic.de (Walter =?ISO-8859-1?Q?D=F6rwald?=) Date: Wed, 21 Nov 2001 17:13:48 +0100 Subject: [XML-SIG] 4DOM and namespaces References: <200111211425.fALEP7w21006@localhost.localdomain> Message-ID: <3BFBD2BC.30607@livinglogic.de> Lars Marius Garshol wrote: > * Uche Ogbuji > |=20 > | I think we really need to come up with a better alt to DOM (this is > | another topic that frequently comes up). Sort of a Pythonic JDOM. >=20 > Yes! Yes, yes, YES! And the sooner, the better. The reason I don't > participate much in DOM discussions is that I think it's a waste of > time and that we should replace the whole thing with something better. >=20 > Pyxie is a good place to look for inspiration. Even better: Python is a good place for inspiration, i.e. document fragments should behave like Python sequences (__getitem__, __setitem__, __delitem__, append, insert, __len__) etc. and attribute maps should behave like Python mappings (__getitem__, __setitem__, get, has_key, keys= , items, values, etc.). Elements should behave like both! Bye, Walter D=F6rwald From Alexandre.Fayolle@logilab.fr Wed Nov 21 16:08:42 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 17:08:42 +0100 (CET) Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: <200111211411.fALEB3T20960@localhost.localdomain> Message-ID: On Wed, 21 Nov 2001, Uche Ogbuji wrote: > Better to fail noisily. I'd go even stronger than warning. raise DOMException(NAMESPACE_ERROR,"Empty namespace is None, not ''") ? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Wed Nov 21 16:21:15 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 17:21:15 +0100 (CET) Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: Message-ID: On Wed, 21 Nov 2001, Alexandre Fayolle wrote: > On Wed, 21 Nov 2001, Uche Ogbuji wrote: > > > Better to fail noisily. I'd go even stronger than warning. > > raise DOMException(NAMESPACE_ERROR,"Empty namespace is None, not ''") ? Better is to raise NamespaceErr(msg) Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From pobrien@orbtech.com Wed Nov 21 17:06:14 2001 From: pobrien@orbtech.com (Patrick K. O'Brien) Date: Wed, 21 Nov 2001 11:06:14 -0600 Subject: [XML-SIG] 4DOM and namespaces In-Reply-To: <200111211425.fALEP7w21006@localhost.localdomain> Message-ID: I think that's a great idea. There is a decent article about JDOM here (http://www.devx.com/upload/free/Features/Javapro/2001/01dec01/jt0112/jt011 2-1.asp). P.S. Not that I think you would need to read it. But others, like myself, who aren't as familiar with JDOM might appreciate it. --- Patrick K. O'Brien Orbtech "I am, therefore I think." -----Original Message----- From: xml-sig-admin@python.org [mailto:xml-sig-admin@python.org]On Behalf Of Uche Ogbuji Sent: Wednesday, November 21, 2001 8:25 AM To: Alexandre Fayolle Cc: xml-sig@python.org Subject: Re: [XML-SIG] 4DOM and namespaces I think we really need to come up with a better alt to DOM (this is another topic that frequently comes up). Sort of a Pythonic JDOM. As Guido rightly says "DOM sucks". -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig From uye-istanbul@mmo.org.tr Wed Nov 21 17:15:46 2001 From: uye-istanbul@mmo.org.tr (uye-mmo) Date: Wed, 21 Nov 2001 19:15:46 +0200 Subject: [XML-SIG] Sanayi Kongresi'ne Davet Message-ID: TMMOB SANAYÝ KONGRESÝ 2001 30 KASIM - 1 - 2 ARALIK YILDIZ TEKNÝK ÜNÝVERSÝTESÝ ODÝTORYUMU - BEÞÝKTAÞ/ÝSTANBUL http://www.mmo.org.tr/istanbul/semp/sanayi/ 1960 yýllardan bu yana iki yýlda bir yapýlan TMMOB Sanayi Kongresi ilk defa Ýstanbul'da düzenlenmektedir. 30 Kasým Tarihinde baþlayacak olan Kongrede iþlenecek konu "Küreselleþme ve Sanayileþmedir." KONGREYE KATILIM ÜCRETSÝZDÝR. KONGRE ÝLE ÝLGÝLÝ AYRINTILI BÝLGÝYÝYE AÞAÐIDAKÝ WEB ADRESÝNDEN ULAÞABÝLÝRSÝNÝZ. http://www.mmo.org.tr/istanbul/semp/sanayi/ Baðýmsýz, Demokratik ve Sanayileþen bir Türkiye için, Toplumun tüm kesimlerini TMMOB Sanayi Kongresi 2001'e katýlmaya çaðýrýyoruz. From uche.ogbuji@fourthought.com Wed Nov 21 17:11:22 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 21 Nov 2001 10:11:22 -0700 Subject: [XML-SIG] DOM and empty namespaces References: Message-ID: <3BFBE03A.3F41E620@fourthought.com> Alexandre Fayolle wrote: > > On Wed, 21 Nov 2001, Uche Ogbuji wrote: > > > Better to fail noisily. I'd go even stronger than warning. > > raise DOMException(NAMESPACE_ERROR,"Empty namespace is None, not ''") ? Fine with me. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From Alexandre.Fayolle@logilab.fr Wed Nov 21 18:38:42 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 21 Nov 2001 19:38:42 +0100 (CET) Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: Message-ID: OK, I've just commited the changes to 4DOM. 1. raise NamespaceErr when '' is used as a nsURI 2. made non NS-aware methods delegate treatment to NS-aware methods when available. Do people want to see something similar to 1. in minidom? I can do that while I'm busy with this stuff, though this could have an impact on the main python distribution. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From martin@v.loewis.de Wed Nov 21 19:29:29 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Wed, 21 Nov 2001 20:29:29 +0100 Subject: [XML-SIG] Preparing for PyXML 0.7 In-Reply-To: (message from Alexandre Fayolle on Wed, 21 Nov 2001 10:41:31 +0100 (CET)) References: Message-ID: <200111211929.fALJTT701569@mira.informatik.hu-berlin.de> > I'd be in favor of including xml.xslt in 0.7, maybe mentioning it is > in a Beta state, in order to get more feedback. We can certainly do that; I assume you are talking about the current state of the CVS. I've tried to merge 4XSLT 0.11.1; I have a few changed files on my disk, but I can't get it to work - this is why I wanted to defer a release, until we can more easily incorporate the current 4Suite code. Regards, Martin From martin@v.loewis.de Wed Nov 21 22:00:34 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Wed, 21 Nov 2001 23:00:34 +0100 Subject: [XML-SIG] sax Attributes interface In-Reply-To: (message from Alexandre Fayolle on Wed, 21 Nov 2001 13:49:56 +0100 (CET)) References: Message-ID: <200111212200.fALM0Y902162@mira.informatik.hu-berlin.de> > The Attributes interface is defined in xml.sax.saxlib. > > The AttributesImpl implementation implicitely implements this interface, > and adds a new method, getNameByQName(self, name) > > 1. shouldn't the implementation be made explicit, or is there a > performance hit? I can't see any advantage in doing so. Also, unless I'm mistaken, Attributes is not part of Python proper, so adding this base class to PyXML would mean that the SAX implementations diverge further. Of course, that could be fixed by adding Attributes to Python proper, then adding the inheritance to both Python and PyXML. > 2. should the getNameByQName method be added to the Attributes interface, > or is there a backward compatibility concern with other implementations of > Attributes? Looking at 1.11 of of saxlib.py, I see def getNameByQName(self, name): """Returns the namespace name of the attribute with the given raw (or qualified) name.""" raise NotImplementedError("This method must be implemented!") What is it that you want to change? Regards, Martin From martin@v.loewis.de Wed Nov 21 22:08:42 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Wed, 21 Nov 2001 23:08:42 +0100 Subject: [XML-SIG] 4DOM and i18n In-Reply-To: (message from Alexandre Fayolle on Wed, 21 Nov 2001 11:36:52 +0100 (CET)) References: Message-ID: <200111212208.fALM8gO02166@mira.informatik.hu-berlin.de> > In the CVS repository, the 4DOM l18n files are > * de.po > * en_US.po > * fr_FR.po > together with matching directories for mo files. > > I think this means that someone having his LANG environment variable set > to fr_CH or en_AU will not be seeing the localized messages, and that > en_US and fr_FR should be renamed to en and fr respectively. > > Could someone more familiar than I in pygettext confirm this and give me a > green light for renaming the files and directories? That's correct. gettext will fall-back from, say, de_DE.ISO8859-1 to de_DE de.ISO8859-1 de in this order. Regards, Martin From martin@v.loewis.de Wed Nov 21 22:11:00 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Wed, 21 Nov 2001 23:11:00 +0100 Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: (message from Alexandre Fayolle on Wed, 21 Nov 2001 10:58:52 +0100 (CET)) References: Message-ID: <200111212211.fALMB0S02169@mira.informatik.hu-berlin.de> > I would like to add an assertion in the NS methods of 4DOM, to make sure > that they are not called with empty strings as the namespaceURI. I think an empty strings is valid as a namespaceURI. If so, it would be undesirable to reject it. Regards, Martin From martin@v.loewis.de Wed Nov 21 21:19:56 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Wed, 21 Nov 2001 22:19:56 +0100 Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: (message from Alexandre Fayolle on Wed, 21 Nov 2001 19:38:42 +0100 (CET)) References: Message-ID: <200111212119.fALLJu602004@mira.informatik.hu-berlin.de> > OK, I've just commited the changes to 4DOM. > > 1. raise NamespaceErr when '' is used as a nsURI I think that was a mistake. "" is a valid namespace, unless I'm mistaken. Eg. conforms to the namespace spec, AFAICT. > 2. made non NS-aware methods delegate treatment to NS-aware methods when > available. That's a good thing. > Do people want to see something similar to 1. in minidom? > I can do that while I'm busy with this stuff, though this could have an > impact on the main python distribution. Please don't. I think it is slightly incorrect, and I'm not sure such a change could be synchronized with Python 2.2. Regards, Martin From tpassin@home.com Thu Nov 22 00:41:03 2001 From: tpassin@home.com (Thomas B. Passin) Date: Wed, 21 Nov 2001 19:41:03 -0500 Subject: [XML-SIG] DOM and empty namespaces References: <200111212119.fALLJu602004@mira.informatik.hu-berlin.de> Message-ID: <004a01c172ee$5dcb9310$7cac1218@cj64132b> [Martin v. Loewis] > > I think that was a mistake. "" is a valid namespace, unless I'm > mistaken. Eg. > > > > > > conforms to the namespace spec, AFAICT. > As I remember it, using xmlns:empty="" is said in the Rec to be a convention to remove a defalt namespace that was previously bound to some other URI. It does not mean that the resulting namespace is supposed to be an empty string. The empty string is not actually a valid namespace in itself, as best I understand it. Something can be in NO namespace, and that is achieved by the construction. I'm sure that applies to the default namespace (i.e., xmlns='') and less sure about whether it's allowed with a prefix as shown. Ah, here we are, from the namespace Rec: "[Definition:] If the attribute name matches PrefixedAttName, then the NCName gives the namespace prefix, used to associate element and attribute names with the namespace name in the attribute value in the scope of the element to which the declaration is attached. In such declarations, the namespace name may not be empty." So xmlns:empty="" is not in fact allowed. For default namespace declarations, the Rec says: "If the URI reference in a default namespace declaration is empty, then unprefixed elements in the scope of the declaration are not considered to be in any namespace. " and also, "The default namespace can be set to the empty string. This has the same effect, within the scope of the declaration, of there being no default namespace. " So once again, there is no case for an empty string as a valid namespace value. We wanted to use None to defuse this argument and not have two incompatible ways to represent a lack of namespace. Let's stick with EMPTY_NAMESPACE=None. Cheers, Tom P From horatio@qpsf.edu.au Thu Nov 22 00:54:30 2001 From: horatio@qpsf.edu.au (Horatio Davis) Date: Thu, 22 Nov 2001 10:54:30 +1000 (EST) Subject: [XML-SIG] 4DOM and namespaces In-Reply-To: <200111211425.fALEP7w21006@localhost.localdomain> Message-ID: On Wed, 21 Nov 2001, Uche Ogbuji wrote: > I think we really need to come up with a better alt to DOM (this is another > topic that frequently comes up). Sort of a Pythonic JDOM. Would ISO groves (or a Pythonic variant thereof) be suitable? As a bonus you could then address the general case - SGML - as opposed to merely XML. There was a pygrove project, I think. What ever happened to it? Cheers, AHD From uche.ogbuji@fourthought.com Thu Nov 22 03:32:34 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 21 Nov 2001 20:32:34 -0700 Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: Message from "Martin v. Loewis" of "Wed, 21 Nov 2001 23:11:00 +0100." <200111212211.fALMB0S02169@mira.informatik.hu-berlin.de> Message-ID: <200111220332.fAM3WYa23728@localhost.localdomain> > > I would like to add an assertion in the NS methods of 4DOM, to make sure > > that they are not called with empty strings as the namespaceURI. > > I think an empty strings is valid as a namespaceURI. If so, it would > be undesirable to reject it. I don't think so, and I think it took the www-uri mailing list 3000+ messages and a lot of hot flames to reject relative URI references. Luckily, I wasn't there. At any rate, XMLNS 1.0 itself reserves the empty string as nsref for "undefining" namespaces. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji@fourthought.com Thu Nov 22 03:37:27 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Wed, 21 Nov 2001 20:37:27 -0700 Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: Message from "Martin v. Loewis" of "Wed, 21 Nov 2001 22:19:56 +0100." <200111212119.fALLJu602004@mira.informatik.hu-berlin.de> Message-ID: <200111220337.fAM3bRQ23746@localhost.localdomain> > > OK, I've just commited the changes to 4DOM. > > > > 1. raise NamespaceErr when '' is used as a nsURI > > I think that was a mistake. "" is a valid namespace, unless I'm > mistaken. Eg. > > > > > > conforms to the namespace spec, AFAICT. I disagree. True, the spec is quite unclear, but it does explicitly state that "" is used to remove a default namespace declaration. I would say that this probably disqualifies "" from being a proper nsref. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From m_mariappanX@trillium.com Thu Nov 22 07:27:17 2001 From: m_mariappanX@trillium.com (Mariappan, MaharajanX) Date: Wed, 21 Nov 2001 23:27:17 -0800 Subject: [XML-SIG] parsing xml schema Message-ID: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> Hi Folks! Is it possible to validate and parse a xml-schema file and get structured data in python objects? If so, plz let me know sample codes. Which python lib to be used for this? I would like to write a python class which have functions to get the structured data by parsing the xml-schema file. And pass this info to wxPython objects. TIA, Maharajan From martin@v.loewis.de Thu Nov 22 08:12:05 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 22 Nov 2001 09:12:05 +0100 Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: <004a01c172ee$5dcb9310$7cac1218@cj64132b> (tpassin@home.com) References: <200111212119.fALLJu602004@mira.informatik.hu-berlin.de> <004a01c172ee$5dcb9310$7cac1218@cj64132b> Message-ID: <200111220812.fAM8C5P01253@mira.informatik.hu-berlin.de> > So once again, there is no case for an empty string as a valid namespace > value. Thanks a lot; that clarifies it. Regards, Martin From martin@v.loewis.de Thu Nov 22 08:16:52 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 22 Nov 2001 09:16:52 +0100 Subject: [XML-SIG] parsing xml schema In-Reply-To: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> (m_mariappanX@trillium.com) References: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> Message-ID: <200111220816.fAM8GqJ01278@mira.informatik.hu-berlin.de> > Is it possible to validate and parse a xml-schema file and get structured > data in python objects? Parsing it is no problem; any Python parser will do. Validating it is currently not supported in PyXML; you may want to try XSV - see the PyXML "other software" page for a reference. > I would like to write a python class which have functions to get the > structured data by parsing the xml-schema file. And pass this info to > wxPython objects. If you merely want to get the data, then I would suggest that validation is not needed. In this case, I'd recommend to build a DOM tree, e.g. using tree = xml.dom.minidom.parse(resource) and then either use DOM accessors or XPath to get the data. Regards, Martin From Alexandre.Fayolle@logilab.fr Thu Nov 22 08:27:06 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 22 Nov 2001 09:27:06 +0100 (CET) Subject: [XML-SIG] DOM and empty namespaces In-Reply-To: <200111220812.fAM8C5P01253@mira.informatik.hu-berlin.de> Message-ID: On Thu, 22 Nov 2001, Martin v. Loewis wrote: > > So once again, there is no case for an empty string as a valid namespace > > value. > > Thanks a lot; that clarifies it. So we keep the exception in 4DOM for now. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Thu Nov 22 08:40:06 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 22 Nov 2001 09:40:06 +0100 (CET) Subject: [XML-SIG] (no subject) Message-ID: Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Thu Nov 22 08:40:25 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 22 Nov 2001 09:40:25 +0100 (CET) Subject: [XML-SIG] PyXML and 4Suite possible conflict In-Reply-To: Message-ID: On Thu, 22 Nov 2001, Martin v. L?wis wrote: > + If you intend to install FourThought's 4Suite package, there may be a > + conflict between the packages xml.xpath and xml.xslt provided here, > + and those provided by 4Suite. In this case, it is advisable not to > + install XPath and XSLT support from this package. To do so, pass the > + options --without-xpath and --without-xslt to setup.py. I suggest mentionning that the conflict will only occure with 4Suite version < 0.12. Since 0.12 will have it's own versions in Ft/Xml/XPath and Ft/Xml/Xslt, ther should be no conflict, as long as you know what you import. Sample code would look like try: from Ft.Xml.Xpath import Evaluate from Ft.Xml.Xslt import Processor except: try: from xml.xpath import Evaluate from xml.xslt import Processor except: raise 'You need to install PyXML to run this application' else: print "Using PyXML's XPath and Xslt engine" else: print "Using 4Suite's XPath and Xslt engine" I have not checked in details, but I believe the main interfaces are compatible. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From m_mariappanX@trillium.com Thu Nov 22 09:10:36 2001 From: m_mariappanX@trillium.com (Mariappan, MaharajanX) Date: Thu, 22 Nov 2001 01:10:36 -0800 Subject: [XML-SIG] parsing xml schema Message-ID: <53A7943A5BD8D411B6930002A5073155013F60A2@bgsmsx90.iind.intel.com> Hi, -----Original Message----- From: Martin v. Loewis [mailto:martin@v.loewis.de] Sent: Thursday, November 22, 2001 1:47 PM To: m_mariappanX@trillium.com Cc: xml-sig@python.org Subject: Re: [XML-SIG] parsing xml schema > Is it possible to validate and parse a xml-schema file and get structured > data in python objects? Parsing it is no problem; any Python parser will do. Validating it is currently not supported in PyXML; you may want to try XSV - see the PyXML "other software" page for a reference. I'll check it out XSV > I would like to write a python class which have functions to get the > structured data by parsing the xml-schema file. And pass this info to > wxPython objects. If you merely want to get the data, then I would suggest that validation is not needed. In this case, I'd recommend to build a DOM tree, e.g. using tree = xml.dom.minidom.parse(resource) and then either use DOM accessors or XPath to get the data. Do you have sample entire code to have a look? Regards, Martin From martin@v.loewis.de Thu Nov 22 09:07:51 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 22 Nov 2001 10:07:51 +0100 Subject: [XML-SIG] PyXML and 4Suite possible conflict In-Reply-To: (message from Alexandre Fayolle on Thu, 22 Nov 2001 09:40:25 +0100 (CET)) References: Message-ID: <200111220907.fAM97pd01734@mira.informatik.hu-berlin.de> > I suggest mentionning that the conflict will only occure with 4Suite > version < 0.12. Since 0.12 will have it's own versions in Ft/Xml/XPath and > Ft/Xml/Xslt, ther should be no conflict, as long as you know what you > import. Well, I'd rather not speculate about the future. Once 4Suite 0.12 is released, we can say for sure. > I have not checked in details, but I believe the main interfaces are > compatible. That is certainly the intention. However, it will still take some work to get there. I guess from a 4Suite POV, PyXML will "just" be released with an old version of 4XSLT. However, it is worse: In 4Suite 0.11, the stylesheet reader is quite closely tied to pDomlette, to a degree that integrating it into PyXML amounts to rewriting it from scratch. Regards, Martin From martin@v.loewis.de Thu Nov 22 09:15:04 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 22 Nov 2001 10:15:04 +0100 Subject: [XML-SIG] parsing xml schema In-Reply-To: <53A7943A5BD8D411B6930002A5073155013F60A2@bgsmsx90.iind.intel.com> (m_mariappanX@trillium.com) References: <53A7943A5BD8D411B6930002A5073155013F60A2@bgsmsx90.iind.intel.com> Message-ID: <200111220915.fAM9F4P01798@mira.informatik.hu-berlin.de> > Do you have sample entire code to have a look? As a matter of fact, I believe the answer to that question is "no". Regards, Martin From Juergen Hermann" Message-ID: On Thu, 22 Nov 2001 09:16:52 +0100, Martin v. Loewis wrote: >Parsing it is no problem; any Python parser will do. Validating it is >currently not supported in PyXML; you may want to try XSV - see the >PyXML "other software" page for a reference. Another option is (but I never tested this) to use pirxx compiled for Xerces 1.5.2, which has a SAX2 feature to enable schema validation. Note= that schema support is not complete in Xerces. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From larsga@garshol.priv.no Thu Nov 22 10:50:44 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 22 Nov 2001 11:50:44 +0100 Subject: [XML-SIG] 4DOM and namespaces In-Reply-To: References: Message-ID: * Uche Ogbuji | | I think we really need to come up with a better alt to DOM (this is | another topic that frequently comes up). Sort of a Pythonic JDOM. * Horatio Davis | | Would ISO groves (or a Pythonic variant thereof) be suitable? To some extent they may be useful, but groves is not an API; it's just a data model. The property set can certainly provide some guidance on correct naming and structuring of the API (which the DOM folks should have used), but for developer convenience methods and update methods we'll have to look elsewhere. | As a bonus you could then address the general case - SGML - as | opposed to merely XML. You can do that anyway. You can either use SX to convert your SGML to XML before parsing it, or you can use pysp with its SAX driver and read SGML as if it were XML. | There was a pygrove project, I think. What ever happened to it? Geir Ove Grønmo still has his GPS tool available. GPS is actually quite powerful, and can do a lot of cool stuff. --Lars M. From rsalz@zolera.com Thu Nov 22 12:46:11 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 22 Nov 2001 07:46:11 -0500 Subject: [XML-SIG] parsing xml schema References: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> <200111220816.fAM8GqJ01278@mira.informatik.hu-berlin.de> Message-ID: <3BFCF393.DA092D27@zolera.com> > Parsing it is no problem; any Python parser will do. Validating it is > currently not supported in PyXML; you may want to try XSV - see the > PyXML "other software" page for a reference. Unless it's changed a lot in the past two months, I'd avoid XSV -- just building it was an adventure. :( You should also check out PyTrex, a tree-oriented validation language. /r$ -- Zolera Systems, Securing web services (XML, SOAP, Signatures, Encryption) http://www.zolera.com From noreply@sourceforge.net Thu Nov 22 13:05:30 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 22 Nov 2001 05:05:30 -0800 Subject: [XML-SIG] [ pyxml-Bugs-484549 ] ContentHandler chokes on None qname Message-ID: Bugs item #484549, was opened at 2001-11-22 05:05 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=484549&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Alexandre Fayolle (afayolle) Assigned to: Nobody/Anonymous (nobody) Summary: ContentHandler chokes on None qname Initial Comment: A Sax2 parser with namespace feature enabled may not report the qname in startElementNS (and use None instead). PyExpat is one of these. This breaks xml.dom.ext.reader.Sax2.Reader which expects a qname to be reported. Other content handlers may choke on this bug. I think that the pDomlette reader is vulnerable too, but I'd have to check. I'll fix xml.dom.Sax2. Alexandre ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=484549&group_id=6473 From uche.ogbuji@fourthought.com Thu Nov 22 15:02:27 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 22 Nov 2001 08:02:27 -0700 Subject: [XML-SIG] PyXML and 4Suite possible conflict In-Reply-To: Message from Alexandre Fayolle of "Thu, 22 Nov 2001 09:40:25 +0100." Message-ID: <200111221502.fAMF2Sm24977@localhost.localdomain> > On Thu, 22 Nov 2001, Martin v. L?wis wrote: > > > + If you intend to install FourThought's 4Suite package, there may be a > > + conflict between the packages xml.xpath and xml.xslt provided here, > > + and those provided by 4Suite. In this case, it is advisable not to > > + install XPath and XSLT support from this package. To do so, pass the > > + options --without-xpath and --without-xslt to setup.py. > > I suggest mentionning that the conflict will only occure with 4Suite > version < 0.12. Since 0.12 will have it's own versions in Ft/Xml/XPath and > Ft/Xml/Xslt, ther should be no conflict, as long as you know what you > import. Sample code would look like > > try: > from Ft.Xml.Xpath import Evaluate > from Ft.Xml.Xslt import Processor > except: > try: > from xml.xpath import Evaluate > from xml.xslt import Processor > except: > raise 'You need to install PyXML to run this application' > else: > print "Using PyXML's XPath and Xslt engine" > else: > print "Using 4Suite's XPath and Xslt engine" > > I have not checked in details, but I believe the main interfaces are > compatible. Yes. They all are. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji@fourthought.com Thu Nov 22 15:06:57 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 22 Nov 2001 08:06:57 -0700 Subject: [XML-SIG] PyXML and 4Suite possible conflict In-Reply-To: Message from "Martin v. Loewis" of "Thu, 22 Nov 2001 10:07:51 +0100." <200111220907.fAM97pd01734@mira.informatik.hu-berlin.de> Message-ID: <200111221506.fAMF6vR24988@localhost.localdomain> > > I suggest mentionning that the conflict will only occure with 4Suite > > version < 0.12. Since 0.12 will have it's own versions in Ft/Xml/XPath and > > Ft/Xml/Xslt, ther should be no conflict, as long as you know what you > > import. > > Well, I'd rather not speculate about the future. Once 4Suite 0.12 is > released, we can say for sure. > > > I have not checked in details, but I believe the main interfaces are > > compatible. > > That is certainly the intention. However, it will still take some work > to get there. I guess from a 4Suite POV, PyXML will "just" be released > with an old version of 4XSLT. However, it is worse: In 4Suite 0.11, > the stylesheet reader is quite closely tied to pDomlette, to a degree > that integrating it into PyXML amounts to rewriting it from scratch. We're abstracting away from any particular DOM implementation again. Now there is an Ft.Xml.DOMLETTE value which is a module which contains all that is needed for a Domlette. I will look into your older patches in order to write a tiny minidom->Domlette interface wrapper, which should take care of the problem. Also worth noting that in 0.12.0, pDomlette will pretty much be in the skids. Mike is working on cDomlette mutation now. AWhile he's at it, he's making sure all Domlettes have the precisely same interface. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From Mike.Olson@fourthought.com Thu Nov 22 15:23:39 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Thu, 22 Nov 2001 08:23:39 -0700 Subject: [XML-SIG] PyXML and 4Suite possible conflict References: <200111221506.fAMF6vR24988@localhost.localdomain> Message-ID: <3BFD187B.6EE6A137@fourthought.com> Uche Ogbuji wrote: > > > > I suggest mentionning that the conflict will only occure with 4Suite > > > version < 0.12. Since 0.12 will have it's own versions in Ft/Xml/XPath and > > > Ft/Xml/Xslt, ther should be no conflict, as long as you know what you > > > import. > > > > Well, I'd rather not speculate about the future. Once 4Suite 0.12 is > > released, we can say for sure. > > > > > I have not checked in details, but I believe the main interfaces are > > > compatible. > > > > That is certainly the intention. However, it will still take some work > > to get there. I guess from a 4Suite POV, PyXML will "just" be released > > with an old version of 4XSLT. However, it is worse: In 4Suite 0.11, > > the stylesheet reader is quite closely tied to pDomlette, to a degree > > that integrating it into PyXML amounts to rewriting it from scratch. > > We're abstracting away from any particular DOM implementation again. Now > there is an Ft.Xml.DOMLETTE value which is a module which contains all that is > needed for a Domlette. I will look into your older patches in order to write > a tiny minidom->Domlette interface wrapper, which should take care of the > problem. > > Also worth noting that in 0.12.0, pDomlette will pretty much be in the skids. > Mike is working on cDomlette mutation now. AWhile he's at it, he's making > sure all Domlettes have the precisely same interface. Note, this won't help Xslt much with Stylesheets as they need to inherit from a class (cDomlette won't work). 2.2 let's us get around that but I don't think we want to inforce 2.2 yet. I have been thinking of breaking the Stylesheet -> pDomlette dependency by putting (another) little DOM in Xslt that is for stylesheets only. we only need a handfull of elements on Element, and Text Nodes. Would be pretty small in size. Mike > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Boulder, CO 80301-2537, USA > XML strategy, XML tools (http://4Suite.org), knowledge management > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management From loewis@informatik.hu-berlin.de Thu Nov 22 16:08:25 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Thu, 22 Nov 2001 17:08:25 +0100 (MET) Subject: [XML-SIG] PyXML and 4Suite possible conflict In-Reply-To: <3BFD187B.6EE6A137@fourthought.com> (message from Mike Olson on Thu, 22 Nov 2001 08:23:39 -0700) References: <200111221506.fAMF6vR24988@localhost.localdomain> <3BFD187B.6EE6A137@fourthought.com> Message-ID: <200111221608.fAMG8Pd08952@paros.informatik.hu-berlin.de> > I have been thinking of breaking the Stylesheet -> pDomlette dependency > by putting (another) little DOM in Xslt that is for stylesheets only. > we only need a handfull of elements on Element, and Text Nodes. Would > be pretty small in size. That sounds good. Are you proposing that we skip any attempts of merging 0.11 into PyXML, and go right to 0.12? Regards, Martin From uche.ogbuji@fourthought.com Thu Nov 22 16:29:01 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 22 Nov 2001 09:29:01 -0700 Subject: [XML-SIG] PyXML and 4Suite possible conflict References: <200111221506.fAMF6vR24988@localhost.localdomain> <3BFD187B.6EE6A137@fourthought.com> Message-ID: <3BFD27CD.7DFBBF4C@fourthought.com> Mike Olson wrote: > I have been thinking of breaking the Stylesheet -> pDomlette dependency > by putting (another) little DOM in Xslt that is for stylesheets only. > we only need a handfull of elements on Element, and Text Nodes. Would > be pretty small in size. I don't think we should add another Domlette to the mix, regardless of how thin. Why not just have a Python class API that delegates to cDomlette (through explicit invocation)? Of course this approach would mean more complex pickling code. Another option is to simply use minidom as a base for Stylesheets, extended with whatever it's missing. This would be best for the PyXML folks. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From uche.ogbuji@fourthought.com Thu Nov 22 16:30:17 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 22 Nov 2001 09:30:17 -0700 Subject: [XML-SIG] PyXML and 4Suite possible conflict References: <200111221506.fAMF6vR24988@localhost.localdomain> <3BFD187B.6EE6A137@fourthought.com> <200111221608.fAMG8Pd08952@paros.informatik.hu-berlin.de> Message-ID: <3BFD2819.498D6074@fourthought.com> Martin von Loewis wrote: > > > I have been thinking of breaking the Stylesheet -> pDomlette dependency > > by putting (another) little DOM in Xslt that is for stylesheets only. > > we only need a handfull of elements on Element, and Text Nodes. Would > > be pretty small in size. > > That sounds good. Are you proposing that we skip any attempts of > merging 0.11 into PyXML, and go right to 0.12? This might be best, though I'm not sure how we manage the scheduling. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From Alexandre.Fayolle@logilab.fr Thu Nov 22 16:56:17 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 22 Nov 2001 17:56:17 +0100 (CET) Subject: [XML-SIG] SAX documentation issue Message-ID: The Sax documentation in the Python Library Reference mentions the Attributes and the AttributesNS class, but these two classes are not documented anywhere (or if they are, they are not easy to find). I've submitted a bug (#484603) in the Python sf tracker. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Thu Nov 22 17:02:48 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 22 Nov 2001 18:02:48 +0100 (CET) Subject: [XML-SIG] Pyexpat and namespaces Message-ID: When using pyexpat with feature_namespaces enabled, startElementNS is called with qname set to None. This is documented as a possible behaviour in the ContentHandler documentation. However, the instance of AttributesNSImpl passed as attrs parameter is created with the qname argument set to an empty dictionnary. As a consequence, calling getValueByQName, getNameByQName, getQNameByName, getQNames will lead to an exception or an unexepected result. I think this should be noted somewhere in the standard library documentation too. Opinions? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Thu Nov 22 17:32:35 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 22 Nov 2001 18:32:35 +0100 (CET) Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: Message-ID: One more issue in the long lived tradition of '' vs. None: How should the lack of prefix be represented? 4DOM uses '' for the prefix of non-prefixed elements, whereas both PyExpat and xmlproc use None. I've got nothing against having a different convention in DOM and SAX, but I would like to be certain that all SAX parsers will be using None for no prefix. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From martin@v.loewis.de Thu Nov 22 18:26:11 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 22 Nov 2001 19:26:11 +0100 Subject: [XML-SIG] parsing xml schema In-Reply-To: <3BFCF393.DA092D27@zolera.com> (message from Rich Salz on Thu, 22 Nov 2001 07:46:11 -0500) References: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> <200111220816.fAM8GqJ01278@mira.informatik.hu-berlin.de> <3BFCF393.DA092D27@zolera.com> Message-ID: <200111221826.fAMIQBG01375@mira.informatik.hu-berlin.de> > Unless it's changed a lot in the past two months, I'd avoid XSV -- just > building it was an adventure. :( Of course, I'd personally take a different route: avoid XML Schema in the first place :-) Regards, Martin From martin@v.loewis.de Thu Nov 22 18:32:18 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 22 Nov 2001 19:32:18 +0100 Subject: [XML-SIG] PyXML and 4Suite possible conflict In-Reply-To: <3BFD27CD.7DFBBF4C@fourthought.com> (message from Uche Ogbuji on Thu, 22 Nov 2001 09:29:01 -0700) References: <200111221506.fAMF6vR24988@localhost.localdomain> <3BFD187B.6EE6A137@fourthought.com> <3BFD27CD.7DFBBF4C@fourthought.com> Message-ID: <200111221832.fAMIWI501398@mira.informatik.hu-berlin.de> > Another option is to simply use minidom as a base for Stylesheets, > extended with whatever it's missing. This would be best for the PyXML > folks. That would be the best thing, indeed. Regards, Martin From Alexandre.Fayolle@logilab.fr Thu Nov 22 18:53:25 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Thu, 22 Nov 2001 19:53:25 +0100 (CET) Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes Message-ID: I've just commited fixes to xml.dom.ext.reader.Sax2 so that it will work with parser with feature_namespaces enabled, and a couple of other fixes. Note to Stephane Bidoule : this should improve your situation. I'm able to correctly build DOM trees for the following documents, with both pyexpat and xmlproc, with the namespaces feature enabled and disabled (except that xmlproc crashes on s3 with namespaces on, see bug #469460) s1 = "data" s2 = "foo" s3 = "data" s4 = "

" s5 = "" I encourage you to give it a try and report back to the list if you find remaining bugs. Cheers, Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From pyxml@xhaus.com Thu Nov 22 19:03:18 2001 From: pyxml@xhaus.com (Alan Kennedy) Date: Thu, 22 Nov 2001 19:03:18 +0000 Subject: [XML-SIG] parsing xml schema References: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> <200111220816.fAM8GqJ01278@mira.informatik.hu-berlin.de> <3BFCF393.DA092D27@zolera.com> <200111221826.fAMIQBG01375@mira.informatik.hu-berlin.de> Message-ID: <3BFD4BF6.B156352A@xhaus.com> "Martin v. Loewis" wrote: > > Unless it's changed a lot in the past two months, I'd avoid XSV -- just > > building it was an adventure. :( > > Of course, I'd personally take a different route: avoid XML Schema in > the first place :-) I've been looking at pyTrex for the last couple of days, and it's pretty good. While the coverage of the "spec" is not 100%, it covers all the important areas, and most importantly can be extended with your own datatypes. All you need to do to add your own datatypes is implement a function which takes a string and returns true or false depending on whether the string is acceptable as an instance of that datatype or not. Now all I need to do is figure out how to make it work with 4Suite doms, and I'll be using it in my current project. In the long run, TREX will be going away, to be superceded by RELAX-NG, which is a combination of the best of RELAX, TREX and SCHEMATRON. Details on RELAX-NG from http://www.oasis-open.org/committees/relax-ng In summary, pyTrex works, its fast, and its here right now. And it's straightforward to learn. I've been teaching a couple of non-programmers Python and XML for the last two months, and they picked up Trex in a matter of hours. Happy Thanksgiving to all the US folks out there. Alan. From martin@v.loewis.de Thu Nov 22 19:39:58 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 22 Nov 2001 20:39:58 +0100 Subject: [XML-SIG] parsing xml schema In-Reply-To: <3BFD4BF6.B156352A@xhaus.com> (message from Alan Kennedy on Thu, 22 Nov 2001 19:03:18 +0000) References: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> <200111220816.fAM8GqJ01278@mira.informatik.hu-berlin.de> <3BFCF393.DA092D27@zolera.com> <200111221826.fAMIQBG01375@mira.informatik.hu-berlin.de> <3BFD4BF6.B156352A@xhaus.com> Message-ID: <200111221939.fAMJdwg01712@mira.informatik.hu-berlin.de> > In summary, pyTrex works, its fast, and its here right now. And it's > straightforward to learn. I think James Tauber will be pleased to hear that. I'm sure he's aware of the RELAX development, since those things are closely related. Regards, Martin From horatio@qpsf.edu.au Thu Nov 22 23:07:19 2001 From: horatio@qpsf.edu.au (Horatio Davis) Date: Fri, 23 Nov 2001 09:07:19 +1000 (EST) Subject: [XML-SIG] 4DOM and namespaces In-Reply-To: Message-ID: On 22 Nov 2001, Lars Marius Garshol wrote: > You can either use SX to convert your SGML to XML before parsing it, > or you can use pysp with its SAX driver and read SGML as if it were > XML. [snip] > Geir Ove Gr=F8nmo still has his GPS tool available. GPS is actually > quite powerful, and can do a lot of cool stuff. These look like what I am after. Thanks for the pointers, Horatio From uche.ogbuji@fourthought.com Thu Nov 22 23:41:52 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Thu, 22 Nov 2001 16:41:52 -0700 Subject: [XML-SIG] parsing xml schema In-Reply-To: Message from Alan Kennedy of "Thu, 22 Nov 2001 19:03:18 GMT." <3BFD4BF6.B156352A@xhaus.com> Message-ID: <200111222341.fAMNfqb26329@localhost.localdomain> > "Martin v. Loewis" wrote: > > > > Unless it's changed a lot in the past two months, I'd avoid XSV -- just > > > building it was an adventure. :( > > > > Of course, I'd personally take a different route: avoid XML Schema in > > the first place :-) > > I've been looking at pyTrex for the last couple of days, and it's pretty > good. > > While the coverage of the "spec" is not 100%, it covers all the important > areas, and most importantly can be extended with your own datatypes. All you > need to do to add your own datatypes is implement a function which takes a > string and returns true or false depending on whether the string is > acceptable as an instance of that datatype or not. > > Now all I need to do is figure out how to make it work with 4Suite doms, and > I'll be using it in my current project. Don't hesitate to ask if yo need help with this task. In fact, if you were able to write up what you did to use PyTREX as a validator I would love to make this available to others. I too am no friend of XSD. We tend to use Schematron or Examplotron when we use schemas at all, but I like RELAX-NG/TREX. -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Boulder, CO 80301-2537, USA XML strategy, XML tools (http://4Suite.org), knowledge management From rsalz@zolera.com Fri Nov 23 04:01:50 2001 From: rsalz@zolera.com (Rich Salz) Date: Thu, 22 Nov 2001 23:01:50 -0500 Subject: [XML-SIG] parsing xml schema Message-ID: <200111230401.fAN41oX05076@zolera.com> >I too am no friend of XSD. It is, unfortunately, what the world is gonna use for web services. sigh. :( From Karl Eichwalder Fri Nov 23 05:39:36 2001 From: Karl Eichwalder (Karl Eichwalder) Date: Fri, 23 Nov 2001 06:39:36 +0100 Subject: [XML-SIG] Re: 4DOM and i18n In-Reply-To: (Alexandre Fayolle's message of "Wed, 21 Nov 2001 11:36:52 +0100 (CET)") References: Message-ID: Alexandre Fayolle writes: > I think this means that someone having his LANG environment variable set > to fr_CH or en_AU will not be seeing the localized messages, and that > en_US and fr_FR should be renamed to en and fr respectively. Arguable whether it's okay to rename message files. It should be up to the translator to decide. Instead of renaming I recommend to create links (or copies) at installation time for fallback purposes as explained by Martin. cd .../fr/LC_MESSAGES ln -sf ../../fr_FR/LC_MESSAGES/$DOMAIN.mo . -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) From stephane.bidoul@softwareag.com Fri Nov 23 09:30:16 2001 From: stephane.bidoul@softwareag.com (=?US-ASCII?Q?Stephane_Bidoul?=) Date: Fri, 23 Nov 2001 10:30:16 +0100 Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes In-Reply-To: Message-ID: <000201c17401$77c5de10$69a679c1@softwareag.com> No remaining bugs except xmlproc with xmlns=3D'', as you mention. And also the related XMLGenerator problem, but it's really an easy one (see bug #469463) - just waiting for someone to commit the patch (to PyXML and the main Python tree, I guess). The updated test suite is on its way to you, Alexandre. Thanks for the hard work. -Stephane > -----Original Message----- > From: Alexandre Fayolle [mailto:Alexandre.Fayolle@logilab.fr] > Sent: 22 November, 2001 19:53 > To: xml-sig@python.org > Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes >=20 >=20 > I've just commited fixes to xml.dom.ext.reader.Sax2 so that=20 > it will work > with parser with feature_namespaces enabled, and a couple of=20 > other fixes. >=20 > Note to Stephane Bidoule : this should improve your=20 > situation. I'm able > to correctly build DOM trees for the following documents, with both > pyexpat and xmlproc, with the namespaces feature enabled and disabled > (except that xmlproc crashes on s3 with namespaces on, see=20 > bug #469460) >=20 > s1 =3D "data" > s2 =3D "foo" > s3 =3D "data" > s4 =3D "

" > s5 =3D " xmlns:p=3D'gaaaaaaa'/>" >=20 > I encourage you to give it a try and report back to the list=20 > if you find > remaining bugs. >=20 > Cheers, >=20 > Alexandre Fayolle > --=20 > LOGILAB, Paris (France). > http://www.logilab.com http://www.logilab.fr http://www.logilab.org > Narval, the first software agent available as free software (GPL). >=20 >=20 > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig From Alexandre.Fayolle@logilab.fr Fri Nov 23 09:34:06 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 23 Nov 2001 10:34:06 +0100 (CET) Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes In-Reply-To: <000201c17401$77c5de10$69a679c1@softwareag.com> Message-ID: On Fri, 23 Nov 2001, Stephane Bidoul wrote: > No remaining bugs except xmlproc with xmlns='', as you mention. I have posted a patch for this one on SF. I'd like to get Lars's approval before committing it. > And also the related XMLGenerator problem, but it's really an easy > one (see bug #469463) - just waiting for someone to commit the patch > (to PyXML and the main Python tree, I guess). I've just commited the fix and closed the bug. I don't know how this will be reported on the main python tree, though. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Fri Nov 23 09:49:44 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 23 Nov 2001 10:49:44 +0100 (CET) Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes In-Reply-To: <000201c17401$77c5de10$69a679c1@softwareag.com> Message-ID: On Fri, 23 Nov 2001, Stephane Bidoul wrote: > The updated test suite is on its way to you, Alexandre. With the patch to bug #469460 applied, all tests in the new test suite pass but one, test_gen_xmlproc_nson__mixedns, which fails because of bug #482525 (xmlproc doesn't call startPrefixNS), I think. I think I'll wait until this one is fixed by Lars Marius Garshol, who should be much more at ease with xmlproc than I, and come back to it. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From stephane.bidoul@softwareag.com Fri Nov 23 09:55:39 2001 From: stephane.bidoul@softwareag.com (=?US-ASCII?Q?Stephane_Bidoul?=) Date: Fri, 23 Nov 2001 10:55:39 +0100 Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes In-Reply-To: Message-ID: <000601c17405$02b8d290$69a679c1@softwareag.com> I reached the same conclusion. Everything is fine now AFAIC. Thanks. -sbi > -----Original Message----- > From: Alexandre Fayolle [mailto:Alexandre.Fayolle@logilab.fr] > Sent: 23 November, 2001 10:50 > To: Stephane Bidoul > Cc: xml-sig@python.org > Subject: RE: [XML-SIG] xml.dom.ext.reader.Sax2 fixes >=20 >=20 > On Fri, 23 Nov 2001, Stephane Bidoul wrote: >=20 > > The updated test suite is on its way to you, Alexandre. >=20 > With the patch to bug #469460 applied, all tests in the new test suite > pass but one, test_gen_xmlproc_nson__mixedns, which fails=20 > because of bug > #482525 (xmlproc doesn't call startPrefixNS), I think. I=20 > think I'll wait > until this one is fixed by Lars Marius Garshol, who should be=20 > much more at > ease with xmlproc than I, and come back to it.=20 >=20 > Alexandre Fayolle > --=20 > LOGILAB, Paris (France). > http://www.logilab.com http://www.logilab.fr http://www.logilab.org > Narval, the first software agent available as free software (GPL). >=20 From larsga@garshol.priv.no Fri Nov 23 10:10:01 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 23 Nov 2001 11:10:01 +0100 Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes In-Reply-To: References: Message-ID: * Stephane Bidoul | | No remaining bugs except xmlproc with xmlns='', as you mention. * Alexandre Fayolle | | I have posted a patch for this one on SF. I'd like to get Lars's | approval before committing it. I'll look at this (as well as all the other mail on this list) tomorrow. Too busy before then. --Lars M. From Alexandre.Fayolle@logilab.fr Fri Nov 23 10:20:42 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 23 Nov 2001 11:20:42 +0100 (CET) Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes In-Reply-To: Message-ID: On 23 Nov 2001, Lars Marius Garshol wrote: > * Alexandre Fayolle > | > | I have posted a patch for this one on SF. I'd like to get Lars's > | approval before committing it. > > I'll look at this (as well as all the other mail on this list) > tomorrow. Too busy before then. No problem with that. ;o) Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From noreply@sourceforge.net Fri Nov 23 11:12:48 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 23 Nov 2001 03:12:48 -0800 Subject: [XML-SIG] [ pyxml-Patches-484826 ] pDomletteReader with namespace parser Message-ID: Patches item #484826, was opened at 2001-11-23 03:12 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=484826&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Alexandre Fayolle (afayolle) Assigned to: Uche Ogbuji (uche) Summary: pDomletteReader with namespace parser Initial Comment: Here's a conversion of the modifications I made to xml.dom.ext.reader.Sax2.XmlDomGenerator to Ft.Xml.pDomletteReader.SaxReader. It enables to build a pDomlette tree from a SAX parser with feature_namespaces enabled. Tested with pyexpat and xmlproc. Cheers Alexandre ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=484826&group_id=6473 From Alexandre.Fayolle@logilab.fr Fri Nov 23 16:18:55 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 23 Nov 2001 17:18:55 +0100 (CET) Subject: [XML-SIG] 4Xslt in PyXML 0.7 Message-ID: Hello, I've noticed that xml.xslt.DomWriter has disapeared during the 4S -> pyxml migration. Is this a deliberate move, or an omission or just the fact that DomWriter has not yet been ported to PyXML? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Fri Nov 23 16:24:29 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 23 Nov 2001 17:24:29 +0100 (CET) Subject: [XML-SIG] xml.xpath.SyntaxException Message-ID: When importing xml.xslt, the following exception is raised: AttributeError: 'xml.xpath' module has no attribute 'SyntaxException' I've fixed this by adding the following line to xml/xpath/__init__.py: from XPathParserBase import SyntaxException Another solution could be importing SyntaxException from somewhere else in xslt. However, since xpath.SyntaxException was widely used, I thought the first way was easier. I can commit the patch if you think it's correct. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From pyxml@xhaus.com Fri Nov 23 17:00:31 2001 From: pyxml@xhaus.com (Alan Kennedy) Date: Fri, 23 Nov 2001 17:00:31 +0000 Subject: [XML-SIG] parsing xml schema References: <200111222341.fAMNfqb26329@localhost.localdomain> Message-ID: <3BFE80AF.9F8205A4@xhaus.com> Uche Ogbuji wrote: > Don't hesitate to ask if yo need help with this task. In fact, if you were > able to write up what you did to use PyTREX as a validator I would love to > make this available to others. OK, here is how I see it. Basically, I need to do validation of XML data files. These may either be from textual xml data that is submitted to the application, or on DOM structures that have been retrieved from some storage(content repos, pickled DOM in RDBMS?). The DOM structures are very likely to be either pDomlette, or 4Suite 0.12 R/W cDomlettes (he said hopefully ;-) Another, perhaps more esoteric, case is where the TREX pattern is stored in a DOM, having perhaps been generated from an XSLT transform, although off-hand I can't picture any use cases for such a scenario? It is very likely to be the case that I will need a persistable "compiled" version of the trex pattern, since I will have a set of 10 to 100 handwritten trex patterns that will be used continually, and I don't want to parse them each time. It is quite likely I could just pickle the pattern after parsing, but that remains to be verified. Validating textual XML is simple. Just create pyTrex instances from the textual XML, using the "parse_Instance" function, create a trex instance from the textual trex file, using the "parse_Trex" function. And then use the "validate" function to match the former against the latter. However, it is more complex when it comes to DOMs, mainly because pyTrex uses non-SAX/DOM interfaces in order to speed things up as much as possible. Efficiently integrating with [cp]Domlette is non-trivial, for the following reasons. 1. pyTrex uses the pyExpat (non-SAX) callback interfaces directly, presumably to increase speed. 2. pyTrex uses its own internal non-DOM object model to store the document and schema representations, again presumably for speed. This is a good design choice, since pyTrex does not need the wealth of DOM pointers (sibling, parent, etc) to do its job: it just needs one-way, down-pointing parent to child relationships. The way I see it, there are four possible approaches I can take to validate a DOM structure. 1. Serialise the DOM to a string, and let pyTrex re-parse the string to build up its own data structures. Advantages: minimal extra coding. Disadvantages: 1. Speed inefficiency, since the XML is parsed twice. 2. Memory inefficiency, since both the DOM structure and the pyTrex object model will be present in memory at the same time 3. Can only use pyExpat as a parser. 2. Rewrite the pyTrex HandlerBase class to take its event calls from a SAX(2) stream. Advantages: 1. Can use any SAX compliant parser and 2. Eliminates the double parsing problem, since can generate a SAX stream by tree-walking an existing DOM. Disadvantages: Memory inefficiency, since both the DOM structure and the pyTrex object model will be present in memory at the same time. 3. Rewrite the pyTrex "parse_Instance" function and dependent classes so that it augments an existing DOM structure with whatever attributes it needs. Advantages: Memory effiency, since both parallel object models are stored in the same structure. Disadvantages: Fair amount of code rewriting, and thus debugging (which I don't fancy much) . 4. Rewrite the pyTrex pattern matching routines so that they operate off a DOM structure instead of the proprietary pyTrex structure. Advantages: best possible memory efficiency. Disadvantages: A *lot* of code rewriting, and consequent debugging. I don't think I fancy opening that little can of worms. As things stand for me now, I think I am very likely to opt for option 1, since it involves the minimum amount of coding. However, I may yet go for option 2, if the overhead of parsing my data files (which will alll be < 100K text) turns out to be large. Or it may turn out to be the case (not sure yet, still spec'ing requirements) that I won't bother with the second (DOM) parse if the first (validation) parse fails, since a failed validation means I'm not interested in the file anyway. Thinking about it some more, this is very likely to be the case, since I really only need the TREX validation to act as a gate-keeper against bad data files coming into my system. Once it's in the system, I shouldn't need to validate it again. I don't think the memory inefficiency inherent in options 1 and 2 is large, since the pyTrex data structures are so light. I can't see myself going for options 3 or 4, since that would involve rewriting of pretty complex code, a place where I can't afford to go right now. One last requirement that I don't have now, but could foresee myself having in the future: Validating the output of a (XSLT) transformation, to be absolutely certain that the transform is generating valid output. This is really a system testing requirement, used only in system development and QA phases, unlikely to be required at run time. The reason why I mention this is that our last contract was a QA contract where we were testing a system written in ASP. A lot of crud came out of that system, and it handled multi-browser considerations partulcularly badly. Some form of automated validation of the ouput HTML/etc might have prevented a lot of wild-goose chasing. Regards, Alan. From rsalz@zolera.com Fri Nov 23 17:20:17 2001 From: rsalz@zolera.com (Rich Salz) Date: Fri, 23 Nov 2001 12:20:17 -0500 Subject: [XML-SIG] xml.xpath.SyntaxException Message-ID: <200111231720.fANHKHE06676@zolera.com> xslt depends on xpath, not vice-versa, so your fix seems right to me. From martin@v.loewis.de Fri Nov 23 17:55:07 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 23 Nov 2001 18:55:07 +0100 Subject: [XML-SIG] xml.dom.ext.reader.Sax2 fixes In-Reply-To: (message from Alexandre Fayolle on Fri, 23 Nov 2001 10:34:06 +0100 (CET)) References: Message-ID: <200111231755.fANHt7201320@mira.informatik.hu-berlin.de> > I've just commited the fix and closed the bug. I don't know how this will > be reported on the main python tree, though. I'll try to synchronize trees from time to time. I'm not sure whether another synchronization will happen before 2.2, though; people should submit specific patches to the Python project on SF if they think changes are absolutely needed for 2.2. Regards, Martin From martin@v.loewis.de Fri Nov 23 18:06:57 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 23 Nov 2001 19:06:57 +0100 Subject: [XML-SIG] 4Xslt in PyXML 0.7 In-Reply-To: (message from Alexandre Fayolle on Fri, 23 Nov 2001 17:18:55 +0100 (CET)) References: Message-ID: <200111231806.fANI6vM01387@mira.informatik.hu-berlin.de> > I've noticed that xml.xslt.DomWriter has disapeared during the 4S -> pyxml > migration. Is this a deliberate move, or an omission or just the fact that > DomWriter has not yet been ported to PyXML? Neither, nor. I've included everything that was needed by the test suite. "Nobody" uses DomWriter, so I've omitted it. If you think it should be included, feel free to add it. Make sure you indicate what version this came from. Regards, Martin From martin@v.loewis.de Fri Nov 23 18:12:28 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Fri, 23 Nov 2001 19:12:28 +0100 Subject: [XML-SIG] parsing xml schema In-Reply-To: <3BFE80AF.9F8205A4@xhaus.com> (message from Alan Kennedy on Fri, 23 Nov 2001 17:00:31 +0000) References: <200111222341.fAMNfqb26329@localhost.localdomain> <3BFE80AF.9F8205A4@xhaus.com> Message-ID: <200111231812.fANICS501392@mira.informatik.hu-berlin.de> > However, I may yet go for option 2, if the overhead of parsing my > data files (which will alll be < 100K text) turns out to be large. There is another option: parse the document only once using expat. More precisely, register a set of handlers with expat that feeds both the trex parsing, and a DOM builder (i.e. xml.dom.ext.readers.PyExpat); alternatively, feed both pytrex and expatreader, and use the resulting SAX events to build a DOM tree. HTH, Martin From pyxml@xhaus.com Fri Nov 23 18:30:19 2001 From: pyxml@xhaus.com (Alan Kennedy) Date: Fri, 23 Nov 2001 18:30:19 +0000 Subject: [XML-SIG] parsing xml schema References: <200111222341.fAMNfqb26329@localhost.localdomain> <3BFE80AF.9F8205A4@xhaus.com> <200111231812.fANICS501392@mira.informatik.hu-berlin.de> Message-ID: <3BFE95BB.A1AA8F79@xhaus.com> "Martin v. Loewis" wrote: > There is another option: parse the document only once using > expat. More precisely, register a set of handlers with expat that > feeds both the trex parsing, and a DOM builder > (i.e. xml.dom.ext.readers.PyExpat); alternatively, feed both pytrex > and expatreader, and use the resulting SAX events to build a DOM tree. Martin, Of course! Multiple handlers for expat. I might also look into buffering the second SAX/event stream, so that DOM construction can be deferred until the input is confirmed valid. The overhead of constructing the buffer should probably be less than constructing a DOM which might then be discarded. I suppose that really depends on how frequently I expect to receive invalid documents. For my current requirement, where the content of the submitted XML documents will be written by people, either hand written or with an XML editor, receiving and checking submissions of XML files will be a (relatively) infrequent occurence. But I can imagine that in a web services SOAP/WSDL situation, where the XML documents might be coming thick and fast, such "deferred DOM construction" might result in a considerable speed up. I must do some timings. Excellent solution. Thanks, Alan. From Mike.Olson@fourthought.com Sat Nov 24 00:33:57 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Fri, 23 Nov 2001 17:33:57 -0700 Subject: [XML-SIG] PyXML and 4Suite possible conflict References: <200111221506.fAMF6vR24988@localhost.localdomain> <3BFD187B.6EE6A137@fourthought.com> <200111221608.fAMG8Pd08952@paros.informatik.hu-berlin.de> Message-ID: <3BFEEAF5.CB7F6699@fourthought.com> Martin von Loewis wrote: > > > I have been thinking of breaking the Stylesheet -> pDomlette dependency > > by putting (another) little DOM in Xslt that is for stylesheets only. > > we only need a handfull of elements on Element, and Text Nodes. Would > > be pretty small in size. > > That sounds good. Are you proposing that we skip any attempts of > merging 0.11 into PyXML, and go right to 0.12? My concern is that 4Suite 12.0 is Python 2.x only and I don;t think you are ready to remove 1.5.2 support from pyXML. Mike > > Regards, > Martin > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management From Mike.Olson@fourthought.com Sat Nov 24 00:35:17 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Fri, 23 Nov 2001 17:35:17 -0700 Subject: [XML-SIG] PyXML and 4Suite possible conflict References: <200111221506.fAMF6vR24988@localhost.localdomain> <3BFD187B.6EE6A137@fourthought.com> <3BFD27CD.7DFBBF4C@fourthought.com> Message-ID: <3BFEEB45.6CD3800@fourthought.com> Uche Ogbuji wrote: > > Mike Olson wrote: > > > I have been thinking of breaking the Stylesheet -> pDomlette dependency > > by putting (another) little DOM in Xslt that is for stylesheets only. > > we only need a handfull of elements on Element, and Text Nodes. Would > > be pretty small in size. > > I don't think we should add another Domlette to the mix, regardless of > how thin. Why not just have a Python class API that delegates to > cDomlette (through explicit invocation)? Of course this approach would > mean more complex pickling code. I'm not sure if it would be complex. If the domlette supports getstate and setstate would should be able to pickle them. Also, if parsing the Stylesheets was quicker we might not need the pickling... My main focus on the new DOM would be size. Mike > > Another option is to simply use minidom as a base for Stylesheets, > extended with whatever it's missing. This would be best for the PyXML > folks. > > -- > Uche Ogbuji Principal Consultant > uche.ogbuji@fourthought.com +1 303 583 9900 x 101 > Fourthought, Inc. http://Fourthought.com > 4735 East Walnut St, Boulder, CO 80301-2537, USA > XML strategy, XML tools (http://4Suite.org), knowledge management > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management From martin@v.loewis.de Sat Nov 24 07:50:59 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Sat, 24 Nov 2001 08:50:59 +0100 Subject: [XML-SIG] PyXML and 4Suite possible conflict In-Reply-To: <3BFEEAF5.CB7F6699@fourthought.com> (message from Mike Olson on Fri, 23 Nov 2001 17:33:57 -0700) References: <200111221506.fAMF6vR24988@localhost.localdomain> <3BFD187B.6EE6A137@fourthought.com> <200111221608.fAMG8Pd08952@paros.informatik.hu-berlin.de> <3BFEEAF5.CB7F6699@fourthought.com> Message-ID: <200111240750.fAO7oxW01640@mira.informatik.hu-berlin.de> > My concern is that 4Suite 12.0 is Python 2.x only and I don;t think you > are ready to remove 1.5.2 support from pyXML. We will see. Backporting xslt to 1.5.2 shouldn't be that difficult. Regards, Martin From ht@cogsci.ed.ac.uk Sat Nov 24 10:19:11 2001 From: ht@cogsci.ed.ac.uk (Henry S. Thompson) Date: 24 Nov 2001 10:19:11 +0000 Subject: [XML-SIG] parsing xml schema In-Reply-To: <3BFCF393.DA092D27@zolera.com> References: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> <200111220816.fAM8GqJ01278@mira.informatik.hu-berlin.de> <3BFCF393.DA092D27@zolera.com> Message-ID: Rich Salz writes: > > Parsing it is no problem; any Python parser will do. Validating it is > > currently not supported in PyXML; you may want to try XSV - see the > > PyXML "other software" page for a reference. > > Unless it's changed a lot in the past two months, I'd avoid XSV -- just > building it was an adventure. :( You should also check out PyTrex, a > tree-oriented validation language. Sorry you had difficulties -- I'd be happy to receive feedback on how things didn't go well for you. Without such feedback, I'm handicapped in trying to improve things. It's worth noting that XSV exposes both source and schema in an Infoset-compliant form, and has by far the most flexible support for different approaches to determining which schema documents to use for validation. ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2001, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ From noreply@sourceforge.net Mon Nov 26 11:45:36 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 26 Nov 2001 03:45:36 -0800 Subject: [XML-SIG] [ pyxml-Bugs-485569 ] xmlproc and property_lexical_handler Message-ID: Bugs item #485569, was opened at 2001-11-26 03:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=485569&group_id=6473 Category: xmlproc Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Lars Marius Garshol (larsga) Summary: xmlproc and property_lexical_handler Initial Comment: when property_lexical_handler is on, a sax should call 'startCDATA' and 'endCDATA' on the property handler before each CDATA xmlproc doesn't call these callbacks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=485569&group_id=6473 From noreply@sourceforge.net Mon Nov 26 13:01:07 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 26 Nov 2001 05:01:07 -0800 Subject: [XML-SIG] [ pyxml-Bugs-485584 ] expat and property_lexical_handler Message-ID: Bugs item #485584, was opened at 2001-11-26 05:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=485584&group_id=6473 Category: expat Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: expat and property_lexical_handler Initial Comment: when property_lexical_handler is on, a sax parser should call startDTD/endDTD before and after the dtd declaration () pyexpat doesn't ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=106473&aid=485584&group_id=6473 From Sylvain.Thenault@logilab.fr Mon Nov 26 13:09:26 2001 From: Sylvain.Thenault@logilab.fr (Sylvain Thenault) Date: Mon, 26 Nov 2001 14:09:26 +0100 (CET) Subject: [XML-SIG] dom 2 sax events Message-ID: hello, I have written a parser which generates SAX events from a DOM tree. This may seem strange, but it may be usefull sometimes: we are using it in Narval to construct objects from a dom tree, I guess it should also be useful to change a dom tree type (for instance, get minidom from 4dom) or to write tests... Does this interest somebody ? If people are interested, may be should we include it in pyxml ? regards -- Sylvain Thenault LOGILAB http://www.logilab.org From Alexandre.Fayolle@logilab.fr Mon Nov 26 13:18:26 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Mon, 26 Nov 2001 14:18:26 +0100 (CET) Subject: [XML-SIG] dom 2 sax events In-Reply-To: Message-ID: On Mon, 26 Nov 2001, Sylvain Thenault wrote: > Does this interest somebody ? > If people are interested, may be should we include it in pyxml ? I can do that. I think that putting it in xml.dom.ext would be a good place. Opinions? Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Juergen Hermann" Message-ID: On Mon, 26 Nov 2001 14:18:26 +0100 (CET), Alexandre Fayolle wrote: >On Mon, 26 Nov 2001, Sylvain Thenault wrote: > >> Does this interest somebody ? >> If people are interested, may be should we include it in pyxml ? > >I can do that. I think that putting it in xml.dom.ext would be a good >place. > >Opinions? Sounds useful. What we should check & decide then is whether we replace any serializing code that is DOM-specific by SAX serializing (which has advantages regarding consistency of output, and eliminates duplicate code). The overhead of the additional function calls is negligable when compared to generating & writing the bytes, I think. Note that the LexicalXMLGenerator I committed yesterday allows to serialize CDATA and comment nodes, and those event calls should be added to Sylvain's code if he doesn't have them. From Sylvain.Thenault@logilab.fr Mon Nov 26 15:42:44 2001 From: Sylvain.Thenault@logilab.fr (Sylvain Thenault) Date: Mon, 26 Nov 2001 16:42:44 +0100 (CET) Subject: [XML-SIG] dom 2 sax events In-Reply-To: Message-ID: On Mon, 26 Nov 2001, Juergen Hermann wrote: > Note that the LexicalXMLGenerator I committed yesterday allows to > serialize CDATA and comment nodes, and those event calls should be > added to Sylvain's code if he doesn't have them. I have tried to handle feature_namespaces, feature_namespace_prefixes, property_lexical_handler and property_decl_handler, so it call CDATA and comment callbacks. Note that as property_decl_handler isn't yet fully supported (pyexpat doesn't support it, so python dom implementations doesn't have correct DTD information and entity declaration or I missed something) I haven't been able to test it. regards -- Sylvain Thenault LOGILAB http://www.logilab.org From rsalz@zolera.com Mon Nov 26 17:42:40 2001 From: rsalz@zolera.com (Rich Salz) Date: Mon, 26 Nov 2001 12:42:40 -0500 Subject: [XML-SIG] parsing xml schema References: <53A7943A5BD8D411B6930002A5073155013F609F@bgsmsx90.iind.intel.com> <200111220816.fAM8GqJ01278@mira.informatik.hu-berlin.de> <3BFCF393.DA092D27@zolera.com> Message-ID: <3C027F10.462FC294@zolera.com> > > Unless it's changed a lot in the past two months, I'd avoid XSV -- just > > building it was an adventure. :( You should also check out PyTrex, a > > tree-oriented validation language. > > Sorry you had difficulties -- I'd be happy to receive feedback on how > things didn't go well for you. Without such feedback, I'm handicapped > in trying to improve things. I sent detailed notes to Henry, in case anyone cares. /r$ -- Zolera Systems, Your Key to Online Integrity Securing Web services: XML, SOAP, Dig-sig, Encryption http://www.zolera.com From Sylvain.Thenault@logilab.fr Mon Nov 26 17:41:02 2001 From: Sylvain.Thenault@logilab.fr (Sylvain Thenault) Date: Mon, 26 Nov 2001 18:41:02 +0100 (CET) Subject: [XML-SIG] (no subject) Message-ID: hello, I wonder when should I call startEntity and endEntity of the lexical handler. Before/after an entity to be resolved ? declaration ? I have been looking on the megginson web site, but didn't found anything which help me to understand this :( does any body have an idea? -- Sylvain Thenault LOGILAB http://www.logilab.org From larsga@garshol.priv.no Mon Nov 26 18:11:37 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 26 Nov 2001 19:11:37 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: References: Message-ID: * Alexandre Fayolle | | How should the lack of prefix be represented? 4DOM uses '' for the prefix | of non-prefixed elements, whereas both PyExpat and xmlproc use None. Good catch! I think the only thing that makes sense for SAX is to be internally consistent (with no-URI namespace names) and use None. Let's make sure we're fully aligned here, however. You are thinking of the following cases, right? 1) Default namespace declaration reported through *PrefixMapping. 2) The values returned by the AttributesNS objects for prefix-less names. Is this correct? I've written a patch to the library documentation and sent it to Fred as I *still* can't access SourceForge properly from home. This patch also fully documents the Attributes and AttributesNS interfaces. Is anyone else still seeing the SourceForge problem, BTW? This is beginning to get difficult for me, so I'll have to do something about it. --Lars M. From Nicolas.Chauvat@logilab.fr Mon Nov 26 18:22:36 2001 From: Nicolas.Chauvat@logilab.fr (Nicolas Chauvat) Date: Mon, 26 Nov 2001 19:22:36 +0100 (CET) Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: Message-ID: > Is anyone else still seeing the SourceForge problem, BTW? This is > beginning to get difficult for me, so I'll have to do something about > it. Do you mean like moving to savannah or asking some kind soul to set up a similar server for python apps ? I did the second a year ago on comp.lang.python and got shot down into burning flames... :-( -- Nicolas Chauvat http://www.logilab.com - "Mais où est donc Ornicar ?" - LOGILAB, Paris (France) From larsga@garshol.priv.no Mon Nov 26 18:28:53 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 26 Nov 2001 19:28:53 +0100 Subject: [XML-SIG] Pyexpat and namespaces In-Reply-To: References: Message-ID: * Alexandre Fayolle | | However, the instance of AttributesNSImpl passed as attrs parameter is | created with the qname argument set to an empty dictionnary. As a | consequence, calling getValueByQName, getNameByQName, getQNameByName, | getQNames will lead to an exception or an unexepected result. | | I think this should be noted somewhere in the standard library | documentation too. Agreed. I've updated the patch I sent to Fred. --Lars M. From larsga@garshol.priv.no Mon Nov 26 18:36:26 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 26 Nov 2001 19:36:26 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: References: Message-ID: * Lars Marius Garshol | | Is anyone else still seeing the SourceForge problem, BTW? This is | beginning to get difficult for me, so I'll have to do something | about it. * Nicolas Chauvat | | Do you mean like moving to savannah or asking some kind soul to set | up a similar server for python apps ? More like finding out what's causing the problem for me and trying to fix that, either locally or by contacting SF. It seems that not everyone has this problem, so presumably something can be done about it. Savannah is in any case GPL-only, which rules out Python. Running a separate server system is something the Python community has done for several years, so if they're not keen on that they probably have good reason not to be. --Lars M. From larsga@garshol.priv.no Mon Nov 26 18:37:52 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 26 Nov 2001 19:37:52 +0100 Subject: [XML-SIG] (no subject) In-Reply-To: References: Message-ID: Hi Sylvain, * Sylvain Thenault | | I wonder when should I call startEntity and endEntity of the lexical | handler. Before/after an entity to be resolved ? declaration ? These two events are supposed to tell you where events begin and end, in case you actually need the information about the entity structure of the XML document. You should call startEntity before you start reporting content from the entity, and endEntity just after the last content event for the entity. --Lars M. From larsga@garshol.priv.no Mon Nov 26 18:40:31 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 26 Nov 2001 19:40:31 +0100 Subject: [XML-SIG] Re: SAX documentation issue Message-ID: * Alexandre Fayolle | | The Sax documentation in the Python Library Reference mentions the | Attributes and the AttributesNS class, but these two classes are not | documented anywhere (or if they are, they are not easy to find). This is kind of messy. They aren't in the standard library, and therefore they are not documented either. The patch I sent to Fred contains added text in the right places that explains this and points to the right places. | I've submitted a bug (#484603) in the Python sf tracker. I've asked Fred to close it. (Would have done it myself, if only...) --Lars M. From martin@v.loewis.de Mon Nov 26 21:42:14 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Mon, 26 Nov 2001 22:42:14 +0100 Subject: [XML-SIG] dom 2 sax events In-Reply-To: (message from Sylvain Thenault on Mon, 26 Nov 2001 14:09:26 +0100 (CET)) References: Message-ID: <200111262142.fAQLgE701468@mira.informatik.hu-berlin.de> > Does this interest somebody ? > If people are interested, may be should we include it in pyxml ? If you are interested in distributing it through PyXML, and willing to maintain it (as in: respond to user questions, fix bugs, perhaps write documentation) - including it in PyXML is fine with me. Regards, Martin From martin@v.loewis.de Mon Nov 26 22:04:45 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Mon, 26 Nov 2001 23:04:45 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: (message from Lars Marius Garshol on 26 Nov 2001 19:36:26 +0100) References: Message-ID: <200111262204.fAQM4jo01543@mira.informatik.hu-berlin.de> > More like finding out what's causing the problem for me and trying to > fix that, either locally or by contacting SF. It seems that not > everyone has this problem, so presumably something can be done about > it. Could you please elaborate the problems again? Maybe somebody can help: What browser, what URL, what phenomenon? > Savannah is in any case GPL-only, which rules out Python. Running a > separate server system is something the Python community has done for > several years, so if they're not keen on that they probably have good > reason not to be. Also, Savannah is SF 2.0, so any problem that you have with the current SF likely also happen with Savannah. Regards, Martin From larsga@garshol.priv.no Mon Nov 26 22:26:02 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 26 Nov 2001 23:26:02 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: <200111262204.fAQM4jo01543@mira.informatik.hu-berlin.de> References: <200111262204.fAQM4jo01543@mira.informatik.hu-berlin.de> Message-ID: * Lars Marius Garshol | | More like finding out what's causing the problem for me and trying | to fix that, either locally or by contacting SF. It seems that not | everyone has this problem, so presumably something can be done about | it. * Martin v. Loewis | | Could you please elaborate the problems again? Maybe somebody can help: | What browser, what URL, what phenomenon? The same thing happens in Opera, Mozilla, Netscape, lynx, w3m, wget, etc, even when I telnet from the command-line: [larsga@pc36 python]$ telnet www.sourceforge.net 80 Trying 216.136.171.196... Connected to www.sourceforge.net. Escape character is '^]'. GET / HTTP/1.0 Connection closed by foreign host. SourceForge simply refuses to talk to me; it closes the connection without sending anything at all back. This happens on absolutely *all* pages hosted on SourceForge, including www.saxproject.org. It works occasionally, then reverts to this behaviour. I just successfully downloaded a module from a different machine, using Lynx over SSH, so it has got to be a local problem of some kind. It might be my local provider, but if so this is the first problem of this kind I've had in about 6 months with the same provider. I'll try again tomorrow from work, and see if it's related to the machine, or to my local network at home. | Also, Savannah is SF 2.0, so any problem that you have with the | current SF likely also happen with Savannah. In any case, since only I have the problem, it's unreasonable to ask that we should switch servers. --Lars M. From martin@v.loewis.de Mon Nov 26 22:26:46 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Mon, 26 Nov 2001 23:26:46 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: (message from Alexandre Fayolle on Thu, 22 Nov 2001 18:32:35 +0100 (CET)) References: Message-ID: <200111262226.fAQMQkf01606@mira.informatik.hu-berlin.de> > How should the lack of prefix be represented? 4DOM uses '' for the prefix > of non-prefixed elements, whereas both PyExpat and xmlproc use None. The definition of prefix, on p.40 of http://www.w3.org/TR/2000/PR-DOM-Level-2-Core-20000927 is prefix of type DOMString [p.17] , introduced in DOM Level 2 The namespace prefix [p.103] of this node, or null if it is unspecified. So it should be null, which, in Python, means it should be None. That says nothing about SAX, of course. Regards, Martin From martin@v.loewis.de Mon Nov 26 22:28:01 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Mon, 26 Nov 2001 23:28:01 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: (message from Lars Marius Garshol on 26 Nov 2001 19:11:37 +0100) References: Message-ID: <200111262228.fAQMS1C01609@mira.informatik.hu-berlin.de> > Let's make sure we're fully aligned here, however. You are thinking of > the following cases, right? > > 1) Default namespace declaration reported through *PrefixMapping. > > 2) The values returned by the AttributesNS objects for prefix-less > names. > > Is this correct? There is also 3) value of prefix for elements having the default namespace, both in startElement, and in the Element node. Regards, Martin From martin@v.loewis.de Mon Nov 26 22:50:48 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Mon, 26 Nov 2001 23:50:48 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: (message from Lars Marius Garshol on 26 Nov 2001 23:26:02 +0100) References: <200111262204.fAQM4jo01543@mira.informatik.hu-berlin.de> Message-ID: <200111262250.fAQMomk01774@mira.informatik.hu-berlin.de> > SourceForge simply refuses to talk to me; it closes the connection > without sending anything at all back. This happens on absolutely *all* > pages hosted on SourceForge, including www.saxproject.org. It works > occasionally, then reverts to this behaviour. I see. Did you try using https? I've tried to look at recent bug reports or support requests, and could not find anything similar. If you want to, I can file one; please send the details in this case. I guess originating IP, time of access, exact request would help them to see whether it's on their side - not that the would likely act quickly on it. Regards, Martin From noreply@sourceforge.net Tue Nov 27 04:14:31 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 26 Nov 2001 20:14:31 -0800 Subject: [XML-SIG] [ pyxml-Patches-485882 ] Crash on valid XML Message-ID: Patches item #485882, was opened at 2001-11-26 20:14 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=485882&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Crash on valid XML Initial Comment: There is a bug in xmlproc.py:start_tag. In particular, self.pos=self.pos+1 # Skips the '<' name=self._get_name() self.skip_ws() .... if self.data[self.pos]!=">" and self.data[self.pos]!="/": This last line may get an IndexError since skip_ws may have advanced the cursor past the end of the available data. The attached patch fixes the problem. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=485882&group_id=6473 From noreply@sourceforge.net Tue Nov 27 04:17:22 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 26 Nov 2001 20:17:22 -0800 Subject: [XML-SIG] [ pyxml-Patches-485883 ] Crash on valid XML Message-ID: Patches item #485883, was opened at 2001-11-26 20:17 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=485883&group_id=6473 Category: DOM Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Mitchell (markmitchell) Assigned to: Nobody/Anonymous (nobody) Summary: Crash on valid XML Initial Comment: The start_tag method in xmlproc ocasionally crashes. Note that this method calls skip_ws and then accesses self.data[self.pos] without the safety of a try block. An IndexError can occur, which should result in an OutOfDataException, but does not. A patch is attached. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=485883&group_id=6473 From Alexandre.Fayolle@logilab.fr Tue Nov 27 08:46:15 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 27 Nov 2001 09:46:15 +0100 (CET) Subject: [XML-SIG] Re: [XML-checkins]CVS: xml/xml/xpath __init__.py,1.4,1.5 In-Reply-To: <200111262136.fAQLa4901461@mira.informatik.hu-berlin.de> Message-ID: On Mon, 26 Nov 2001, Martin v. Loewis wrote: > > added missing exception > > Can you please elaborate why this is necessary? SyntaxException cannot > occur in PyXML, since XPathParser isn't used. In a number of files in xml.xslt, xpath.SyntaxException is used. Importing these files without the patch will cause an ImportError or a NameError. Maybe the patch is not the most adapted fix to the problem, in which case I'll be glad to revert it and fix the bug differently. See also http://mail.python.org/pipermail/xml-sig/2001-November/006643.html for my original post about the bug. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Tue Nov 27 08:54:13 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 27 Nov 2001 09:54:13 +0100 (CET) Subject: [XML-SIG] dom 2 sax events In-Reply-To: <200111262142.fAQLgE701468@mira.informatik.hu-berlin.de> Message-ID: On Mon, 26 Nov 2001, Martin v. Loewis wrote: > > Does this interest somebody ? > > If people are interested, may be should we include it in pyxml ? > > If you are interested in distributing it through PyXML, and willing to > maintain it (as in: respond to user questions, fix bugs, perhaps write > documentation) - including it in PyXML is fine with me. Logilab will maintain the module. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Tue Nov 27 08:59:36 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 27 Nov 2001 09:59:36 +0100 (CET) Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: <200111262228.fAQMS1C01609@mira.informatik.hu-berlin.de> Message-ID: On Mon, 26 Nov 2001, Martin v. Loewis wrote: > > Let's make sure we're fully aligned here, however. You are thinking of > > the following cases, right? > > > > 1) Default namespace declaration reported through *PrefixMapping. > > > > 2) The values returned by the AttributesNS objects for prefix-less > > names. > > > > Is this correct? > > There is also > > 3) value of prefix for elements having the default namespace, > both in startElement, and in the Element node. Agreed. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Tue Nov 27 09:02:24 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Tue, 27 Nov 2001 10:02:24 +0100 (CET) Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: <200111262226.fAQMQkf01606@mira.informatik.hu-berlin.de> Message-ID: On Mon, 26 Nov 2001, Martin v. Loewis wrote: > > How should the lack of prefix be represented? 4DOM uses '' for the prefix > > of non-prefixed elements, whereas both PyExpat and xmlproc use None. > > The definition of prefix, on p.40 of > http://www.w3.org/TR/2000/PR-DOM-Level-2-Core-20000927 is > > prefix of type DOMString [p.17] , introduced in DOM Level 2 > The namespace prefix [p.103] of this node, or null if it is > unspecified. > > So it should be null, which, in Python, means it should be None. > > That says nothing about SAX, of course. OK. If we agree on this mapping, I think I can fix minidom and 4DOM by the end of the week. The impact on XPath and Xslt should be small. There are already some tests in the Printers to see if a ':' should be generated, so there again, there should be little impact. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From noreply@sourceforge.net Tue Nov 27 12:47:12 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 27 Nov 2001 04:47:12 -0800 Subject: [XML-SIG] [ pyxml-Patches-485982 ] setup.py catches wrong exception Message-ID: Patches item #485982, was opened at 2001-11-27 04:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=485982&group_id=6473 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Lars Marius Garshol (larsga) Assigned to: Martin v. Löwis (loewis) Summary: setup.py catches wrong exception Initial Comment: When checking for sys.byteorder, setup.py catches the wrong exception. Python 1.5.2 throws AttributeError, not NameError, and therefore the installer fails on Python 1.5.2. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=485982&group_id=6473 From larsga@garshol.priv.no Tue Nov 27 19:47:00 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 27 Nov 2001 20:47:00 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: <200111262250.fAQMomk01774@mira.informatik.hu-berlin.de> References: <200111262204.fAQM4jo01543@mira.informatik.hu-berlin.de> <200111262250.fAQMomk01774@mira.informatik.hu-berlin.de> Message-ID: * Lars Marius Garshol | | SourceForge simply refuses to talk to me; it closes the connection | without sending anything at all back. This happens on absolutely | *all* pages hosted on SourceForge, including www.saxproject.org. It | works occasionally, then reverts to this behaviour. * Martin v. Loewis | | I see. Did you try using https? Ah, good thinking. I did try that last night (after seeing your email), and that did work, even though everything else failed. Today, however, it works again. As if there had never been a problem in the first place. I have no idea why, but now I at least have a solution if the problem should come back. | I've tried to look at recent bug reports or support requests, and | could not find anything similar. If you want to, I can file one; | [...] Thanks for offering, but it doesn't seem to be necessary any more. --Lars M. (now happy again) From larsga@garshol.priv.no Tue Nov 27 19:49:14 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 27 Nov 2001 20:49:14 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: <200111262228.fAQMS1C01609@mira.informatik.hu-berlin.de> References: <200111262228.fAQMS1C01609@mira.informatik.hu-berlin.de> Message-ID: * Martin v. Loewis | | There is also | | 3) value of prefix for elements having the default namespace, | both in startElement, and in the Element node. The prefix has no independent parameter, so I don't see how this can apply to startElementNS. If there is no prefix the qname parameter just won't contain one. It will not be None, however, since the element type name will be there. Or am I as messed up as my HTTP connections? --Lars M. From martin@v.loewis.de Tue Nov 27 19:58:59 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Tue, 27 Nov 2001 20:58:59 +0100 Subject: [XML-SIG] Re: [XML-checkins]CVS: xml/xml/xpath __init__.py,1.4,1.5 In-Reply-To: (message from Alexandre Fayolle on Tue, 27 Nov 2001 09:46:15 +0100 (CET)) References: Message-ID: <200111271958.fARJwxY01408@mira.informatik.hu-berlin.de> > Maybe the patch is not the most adapted fix to the problem, in which case > I'll be glad to revert it and fix the bug differently. See also > http://mail.python.org/pipermail/xml-sig/2001-November/006643.html for my > original post about the bug. Associating SyntaxException with yappsrt.SyntaxError may be closer to the truth. Unfortunately, these two exceptions carry different parameters, so I don't really know what the best solution is. Regards, Martin From martin@v.loewis.de Tue Nov 27 20:01:52 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Tue, 27 Nov 2001 21:01:52 +0100 Subject: [XML-SIG] dom 2 sax events In-Reply-To: (message from Alexandre Fayolle on Tue, 27 Nov 2001 09:54:13 +0100 (CET)) References: Message-ID: <200111272001.fARK1qY01435@mira.informatik.hu-berlin.de> > Logilab will maintain the module. Excellent! Would like to commit it, or, Sylvain, would you like to get PyXML write access (in which case I'd need you SF account name)? Regards, Martin From martin@v.loewis.de Tue Nov 27 20:52:55 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Tue, 27 Nov 2001 21:52:55 +0100 Subject: [XML-SIG] gray areas in Python SAX API In-Reply-To: (message from Lars Marius Garshol on 27 Nov 2001 20:49:14 +0100) References: <200111262228.fAQMS1C01609@mira.informatik.hu-berlin.de> Message-ID: <200111272052.fARKqtm01597@mira.informatik.hu-berlin.de> > | 3) value of prefix for elements having the default namespace, > | both in startElement, and in the Element node. > > The prefix has no independent parameter, so I don't see how this can > apply to startElementNS. If there is no prefix the qname parameter > just won't contain one. It will not be None, however, since the > element type name will be there. > > Or am I as messed up as my HTTP connections? You are right, of course. prefix is relevant only on DOM Element, not in SAX handlers. Regards, Martin From m_mariappanX@trillium.com Wed Nov 28 07:25:50 2001 From: m_mariappanX@trillium.com (Mariappan, MaharajanX) Date: Tue, 27 Nov 2001 23:25:50 -0800 Subject: [XML-SIG] parsing xml schema Message-ID: <53A7943A5BD8D411B6930002A5073155013F60C2@bgsmsx90.iind.intel.com> Hi All, Still I cann't locate a valid parser library to parse the xml schema files, by seeing docs and mailing list arcives. Maharajan -----Original Message----- From: Rich Salz [mailto:rsalz@zolera.com] Sent: Monday, November 26, 2001 11:13 PM Cc: Martin v. Loewis; m_mariappanX@trillium.com; xml-sig@python.org Subject: Re: [XML-SIG] parsing xml schema > > Unless it's changed a lot in the past two months, I'd avoid XSV -- just > > building it was an adventure. :( You should also check out PyTrex, a > > tree-oriented validation language. > > Sorry you had difficulties -- I'd be happy to receive feedback on how > things didn't go well for you. Without such feedback, I'm handicapped > in trying to improve things. I sent detailed notes to Henry, in case anyone cares. /r$ -- Zolera Systems, Your Key to Online Integrity Securing Web services: XML, SOAP, Dig-sig, Encryption http://www.zolera.com From Sylvain.Thenault@logilab.fr Wed Nov 28 08:35:15 2001 From: Sylvain.Thenault@logilab.fr (Sylvain Thenault) Date: Wed, 28 Nov 2001 09:35:15 +0100 (CET) Subject: [XML-SIG] dom 2 sax events In-Reply-To: <200111272001.fARK1qY01435@mira.informatik.hu-berlin.de> Message-ID: On Tue, 27 Nov 2001, Martin v. Loewis wrote: > > Logilab will maintain the module. > > Excellent! Would like to commit it, or, Sylvain, would you like to get > PyXML write access (in which case I'd need you SF account name)? That's fine for me, my new SF user-id is 387525 (syt) -- Sylvain Thenault LOGILAB http://www.logilab.org From phillipg@prism.co.za Wed Nov 28 12:23:05 2001 From: phillipg@prism.co.za (Phillip Gibb) Date: Wed, 28 Nov 2001 14:23:05 +0200 Subject: [XML-SIG] problem installing PyXML-0.6.6 Message-ID: windows 2000 python2.2 problem when running python setup.py or python setup.py build or python setup.py install I get : python.exe - Entry Point Not Found The procedure entry point_PyUnicode_IsWhitespace counld not be located in the dynamic link library python22.dll tks Phill ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com ********************************************************************** From phillipg@prism.co.za Wed Nov 28 12:55:03 2001 From: phillipg@prism.co.za (Phillip Gibb) Date: Wed, 28 Nov 2001 14:55:03 +0200 Subject: [XML-SIG] PyXML-0.6.6.win32-py2.1.exe install problem Message-ID: I have installed python2.2b2 and b1 then when I try to install PyXML-0.6.6.win32-py2.1.exe and it asks me which python installation to use: it offers no choice and no option to enter a path to python. tks Phill ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com ********************************************************************** From Alexandre.Fayolle@logilab.fr Wed Nov 28 13:01:38 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 28 Nov 2001 14:01:38 +0100 (CET) Subject: [XML-SIG] PyXML-0.6.6.win32-py2.1.exe install problem In-Reply-To: Message-ID: On Wed, 28 Nov 2001, Phillip Gibb wrote: > I have installed python2.2b2 and b1 > > then when I try to install PyXML-0.6.6.win32-py2.1.exe and it asks me which > python installation to use: it offers no choice and no option to enter a > path to python. The py2.1 in the filename tells you that this installer is meant for python 2.1, and not 2.2. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle@logilab.fr Wed Nov 28 13:02:17 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Wed, 28 Nov 2001 14:02:17 +0100 (CET) Subject: [XML-SIG] problem installing PyXML-0.6.6 In-Reply-To: Message-ID: On Wed, 28 Nov 2001, Phillip Gibb wrote: > windows 2000 > python2.2 I'm not sure that python 2.2 is supported by PyXML. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From dkgunter@lbl.gov Wed Nov 28 15:04:20 2001 From: dkgunter@lbl.gov (Dan Gunter) Date: Wed, 28 Nov 2001 07:04:20 -0800 Subject: [XML-SIG] problem installing PyXML-0.6.6 References: Message-ID: <3C04FCF4.3050801@lbl.gov> For what it's worth, I installed PyXML-0.6.6 with Python 2.2b2 on Linux with no problems (well, from the tarball with no problems -- the rpm whined). Alexandre Fayolle wrote: > On Wed, 28 Nov 2001, Phillip Gibb wrote: > > >>windows 2000 >>python2.2 >> > > I'm not sure that python 2.2 is supported by PyXML. > > Alexandre Fayolle > From martin@v.loewis.de Wed Nov 28 16:44:29 2001 From: martin@v.loewis.de (Martin v. Loewis) Date: Wed, 28 Nov 2001 17:44:29 +0100 Subject: [XML-SIG] problem installing PyXML-0.6.6 In-Reply-To: (message from Alexandre Fayolle on Wed, 28 Nov 2001 14:02:17 +0100 (CET)) References: Message-ID: <200111281644.fASGiTp01316@mira.informatik.hu-berlin.de> > On Wed, 28 Nov 2001, Phillip Gibb wrote: > > > windows 2000 > > python2.2 > > I'm not sure that python 2.2 is supported by PyXML. It is, but you have to compile from source. Regards, Martin From m_mariappanX@trillium.com Fri Nov 30 10:28:07 2001 From: m_mariappanX@trillium.com (Mariappan, MaharajanX) Date: Fri, 30 Nov 2001 02:28:07 -0800 Subject: [XML-SIG] documentation on DOM objects Message-ID: <53A7943A5BD8D411B6930002A5073155013F60CE@bgsmsx90.iind.intel.com> Hi Folks, 1) Can we use DOM object to parse the xml file[written based on xml schema]? 2) where can I get more info on manipulating the DOM tree? with examples is best to understand. TIA, Maharajan From noreply@sourceforge.net Fri Nov 30 16:59:43 2001 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 30 Nov 2001 08:59:43 -0800 Subject: [XML-SIG] [ pyxml-Patches-487590 ] pDomlette removeAttributeNS typo Message-ID: Patches item #487590, was opened at 2001-11-30 08:59 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=487590&group_id=6473 Category: 4Suite Group: None Status: Open Resolution: None Priority: 5 Submitted By: Alexandre Fayolle (afayolle) Assigned to: Mike Olson (mikeolson) Summary: pDomlette removeAttributeNS typo Initial Comment: def removeAttributeNS(self,namespaceURI,localname): - node = self.getAttributeNodeNS(namespacesURI,localname) + node = self.getAttributeNodeNS(namespaceURI,localname) if node: self.removeAttributeNode(node) return ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=306473&aid=487590&group_id=6473 From Alexandre.Fayolle@logilab.fr Fri Nov 30 17:03:38 2001 From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle) Date: Fri, 30 Nov 2001 18:03:38 +0100 (CET) Subject: [XML-SIG] Node.prefix Message-ID: I've committed changes to 4DOM and minidom that should change no prefix form '' to None. I have not had time to check wether this has an impact on xml.xpath/xml.xslt. Cheers, Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Mike.Olson@fourthought.com Fri Nov 30 17:25:50 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Fri, 30 Nov 2001 10:25:50 -0700 Subject: [XML-SIG] Node.prefix References: Message-ID: <3C07C11E.37EFDA7E@fourthought.com> Alexandre Fayolle wrote: > > I've committed changes to 4DOM and minidom that should change no prefix > form '' to None. I have not had time to check wether this has an impact on > xml.xpath/xml.xslt. I've being working on this change on my branch. When I merge early next week xslt and xpath should work... Mike > > Cheers, > > Alexandre Fayolle > -- > LOGILAB, Paris (France). > http://www.logilab.com http://www.logilab.fr http://www.logilab.org > Narval, the first software agent available as free software (GPL). > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig -- Mike Olson Principal Consultant mike.olson@fourthought.com +1 303 583 9900 x 102 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, http://4Suite.org Boulder, CO 80301-2537, USA XML strategy, XML tools, knowledge management