From boris at codesynthesis.com Wed Jun 11 16:47:11 2008 From: boris at codesynthesis.com (Boris Kolpackov) Date: Wed, 11 Jun 2008 14:47:11 +0000 (UTC) Subject: [Expat-discuss] Namespace attributes w/ namespace processing enabled References: Message-ID: Hi Bob, "Bob Lee" writes: > Is there any way to tell Expat to still include namespace attributes when > namespace processing is enabled? I don't think there is an option to do this though you probably can fake it by providing the namespace declaration handlers (see XML_SetNamespaceDeclHandler) and reconstructing the namespace declaration attributes from there. Boris -- Boris Kolpackov, Code Synthesis Tools http://codesynthesis.com/~boris/blog Open source XML data binding for C++: http://codesynthesis.com/products/xsd Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde From crazybob at crazybob.org Wed Jun 11 17:36:24 2008 From: crazybob at crazybob.org (Bob Lee) Date: Wed, 11 Jun 2008 08:36:24 -0700 Subject: [Expat-discuss] Namespace attributes w/ namespace processing enabled In-Reply-To: References: Message-ID: That's what I figured. Thanks, Boris! Bob On Wed, Jun 11, 2008 at 7:47 AM, Boris Kolpackov wrote: > Hi Bob, > > "Bob Lee" writes: > > > Is there any way to tell Expat to still include namespace attributes when > > namespace processing is enabled? > > I don't think there is an option to do this though you probably > can fake it by providing the namespace declaration handlers (see > XML_SetNamespaceDeclHandler) and reconstructing the namespace > declaration attributes from there. > > Boris > > -- > Boris Kolpackov, Code Synthesis Tools > http://codesynthesis.com/~boris/blog > Open source XML data binding for C++: > http://codesynthesis.com/products/xsd > Mobile/embedded validating XML parsing: > http://codesynthesis.com/products/xsde > > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > From josel2820 at gmail.com Thu Jun 12 06:06:28 2008 From: josel2820 at gmail.com (Jose Luis _) Date: Wed, 11 Jun 2008 23:06:28 -0500 Subject: [Expat-discuss] Problem with (ubuntu-eclipse cdt) and Expat Message-ID: Hello: I am trying to use from (ubuntu - eclipse cdt) to expat but it does not recognize it, says that he does not find the directory of include "xmlparse.h" , i need to load a xml file from a class in c++ , if someone have examples, please can help me send it to josel2820 at gmail.com, thanks in advance. From altaf.navalur at gmail.com Tue Jun 17 08:16:29 2008 From: altaf.navalur at gmail.com (Altaf Navalur) Date: Tue, 17 Jun 2008 11:46:29 +0530 Subject: [Expat-discuss] Need help compiling version 1.95.4 on VS 2005 Message-ID: I am trying to compile Expat v 1.95.4 on PC. I am getting lot of errors when I compile the code. I have to use this version because I am porting a game and the game uses old version. If I replace old version with new one, the game crashes. Could any one please help with this. Thanks in advance, Altaf -- An eye for an eye only ends up making the whole world blind. From sfogoros at hsc.unt.edu Fri Jun 20 19:02:04 2008 From: sfogoros at hsc.unt.edu (Steve Fogoros) Date: Fri, 20 Jun 2008 12:02:04 -0500 Subject: [Expat-discuss] I need help with error message 'xml declaration not at start of external entity' References: <485B8F0D.C2A1.0037.0@hsc.unt.edu> Message-ID: <485B9C3B.C2A1.0037.0@hsc.unt.edu> Hello, I'm using the PHP module XML Parser under PHP version 4.4.0. I get the referenced error due to new lines before the preamble. I've searched the error message and reviewed the w3c spec for information on xml parsing. I haven't found anything that explicitly states what a parser should do about leading white space outside of the xml document. I've also noted that there are many failures of this type reported on WordPress and RSS feed forums. In all cases, the correction seems to be altering the provider application to submit the xml document without any leading characters before the preamble. My question is: does the xml spec explicitly specify that there be nothing other than the preamble at the beginning of a well formed xml doc? Is this something that shoud/could be addressed in the parser (it sure would eliminate a lot of failed implementations)? Thanks, Steve ** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. ** From nickmacd at gmail.com Mon Jun 23 14:52:12 2008 From: nickmacd at gmail.com (Nick MacDonald) Date: Mon, 23 Jun 2008 08:52:12 -0400 Subject: [Expat-discuss] I need help with error message 'xml declaration not at start of external entity' In-Reply-To: <485B9C3B.C2A1.0037.0@hsc.unt.edu> References: <485B8F0D.C2A1.0037.0@hsc.unt.edu> <485B9C3B.C2A1.0037.0@hsc.unt.edu> Message-ID: Perhaps you're not reading the same XML spec I am, because to me it is ABSOLUTELY clear that whitespace is not allowed to come before the XML specification: The primary rule states this: document ::= prolog element Misc* Note that a prolog is defined as so: prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? Which says the XMLDecl is optional, but if present, it would be defined as so: XMLDecl ::= '' and since whitespace (the term 'S' as used in this ruleset) does not appear to be mentioned until the end of the XMLDecl, it makes it pretty clear its not allowed at the beginning. If you don't like this behaviour, you'd be better off lobbying the W3C to change the spec, but as it stands, eXpat is quite clearly enforcing the rules of validity for an XML document. Note that the rules also make it quite clear that you don't HAVE to have a XMLDecl, and thus without it your document can have as much initial whitespace as makes you happy... (I have tested this with eXpat and it works fully as expected.) Nick On Fri, Jun 20, 2008 at 1:02 PM, Steve Fogoros wrote: > I'm using the PHP module XML Parser under PHP version 4.4.0. > > I get the referenced error due to new lines before the preamble. > > I've searched the error message and reviewed the w3c spec for > information on xml parsing. I haven't found anything that explicitly > states what a parser should do about leading white space outside of the > xml document. I've also noted that there are many failures of this type > reported on WordPress and RSS feed forums. In all cases, the correction > seems to be altering the provider application to submit the xml document > without any leading characters before the preamble. > > My question is: does the xml spec explicitly specify that there be > nothing other than the preamble at the beginning of a well formed xml > doc? Is this something that shoud/could be addressed in the parser (it > sure would eliminate a lot of failed implementations)? -- Nick MacDonald NickMacD at gmail.com From sfogoros at hsc.unt.edu Mon Jun 23 17:30:29 2008 From: sfogoros at hsc.unt.edu (Steve Fogoros) Date: Mon, 23 Jun 2008 10:30:29 -0500 Subject: [Expat-discuss] I need help with error message 'xml declaration not at start of external entity' In-Reply-To: References: <485B8F0D.C2A1.0037.0@hsc.unt.edu> <485B9C3B.C2A1.0037.0@hsc.unt.edu> Message-ID: <485F7B45.C2A1.0037.0@hsc.unt.edu> Thanks Nick, I thought that too based on sections 2.1 and 2.8 of Extensible Markup Language (XML) 1.0 (Second Edition) at http://www.w3.org/TR/2000/REC-xml-20001006#NT-document . I was having trouble with the single quotes around the XMLDecl declaration. I've never seen that in a formal grammar and didn't want to assume it meant that nothing comes before the prolog, if it exists. I looked more closely and found section 2.4 of the specification does address my question and I believe states that whitespace is allowed before the XML specification: Quoted from URL referenced above: Text consists of intermingled character data and markup. [Definition: Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, processing instructions, XML declarations, text declarations, and any white space that is at the top level of the document entity (that is, outside the document element and not inside any other markup).] Nick, do you read this the same way I do? And, in case I haven't researched completely, has it been superceded in version 1.1? Thanks again for validating my assumptions. I think I will pass this on to the maintainers of expat Steve Fogoros >>> "Nick MacDonald" 6/23/2008 7:52 AM >>> Perhaps you're not reading the same XML spec I am, because to me it is ABSOLUTELY clear that whitespace is not allowed to come before the XML specification: The primary rule states this: document ::= prolog element Misc* Note that a prolog is defined as so: prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? Which says the XMLDecl is optional, but if present, it would be defined as so: XMLDecl ::= '' and since whitespace (the term 'S' as used in this ruleset) does not appear to be mentioned until the end of the XMLDecl, it makes it pretty clear its not allowed at the beginning. If you don't like this behaviour, you'd be better off lobbying the W3C to change the spec, but as it stands, eXpat is quite clearly enforcing the rules of validity for an XML document. Note that the rules also make it quite clear that you don't HAVE to have a XMLDecl, and thus without it your document can have as much initial whitespace as makes you happy... (I have tested this with eXpat and it works fully as expected.) Nick On Fri, Jun 20, 2008 at 1:02 PM, Steve Fogoros wrote: > I'm using the PHP module XML Parser under PHP version 4.4.0. > > I get the referenced error due to new lines before the preamble. > > I've searched the error message and reviewed the w3c spec for > information on xml parsing. I haven't found anything that explicitly > states what a parser should do about leading white space outside of the > xml document. I've also noted that there are many failures of this type > reported on WordPress and RSS feed forums. In all cases, the correction > seems to be altering the provider application to submit the xml document > without any leading characters before the preamble. > > My question is: does the xml spec explicitly specify that there be > nothing other than the preamble at the beginning of a well formed xml > doc? Is this something that shoud/could be addressed in the parser (it > sure would eliminate a lot of failed implementations)? -- Nick MacDonald NickMacD at gmail.com ** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. ** From nickmacd at gmail.com Tue Jun 24 16:46:58 2008 From: nickmacd at gmail.com (Nick MacDonald) Date: Tue, 24 Jun 2008 10:46:58 -0400 Subject: [Expat-discuss] Fwd: I need help with error message 'xml declaration not at start of external entity' In-Reply-To: References: <485B8F0D.C2A1.0037.0@hsc.unt.edu> <485B9C3B.C2A1.0037.0@hsc.unt.edu> <485F7B45.C2A1.0037.0@hsc.unt.edu> Message-ID: [Sorry, forgot to copy the rest of the list] ---------- Forwarded message ---------- From: Nick MacDonald Date: Tue, Jun 24, 2008 at 10:46 AM Subject: Re: [Expat-discuss] I need help with error message 'xml declaration not at start of external entity' To: Steve Fogoros Steve: Careful to be sure you're reading what I wrote, and not what you want to see... the Backus?Naur Form rules in the current XML spec are very clear that whitespace is NOT ALLOWED where you wish they were. http://www.w3.org/TR/2006/REC-xml-20060816/ http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form The only rules you need to read are right here: 2.1 Well-Formed XML Documents [Definition: A textual object is a well-formed XML document if:] 1.Taken as a whole, it matches the production labeled document. 2. It meets all the well-formedness constraints given in this specification. 3. Each of the parsed entities which is referenced directly or indirectly within the document is well-formed. Document [1] document ::= prolog element Misc* [3] S ::= (#x20 | #x9 | #xD | #xA)+ [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '' [24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') [25] Eq ::= S? '=' S? [26] VersionNum ::= '1.0' [27] Misc ::= Comment | PI | S What you'd be looking for is a way to get some S (3) into the front of a valid expansion of document (1). I can't see any way where this can happen. The text elsewhere in the document is not really specifically applicable to your case... there is certainly nothing more normative than the explict BNF grammar provided, so you have to follow its rules. You're confused by single quotes in the BNF? Just assume they're double quotes if that's less confusing... the expectation is that they are literal strings showing the exact characters you would need to supply at that point in the grammar. Nothing is wrong with eXpat... it is acting exactly as the spec dictates. Nick On Mon, Jun 23, 2008 at 11:30 AM, Steve Fogoros wrote: > I thought that too based on sections 2.1 and 2.8 of Extensible Markup > Language (XML) 1.0 (Second Edition) at > http://www.w3.org/TR/2000/REC-xml-20001006#NT-document . I was having > trouble with the single quotes around the XMLDecl declaration. I've > never seen that in a formal grammar and didn't want to assume it meant > that nothing comes before the prolog, if it exists. > > I looked more closely and found section 2.4 of the specification does > address my question and I believe states that whitespace is allowed > before the XML specification: > > Quoted from URL referenced above: > > Text consists of intermingled character data and markup. [Definition: > Markup takes the form of start-tags, end-tags, empty-element tags, > entity references, character references, comments, CDATA section > delimiters, document type declarations, processing instructions, XML > declarations, text declarations, and any white space that is at the top > level of the document entity (that is, outside the document element and > not inside any other markup).] > Nick, do you read this the same way I do? And, in case I haven't > researched completely, has it been superceded in version 1.1? > > Thanks again for validating my assumptions. I think I will pass this on > to the maintainers of expat > Steve Fogoros > >>>> "Nick MacDonald" 6/23/2008 7:52 AM >>> > Perhaps you're not reading the same XML spec I am, because to me it is > ABSOLUTELY clear that whitespace is not allowed to come before the XML > specification: > > The primary rule states this: > > document ::= prolog element Misc* > > Note that a prolog is defined as so: > > prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? > > Which says the XMLDecl is optional, but if present, it would be defined > as so: > > XMLDecl ::= '' > > and since whitespace (the term 'S' as used in this ruleset) does not > appear to be mentioned until the end of the XMLDecl, it makes it > pretty clear its not allowed at the beginning. > > If you don't like this behaviour, you'd be better off lobbying the W3C > to change the spec, but as it stands, eXpat is quite clearly enforcing > the rules of validity for an XML document. > > Note that the rules also make it quite clear that you don't HAVE to > have a XMLDecl, and thus without it your document can have as much > initial whitespace as makes you happy... (I have tested this with > eXpat and it works fully as expected.) > > Nick > > > On Fri, Jun 20, 2008 at 1:02 PM, Steve Fogoros > wrote: >> I'm using the PHP module XML Parser under PHP version 4.4.0. >> >> I get the referenced error due to new lines before the preamble. >> >> I've searched the error message and reviewed the w3c spec for >> information on xml parsing. I haven't found anything that explicitly >> states what a parser should do about leading white space outside of > the >> xml document. I've also noted that there are many failures of this > type >> reported on WordPress and RSS feed forums. In all cases, the > correction >> seems to be altering the provider application to submit the xml > document >> without any leading characters before the preamble. >> >> My question is: does the xml spec explicitly specify that there be >> nothing other than the preamble at the beginning of a well formed > xml >> doc? Is this something that shoud/could be addressed in the parser > (it >> sure would eliminate a lot of failed implementations)? > > -- > Nick MacDonald > NickMacD at gmail.com > > > > > ** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. ** > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > -- Nick MacDonald NickMacD at gmail.com -- Nick MacDonald NickMacD at gmail.com From sfogoros at hsc.unt.edu Tue Jun 24 20:12:13 2008 From: sfogoros at hsc.unt.edu (Steve Fogoros) Date: Tue, 24 Jun 2008 13:12:13 -0500 Subject: [Expat-discuss] Fwd: I need help with error message 'xml declaration not at start of external en In-Reply-To: References: <485B8F0D.C2A1.0037.0@hsc.unt.edu> <485B9C3B.C2A1.0037.0@hsc.unt.edu> <485F7B45.C2A1.0037.0@hsc.unt.edu> Message-ID: <4860F2AD.C2A1.0037.0@hsc.unt.edu> Nick, Thank you for taking the time to help me work this out. I think in doing this kind of work, it's the nature of the beast to 'read what I want to see ...', and I am applying due diligence rather than have you do my work for me. I misinterpreted the quotes on production [23] as a formal declaration that between the ' and the < at the beginning of '', formally do not allow white space to exist as markup, yet, the Recommendation's descriptions on white space as markup within the document clearly allow and seem to encourage it (Section 2.10). Would it be true to say that current parsers that allow white space as markup within the document are not compliant with the recommendation since the BNF rule prohibits it? I've also found where the recommendation clearly states that "Each XML document has one entity called the document entity, which serves as the starting point for the XML processor and may contain the whole document." (Section 4), Section 2.4 describes white space as allowed at the top level of the document entity and specifically outside the document element (production [1]). I haven't found any description that specifically prohibits white space prior to XMLDoc. I trust I'm reading this correctly? And, so I am clear about why I am looking this close at the spec, we (UNT Health Science Center, Academic Information Services) are implementing application handling of XML data and ran into the issue of white space prior to XMLDoc failing the application. I can fix the application or the XML Parser, but which one I fix depends on which one is broken. So before I start changing the application, I want to make sure it isn't the XML Parser. But from what I'm finding in the recommendation, I think the parser needs to be fixed, or the recommendation should be more concise. I can go either way. Steve Fogoros >>> "Nick MacDonald" 6/24/2008 9:46 AM >>> Steve: Careful to be sure you're reading what I wrote, and not what you want to see... the Backus*Naur Form rules in the current XML spec are very clear that whitespace is NOT ALLOWED where you wish they were. http://www.w3.org/TR/2006/REC-xml-20060816/ http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form The only rules you need to read are right here: 2.1 Well-Formed XML Documents [Definition: A textual object is a well-formed XML document if:] 1.Taken as a whole, it matches the production labeled document. 2. It meets all the well-formedness constraints given in this specification. 3. Each of the parsed entities which is referenced directly or indirectly within the document is well-formed. Document [1] document ::= prolog element Misc* [3] S ::= (#x20 | #x9 | #xD | #xA)+ [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '' [24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') [25] Eq ::= S? '=' S? [26] VersionNum ::= '1.0' [27] Misc ::= Comment | PI | S What you'd be looking for is a way to get some S (3) into the front of a valid expansion of document (1). I can't see any way where this can happen. The text elsewhere in the document is not really specifically applicable to your case... there is certainly nothing more normative than the explict BNF grammar provided, so you have to follow its rules. You're confused by single quotes in the BNF? Just assume they're double quotes if that's less confusing... the expectation is that they are literal strings showing the exact characters you would need to supply at that point in the grammar. Nothing is wrong with eXpat... it is acting exactly as the spec dictates. Nick On Mon, Jun 23, 2008 at 11:30 AM, Steve Fogoros wrote: > I thought that too based on sections 2.1 and 2.8 of Extensible Markup > Language (XML) 1.0 (Second Edition) at > http://www.w3.org/TR/2000/REC-xml-20001006#NT-document . I was having > trouble with the single quotes around the XMLDecl declaration. I've > never seen that in a formal grammar and didn't want to assume it meant > that nothing comes before the prolog, if it exists. > > I looked more closely and found section 2.4 of the specification does > address my question and I believe states that whitespace is allowed > before the XML specification: > > Quoted from URL referenced above: > > Text consists of intermingled character data and markup. [Definition: > Markup takes the form of start-tags, end-tags, empty-element tags, > entity references, character references, comments, CDATA section > delimiters, document type declarations, processing instructions, XML > declarations, text declarations, and any white space that is at the top > level of the document entity (that is, outside the document element and > not inside any other markup).] > Nick, do you read this the same way I do? And, in case I haven't > researched completely, has it been superceded in version 1.1? > > Thanks again for validating my assumptions. I think I will pass this on > to the maintainers of expat > Steve Fogoros > >>>> "Nick MacDonald" 6/23/2008 7:52 AM >>> > Perhaps you're not reading the same XML spec I am, because to me it is > ABSOLUTELY clear that whitespace is not allowed to come before the XML > specification: > > The primary rule states this: > > document ::= prolog element Misc* > > Note that a prolog is defined as so: > > prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? > > Which says the XMLDecl is optional, but if present, it would be defined > as so: > > XMLDecl ::= '' > > and since whitespace (the term 'S' as used in this ruleset) does not > appear to be mentioned until the end of the XMLDecl, it makes it > pretty clear its not allowed at the beginning. > > If you don't like this behaviour, you'd be better off lobbying the W3C > to change the spec, but as it stands, eXpat is quite clearly enforcing > the rules of validity for an XML document. > > Note that the rules also make it quite clear that you don't HAVE to > have a XMLDecl, and thus without it your document can have as much > initial whitespace as makes you happy... (I have tested this with > eXpat and it works fully as expected.) > > Nick > > > On Fri, Jun 20, 2008 at 1:02 PM, Steve Fogoros > wrote: >> I'm using the PHP module XML Parser under PHP version 4.4.0. >> >> I get the referenced error due to new lines before the preamble. >> >> I've searched the error message and reviewed the w3c spec for >> information on xml parsing. I haven't found anything that explicitly >> states what a parser should do about leading white space outside of > the >> xml document. I've also noted that there are many failures of this > type >> reported on WordPress and RSS feed forums. In all cases, the > correction >> seems to be altering the provider application to submit the xml > document >> without any leading characters before the preamble. >> >> My question is: does the xml spec explicitly specify that there be >> nothing other than the preamble at the beginning of a well formed > xml >> doc? Is this something that shoud/could be addressed in the parser > (it >> sure would eliminate a lot of failed implementations)? > > -- > Nick MacDonald > NickMacD at gmail.com > > > > > ** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. ** > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > -- Nick MacDonald NickMacD at gmail.com -- Nick MacDonald NickMacD at gmail.com _______________________________________________ Expat-discuss mailing list Expat-discuss at libexpat.org http://mail.libexpat.org/mailman/listinfo/expat-discuss ** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. ** From nickmacd at gmail.com Wed Jun 25 15:15:12 2008 From: nickmacd at gmail.com (Nick MacDonald) Date: Wed, 25 Jun 2008 09:15:12 -0400 Subject: [Expat-discuss] Fwd: I need help with error message 'xml declaration not at start of external en In-Reply-To: <4860F2AD.C2A1.0037.0@hsc.unt.edu> References: <485B8F0D.C2A1.0037.0@hsc.unt.edu> <485B9C3B.C2A1.0037.0@hsc.unt.edu> <485F7B45.C2A1.0037.0@hsc.unt.edu> <4860F2AD.C2A1.0037.0@hsc.unt.edu> Message-ID: Steve: I don't want to sound pedantic, but when it comes to spec's and most especially BNF grammars, I think you have to be. So with the mindset in mind, I think you need to think "what would happen if I lifted the BNF grammar out of the XML spec, put into a code generation tool and made my own XML parser from that generated code?" By my read, if you did that, you would get the behaviour that eXpat exhibits. (See [1] below.) (All that follows is simply my opinion, as I interpret the XML spec... I am not a member of any XML spec writing organization.) I want to be clear that the declaration "tag" is not just another XML tag like any other in your document, it is very special... its more of a file signature than anything else, thus the reason why it needs to appear very specifically at the beginning of the file. (See [2] below.) Whitespace and XML comments are allowed at any point in the body of the document, but the body of the document does not start until AFTER the XML declaration... Its important to note that you do not need to include the XML declaration on the file, but if you chose too, it needs to be the very first characters in the file, no whitespace or comments before it. If it is omitted, then any whitespace or comments are allowed in the rest of the XML document, and so could appear at the top of a file without the XML declaration. Perhaps your applications that produce white space before the XML declaration could simply omit the declaration? (Or you might have a special program that pre-formats the files, and fixes them up to make them compliant?) I hope this makes it clear... your applications are what appears to be broken... not the XML spec, and not eXpat. Nick [1] I have no information on what tool, if any was used to actually produce eXpat... in all honesty I have barely ever looked at the actual eXpat source... I am a true user in this case... I use it cause it solves my problems... and its never done anything to me to make me think it was broken, so I never looked into its internals. In fact, I have never ran into your specific case before, because I am the type of XML user who uses XML as human generated input (usually config or command files) and I always copy from a template file that has always just worked. That file has no space at the beginning... but when I added it to try a test case based on your question, I saw the same behaviour you're questioning. [2] On Unix, there is the "file" command, and it looks for file signatures. It is configured by being told what and where to look for things, and if it finds them, that is how it "types" a file. If the file could have a random amount of whitespace at the beginning, this would make it harder to identify an XML file using this tool. On Tue, Jun 24, 2008 at 2:12 PM, Steve Fogoros wrote: > Nick, > > Thank you for taking the time to help me work this out. I think in doing > this kind of work, it's the nature of the beast to 'read what I want to see > ...', and I am applying due diligence rather than have you do my work for > me. > > I misinterpreted the quotes on production [23] as a formal declaration that > between the ' and the < at the beginning of ' characters. While this is in fact true, it doesn't mean what I thought it > meant in terms of no white space before XMLDoc. It is simply the system > literal of the start of the XMLDoc declaration (as you said). > > I agree the BNF rules should be all that is needed, but I find it is > inconsistent with the Recommendation's descriptions regarding whitespace. In > example, production [39] element ::= EmptyElemTag | STag content ETag, and > production [40] STag ::= '<' Name (S Attribute)* S? '>', formally do not > allow white space to exist as markup, yet, the Recommendation's descriptions > on white space as markup within the document clearly allow and seem to > encourage it (Section 2.10). Would it be true to say that current parsers > that allow white space as markup within the document are not compliant with > the recommendation since the BNF rule prohibits it? > > I've also found where the recommendation clearly states that "Each XML > document has one entity called the document entity, which serves as the > starting point for the XML processor and may contain the whole document." > (Section 4), Section 2.4 describes white space as allowed at the top level > of the document entity and specifically outside the document element > (production [1]). > > I haven't found any description that specifically prohibits white space > prior to XMLDoc. > > I trust I'm reading this correctly? > > And, so I am clear about why I am looking this close at the spec, we (UNT > Health Science Center, Academic Information Services) are implementing > application handling of XML data and ran into the issue of white space prior > to XMLDoc failing the application. I can fix the application or the XML > Parser, but which one I fix depends on which one is broken. So before I > start changing the application, I want to make sure it isn't the XML Parser. > But from what I'm finding in the recommendation, I think the parser needs to > be fixed, or the recommendation should be more concise. I can go either way. > > Steve Fogoros > >>>> "Nick MacDonald" 6/24/2008 9:46 AM >>> > Steve: > > Careful to be sure you're reading what I wrote, and not what you want > to see... the Backus-Naur Form rules in the current XML spec are very > clear that whitespace is NOT ALLOWED where you wish they were. > > http://www.w3.org/TR/2006/REC-xml-20060816/ > http://en.wikipedia.org/wiki/Backus? Naur_Form > > The only rules you need to read are right here: > > 2.1 Well-Formed XML Documents > [Definition: A textual object is a well-formed XML document if:] > 1.Taken as a whole, it matches the production labeled document. > 2. It meets all the well-formedness constraints given in this specification. > 3. Each of the parsed entities which is referenced directly or > indirectly within the document is well-formed. > > Document > [1] document ::= prolog element Misc* > > [3] S ::= (#x20 | #x9 | #xD | #xA)+ > > [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? > [23] XMLDecl ::= ' SDDecl? S? '?>' > [24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | > '"' > VersionNum '"') > [25] Eq ::= S? '=' S? > [26] VersionNum ::= '1.0' > [27] Misc ::= Comment | PI | S > > What you'd be looking for is a way to get some S (3) into the front of > a valid expansion of document (1). I can't see any way where this can > happen. The text elsewhere in the document is not really specifically > applicable to your case... there is certainly nothing more normative > than the explict BNF grammar provided, so you have to follow its > rules. You're confused by single quotes in the BNF? Just assume > they're double quotes if that's less confusing... the expectation is > that they are literal strings showing the exact characters you would > need to supply at that point in the grammar. > > Nothing is wrong with eXpat... it is acting exactly as the spec dictates. > > Nick > > > On Mon, Jun 23, 2008 at 11:30 AM, Steve Fogoros > wrote: >> I thought that too based on sections 2.1 and 2.8 of Extensible Markup >> Language (XML) 1.0 (Second Edition) at >> http://www.w3.org/TR/2000/REC-xml-20001006#NT-document . I was having >> trouble with the single quotes around the XMLDecl declaration. I've >> never seen that in a formal grammar and didn't want to assume it meant >> that nothing comes before the prolog, if it exists. >> >> I looked more closely and found section 2.4 of the specification does >> address my question and I believe states that whitespace is allowed >> before the XML specification: >> >> Quoted from URL referenced above: >> >> Text consists of intermingled character data and markup. [Definition: >> Markup takes the form of start-tags, end-tags, empty-element tags, >> entity references, character references, comments, CDATA section >> delimiters, document type declarations, processing instructions, XML >> declarations, text declarations, and any white space that is at the top >> level of the document entity (that is, outside the document element and >> not inside any other markup).] >> Nick, do you read this the same way I do? And, in case I haven't >> researched completely, has it been superceded in version 1.1? >> >> Thanks again for validating my assumptions. I think I will pass this on >> to the maintainers of expat >> Steve Fogoros >> >>>>> "Nick MacDonald" 6/23/2008 7:52 AM >>> >> Perhaps you're not reading the same XML spec I am, because to me it is >> ABSOLUTELY clear that whitespace is not allowed to come before the XML >> specification: >> >> The primary rule states this: >> >> document ::= prolog element Misc* >> >> Note that a prolog is defined as so: >> >> prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? >> >> Which says the XMLDecl is optional, but if present, it would be defined >> as so: >> >> XMLDecl ::= '' >> >> and since whitespace (the term 'S' as used in this ruleset) does not >> appear to be mentioned until the end of the XMLDecl, it makes it >> pretty clear its not allowed at the beginning. >> >> If you don't like this behaviour, you'd be better off lobbying the W3C >> to change the spec, but as it stands, eXpat is quite clearly enforcing >> the rules of validity for an XML document. >> >> Note that the rules also make it quite clear that you don't HAVE to >> have a XMLDecl, and thus without it your document can have as much >> initial whitespace as makes you happy... (I have tested this with >> eXpat and it works fully as expected.) >> >> Nick >> >> >> On Fri, Jun 20, 2008 at 1:02 PM, Steve Fogoros >> wrote: >>> I'm using the PHP module XML Parser under PHP version 4.4.0. >>> >>> I get the referenced error due to new lines before the preamble. >>> >>> I've searched the error message and reviewed the w3c spec for >>> information on xml parsing. I haven't found anything that explicitly >>> states what a parser should do about leading white space outside of >> the >>> xml document. I've also noted that there are many failures of this >> type >>> reported on WordPress and RSS feed forums. In all cases, the >> correction >>> seems to be altering the provider application to submit the xml >> document >>> without any leading characters before the preamble. >>> >>> My question is: does the xml spec explicitly specify that there be >>> nothing other than the preamble at the beginning of a well formed >> xml >>> doc? Is this something that shoud/could be addressed in the parser >> (it >>> sure would eliminate a lot of failed implementations)? >> >> -- >> Nick MacDonald >> NickMacD at gmail.com >> >> >> >> >> ** Confidentiality Notice: This e-mail and any files transmitted with it >> are confidential to the extent permitted by law and intended solely for the >> use of the individual or entity to whom they are addressed. If you have >> received this e-mail in error please notify the originator of the message >> and destroy all copies. ** >> _______________________________________________ >> Expat-discuss mailing list >> Expat-discuss at libexpat.org >> http://mail.libexpat.org/mailman/listinfo/expat-discuss >> > > > > -- > Nick MacDonald > NickMacD at gmail.com > > > > -- > Nick MacDonald > NickMacD at gmail.com > _______________________________________________ > Expat-discuss mailing list > Expat-discuss at libexpat.org > http://mail.libexpat.org/mailman/listinfo/expat-discuss > > > > ** Confidentiality Notice: This e-mail and any files transmitted with it are > confidential to the extent permitted by law and intended solely for the use > of the individual or entity to whom they are addressed. If you have received > this e-mail in error please notify the originator of the message and destroy > all copies. ** > -- Nick MacDonald NickMacD at gmail.com From maxim2000 at tut.by Mon Jun 16 19:04:57 2008 From: maxim2000 at tut.by (=?windows-1251?Q?=CC.=C2.?=) Date: Mon, 16 Jun 2008 17:04:57 -0000 Subject: [Expat-discuss] Parsing copyright symbol Message-ID: Hello! I've a trouble with eXpat2.0.1. I'm sure my question is stupid, but I need the answer :) So, I ask you to help. The problem is to read/write a copyright symbol (code 0xae) from/to xml file. I tried to use CDATA section, but it wasn't helpful. XmlContentTok() returns XML_TOK_INVALID. Should I use utf16 instead of utf8? But UTF16 is not desirable for me... May be some extra defines or handlers? Thanks in advance, Maxim P.S. please, answer to maxim2000 at tut.by ------ ?????? ?? ???????? ???????! ?????????????? ???????????????? ? ??. ?????. ??????????? ???????????: '?????? ?? ???????????? ????????????? ??????????', '?????? ????????????'. ??????????, (017)287-66-97, http://www.belrosbank.by From sfogoros at hsc.unt.edu Fri Jun 20 18:05:49 2008 From: sfogoros at hsc.unt.edu (Steve Fogoros) Date: Fri, 20 Jun 2008 11:05:49 -0500 Subject: [Expat-discuss] I need help with error message 'xml declaration not at start of external entity' Message-ID: <485B8F0D.C2A1.0037.0@hsc.unt.edu> Hello, I'm using the PHP module XML Parser under PHP version 4.4.0. I get the referenced error due to new lines before the preamble. I've searched the error message and reviewed the w3c spec for information on xml parsing. I haven't found anything that explicitly states what a parser should do about leading white space outside of the xml document. I've also noted that there are many failures of this type reported on WordPress and RSS feed forums. In all cases, the correction seems to be altering the provider application to submit the xml document without any leading characters before the preamble. My question is: does the xml spec explicitly specify that there be nothing other than the preamble at the beginning of a well formed xml doc? Is this something that shoud/could be addressed in the parser (it sure would eliminate a lot of failed implementations)? Thanks, Steve ** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. **