[Expat-discuss] I need help with error message 'xml declaration not at start of external entity'

Steve Fogoros sfogoros at hsc.unt.edu
Mon Jun 23 17:30:29 CEST 2008


Thanks Nick,
 
I thought that too based on sections 2.1 and 2.8 of Extensible Markup
Language (XML) 1.0 (Second Edition) at
http://www.w3.org/TR/2000/REC-xml-20001006#NT-document . I was having
trouble with the single quotes around the XMLDecl declaration. I've
never seen that in a formal grammar and didn't want to assume it meant
that nothing comes before the prolog, if it exists.
 
I looked more closely and found section 2.4 of the specification does
address my question and I believe states that whitespace is allowed
before the XML specification:
 
Quoted from URL referenced above:
 
Text consists of intermingled character data and markup. [Definition:
Markup takes the form of start-tags, end-tags, empty-element tags,
entity references, character references, comments, CDATA section
delimiters, document type declarations, processing instructions, XML
declarations, text declarations, and any white space that is at the top
level of the document entity (that is, outside the document element and
not inside any other markup).]
Nick, do you read this the same way I do? And, in case I haven't
researched completely, has it been superceded in version 1.1?
 
Thanks again for validating my assumptions. I think I will pass this on
to the maintainers of expat
Steve Fogoros

>>> "Nick MacDonald" <nickmacd at gmail.com> 6/23/2008 7:52 AM >>>
Perhaps you're not reading the same XML spec I am, because to me it is
ABSOLUTELY clear that whitespace is not allowed to come before the XML
specification:

The primary rule states this:

document   ::=   prolog  element  Misc*

Note that a prolog is defined as so:

prolog   ::=   XMLDecl? Misc* (doctypedecl  Misc*)?

Which says the XMLDecl is optional, but if present, it would be defined
as so:

XMLDecl   ::=   '<?xml' VersionInfo  EncodingDecl? SDDecl? S? '?>'

and since whitespace (the term 'S' as used in this ruleset) does not
appear to be mentioned until the end of the XMLDecl, it makes it
pretty clear its not allowed at the beginning.

If you don't like this behaviour, you'd be better off lobbying the W3C
to change the spec, but as it stands, eXpat is quite clearly enforcing
the rules of validity for an XML document.

Note that the rules also make it quite clear that you don't HAVE to
have a XMLDecl, and thus without it your document can have as much
initial whitespace as makes you happy...  (I have tested this with
eXpat and it works fully as expected.)

Nick


On Fri, Jun 20, 2008 at 1:02 PM, Steve Fogoros <sfogoros at hsc.unt.edu>
wrote:
> I'm using the PHP module XML Parser under PHP version 4.4.0.
>
> I get the referenced error due to new lines before the preamble.
>
> I've searched the error message and reviewed the w3c spec for
> information on xml parsing. I haven't found anything that explicitly
> states what a parser should do about leading white space outside of
the
> xml document. I've also noted that there are many failures of this
type
> reported on WordPress and RSS feed forums. In all cases, the
correction
> seems to be altering the provider application to submit the xml
document
> without any leading characters before the preamble.
>
> My question is: does the xml spec explicitly specify that there be
> nothing other than the preamble at the beginning of a well formed
xml
> doc? Is this something that shoud/could be addressed in the parser
(it
> sure would eliminate a lot of failed implementations)?

-- 
Nick MacDonald
NickMacD at gmail.com 




** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. **


More information about the Expat-discuss mailing list