sgmllib and entityref handling in python2.0

Richard Brodie R.Brodie at rl.ac.uk
Tue May 22 05:39:58 EDT 2001


"Petar Karafezov" <petar at metamarkets.com> wrote in message
news:mailman.990470849.31270.python-list at python.org...

> This would match anything that looks like entityref and that ends on a
> non-alphabetic or numeric character. The W3C docs described that entityrefs
> end on ';' and so does the python docs when talking how SGMLParser works.

Python's behaviour is correct, the semicolon is optional in SGML, except to
resolve ambiguities. Browser parsing is rather flakey, so the W3C recommend
the semicolon, and made it mandatory in XML.

You see a lot of fragments like the one you quoted but they are incorrect,
and the W3C validator will choke on them. See also:

http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2






More information about the Python-list mailing list