Pulling out <TITLE></TITLE>

Bengt Richter bokr at accessone.com
Thu Nov 22 05:18:21 EST 2001


On Wed, 21 Nov 2001 23:42:51 -0800, Brett Cannon <bac at OCF.Berkeley.EDU> wrote:

>Could use negative lookahead and lookbehinds.  Another solution is to just
>strip out all comments from the HTML.  Probably wouldn't hurt, anyway,
>since it will probably increase performance slightly be cutting down on
>the amount of tags to deal with.
It's probably useful to not that there must be one legal <TITLE></TITLE>
in an html doc, and it must be in the <HEAD>/<HEAD> section. Once you've
found it, you're done.

But the real question is what the operational requirement is. Seems like
he is making a command line tool to generate an index (to put in DB?
to put into a monolithic HTML page indexing and inliking to all the pages?
to put in a hierarchical frame-wrapped version of that? etc?) of
HTML pages (all in a single directory? specified by a list of glob expressions?).

Or he might want a cron job and keep track of file mod dates to avoid reprocessing?
(Hm, wonder about using make cleverly)...

You never know what people are up to ;-)

>
>But it is also illegal syntax, I believe, to embed tags within a comment.
>
Nope. It's ok. I think the standard will move to the XML definition of
a comment, even though you probably will want to handle the illegal but
widely accepted error of including more than one '-' in a row between
the '<!--' and the '-->' (the '--' is illegal in XML for compatibility
with SGML).

>From the CML spec:
--
2.5 Comments

Comments may appear anywhere in a document outside other markup;in addition,
they may appear within the document type declaration at places allowed by the
grammar. They are not part of the document's character data; an XML processor
may, but need not, make it possible for an application to retrieve the text
of comments. For compatibility, the string "--" (double-hyphen) must not occur
within comments. 

Comments
[15] 
Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'


An example of a comment: 

  <!-- declarations for <head> & <body> -->
--

The example show's it's ok to embed tags in comments.



More information about the Python-list mailing list