Pulling out <TITLE></TITLE>

Brett Cannon bac at OCF.Berkeley.EDU
Sun Nov 18 23:45:44 EST 2001


You could just read each page and use a regex to fetch it:

title_value=re.search(r'<title>(?P<title>.*?)</title>',re.I)
title_value.group('title')

-Brett C.



On Sun, 18 Nov 2001, David A McInnis wrote:

> I am writing a script to catalog about 30,000 html pages on my site and need
> to pull out the value of <TITLE></TITLE>.
>
> I guess this is possible with htmllib, but I cannot figure it out.
>
> Thanks,
> David
>
>
>




More information about the Python-list mailing list