Web-crawling

John J. Lee jjl at pobox.com
Sat Oct 4 12:26:31 EDT 2003


"John Bradbury" <john_bradbury at ___cableinet.co.uk> writes:

> "Rene Pijlman" <reply.in.the.newsgroup at my.address.is.invalid> wrote in
> message news:bretnvcng69nqpoeug71jon4obs0moe63f at 4ax.com...
> > John Bradbury:
> > >I am trying to develop a special putpose crawler using htmllib & urllib.
> > >How do you tell the server application that you are a modern browser
> > >and can handle frames?
[...]
> > server would care, but you could mimic the User-agent header sent by a
[...]
> I don't know what is causing the problem, but the site I am accessing is
> sending out forms for a browser that has a low resolution and does not
> support frames.  Excuse my ignorance, but where do you set up the
> User-agent header you suggested.

For urllib2 (well, almost):

http://wwwsearch.sourceforge.net/ClientCookie/doc.html#headers


John




More information about the Python-list mailing list