[Tutor] can I walk or glob a website?
Albert-Jan Roskam
fomcl at yahoo.com
Wed May 18 19:39:51 CEST 2011
________________________________
From: Alan Gauld <alan.gauld at btinternet.com>
To: tutor at python.org
Sent: Wed, May 18, 2011 4:40:19 PM
Subject: Re: [Tutor] can I walk or glob a website?
"Dave Angel" <davea at ieee.org> wrote
>> "Albert-Jan Roskam" <fomcl at yahoo.com> wrote
>>> How can I walk (as in os.walk) or glob a website?
>>
>> I don't think there is a way to do that via the web.
> It has to be (more or less) possible. That's what google does for their search
>engine.
Google trawls the site following links. If thats all he wants then its fairly
easy.
I took it he wanted to actually trawl the server getting *all* the pdf files not
just the published pdfs...
Depends what the real requirement is.
===> No, I meant only the published ones. I would consider it somewhat
dodgy/unethical/whatever-you-wanna-call-it to download unpublished stuff. Indeed
I only need published data.
Alan G.
_______________________________________________
Tutor maillist - Tutor at python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110518/2922fa50/attachment.html>
More information about the Tutor
mailing list