simple download manager

Kiuhnm gandalf23 at mail.com
Tue Nov 4 19:38:45 EST 2014


On Tuesday, November 4, 2014 4:10:59 PM UTC+1, Kiuhnm wrote:
> On Tuesday, November 4, 2014 4:00:51 PM UTC+1, Chris Angelico wrote:
> > On Wed, Nov 5, 2014 at 1:53 AM, Kiuhnm <gandalf23 at mail.com> wrote:
> > > I wish to automate the downloading from a particular site which has some ADs and which requires to click on a lot of buttons before the download starts.
> > >
> > > What library should I use to handle HTTP?
> > > Also, I need to support big files (> 1 GB) so the library should hand the data to me chunk by chunk.
> > 
> > You may be violating the site's terms of service, so be aware of what
> > you're doing.
> > 
> > This could be a really simple job (just figure out what the last HTTP
> > query is, and replicate that), or it could be insanely complicated
> > (crypto, JavaScript, and/or timestamped URLs could easily be
> > involved). To start off, I would recommend not writing a single like
> > of Python code, but just pulling up Mozilla Firefox with Firebug, or
> > Google Chrome with in-built inspection tools, or some equivalent, and
> > watching the exact queries that go through. Once you figure out what
> > queries are happening, you can figure out how to do them in Python.
> > 
> > ChrisA
> 
> It'll be tricky. I'm sure of that, but if the browser can do it, so can I :)
> Fortunately, there are no captchas.

There are no captcha but the site is behind cloudflare (DDOS protection).
Anyway, I now know what to do. To deal with cloudflare's javascript challenge I'm going to use jsdb, a neat little javascript interpreter.
By the way, I'm using requests instead of urllib, but I need to figure out how to download and write to disk big files.



More information about the Python-list mailing list