Unpythonic Python

David Abrahams dave at boost-consulting.com
Wed Aug 25 15:24:35 EDT 2004


Rob Williscroft <rtw at freenet.co.uk> writes:

> David Abrahams wrote in news:uy8k31as1.fsf at boost-consulting.com in
> comp.lang.python: 
>
>> Rob Williscroft <rtw at freenet.co.uk> writes:
>> 
>>> David Abrahams wrote in news:uzn4j2s38.fsf at boost-consulting.com in
>>> comp.lang.python: 
>>>
>>>>> That's not the problem.  I can download the file reliably from
>>>>> other machines. 
>>>
>>> At the same time, using http ?
>> 
>> I can download the file reliably using IE from my WinXP box.
>> 
>> I can download the file reliably using urllib from Cygwin Python 2.3.2
>> 
>> The 2nd element returned by urlretrieve is 
>
> Which version, the one that works or the one that doesn't ?
>
>> 
>>   'Date: Wed, 25 Aug 2004 14:50:17 GMT\r\nServer: Apache/2.0.40 (Red
>>   Hat Linux)\r\nLast-Modified: Wed, 25 Aug 20 2 GMT\r\nETag:

The one that works.

> Something is missing here:
>
>   Last-Modified: Wed, 25 Aug 20 2 GMT
>
> Contrast:
>
>   Wed, 25 Aug 2004 14:50:17 GMT

Where did that come from, what do you think is missing, and why?

>>   "b63d5b-20ec84b-18057e80"\r\nAccept-Ranges: bytes\r\nContent-Length:
>>   34523211\r\nContent-Type: n/x-bzip2\r\nConnection: close\r\n'
>
> 34 MB's ( I got 6 MB's )

It's 34MB.

>>>> Trying again with Python 2.3 on Cygwin.
>> 
>> As you can see from the above, it works.  Is there a known urllib bug
>> in earlier Pythons?
>
> Sorry I don't know, but I've seen the same truncation with no python,
> and no unix.

Argh.

>>> Is it possible the file is being (re) uploaded (via cvs) during your 
>>> cron job's download, thus truncating your download ?
>> 
>> I don't think so.
>
> Can you test wether or not this is happening ? I.e if you don't
> get the full 34523211 bytes re-download and compare the above
> Length, ETag and Last-Modified.
>

I did some tests, but didn't come up with anything conclusive.  I set
my cron job to start 3 hours later.  We'll see.

>>> Perhapse you should change to cvs:
>>>
>>>   os.system( 'cvs ... ' )
>> 
>> The problem with that is that I want to capture the whole CVS
>> history, not just today's state.
>
> I was suggesting you get the tarball via cvs, though presumably
> sourceforge don't give you the option. 

No they don't.

> http has the problem that
> the server will just truncate the download if the source file
> gets replaced.
>
>> 
>>> FWIW, I tried downlading with IE using the link above I got a
>>> truncated 6 and bit MB's (16:15 BST (UTC +0100)).
>>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> 
>> Sorry, what does that mean?  Did it show that message in a dialog,
>> or...?
>> 
>
> No, I got a download complete, but the file was only 6 MB's, bzip2 -t 
> told me the file was truncated, the (16:15 ...) is the time I tried
> downloading, BST = British Summer Time, though you wouldn't know it 
> from the weather :).
>
> Further I just ran:
>
> import urllib
>
> filename, headers = \
>     urllib.urlretrieve(
>         'http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2', 
>         'boost-cvsroot.tar.bz2')
>
> print filename
>
> print headers
>
> boost-cvsroot.tar.bz2
> Date: Wed, 25 Aug 2004 16:53:20 GMT
> Server: Apache/2.0.40 (Red Hat Linux)
> Last-Modified: Wed, 25 Aug 2004 14:14:02 GMT
> ETag: "b63d5b-20ec84b-18057e80"
> Accept-Ranges: bytes
> Content-Length: 34523211
> Content-Type: application/x-bzip2
> Connection: close
>
> The script ended at 17::59 BST, Note the difference bettween the two
> times in the headers, suggesting the file was modified 1:45 min's
> ago ~ the same time my attempted download with IE failed.

That's odd!  Your (failed) download modified the file being
downloaded??

-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com



More information about the Python-list mailing list