read() returns data of different sizes

jimgardener jimgardener at gmail.com
Sat Oct 2 07:58:33 EDT 2010


hi
while trying out urllib.urlopen ,I wrote this code to read a url and
return the data length

import datetime,time,urllib

def get_page_size(pageurlstr):
    h=urllib.urlopen(pageurlstr)
    data=h.read()
    return len(data)

    while True:
        print 'reading url www.google.com
at',datetime.datetime.now().isoformat(' ')
        print 'size=%d'%get_page_size('http://www.google.com')
        time.sleep(5)


I got this output

reading url www.google.com at 2010-10-02 17:22:24.691654
size=9512
reading url www.google.com at 2010-10-02 17:22:30.681236
size=9530
reading url www.google.com at 2010-10-02 17:22:36.886369
size=9530
reading url www.google.com at 2010-10-02 17:22:42.315392
size=9512
reading url www.google.com at 2010-10-02 17:22:48.763693
size=9512
reading url www.google.com at 2010-10-02 17:22:54.711666
size=9548
reading url www.google.com at 2010-10-02 17:23:00.151843
size=9530
reading url www.google.com at 2010-10-02 17:23:05.844620
size=9548


Why is it that the sizes are different?what must I do to ensure that
the whole page is read ?
thanks
jim



More information about the Python-list mailing list