Urllib2: Only a partial page retrieved

hpsMouse hpsmouse at gmail.com
Sun May 23 05:19:40 EDT 2010


On 5月22日, 下午5时43分, Dragon Lord <dragonlord... at gmail.com> wrote:
> The cutoff is allways at the same location: just after the label
> "Meeting date" and before the date itself. Could it be that something
> is interpreted as and eof command or something like that?
>
> example of the cutoff point with a bad page:
> <br/><b>Meeting Date: </b>
>
> example of the cutoff point with a good page:
> <br/><b>Meeting Date: </b>

I checked TCP packages, and found that the remote HTTP server send a
data package with flag "PUSH", causing the client to close connection.
That is exactly where the "Meeting Date: </b>" appears.
This seems not to be a bug for python, because Qt and telnet both
failed in my test, so did the wget program...
Most browsers use keep-alive HTTP, so the connection won't be closed.
I think that's why a browser show the page correctly.



More information about the Python-list mailing list