Intermittent bug with asyncio and MS Edge

Barry Scott barry at barrys-emacs.org
Wed Mar 25 10:31:13 EDT 2020



> On 25 Mar 2020, at 06:12, Frank Millman <frank at chagford.com> wrote:
> 
> On 2020-03-24 8:39 PM, Barry Scott wrote:
>>> On 24 Mar 2020, at 11:54, Frank Millman <frank at chagford.com> wrote:
>>> 
>>> 
>>> I decided to concentrate on using Wireshark to detect the difference between a Python3.7 session and a Python3.8 session. Already I can see some differences.
>>> 
>>> There is only one version of my program. I am simply running it with either 'py -3.7 ' or 'py -3.8'. And I am ignoring Chrome at this stage, as it is only that Edge shows the problem.
>>> 
>>> First point - Python3.7 also shows a lot of [RST, ACK] lines. My guess is that this is caused by my 'protocol violation' of sending a 'Keep-Alive' header and then closing the connection. Python3.7 does not suffer from dropping files, so I now think this is a sidetrack. I will fix my program when this is all over, but for now I don't want to touch it.
>> Yes your protocol violation is why you see [RST, ACK].
>> I'm confused you know that the code has a critical bug in it and you have not fixed it?
>> Just send "Connection: close" and I'd assume all will work.
> 
> Well, the reason is simply that I wanted to understand why my code that worked all the way from 3.4 through 3.7 stopped working in 3.8. I realise that my code is faulty, but I still wanted to know what the trigger was that caused the bug to appear.

Got it, I'd not picked up on you wishes to find the details of why.

> 
> From my testing with Wireshark, I can see that both Edge and Chrome create 20 connections to GET 20 files. The difference seems to be that Chrome does not attempt to re-use a connection, even though both client and server have sent Keep-Alive headers. Edge does attempt to re-use the connection.

Chrome will reuse the connections if you load enough files from the same server.
All browser do this to get a page displayed as fast as possible.
Because HTTP is half-duplex its the round trip time rather then bandwidth that controls
the time taken get all the asserts needed to render a page most of the time.

> 
> The difference between 3.7 and 3.8 is that 3.7 sends the data in separate packets for the status, each header, and then each chunk, whereas 3.8 sends the whole lot in a single packet.

That will change the timing enough to expose the problem.

I often find with network code bugs that its changes that makes the
timing different that expos bugs in supposed tested and working code.

In this case you could track down what was the cause, but it's often the case that it's not practical to do that.
You may not know what changed between when the code worked and when it broke in many cases.

> My guess is that 3.7 is slower to send the files, so Edge starts up all 20 connections before it has finished receiving the first one, whereas with 3.8, by the time it has opened a few connections the first file has been received, so it tries to re-use the same connection to receive the next one. By then I have closed the connection. If I am right, it is surprising that my program worked *some* of the time.

I have lost count of the number of times when we have found a bug in some code that we say "how did it ever work?".

> 
> The same reasoning would explain why it worked when connecting from a remote host. There would be enough delay to force it into the same behaviour as 3.7.

Yep and if there are any middle boxes, routers, man-in-middle-proxies, etc the timing/packets get pushed around.

> 
> It has been an interesting ride, and I have learned a lot. I will now look into fixing my program. The easy fix is to just send 'Connection: Close', but I will do it properly and implement 'Keep-Alive'.

Barry

> 
> Thanks all
> 
> Frank
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 



More information about the Python-list mailing list