[Python-Dev] O(1) deletes from the front of bytearray (was: Re: Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor)

INADA Naoki songofacandy at gmail.com
Wed Oct 12 06:28:34 EDT 2016


>>
>> [1] My use case is parsing HTTP out of a receive buffer. If deleting
>> the first k bytes of an N byte buffer is O(N), then not only does
>> parsing becomes O(N^2) in the worst case, but it's the sort of O(N^2)
>> that random untrusted network clients can trigger at will to DoS your
>> server.
>
>
> Deleting from buffer can be avoided if pass the starting index together with
> the buffer. For example:
>
>     def read_line(buf: bytes, start: int) -> (bytes, int):
>         try:
>             end = buf.index(b'\r\n', start)
>         except ValueError:
>             return b'', start
>
>         return buf[start:end], end+2
>

In case of asyncio, we can't assume the order of append and consume.

For example, stream processing HTTP chunked response.
Append to receive buffer and consume a chunk in buffer can happen in
arbitrary order.
That's why bytes is not good for receive buffer. Efficient append is
"must have".

For example, Torando implements receive buffer by deque of bytes. See this code.
https://github.com/tornadoweb/tornado/blob/master/tornado/iostream.py#L784-L817

When Tornado drop Python 2.7 support, they can use bytearray, and
iostream can be more simple and fast.

So I hope "amortized O(1) deletion from the front" is language spec,
at least for Python 3.5+

-- 
INADA Naoki  <songofacandy at gmail.com>


More information about the Python-Dev mailing list