[Python-Dev] io.BytesIO slower than monkey-patching io.RawIOBase

Eli Bendersky eliben at gmail.com
Tue Jul 17 05:34:14 CEST 2012


While working on #1767933, Serhiy came up with an observation that
"monkey-patching" one of the base classes of io is faster than using
BytesIO when in need of a file-like object for writing into.

I've distilled it into this standalone test:

import io

data = [b'a'*10, b'bb'*5, b'ccc'*5] * 10000

def withbytesio():
    bio = io.BytesIO()
    for i in data:
        bio.write(i)
    return bio.getvalue()

def monkeypatching():
    mydata = []
    file = io.RawIOBase()
    file.writable = lambda: True
    file.write = mydata.append

    for i in data:
        file.write(i)
    return b''.join(mydata)

The second approach is consistently 10-20% faster than the first one
(depending on input) for trunk Python 3.3

Is there any reason for this to be so? What does BytesIO give us that the
second approach does not (I tried adding more methods to the patched
RawIOBase to make it more functional, like seekable() and tell(), and it
doesn't affect performance)?

This also raises a "moral" question - should I be using the second approach
deep inside the stdlib (ET.tostring) just because it's faster?

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120717/922d5d0d/attachment.html>


More information about the Python-Dev mailing list