avro slow?

Dan Stromberg drsalists at gmail.com
Thu May 5 18:36:52 EDT 2011


On Thu, May 5, 2011 at 2:12 PM, Miki Tebeka <miki.tebeka at gmail.com> wrote:

> Greetings,
>
> I'm reading some data from avro file using the avro library. It takes about
> a minute to load 33K objects from the file. This seem very slow to me,
> specially with the Java version reading the same file in about 1sec.
>

You might want to try an apache mailing list, like at
http://avro.apache.org/mailing_lists.html , as I suspect most Python people
use Python's native pickle support instead.

It looks like the Python version of Avro is doing single-byte-at-a-time I/O
for some types, which is almost guaranteed to perform poorly.  If you're
decoding an 8 byte integer, its much faster to at least read 8 bytes and
then chop that up, and better still is to read a buffer at a time and chop
that up too.

Even in C, the performance of byte-at-a-time I/O is not going to be stellar,
especially if you use read() rather than fread().

A related note: Python is often more about programmer efficiency than
machine efficiency.  With cost per MIPS going down and the price of
programmer time going up, it seems a good idea.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110505/33c513ef/attachment-0001.html>


More information about the Python-list mailing list