[pypy-dev] Unicode encode/decode speed (cont)

Mon Feb 18 19:59:10 CET 2013

So, iter(file).next() is slow?

Alex

On Mon, Feb 18, 2013 at 10:51 AM, Amaury Forgeot d'Arc
<amauryfa at gmail.com>wrote:

> 2013/2/18 Eleytherios Stamatogiannakis <estama at gmail.com>
>
>> On 18/02/13 18:44, Maciej Fijalkowski wrote:
>>
>>> On Mon, Feb 18, 2013 at 6:20 PM, Eleytherios Stamatogiannakis
>>> <estama at gmail.com> wrote:
>>>
>>>> We have found another (very simple) madIS query where PyPy is around
>>>> 250x
>>>> slower that CPython:
>>>>
>>>> CPython: 314msec
>>>> PyPy: 1min 16sec
>>>>
>>>> The query if you would like to test it yourself is the following:
>>>>
>>>> select  count(*)  from   (file  'some_big_text_file.txt' limit 100000);
>>>>
>>>> To run it you'll need some big text file containing at least 100000 text
>>>> lines (we have run above query with a very big XML file). You can also
>>>> run
>>>> above query with a lower limit (the behaviour will be the same) as such:
>>>>
>>>> select  count(*)  from   (file  'some_big_text_file.txt' limit 10000);
>>>>
>>>> Be careful for the file to not have a csv, tsv, json, db or gz ending
>>>> because a different code path inside the "file" operator will be taken
>>>> than
>>>> the one for simple text files.
>>>>
>>>> l.
>>>>
>>>>
>>>> ______________________________**_________________
>>>> pypy-dev mailing list
>>>> pypy-dev at python.org
>>>> http://mail.python.org/**mailman/listinfo/pypy-dev<http://mail.python.org/mailman/listinfo/pypy-dev>
>>>>
>>>
>>> Hey
>>>
>>> I would be incredibly convinient if you can change it to be a
>>> standalone benchmark (say reading large string from a file and
>>> decoding it in a whole or in pieces);
>>>
>>>
>> As it involves SQLite, CFFI and Python, it is very hard to extract the
>> full execution path that madIS goes through even in a simple query like
>> this.
>>
>> Nevertheless we extracted a part of the pure Python execution path, and
>> PyPy is around 50% slower than CPython:
>>
>> CPython: 21 sec
>> PyPy: 33 sec
>>
>> The full madIS execution path involves additional CFFI calls and
>> callbacks (from SQLite) to pass the data to SQLite.
>>
>> To run the test.py:
>>
>> test.py big_text_file
>>
>
> Most of the time is spent in file iteration.
> I added
>     f = f.read().splitlines()
> and the query is almost instant.
>
>
> --
> Amaury Forgeot d'Arc
>
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> http://mail.python.org/mailman/listinfo/pypy-dev
>
>

-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20130218/9507d864/attachment.html>