Python and the need for speed

Brecht Machiels brecht__gmane at mos6581.org
Thu Apr 13 05:52:04 EDT 2017


Bah. My newsreader lost my reply when the WiFi connection dropped 
out... attempt #2.

On 2017-04-12 18:45:16 +0000, bart4858 at gmail.com said:

> On Wednesday, 12 April 2017 16:04:53 UTC+1, Brecht Machiels  wrote:
>> On 2017-04-12 14:46:45 +0000, Michael Torrie said:
> 
>> It would be great if you could run the benchmark I mention in my first> 
>> link and share the results. Highly appreciated!
> 
> Were you ever able to isolate what it was that's taking up most of the 
> time? Either in general or in the bit that pypy has trouble with. Or is 
> execution time spread too widely?

It's been a while since I last focused on performance, but the profile 
is still pretty flat. It's easy enough to verify (see also the URL 
referenced below):

    python -m cProfile -o demo.prof `which rinoh` -f restructuredtext demo.rst
    python -m pstats demo.prof

    demo.prof% strip
    demo.prof% sort tottime
    demo.prof% stats 15

Thu Apr 13 10:59:19 2017    demo.prof

         35193174 function calls (27868271 primitive calls) in 22.461 seconds

   Ordered by: internal time
   List reduced from 5499 to 15 due to restriction <15>

   ncalls       tottime  percall  cumtime  percall filename:lineno(function)
6020041/321084   2.812    0.000    2.884    0.000 layout.py:152(document_part)
   287201        1.211    0.000    6.156    0.000 style.py:645(match)
    98788        0.901    0.000    1.965    0.000 version.py:198(__init__)
419928/232734    0.751    0.000   17.332    0.000 util.py:109(function_wrapper)
   344783        0.588    0.000    1.198    0.000 style.py:319(match)
  1302467        0.534    0.000    0.840    0.000 style.py:438(__hash__)
128992/83504     0.459    0.000   15.477    0.000 
style.py:556(get_style_recursive)
1472251/1472250  0.399    0.000    0.469    0.000 {built-in method 
builtins.isinstance}
   701320        0.395    0.000    0.679    0.000 parse.py:18(reader)
306381/10913     0.389    0.000    6.546    0.000 style.py:757(find_matches)
89622/86768      0.368    0.000    2.126    0.000 style.py:369(match)
      176        0.311    0.002    0.840    0.005 parse.py:157(check_sum)
339968/10360     0.308    0.000    0.417    0.000 dimension.py:239(__float__)
    95312        0.301    0.000    0.347    0.000 version.py:343(_cmpkey)
     2642        0.288    0.000    3.380    0.001 __init__.py:792(resolve)

> (I looked at your project but it's too large, and didn't get much 
> further with the github benchmark, which requires me to subscribe, but 
> the .sh file extensions don't seem too promising to someone on Windows.)

GitHub benchmark? .sh file extensions?

You can easily run some benchmarks following the instructions here (pip 
install): 
https://bitbucket.org/pypy/pypy/issues/2365/rinohtype-much-slower-on-pypy3 


As I commented on that issue, I have been able to run the benchmarks 
using PyPy3 5.7.1 beta, which is now significantly faster than CPython. 
That's very promising!

> Your program seems to be to do with typesetting. Is it possible to at 
> least least quantity the work that is being done in terms of total 
> bytes (and total files) of input, and bytes of output? That might 
> enable comparisons with other systems executing similar tasks, to see 
> if the Python version is taking unreasonably long.

The Sphinx benchmark's source reStructuredText files add up to 584 KB. 
The output PDF file is almost 3 MB (includes fonts and images). Note 
that the input document is parsed into a document tree where each 
paragraph is represented by an object of the Paragraph class, 
containing StyledText objects and so on. The total memory used is about 
1 GB!

LaTeX is orders of magnitude faster, but requires multiple passes. It's 
memory usage is probably much less since it works stream-based.

Best regards,
Brecht




More information about the Python-list mailing list