Why is regex so slow?

Terry Reedy tjreedy at udel.edu
Tue Jun 18 17:29:23 EDT 2013


On 6/18/2013 4:30 PM, Grant Edwards wrote:
> On 2013-06-18, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Roy Smith <roy <at> panix.com> writes:
>>
>> You should read again on the O(...) notation. It's an asymptotic complexity,
>> it tells you nothing about the exact function values at different data points.
>> So you can have two O(n) routines, one of which always twice faster than the
>> other.

Or one that is a million times as fast.

> And you can have two O(n) routines, one of which is twice as fast for
> one value of n and the other is twice as fast for a different value of
> n (and that's true for any value of 'twice': 2X 10X 100X).
>
> All the O() tells you is the general shape of the line.  It doesn't
> tell you where the line is or how steep the slope is (except in the
> case of O(1), where you do know the slope is 0.  It's perfectly
> feasible that for the range of values of n that you care about in a
> particular application, there's an O(n^2) algorithm that's way faster
> than another O(log(n)) algorithm.

In fact, Tim Peters put together two facts to create the current list.sort.
1. O(n*n) binary insert sort is faster than O(n*logn) merge sort, with 
both competently coded in C, for n up to about 64. Part of the reason is 
that binary insert sort is actually O(n*logn) (n binary searches) + 
O(n*n) (n insertions with a shift averaging n/2 items). The multiplier 
for the O(n*n) part is much smaller because on modern CPUs, the shift 
needed for the insertion is a single machine instruction.
2. O(n*logn) sorts have a lower assymtotic complexity because they 
divide the sequence roughly in half about logn times. In other words, 
they are 'fast' because they split a list into lots of little pieces. So 
Tim's aha moment was to think 'Lets stop splitting when pieces are less 
than or equal to 64, rather than splitting all the way down to 1 or 2".

-- 
Terry Jan Reedy




More information about the Python-list mailing list