prefix search on a large file

Thu Oct 12 13:23:34 EDT 2006

js  wrote:

> By eliminating list cloning, my function got much faster than before.
> I really appreciate you, John.
> 
> def prefixdel_recursively2(alist):
>     if len(alist) < 2:
> return alist
> 
>     first = alist.pop(0)
>     unneeded = [no for no, line in enumerate(alist) if
>     line.startswith(first)] adjust=0
>     for i in unneeded:
>         del alist[i+adjust]
> adjust -= 1
> 
>     return [first] + prefixdel_recursively(alist)
> 
> 
> process stime
> prefixdel_stupidly         : 11.9247150421
> prefixdel_recursively   : 14.6975700855
> prefixdel_recursively2 : 0.408113956451
> prefixdel_by_john        : 7.60227012634

Those are suspicious results. Time it again with number=1, or a fresh copy
of the data for every iteration.

I also have my doubts whether sorting by length is a good idea. To take it
to the extreme: what if your data file contains an empty line?

Peter