optimization pointers?
Mel Wilson
mwilson at the-wire.com
Thu Dec 11 17:17:15 EST 2003
In article <Bz5Cb.7881$_r6.5116 at newsread1.news.pas.earthlink.net>,
"r.e.s." <r.s at XXmindspring.com> wrote:
>Would anyone care to offer pointers on how the following code
>might be optimized? Or alternatives? ...
>
>---
>def lz(string):
> """ Return the Lempel-Ziv complexity (LZ78) of string. """
>
> vocabulary = []
> word = ''
> for char in string:
> word += char
> if word not in vocabulary:
> vocabulary.append(word)
> word = ''
> return len(vocabulary)
>---
>
>On my 2 GHz system running PythonWin, the above code averages
>about 1.3 hrs to process a 1 MB string. I'd hoped to compute
>LZ78 for multi-GB files, but it seems infeasible. (?)
One simple improvement is to get rid of
if an_item not in a_list
which does a linear search over the list. A dictionary is faster.
Without a full benchmark, it seems that appending
def lz_complexity (s):
voc = {}
word = ''
for c in s:
word += c
if not voc.get (word, False):
voc[word] = True
word = ''
return len (voc.keys())
if __name__ == '__main__':
import time
text = file ('lzc.py', 'rt').read()
t0 = time.clock()
t0 = time.clock() - t0
c = lz (text)
t1 = time.clock()
c = lz (text)
t1 = time.clock() - t1
c = lz_complexity (text)
t2 = time.clock()
c = lz_complexity (text)
t2 = time.clock() - t2
print 'T1:', t1 - t0
print 'T2:', t2 - t0
will show about a x6 speedup.
Avoiding
a_string += another_string
is a traditional speedup, but it seems you'll be needing the
string form for every character processed, so it may not buy
much in this case.
Regards. Mel.
More information about the Python-list
mailing list