optimization pointers?

Anton Vredegoor anton at vredegoor.doge.nl
Thu Dec 11 17:51:40 EST 2003


"r.e.s." <r.s at XXmindspring.com> wrote:

>Would anyone care to offer pointers on how the following code
>might be optimized?  Or alternatives?  ...
>
>---
>def lz(string):
>    """ Return the Lempel-Ziv complexity (LZ78) of string. """
>
>    vocabulary = []
>    word = ''
>    for char in string:
>        word += char
>        if word not in vocabulary:
>            vocabulary.append(word)
>            word = ''
>    return len(vocabulary)
>---
>
>On my 2 GHz system running PythonWin, the above code averages 
>about 1.3 hrs to process a 1 MB string.  I'd hoped to compute 
>LZ78 for multi-GB files, but it seems infeasible. (?)

On a 300 mhz celeron the script below takes about ten or twenty
seconds, the original lz is commented out as it never seems to finish
for longer strings.

Anton

p.s. the file I used is here:
http://sailor.gutenberg.org/etext97/1donq10.zip

import sets

def lz(string):
    """ Return the Lempel-Ziv complexity (LZ78) of string. """

    vocabulary = []
    word = ''
    for char in string:
        word += char
        if word not in vocabulary:
            vocabulary.append(word)
            word = ''
    return len(vocabulary)

def lzx(string):
    """ Return the Lempel-Ziv complexity (LZ78) of string. """

    vocabulary = sets.Set()
    word = ''
    for char in string:
        word += char
        if word not in vocabulary:
            vocabulary.add(word)
            word = ''
    return len(vocabulary)

def test():
    fn = '1donq10.txt'
    s = file(fn,'rb').read(1000000)
    print lzx(s)
    #print lz(s)


if __name__=='__main__':
    test()









More information about the Python-list mailing list