Efficiency of UserString

Oliver Hofmann a2619725 at uni-koeln.de
Sun Mar 18 12:17:04 EST 2001


This buffer is for notes you don't want to save, and for Lisp evaluation.
If you want to create a file, visit that file with C-x C-f,
then enter the text in that file's own buffer.

'lo everyone!


Got a question regarding UserString's efficiency. I am parsing text
and would like to store a word's position (sentence, absolute position)
somewhere; this would allow turning the document into a list of
words, then removing stopwords or stemming the word and still having
the information about it's original position in the document.

Figured instead of creating several lists to store and update that info
I could put it into the string itself by using UserString; however
the Python library reference states:

"It should be noted that these classes are highly inefficient compared 
to real string or Unicode objects; this is especially the case for
MutableString."

I've tried two basic operations below but couldn't spot much of a 
difference. Could someone please point me at an error I've made or
tell me which operations are slower with UserString?

----

import UserString
import profile, pstats

text = """ The compound ppp(A2p)3A3[32P]pCp is a commercially available
radioactive analogue of the 2,5 oligoadenylate series ppp(A2p)nA, n
greater than or equal to 2, commonly referred to as 2-5A. It is used
as a probe for measuring concentrations in competition radiobinding and
radioimmune assays. We have found that incubation of the probe with
extracts from HeLa, CV1, or neuroblastoma cells results in its covalent
attachment to two size classes of RNA: the first includes a major species
with a molecular weight of approximately 350,000, the second is much
smaller (40 +/- 5 nucleotides in length) and could represent tRNA
half-molecules. Ligation is to the 3 end of the probe molecule with
formation of a 3,5-phosphodiester bond. Thus, probe ligation provides a
sensitive and convenient assay for the detection not only of RNA
ligase(s) but also of ligatable RNAs (such as the putative tRNA
half-molecules) in mammalian cell extracts. """

text = ' '.join(text.split())
text2 = UserString.UserString(text)


def fun1():
    global text
    
    for a in range(0, 1000):
        text = ' '.join(text.split())
        
def fun2():
    global text
    
    for a in range(0, 1000):
        text.find('neuro')
        
def fun3():
    global text2
    
    for a in range(0, 1000):
        text2 = ' '.join(text2.split())
        
def fun4():
    global text2
    
    for a in range(0, 1000):
        text2.find('neuro')

def main():
    for a in range(0, 100):
        fun1()
        fun2()
        fun3()
        fun4()
    
    
profile.run('main()', 'profile.tmp')
p = pstats.Stats('profile.tmp')
p.sort_stats('cumulative').print_stats(10)

----

Here are the stats:


         404 function calls in 80.010 CPU seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   80.010   80.010 <string>:1(?)
        1    0.000    0.000   80.010   80.010 profile:0(main())
        1    0.000    0.000   80.010   80.010 test3.py:48(main)
      100   39.050    0.390   39.050    0.390 test3.py:24(fun1)
      100   39.040    0.390   39.040    0.390 test3.py:36(fun3)
      100    0.960    0.010    0.960    0.010 test3.py:42(fun4)
      100    0.960    0.010    0.960    0.010 test3.py:30(fun2)


Many thanks once again!


    Oliver

--
Oliver Hofmann  -  University of Cologne  -  Department of Biochemistry
   o.hofmann at smail.uni-koeln.de - setar at gmx.de - connla at thewell.com

   If you care, you just get disappointed all the time. If you don't care
nothing matters so you are never upset.   -- Calvin




More information about the Python-list mailing list