create lowercase strings in lists - was: (No subject)

Fri Dec 17 03:09:47 EST 2004

Bengt Richter wrote:
> On Fri, 17 Dec 2004 02:06:01 GMT, Steven Bethard <steven.bethard at gmail.com> wrote:
> 
> 
>>Michael Spencer wrote:
>>
>>> ...     conv = "".join(char.lower() for char in text if char not in 
>>>unwanted)
>>
>>Probably a good place to use str.replace, e.g.
>>
>>conv = text.lower()
>>for char in unwanted:
>>    conv = conv.replace(char, '')
>>
>>Some timings to support my assertion: =)
>>
>>C:\Documents and Settings\Steve>python -m timeit -s "s = 
>>''.join(map(str, range(100)))" "s = ''.join(c for c in s if c not in '01')"
>>10000 loops, best of 3: 74.6 usec per loop
>>
>>C:\Documents and Settings\Steve>python -m timeit -s "s = 
>>''.join(map(str, range(100)))" "for c in '01': s = s.replace(c, '')"
>>100000 loops, best of 3: 2.82 usec per loop
>>
Well, sure, if it's just speed, conciseness and backwards-compatibility that you 
want ;-)

> 
> If unwanted has more than one character in it, I would expect unwanted as
> deletechars in
> 
>  >>> help(str.translate)
>  Help on method_descriptor:
> 
>  translate(...)
>      S.translate(table [,deletechars]) -> string
> 
>      Return a copy of the string S, where all characters occurring
>      in the optional argument deletechars are removed, and the
>      remaining characters have been mapped through the given
>      translation table, which must be a string of length 256.
> 
> to compete well, if table setup were for free
> (otherwise, UIAM, table should be ''.join([chr(i) for i in xrange(256)])
> for identity translation, and that might pay for a couple of .replace loops,
> depending).
> 
> Regards,
> Bengt Richter
Good point - and there is string.maketrans to set up the table too.  So 
normalize can be rewritten as:

def normalize1(text, unwanted = "()", table = maketrans("","")):
     text = text.lower()
     text.translate(table,unwanted)
     return set(text.split())

which gives:
 >>> t= timeit.Timer("normalize1('(UPPER CASE) lower case')", "from listmembers 
import normalize1")
  >>> t.repeat(3,10000)
[0.29812783468287307, 0.29807782832722296, 0.3021370034462052]	

But, while we're at it, we can use str.translate to do the case conversion too:

So:

def normalize2(text, unwanted = "()", table = 
maketrans(ascii_uppercase,ascii_lowercase)):
     text.translate(table,unwanted)
     return set(text.split())

  >>> t= timeit.Timer("normalize2('(UPPER CASE) lower case')", "from listmembers 
import normalize2")
  >>> t.repeat(3,10000)
[0.24295154831133914, 0.24174497038029585, 0.25234855267899547]

...which is a little faster still

Thanks for the comments: they were interesting for me - hope some of this is 
useful to OP

Regards

Michael