A technique from a chatbot

Mark Bourne nntp.mbourne at spamgourmet.com
Fri Apr 5 15:59:54 EDT 2024


Stefan Ram wrote:
> Mark Bourne <nntp.mbourne at spamgourmet.com> wrote or quoted:
>> I don't think there's a tuple being created.  If you mean:
>>      ( word for word in list_ if word[ 0 ]== 'e' )
>> ...that's not creating a tuple.  It's a generator expression, which
>> generates the next value each time it's called for.  If you only ever
>> ask for the first item, it only generates that one.
> 
>    Yes, that's also how I understand it!
> 
>    In the meantime, I wrote code for a microbenchmark, shown below.
> 
>    This code, when executed on my computer, shows that the
>    next+generator approach is a bit faster when compared with
>    the procedural break approach. But when the order of the two
>    approaches is being swapped in the loop, then it is shown to
>    be a bit slower. So let's say, it takes about the same time.

There could be some caching going on, meaning whichever is done second 
comes out a bit faster.

>    However, I also tested code with an early return (not shown below),
>    and this was shown to be faster than both code using break and
>    code using next+generator by a factor of about 1.6, even though
>    the code with return has the "function call overhead"!

To be honest, that's how I'd probably write it - not because of any 
thought that it might be faster, but just that's it's clearer.  And if 
there's a `do_something_else()` that needs to be called regardless of 
the whether a word was found, split it into two functions:
```
def first_word_beginning_with_e(target, wordlist):
     for w in wordlist:
         if w.startswith(target):
             return w
     return ''

def find_word_and_do_something_else(target, wordlist):
     result = first_word_beginning_with_e(target, wordlist)
     do_something_else()
     return result
```

>    But please be aware that such results depend on the implementation
>    and version of the Python implementation being used for the benchmark
>    and also of the details of how exactly the benchmark is written.
> 
> import random
> import string
> import timeit
> 
> print( 'The following loop may need a few seconds or minutes, '
> 'so please bear with me.' )
> 
> time_using_break = 0
> time_using_next = 0
> 
> for repetition in range( 100 ):
>      for i in range( 100 ): # Yes, this nesting is redundant!
> 
>          list_ = \
>          [ ''.join \
>            ( random.choices \
>              ( string.ascii_lowercase, k=random.randint( 1, 30 )))
>            for i in range( random.randint( 0, 50 ))]
> 
>          start_time = timeit.default_timer()
>          for word in list_:
>              if word[ 0 ]== 'e':
>                  word_using_break = word
>                  break
>          else:
>              word_using_break = ''
>          time_using_break += timeit.default_timer() - start_time
> 
>          start_time = timeit.default_timer()
>          word_using_next = \
>          next( ( word for word in list_ if word[ 0 ]== 'e' ), '' )
>          time_using_next += timeit.default_timer() - start_time
> 
>          if word_using_next != word_using_break:
>              raise Exception( 'word_using_next != word_using_break' )
> 
> print( f'{time_using_break = }' )
> print( f'{time_using_next = }' )
> print( f'{time_using_next / time_using_break = }' )
> 


More information about the Python-list mailing list