[Tutor] Python strings example

Tue Apr 27 04:47:12 EDT 2021

On 27/04/2021 14.49, Manprit Singh wrote:
> Coming to my question again. what i want to discuss is,
> 
> If you look at Peter's solution, you can see it is using two generator
> expressions(I mean to say there is a utilization of two for loops that
> increases complexity) what practices should be adopted  in such cases ? the
> one done by Peter or the one done by me ?
> need your guidance.

Not really!
IMHO there is no one, and only one, correct answer - because you must
first define "correct". Both of the ideas presented are correct in terms
of output-produced. What else is there?

There are various ways to look at "complexity", lets take:
- is it easy to read (programmer cognition)
- execution efficiency (the time or other resources required)

(this is not a very exact example, but I hope illustrates the point)
The purpose of generators is a "lazy" production of a range. Under
Python2 range( 5000 ) would require five-thousand values be stored.
Under Python3 the values are only calculated and made available upon
request; and thus (almost) don't have any storage requirement (we used
to call this "xrange()"). Which is more "complex" - the one which
requires more storage resources, or the one which seems harder for the
programmer to understand?
(YMMV!)

In the first case, and as said before, what is acceptable depends upon
your team. If you include Apprentices, then the level of complexity
should be lower than if you are all Masters. Perhaps then you, ie the
team conducting a Code Review, will prefer the more obvious loop to the
programmatically more-succinct twin-comprehensions?

Execution efficiency is easily measured in this case. (The
storage-required is the length of the string. So, no 'big deal'!) You
are perfectly capable of building a 'harness' to repeatedly execute a
function/body-of-code and divide the number of repetitions by the
time-required (or vice-versa).

Again IMHO, @Peter's code is not particularly difficult to follow, so I
(personally) wouldn't red-flag it - even on-behalf of others. Instead,
if there are 'Apprentices' to be considered, I'd suggest it as a
learning-experience...

Accordingly, let's address the thought @Peter raised. Perhaps you should
consider ways to compare methods (algorithms) - preferably before any
code is written. Are you aware of "Big O Notation"? (web.ref below)

Thinking about it, the first example code runs all the way through the
string of characters. If the data is bound to fail, eg "Abhay", then we
must consider every character, regardless. In your other example,
"Abhay2", it is also necessary to complete a 100% review - but that is a
factor of the data-chosen. (let's come back to that in a moment)

Because the algorithm works logically, character-by-character through
the input-string, the time required  is described as "linear".
Execution-time will be (basically) a factor of the number of characters
in the string. This concept is sometimes written "O(n)".

However, as soon as we change the data, eg to "Ab2hay", do we need to
inspect every character?

No! Because, in this case, by the third character we have satisfied both
requirements. Now, when we consider a statistical distribution of
letters and digits, we realise that it is only necessary to look at an
average of half of the number of characters in the input-string, ie O(
n/2 ).

The twin-comprehensions will only run for as long as it takes to find
the first letter, plus as long as it takes to find the first digit,
(respectively). This works-out to almost the same O( n/2 ) - as
observed, by definition some characters will be inspected more than once!

Herewith a revision of the original code to O( n/2 ):

def password_check( candidate:str )->bool:

    alpha_found = False
    digit_found = False
    loops = 0

    for character in candidate:
        if character.isalpha():
            alpha_found = True
        if character.isdigit():
            digit_found = True
        loops += 1
        if alpha_found and digit_found:
            break

    return alpha_found and digit_found, loops

print( password_check( "Abhay2" ) )

print( password_check( "Abhay" ) )

print( password_check( "Ab2hay" ) )

Output:-

(True, 6)
(False, 5)
(True, 3)

Notes:

You will have realised that the tuple o/p is purely to illustrate the
number of loops necessary to reach approval/failure.

On the subject of cognitive-complexity; please also note this 'grumpy,
old, man' has refactored to use more easily-recognised names!
(more readable == less complex !!!)

Perhaps you could build a speed-test harness, and report
results/comparisons (using a wider range of data-samples).

Does this implementation offer benefits/simplicity under both headings
of "complexity", ie is it both easy to read and as elegant as the
twin-comprehension's efficiency?

In the case of short strings like these, I have to say any
efficiency-gain will be all-but negligible. However, if you were working
with larger volumes of data, such tactics (algorithm classifications)
come into their own!

My expectation ('gut feeling') is that the C-code 'behind' the
generators will execute more quickly than the code above.

Care to run a three-way comparison and report-back? Thereafter, let's
return to the question of "complexity" and hear what you think and why...

Web.Ref:
Algorithm Time Complexity and Big O Notation
By Stuart Kuredjian - March 17, 2017
https://www.linux.com/training-tutorials/algorithm-time-complexity-and-big-o-notation-0/

-- 
Regards,
=dn