using regex for password validation

Wed Dec 23 17:41:15 EST 2020

On 24/12/2020 06:03, Sadaka Technology wrote:
> hello guys,
> 
> I have this pattern for password validation (regex):
> 
> I want these rules to be applied:
> 
> Minimum 8 characters.
> The alphabets must be between [a-z]
> At least one alphabet should be of Upper Case [A-Z]
> At least 1 number or digit between [0-9].
> At least 1 character from [ _ or @ or $ ].
> 
> and this pattern:
> 
> passwordpattern = "^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$])[A-Za-z\d@$!%?&]{8,}.$"
> 
> my only issue is that I want to add the symbol () and symbol(.) in the pattern where only it accepts $ and @, I tried adding generally like [@_$] not working

A quick web.search reveals, quite evidently, loads of people attempt to 
solve this problem with ever more-powerful RegExs. (and ever more 
perplexing questions on SO, etc)

There's something seductive about RegEx-s to the average ComSc student. 
The challenge of wielding such control, so concisely. APL or Lisp 
programming anyone? I recall positively-devouring Jeff Friedl's book - 
with expectations of 'changing the world'...

[back down to earth] These days I seldom use them (NB ActiveState do?did 
a (recommended) 'cheat sheet', a copy of which resides in my desk-file 
as crib-notes)

Contrarily, a RegEx may be quite the wrong tool for the job. Partially 
because such expressions are difficult to understand - either someone 
else's code or my own from the proverbial six-months back(!); and 
partially here we're attempting to solve multiple problems in one go.

(I'm writing this from the perspective of 'Apprentice' professionals or 
a ComSc student - with any/all due apologies and respect to the OP)

There is much virtue in saying that every Python routine should solve 
one problem (and only one!), and do that well. Similarly, the scientific 
method as applied to software development is to break each problem into 
smaller, more manageable problems (per ardua) - and thus, more 
recognisable solutions (and we're back to me banging-the-drum of 
readability).

Here's the problem-solution:

     def validate_password( attempt:str )->bool:
         ...

(Oh yeah, wow!)

Obviously(!) this (larger) routine will contain more (smaller, more 
manageable) routines. We can follow the format, exactly as outlined in 
the specification (or homework assignment, as appropriate):

 > Minimum 8 characters.

     def validate_length( rule:int, attempt:str )->bool:

 > The alphabets must be between [a-z]

     def validate_lower_case( attempt:str )->bool:
         # see note, below

 > At least one alphabet should be of Upper Case [A-Z]

     def validate_upper_case( attempt:str )->bool:
         # also, see note, below

 > At least 1 number or digit between [0-9].

     def validate_numeric( attempt:str )->bool:
         # again, see note, below

 > At least 1 character from [ _ or @ or $ ].

     def validate_specials( rule:Set, attempt:str )->bool:

There were five specifications, so there are five (sub) routines, called 
in-turn by validate_password() (a "decision ladder") - with a fast-drop, 
should you wish.

Hang-on though, look at how much 'work' is involved, compared with a 
single line of RegEx! Why go to such bother? There's several reasons.

Notice how the spec has become code? "Readability" is not merely the 
appearance and communication-quality of one's code, but the transfer of 
ideas across levels, or layers, of detail!

Notice that the above have a parameter "rule". Why?
(and that's not (only) the question: "why don't we encode these as 
constants within the function?")

If you've 'been around' for a while, you will have noticed that password 
rules keep changing, over time, presuming that becoming more 'strict', 
will make the system more secure.
(am not going to discuss the hope of solving (largely) social problems 
with technological solutions!)

What would be the impact of a 'make it strict-er' business-rule change 
(specification) on the one-line RegEx solution? Persisting with the 
long-way around:-

A frequent call is to increase the minimum-length of passwords. How 
could we do this? Using RegEx, adjust the counter - but which part is 
the 'counter'?

Alternately, here, reading the code we find validate_length() (or the 
documentation where "rule" is defined/given its value) and change the 
value of the integer. QED!
(and by "QED" I mean: this is a job which could be given to the newest 
of Junior Programmers, with a high confidence of (rapid) success)

Similarly, in the above structure, validate_specials() expects to be 
given a 'rule' which is currently:

     { '[', '_', '@', '$', ']', }

How easy would it be to add another character, eg "#" or "€"; when your 
system goes international and is being used by folk with 
European-variant keyboards? Is extending the set easier (and more likely 
to retain fidelity) than fiddling with a RegEx?

[and here's the note]

If our ambitions include dreams of 'world domination', then we can 
extend exactly the same idea of "rule" to the other three routines! 
Whilst we 'start' with (say) the ASCII character definitions of a-z, we 
will *be able* to extend into accented characters such as "ô"  - which 
really would promote us to take a rôle on the world-stage.
(hah!)

 From a 'business'/users' perspective: note that the five routines could 
easily be extended should further business-rules be demanded.

 From a code-perspective: note that either "in" could be used to detect 
presence (but not so easily 'count', eg minimum of two lower-case 
letters), or we could persist with RegEx solution(s) - in which case, by 
breaking the larger problem into smaller ones, the RegEx 
complexity-level falls to that which mere-mortals may comprehend!

Now let's stoop to allowing that although programmers are *the* most 
important people in the world, we should consider our users - just once 
in a while. Thus, lets consider UX ("User Experience"). Most of us hate 
situations when we are asked to furnish a password but 'get it wrong' - 
particularly those sites which fail to tell us 'the rules' BEFORE we 
offer our preference!

If we're going to be nice to our users, from where do we express these 
"rules"? If the rule is hard-coded, then the user-advice must also be 
hard-coded - and what do we say about having 'the same code' in multiple 
locations? (see also "DRY principle"). How could one state "the rules" 
*once*, and in such a fashion that they can be used for UX output and a 
RegEx?

Second UX-consideration (and its a 'biggie'!): if a password 'fails', 
how can we take the 'result' from a large and complex RegEx, and explain 
to the user which [multiple] of the five requirements was/were not met? 
A failure in the RegEx above tells the system not to proceed, but 
doesn't tell the user is a letter is missing, a digit, ...

RegEx is extremely powerful, but 'power' is seductive - just because we 
can do something doesn't make it a good idea! The Spiderman rule applies...
-- 
Regards =dn