Template language for random string generation

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Aug 10 12:31:01 EDT 2014


Devin Jeanpierre wrote:

> On Fri, Aug 8, 2014 at 2:01 AM, Paul Wolf <paulwolf333 at gmail.com> wrote:
>> This is a proposal with a working implementation for a random string
>> generation template syntax for Python. `strgen` is a module for
>> generating random strings in Python using a regex-like template language.
>> Example:
>>
>>     >>> from strgen import StringGenerator as SG
>>     >>> SG("[\l\d]{8:15}&[\d]&[\p]").render()
>>     u'F0vghTjKalf4^mGLk'
> 
> Why aren't you using regular expressions? I am all for conciseness,
> but using an existing format is so helpful...

You've just answered your own question:

> Unfortunately, the equivalent regexp probably looks like
> r'(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])[a-zA-Z0-9]{8:15}'

Apart from being needlessly verbose, regex syntax is not appropriate because
it specifies too much, specifies too little, and specifies the wrong
things. It specifies too much: regexes like ^ and $ are meaningless in this
case. It specifies too little: there's no regex for the "shuffle operator".
And it specifies the wrong things: regexes like (?= ...) as used in your
example are for matching, not generating strings, and it isn't clear
what "match any character but don't consume any of the string" means when
generating strings.

Personally, I think even the OP's specified language is too complex. For
example, it supports literal text, but given the use-case (password
generators) do we really want to support templates like "password[\d]"? I
don't think so, and if somebody did, they can trivially say "password" +
SG('[\d]').render().

Larry Wall (the creator of Perl) has stated that one of the mistakes with
Perl's regular expression mini-language is that the Huffman coding is
wrong. Common things should be short, uncommon things can afford to be
longer. Since the most common thing for password generation is to specify
character classes, they should be short, e.g. d rather than [\d] (one
character versus four).

The template given could potentially be simplified to:

"(LD){8:15}&D&P"

where the round brackets () are purely used for grouping. Character codes
are specified by a single letter. (I use uppercase to avoid the problem
that l & 1 look very similar. YMMV.) The model here is custom format codes
from spreadsheets, which should be comfortable to anyone who is familiar
with Excel or OpenOffice. If you insist on having the facility to including
literal text in your templates, might I suggest:

"'password'd"  # Literal string "password", followed by a single digit.

but personally I believe that for the use-case given, that's a mistake.

Alternatively, date/time templates use two-character codes like %Y %m etc,
which is better than 



> (I've been working on this kind of thing with regexps, but it's still
> incomplete.)
> 
>> * Uses SystemRandom class (if available, or falls back to Random)
> 
> This sounds cryptographically weak. Isn't the normal thing to do to
> use a cryptographic hash function to generate a pseudorandom sequence?

I don't think that using a good, but not cryptographically-strong, random
number generator to generate passwords is a serious vulnerability. What's
your threat model? Attacks on passwords tend to be one of a very few:

- dictionary attacks (including tables of common passwords and 
  simple transformations of words, e.g. 'pas5w0d');

- brute force against short and weak passwords;

- attacking the hash function used to store passwords (not the password
  itself), e.g. rainbow tables;

- keyloggers or some other way of stealing the password (including
  phishing sites and the ever-popular "beat them with a lead pipe 
  until they give up the password");

- other social attacks, e.g. guessing that the person's password is their
  date of birth in reverse.

But unless the random number generator is *ridiculously* weak ("9, 9, 9, 9,
9, 9, ...") I can't see any way to realistically attack the password
generator based on the weakness of the random number generator. Perhaps I'm
missing something?


> Someone should write a cryptographically secure pseudorandom number
> generator library for Python. :(

Here, let me google that for you :-)

https://duckduckgo.com/html/?q=python+crypto



-- 
Steven




More information about the Python-list mailing list