Template language for random string generation

Paul Wolf paulwolf333 at gmail.com
Sun Aug 10 12:34:52 EDT 2014


On Sunday, 10 August 2014 13:43:04 UTC+1, Devin Jeanpierre  wrote:
> On Fri, Aug 8, 2014 at 2:01 AM, Paul Wolf <paulwolf333 at gmail.com> wrote:
> 
> > This is a proposal with a working implementation for a random string generation template syntax for Python. `strgen` is a module for generating random strings in Python using a regex-like template language. Example:
> 
> >
> 
> >     >>> from strgen import StringGenerator as SG
> 
> >     >>> SG("[\l\d]{8:15}&[\d]&[\p]").render()
> 
> >     u'F0vghTjKalf4^mGLk'
> 
> 
> 
> Why aren't you using regular expressions? I am all for conciseness,
> 
> but using an existing format is so helpful...
> 
> 
> 
> Unfortunately, the equivalent regexp probably looks like
> 
> r'(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])[a-zA-Z0-9]{8:15}'
> 
> 
> 
> (I've been working on this kind of thing with regexps, but it's still
> 
> incomplete.)
> 
> 
> 
> > * Uses SystemRandom class (if available, or falls back to Random)
> 
> 
> 
> This sounds cryptographically weak. Isn't the normal thing to do to
> 
> use a cryptographic hash function to generate a pseudorandom sequence?
> 
> 
> 
> Someone should write a cryptographically secure pseudorandom number
> 
> generator library for Python. :(
> 
> 
> 
> (I think OpenSSL comes with one, but then you can't choose the seed.)
> 
> 
> 
> -- Devin

> Why aren't you using regular expressions?

I guess you answered your own question with your example: 

* No one will want to write that expression
* The regex expression doesn't work anyway
* The purpose of regex is just too different from the purpose of strgen

The purpose of strgen is to make life easier for developers and provide benefits that get pushed downstream (to users of the software that gets produced with it). Adopting a syntax similar to regex is only necessary or useful to the extent it achieves that. 

I should also clarify that when I say the strgen template language is the converse of regular expressions, this is the case conceptually, not formally. Matching text strings is fundamentally different from producing randomized strings. For instance, a template language that validates the output would have to do frequency analysis. But that is getting too far off the purpose of strgen, although such a mechanism would certainly have its place. 

> This sounds cryptographically weak.

Whether using SystemRandom is cryptographically weak is not something I'm taking up here. Someone already suggested allowing the class to accept a different random source provider. That's an excellent idea. I wanted to make sure strgen does whatever they would do anyway hand-coding using the Python Standard Library except vastly more flexible, easier to edit and shorter. strgen is two things: a proposed standard way of expressing a string generation specification that relies heavily on randomness and a wrapper around the standard library. I specifically did not want to try to write better cryptographic routines. 



More information about the Python-list mailing list