Template language for random string generation

Paul Wolf paulwolf333 at gmail.com
Fri Aug 8 05:01:47 EDT 2014


This is a proposal with a working implementation for a random string generation template syntax for Python. `strgen` is a module for generating random strings in Python using a regex-like template language. Example: 

    >>> from strgen import StringGenerator as SG
    >>> SG("[\l\d]{8:15}&[\d]&[\p]").render()
    u'F0vghTjKalf4^mGLk'

The template ([\l\d]{8:15}&[\d]&[\p]) generates a string from 8 to 15 characters in length with letters, digits. It is guaranteed to have at least one digit (maybe more) and exactly one punctuation character. 

If you look at various forums, like Stackoverflow, on how to generate random strings with Python, especially for passwords and other hopefully secure tokens, you will see dozens of variations of this: 

   >>> import random
   >>> import string
   >>> mypassword = ''.join(random.choice(string.ascii_uppercase + string.digits) for x in range(10))

There is nothing wrong with this (it's the right answer and is very fast), but it leads developers to constantly:

* Use cryptographically weak methods
* Forget that the above does not guarantee a result that includes the different classes of characters
* Doesn't include variable length or minimum length output
* It's a lot of typing and the resulting code is vastly different each time making it hard to understand what features were implemented, especially for those new to the language
* You can extend the above to include whatever requirements you want, but it's a constant exercise in wheel reinvention that is extremely verbose, error prone and confusing for exactly the same purposes each time

This application (generation of random strings for passwords, vouchers, secure ids, test data, etc.) is so general, it seems to beg for a general solution. So, why not have a standard way of expressing these using a simple template language? 

strgen: 

* Is far less verbose than commonly offered solutions
* Trivial editing of the pattern lets you incorporate additional important features (variable length, minimum length, additional character classes, etc.)
* Uses a pattern language superficially similar to regular expressions, so it's easy to learn
* Uses SystemRandom class (if available, or falls back to Random)
* Supports > 2.6 through 3.3
* Supports unicode
* Uses a parse tree, so you can have complex - nested - expressions to do tricky data generation tasks, especially for test data generation

In my opinion, it would make using Python for this application much easier and more consistent for very common requirements. The template language could easily be a cross-language standard like regex.  

You can `pip install strgen`. 

It's on Github: https://github.com/paul-wolf/strgen




More information about the Python-list mailing list