need simple parsing ability

Christopher T King squirrel at WPI.EDU
Fri Jul 16 12:04:15 EDT 2004


On Fri, 16 Jul 2004, george young wrote:

> I need to read user input of a subset of these.  The user will type a
> set of names separated by commas (with optional white space), but there
> may also be sequences indicated by a dash between two integers, e.g.: 
> 
>    "9-11"       meaning 9,10,11
>    "foo_11-13"  meaning foo_11, foo_12, and foo_13.
>    "foo_9-11"   meaning foo_9,foo_10,foo_11, or 
>    "bar09-11"   meaning bar09,bar10,bar11
> 
> (Yes, I have to deal with integers with and without leading zeros)
> [I'll proclaim inverse sequences like "foo_11-9" invalid]
> So a sample input might be:
> 
>    9,foo7-9,2-4,xxx   meaning 9,foo7,foo8,foo9,2,3,4,xxx
> 
> The order of the resultant list of names is not important; I have
> to sort them later anyway.

The following should do the trick, using nothing more than the built-in 
re package:

---

import re

def expand(pattern):
    r = re.search('\d+-\d+$',pattern)
    if r is None:
        yield pattern
        return
    s,e = r.group().split('-')
    for n in xrange(int(s),int(e)+1):
        yield pattern[:r.start()]+str(n)

def expand_list(pattern_list):
    return [ w for pattern in pattern_list.split(',')
               for w in expand(pattern) ]

print expand_list('9,foo7-9,2-4,xxx')

---

If you want to let the syntax be a little more lenient, replace
"pattern_list.split(',')" in expand_list() with
"re.split('\s*,\s*',pattern_list)".  This will allow spaces to surround
commas.

Note that because this uses generators, it won't work on Pythons prior to 
2.3.

Hope this helps!




More information about the Python-list mailing list