need simple parsing ability

Fri Jul 16 16:05:44 EDT 2004

Another fix, to handle leading zeros.

ns = '9,2-4,xxx,5, bar, foo_6-11,x07-9'

 # list of plain, clean names
ns = [n.strip() for n in ns.split(',')]
 # list of names and expanded names
fs = []
for n in ns:
    r = n.split('-')
    if len(r) == 2:  # expand name with range
        h = r[0].rstrip('0123456789')  # header
        r[0] = r[0][len(h):]
        if r[0][0] != '0':
           h += '%d'
        else:  # leading zeros
           w = [len(i) for i in r]
           if w[1] > w[0]:
               raise ValueError, 'bad range: ' + n
           h += '%%0%dd' % max(w)
        for i in range(int(r[0],10), 1+int(r[1],10)):
            fs.append(h % i)
    else:  # simple name
        fs.append(n)
 # remove duplicates
fs = dict([(n, i) for i, n in enumerate(fs)]).keys()
  # sort, maybe
fs.sort()

print fs
>>> ['2', '3', '4', '5', '9', 'bar', 'foo_10', 'foo_11', 'foo_6',
'foo_7', 'foo_8', 'foo_9', 'x07', 'x08', 'x09', 'xxx']

There is still a question about a range specification like

  foo09-123

which is treated as as error in the code above.

/Jean Brouwers

In article <20040716145248.1d615670.gry at ll.mit.edu>, george young
<gry at ll.mit.edu> wrote:

> On Fri, 16 Jul 2004 17:10:03 GMT
> Jean Brouwers <JBrouwersAtProphICyDotCom at no.spam.net> threw this fish to the
> penguins:
> > With two fixes, one bug and one typo:
> > 
> > ns = '9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10, foo_11'
> > 
> >  # list of plain, clean names
> > ns = [n.strip() for n in ns.split(',')]
> >  # expand names with range
> > fs = []
> > for n in ns:
> >     r = n.split('-')
> >     if len(r) != 2:  # simple name
> >         fs.append(n)
> >     else: # name with range
> >         h = r[0].rstrip('0123456789')  # header
> >         for i in range(int(r[0][len(h):]), 1 + int(r[1])):
> >             fs.append(h + str(i))
> 
> Mmm, not quite.  If ns=='foo08-11', your fs==[foo8, foo9, foo10, foo11] 
> which is wrong.  It should yield  fs==[foo08, foo09, foo10, foo11].
> I.e., it must maintain leading zeros in ranges.
> 
> (I'm contracting out construction of a special circle of hell for users
> who define [foo7, foo08, foo9, foo10] -- they won't be around to complain
> that it parses wrong ;-)
> 
> > > 
> > > In article <20040716111324.09267883.gry at ll.mit.edu>, george young
> > > <gry at ll.mit.edu> wrote:
> > > 
> > > > [python 2.3.3, x86 linux]
> > > > For each run of my app, I have a known set of (<100) wafer names.
> > > > Names are sometimes simply integers, sometimes a short string, and
> > > > sometimes a short string followed by an integer, e.g.:
> > > > 
> > > >   5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11
> > > > 
> > > > I need to read user input of a subset of these.  The user will type a
> > > > set of names separated by commas (with optional white space), but there
> > > > may also be sequences indicated by a dash between two integers, e.g.: 
> > > > 
> > > >    "9-11"       meaning 9,10,11
> > > >    "foo_11-13"  meaning foo_11, foo_12, and foo_13.
> > > >    "foo_9-11"   meaning foo_9,foo_10,foo_11, or 
> > > >    "bar09-11"   meaning bar09,bar10,bar11
>           ^^^^^^^^            ^^^^^^^^^^^^^^^^^  
> > > > (Yes, I have to deal with integers with and without leading zeros)
> > > > [I'll proclaim inverse sequences like "foo_11-9" invalid]
> > > > So a sample input might be:
> > > > 
> > > >    9,foo7-9,2-4,xxx   meaning 9,foo7,foo8,foo9,2,3,4,xxx
> > > > 
> > > > The order of the resultant list of names is not important; I have
> > > > to sort them later anyway.
> > > > 
> > > > Fancy error recovery is not needed; an invalid input string will be
> > > > peremptorily wiped from the screen with an annoyed beep.
> > > > 
> > > > Can anyone suggest a clean way of doing this?  I don't mind
> > > > installing and importing some parsing package, as long as my code
> > > > using it is clear and simple.  Performance is not an issue.
> > > > 
> > > > 
> > > > -- George Young