Another re question

Mon Oct 23 17:17:24 EDT 2000

I have a question regarding or'ed re's with re.findall()

I have a mapping file which contains two names per line.  The names
can contain patterns which are represented by the following:

1 1979 1X4567
00031  1 4853
1S0959 1X0608

The leading zeroes need to be discarded, but the space following
the first digit needs to be maintained. The names are either tab
and/or space-separated (can have more than one tab and/or space).

I have re's that match most of the different mappings I need, but
I'm having trouble finding an re to match all of the second names
in the file.

(Don't worry that the \n is leading instead of trailing in this
example.)

'\d ' is to match the '1 ' case  : \d \w*  -> 1 1979
[0]*  is to match the '000' case : [0]*\w* -> 00031

Actually, I don't need to worry about the leading zeroes in this
case but I'm leaving it in to be consistent with my other re's.

>>> sid2pid="\n1 1979 1X4567\n00031  1 4853\n1S0959 1X0608\n"
>>> findpid_pat = '\012+\d |[0]*\w*[\t, ]+([a-zA-Z0-9_ ]+)'
>>> re.findall(findpid_pat, sid2pid)
['', '1X4567', '1 4853', '1X0608']

I can just use this re and remove all of the empty strings after
I'm done, but is there a better way that won't generate them at
all?

Thanks