Another re question

Mon Oct 23 20:43:20 EDT 2000

Kent Polk wrote:

> I have a question regarding or'ed re's with re.findall()
>
> I have a mapping file which contains two names per line.  The names
> can contain patterns which are represented by the following:
>
> 1 1979 1X4567
> 00031  1 4853
> 1S0959 1X0608
>
> The leading zeroes need to be discarded, but the space following
> the first digit needs to be maintained. The names are either tab
> and/or space-separated (can have more than one tab and/or space).
>
> I have re's that match most of the different mappings I need, but
> I'm having trouble finding an re to match all of the second names
> in the file.
>
> (Don't worry that the \n is leading instead of trailing in this
> example.)
>
> '\d ' is to match the '1 ' case  : \d \w*  -> 1 1979
> [0]*  is to match the '000' case : [0]*\w* -> 00031
>
> Actually, I don't need to worry about the leading zeroes in this
> case but I'm leaving it in to be consistent with my other re's.
>
> >>> sid2pid="\n1 1979 1X4567\n00031  1 4853\n1S0959 1X0608\n"
> >>> findpid_pat = '\012+\d |[0]*\w*[\t, ]+([a-zA-Z0-9_ ]+)'
> >>> re.findall(findpid_pat, sid2pid)
> ['', '1X4567', '1 4853', '1X0608']
>
> I can just use this re and remove all of the empty strings after
> I'm done, but is there a better way that won't generate them at
> all?
>
> Thanks

>>> findpid_pat = r'\012+0*\w.\w*[\t, ]+([\w_ ]+)'
>>> re.findall(findpid_pat,sid2pid)
['1X4567', '1 4853', '1X0608']

HTH

--
Stephen Kloder               |   "I say what it occurs to me to say.
stephenk at cc.gatech.edu       |      More I cannot say."
Phone 404-874-6584           |   -- The Man in the Shack
ICQ #65153895                |            be :- think.