Another re question

Kent Polk kent at tiamat.goathill.org
Tue Oct 24 14:05:37 EDT 2000


On Mon, 23 Oct 2000 20:43:20 -0400, Stephen Kloder wrote:

---------
 >>> findpid_pat = r'\012+0*\w.\w*[\t, ]+([\w_ ]+)'
 >>> re.findall(findpid_pat,sid2pid)
 ['1X4567', '1 4853', '1X0608']
---------

Thanks. It works great in the cases I provided.  Unfortunately, I
forgot about one case - where the first name can be blank (spaces).

 >>> sid2pid="\n1 1979 1X4567\n00031  1 4853\n1S0959 1X0608\n       3S4267\n"
 >>> print sid2pid
 
 1 1979 1X4567
 00031  1 4853
 1S0959 1X0608
        3S4267
 
 >>> findpid_pat = r'\012+0*\w.\w*[\t, ]+([\w_ ]+)'
 >>> re.findall(findpid_pat,sid2pid)
 ['1X4567', '1 4853', '1X0608']

which misses my (new) last case.

I was using the logical or because I couldn't figure out how to
specify them together.  Using your example to clean my stuff up
results in:

 >>> findpid_pat = r'\012+\d |0*\w*[\t, ]+([\w ]+)'
 >>> re.findall(findpid_pat, sid2pid)
 ['', '1X4567', '1 4853', '1X0608', '3S4267']

I don't understand how a empty string matches in this last
case. Separately they are:

 >>> findpid_pat = r'\012+\d *\w*[\t, ]+([\w ]+)'
 >>> re.findall(findpid_pat, sid2pid)
 ['1X4567', '1 4853', '1X0608']

and 

 >>> findpid_pat = r'\012+0*\w*[\t, ]+([\w ]+)'
 >>> re.findall(findpid_pat, sid2pid)
 ['1979 1X4567', '1 4853', '1X0608', '3S4267']

Thanks Much!




More information about the Python-list mailing list