Simple Regular Expression Needed...

John Hunter jdhunter at ace.bsd.uchicago.edu
Tue Oct 8 17:51:25 EDT 2002


>>>>> "Yin" == Yin  <yin_12180 at yahoo.com> writes:

    Yin> I am parsing the following html strings: <input>YOUNG J
    Yin> STRONG 309 1356 1994 <input>FELDMAN DJ WEAK 15 297 1962

    Yin> I would like a regular expression to return the following for
    Yin> matches: ('YOUNG J STRONG 309 1356 1994', 'FELDMAN DJ WEAK 15
    Yin> 297 1962')

    Yin> CONSTRAINTS: 1. newlines can't be used in the expression.
    Yin> 2. spacing between numbers and text may not be preserved.

    Yin> The actual problem is a little more complex, but the spirit
    Yin> of it I think is preserved in this example.  Any thoughts?

import re
s = """
<input>YOUNG J      STRONG   309   1356   1994
<input>FELDMAN DJ   WEAK  15    297    1962
Some other stuff
"""
rgx = re.compile('<input>([A-Z]+\s+[A-Z]+\s+(?:STRONG|WEAK)\s+\d+\s+\d+\s+\d+)\s*$')


for line in s.split('\n'):
  m = rgx.match(line)
  if m: print m.group(1)






More information about the Python-list mailing list