Using re to get data from text file

Jocknerd jocknerd1 at yahoo.com
Fri Sep 10 10:29:27 EDT 2004


I'm a Python newbie and I'm having trouble with Regular Expressions when
reading in a text file.  Here is a sample layout of the input file:

09/04/2004  Virginia              44   Temple               14
09/04/2004  LSU                   22   Oregon State         21
09/09/2004  Troy State            24   Missouri             14

As you can see, the text file contains a list of games.  Each game has a
date, a winning team, the winning team's score, the losing team, and the
losing team's score.  If I set up my program to import the data with fixed
length format's its no problem.  But some of my text files have different
layouts.  For instance, some only have one space between a team name and
their score.

Here's how I read in the file using fixed length fields:

filename = sys.argv[1]
file = open (filename, 'r')

schedule = []     # make a list called schedule

while True:
    line = file.readline()
    if not line: break
    game = {}     # make a dictionary called game
    game['date']   = line[0:10]   # fixed length field
    game['team1']  = string.strip (line[12:40])
    game['score1'] = line[40:42]
    game['team2']  = string.strip (line[44:72])
    game['score2'] = line[72:74]
    schedule.append(game)

file.close()

Note:  I'm stripping whitespace from the team names because I don't want
the team name to actually be a fixed length.

How would I set this up to read in the data using Regular expressions?

I've tried this:

while True:
    line = file.readline ()
    if not line: break
    game = {}
    datePattern = re.compile('^(\d{2})\D+(\d{2})\D+(\d{4})')

Here's where I get stuck.  What do I do from here?  I just don't know how
to import the text and assign it to the proper fields using the re module.




More information about the Python-list mailing list