Using re to get data from text file

John Lenton john at grulic.org.ar
Fri Sep 10 13:47:27 EDT 2004


On Fri, Sep 10, 2004 at 10:29:27AM -0400, Jocknerd wrote:
> I'm a Python newbie and I'm having trouble with Regular Expressions when
> reading in a text file.  Here is a sample layout of the input file:
> 
> 09/04/2004  Virginia              44   Temple               14
> 09/04/2004  LSU                   22   Oregon State         21
> 09/09/2004  Troy State            24   Missouri             14
> 
> As you can see, the text file contains a list of games.  Each game has a
> date, a winning team, the winning team's score, the losing team, and the
> losing team's score.  If I set up my program to import the data with fixed
> length format's its no problem.  But some of my text files have different
> layouts.  For instance, some only have one space between a team name and
> their score.
> 
> [...]
> 
> I've tried this:
> 
> while True:
>     line = file.readline ()
>     if not line: break
>     game = {}
>     datePattern = re.compile('^(\d{2})\D+(\d{2})\D+(\d{4})')
> 
> Here's where I get stuck.  What do I do from here?  I just don't know how
> to import the text and assign it to the proper fields using the re module.

how about this:

    import re, time, datetime

    class Game(object):
        def __init__(self, d, t1, t2, s1, s2):
            self.date = d
            self.team1 = t1
            self.team2 = t2
            self.score1 = s1
            self.score2 = s2

        def __str__(self):
            return 'On %s, %s beat %s %s-%s' % (self.date,
                                                self.team1,
                                                self.team2,
                                                self.score1,
                                                self.score2)

    class Games(Game):
        _re = re.compile(r'([\d/]+)'
                         + r'\s+(\w[\w\s]+\w)\s+(\d+)' * 2
                         + r'\s*$')
        def __init__(self, filename):
            self.games = []
            for line in file('games.csv'):
                match = re.search(self._re, line)
                if match:
                    d, t1, s1, t2, s2 = match.groups()
                    d = time.strptime(d, '%m/%d/%Y') # m/d/Y! yecch!
                    d = datetime.date(*d[:3])
                    self.games.append(Game(d, t1, t2, s1, s2))

    if __name__ == '__main__':
        import sys
        for i in Games(sys.argv[1]).games:
            print i

woops! looks like I got carried away.

-- 
John Lenton (john at grulic.org.ar) -- Random fortune:
Of course you have a purpose -- to find a purpose.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20040910/d3ed24f6/attachment.sig>


More information about the Python-list mailing list