Using re to get data from text file
John Lenton
john at grulic.org.ar
Fri Sep 10 13:47:27 EDT 2004
On Fri, Sep 10, 2004 at 10:29:27AM -0400, Jocknerd wrote:
> I'm a Python newbie and I'm having trouble with Regular Expressions when
> reading in a text file. Here is a sample layout of the input file:
>
> 09/04/2004 Virginia 44 Temple 14
> 09/04/2004 LSU 22 Oregon State 21
> 09/09/2004 Troy State 24 Missouri 14
>
> As you can see, the text file contains a list of games. Each game has a
> date, a winning team, the winning team's score, the losing team, and the
> losing team's score. If I set up my program to import the data with fixed
> length format's its no problem. But some of my text files have different
> layouts. For instance, some only have one space between a team name and
> their score.
>
> [...]
>
> I've tried this:
>
> while True:
> line = file.readline ()
> if not line: break
> game = {}
> datePattern = re.compile('^(\d{2})\D+(\d{2})\D+(\d{4})')
>
> Here's where I get stuck. What do I do from here? I just don't know how
> to import the text and assign it to the proper fields using the re module.
how about this:
import re, time, datetime
class Game(object):
def __init__(self, d, t1, t2, s1, s2):
self.date = d
self.team1 = t1
self.team2 = t2
self.score1 = s1
self.score2 = s2
def __str__(self):
return 'On %s, %s beat %s %s-%s' % (self.date,
self.team1,
self.team2,
self.score1,
self.score2)
class Games(Game):
_re = re.compile(r'([\d/]+)'
+ r'\s+(\w[\w\s]+\w)\s+(\d+)' * 2
+ r'\s*$')
def __init__(self, filename):
self.games = []
for line in file('games.csv'):
match = re.search(self._re, line)
if match:
d, t1, s1, t2, s2 = match.groups()
d = time.strptime(d, '%m/%d/%Y') # m/d/Y! yecch!
d = datetime.date(*d[:3])
self.games.append(Game(d, t1, t2, s1, s2))
if __name__ == '__main__':
import sys
for i in Games(sys.argv[1]).games:
print i
woops! looks like I got carried away.
--
John Lenton (john at grulic.org.ar) -- Random fortune:
Of course you have a purpose -- to find a purpose.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20040910/d3ed24f6/attachment.sig>
More information about the Python-list
mailing list