idiom for RE matching

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Tue Jul 24 00:01:08 EDT 2007


En Tue, 24 Jul 2007 00:23:46 -0300, Gordon Airporte <JHoover at fbi.gov>  
escribió:

> mik3l3374 at gmail.com wrote:
>> if your search is not overly complicated, i think regexp is not
>> needed. if you want, you can post a sample what you want to search,
>> and some sample input.
>
> I'm afraid it's pretty complicated :-). I'm doing analysis of hand
> histories that online poker sites leave for you. Here's one hand of a
> play money ring game:
>
>
> Full Tilt Poker Game #2042984473: Table Play Chip 344 - 10/20 - Limit
> Hold'em - 18:07:20 ET - 2007/03/22
> Seat 1: grandmarambo (1,595)
> Seat 4: justnoldfoolm (2,430)
> justnoldfoolm posts the small blind of 5
> rickrn posts the big blind of 10
> The button is in seat #1
> *** HOLE CARDS ***
> Dealt to moi [Jd 2c]
> justnoldfoolm bets 10
> [more sample lines]
>
> So I'm picking out all kinds of info about my cards, my stack, my
> betting, my position, board cards, other people's cards, etc. For
> example, this pattern picks out which player bet and how much:
>
> betsRe   = re.compile('^(.*) bets ([\d,]*)')
>
> I have 13 such patterns. The files I'm analyzing are just a session's
> worth of histories like this, separated by \n\n\n. All of this
> information needs to be organized by hand or by when it happened in a
> hand, so I can't just run patterns over the whole file or I'll lose  
> context.
> (Of course, in theory I could write a single monster expression that
> would chop it all up properly and organize by context, but it would be
> next to impossible to write/debug/maintain.)

But you don't HAVE to use a regular expression. For so simple and  
predictable input, using partition or 'xxx in string' is around 4x faster:

import re

betsRe = re.compile('^(.*) bets ([\d,]*)')

def test_partition(line):
   who, bets, amount = line.partition(" bets ")
   if bets:
     return who, amount

def test_re(line):
   r = betsRe.match(line)
   if r:
     return r.group(1), r.group(2)

line1 = "justnoldfoolm bets 10"
assert test_re(line1) == test_partition(line1) == ("justnoldfoolm", "10")
line2 = "Uncalled bet of 20 returned to justnoldfoolm"
assert test_re(line2) == test_partition(line2) == None

py> timeit.Timer("test_partition(line1)", "from __main__ import  
*").repeat()
<timeit-src>:2: SyntaxWarning: import * only allowed at module level
[1.1922188434563594, 1.2086988709458808, 1.1956522407177488]
py> timeit.Timer("test_re(line1)", "from __main__ import *").repeat()
<timeit-src>:2: SyntaxWarning: import * only allowed at module level
[5.2871529761464018, 5.2763971398599523, 5.2791986132315714]

As is often the case, a regular expression is NOT the right tool to use in  
this case.

-- 
Gabriel Genellina




More information about the Python-list mailing list