Newbie: searching an English dictionary for a reg exp

Eugene importbinascii
Sun Dec 2 16:54:24 EST 2001


I see a couple problems.  First, the regular expression you're using,
"abas?", actually means "match 'aba' with zero or one occurances of 's'
after it."  If you mean "match 'abas' with precisely one occurance of any
character after it," you want "aba." [sic].

Second, "reg.match(regpattern)" will always match, because you're asking a
regular object to match against itself.  You wanted to try and match the
pattern against each dictionary word in the loop, so you want to replace
"regpattern" in that line with "word", like:
> # search thru dictionary - print all matches
> for word in dictionary:
>     m = reg.match(word)    #[EG] change is in this line
>  ....

Third, and this may not be a problem, it may have been intentional, the
pattern you've chosen there is case-sensitive because the "re" module does
case-sensitive matching unless you tell it otherwise.  To do
case-insensitive matching you would either change your pattern from 'abas.'
to "(?i)aba." or change the line where you compile it from "reg =
re.compile(regpattern)" to "reg = re.compile(regpattern, re.I)".

On another note, if you're going to be searching through the entire
dictionary every time, you might do the matching as you're going through the
dictionary line-by-line and only store in your list the words that match.
(Unless you needed to have stored the other words for some other purpose, I
mean.)

And finally, there are a couple of different ways to use the "re" module for
most problems.  In this reply you've seen there are two ways to turn on
case-insensitivity (I don't know if the one using re.compile is faster than
the other), and in the Python 2.1 Doc section 4.2.3 ([re] Module Contents)
right at the top they say you can use re.compile or just put the pattern
directly into re.match, but one can be more efficient.  Note also the subtle
difference between re.search and re.match - I've tried to use re.match more
than once when I really wanted re.search because I wanted to search in the
middle of the string....

    -Eugene



"^^@++" <ballsacks at xtra.co.enzed> wrote in message
news:3c09f02f.97177874 at news.akl.ihug.co.nz...
> I want to search through the entire English language for a given
> regexp.
>
> So, I'll have say a 'dictionary.txt' file which contains every English
> word.
>
> Since I don't understand how python regexps work I'm kinda stuck.
>
> My guess is I should be doing something like:
> #####################################
> import re
> import string
>
> # heres our regpattern - once I get it going I'll read
> # regpatterns from a file
> regpattern = 'abas?' # should return 'abash' and 'abase'
>
> dictionary = []
> dictfile = open('dictionary.txt', 'r')
> line = dictfile.readline()
> while line != "":
>     dictionary.append(line)
>     line = string.strip(dictfile.readline())
>
> reg = re.compile(regpattern)
>
> # search thru dictionary - print all matches
> for word in dictionary:
>     m = reg.match(regpattern)
>     if m:
>         print m.group()
>
> #####################################
> This doesn't word properly - so how do I print each word that matches?
>
> Have I gone about this in the correct way? Having the entire
> dictionary in memory could be dodgy I guess...
>
> Thanks
> -Matt





More information about the Python-list mailing list