matching exactly a 4 digit number in python

George Sakkis george.sakkis at gmail.com
Fri Nov 21 17:25:06 EST 2008


On Nov 21, 4:46 pm, harijay <hari... at gmail.com> wrote:

> Hi
> I am a few months new into python. I have used regexps before in perl
> and java but am a little confused with this problem.
>
> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string
>
> However the regexp
> p = re.compile(r'\d{4}')
>
> Matches even sentences that have longer than 4 numbers inside
> strings ..for example it matches "I have 3324234 and more"
>
> I am very confused. Shouldnt the \d{4,} match exactly four digit
> numbers so a 5 digit number sentence should not be matched .

No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
	(?:\D|\b)    # find a non-digit or word boundary..
	(\d{4})       # .. followed by the 4 digits to be matched as group
#1..
	(?:\D|\b)    # .. which are followed by non-digit or word boundary
        ''', re.VERBOSE)


HTH,
George



More information about the Python-list mailing list