matching exactly a 4 digit number in python

MRAB google at mrabarnett.plus.com
Fri Nov 21 18:00:15 EST 2008


George Sakkis wrote:
> On Nov 21, 4:46 pm, harijay <hari... at gmail.com> wrote:
> 
>> Hi
>> I am a few months new into python. I have used regexps before in perl
>> and java but am a little confused with this problem.
>>
>> I want to parse a number of strings and extract only those that
>> contain a 4 digit number anywhere inside a string
>>
>> However the regexp
>> p = re.compile(r'\d{4}')
>>
>> Matches even sentences that have longer than 4 numbers inside
>> strings ..for example it matches "I have 3324234 and more"
>>
>> I am very confused. Shouldnt the \d{4,} match exactly four digit
>> numbers so a 5 digit number sentence should not be matched .
> 
> No, why should it ? What you're saying is "give me 4 consecutive
> digits", without specifying what should precede or follow these
> digits. A correct expression is a bit more hairy:
> 
> p = re.compile(r'''
> 	(?:\D|\b)    # find a non-digit or word boundary..
> 	(\d{4})       # .. followed by the 4 digits to be matched as group
> #1..
> 	(?:\D|\b)    # .. which are followed by non-digit or word boundary
>         ''', re.VERBOSE)
> 
You want to match a sequence of 4 digits: \d{4}
not preceded by a digit: (?<!\d)
not followed by a digit: (?!\d)

which is: re.compile(r'(?<!\d)\d{4}(?!\d)')



More information about the Python-list mailing list