a re question
Bengt Richter
bokr at oz.net
Mon Sep 9 19:26:48 EDT 2002
On Mon, 09 Sep 2002 16:10:39 -0400, Rajarshi Guha <rajarshi at presidency.com> wrote:
>Hi,
> I have a file with lines of the format:
>
>001 Abc D Efg 123456789 7 100 09/05/2002 20:23:23
>001 Xya FGh 143557789 7 100 09/05/2002 20:23:23
>
>I am trying to extract the 9 digit field and the single digit field
>immediatley after that.
>
>When I use Visual Regexp to try out the regexp
>
>(\d{9,} {3,}\d)
>
>it highlights the 2 fields exactly.
>
>But when I use the following Python code I get None:
>
>>> s='001 Abc D Efg 123456789 7 100 09/05/2002 20:23:23'
>>> p = re.compile(r'(\d{9,} {3,}\d)')
>>> print p.match(s)
>>> None
>
>Could anybody point out where I'm going wrong?
>
>Thanks,
>>> import re
>>> s='001 Abc D Efg 123456789 7 100 09/05/2002 20:23:23'
>>> p = re.compile(r'(\d{9,} {3,}\d)')
>>> print p.match(s)
None
>>> print p.search(s).groups()
('123456789 7',)
But if you want them separately,
>>> p = re.compile(r'(\d{9,}) {3,}(\d)')
>>> print p.search(s).groups()
('123456789', '7')
Or as actual integers,
>>> map(int, p.search(s).groups())
[123456789, 7]
match starts at the beginning of the string. See
http://www.python.org/doc/current/lib/matching-searching.html
so for your pattern you might be able to prefix ".* " (i.e., anything ending in
a space before your 9 or more digits etc), e.g.,
>>> p = re.compile(r'.* (\d{9,}) {3,}(\d)')
>>> print p.match(s).groups()
('123456789', '7')
where s is still
>>> s
'001 Abc D Efg 123456789 7 100 09/05/2002 20:23:23'
BTW, you did want to get extra digits beyond 9 and *no* extra digits
in the second single digit number, right? E.g.,
>>> s='001 Abc D Efg 1234567890 70 100 09/05/2002 20:23:23'
incl--^ ^--not incl
>>> p = re.compile(r'(\d{9,}) {3,}(\d)')
>>> print p.search(s).groups()
('1234567890', '7')
^ ^--single digit guaranteed irresp of next
Regards,
Bengt Richter
More information about the Python-list
mailing list