Regular Expressions...
Ben Finney
bignose+hates-spam at benfinney.id.au
Wed Jan 7 20:41:14 EST 2009
"Ken D'Ambrosio" <ken at jots.org> writes:
> Hi, all. As a recovering Perl guy, I have to admit I don't quite "get"
> the re module. For example, I'd like to do a few things (I'm going to use
> phone numbers, 'cause that's what I'm currently dealing with):
> 12345678900 -- How would I:
> - Get just the area code?
> - Get just the seven-digit number?
>
> In Perl, I'd so something like
> m/^1(...)(.......)/;
Wouldn't that be better as:
m/^1(\d{3})(\d{7})$/;
I'll assume that more-precise expression in what follows.
> and then I'd have the numbers in $1 and $2, respectively. But the Python
> stuff simply isn't clicking for me.
In general, where a set of data is likely to be iterated, the Pythonic
way to present it is via a single iterable (instead of, in your Perl
example, separate variables).
Then, for those (generally less frequent) cases where you do want the
separate items, you can bind them in a single statement:
(foo, bar, baz) = some_sequence
or
(foo, bar, baz) = (item for item in some_sequence)
e.g.:
>>> (foo, bar, baz) = [1, 2, 3]
>>> foo
1
>>> bar
2
>>> baz
3
So, the match returned by the various ‘re’ module match functions is
an object which allows access to the grouped matches as a sequence.
> If anyone could supply concrete examples of how to do the problem,
> above, that would be terrific.
Assuming the following:
>>> import re
>>> phone_number_regex = '^1(\d{3})(\d{7})$'
Trivial one-shot example:
>>> phone_number = '12345678900'
>>> (area_code, local_number) = re.match(phone_number_regex, phone_number).groups()
>>> area_code
'234'
>>> local_number
'5678900'
More explicit example, showing the various steps and assuming you want
to re-use the various values in multiple statements:
>>> phone_number_pattern = re.compile(phone_number_regex)
>>> phone_number_pattern
<_sre.SRE_Pattern object at 0xf7f8c598>
>>> phone_number = '12345678900'
>>> phone_number_match = phone_number_pattern.match(phone_number)
>>> phone_number_match
<_sre.SRE_Match object at 0xf7f52338>
>>> (area_code, local_number) = phone_number_match.groups()
>>> area_code
'234'
>>> local_number
'5678900'
Python regular expressions also allow naming each group, for later
access to the matches via a dict:
>>> phone_number_regex = '^1(?P<area_code>\d{3})(?P<local_number>\d{7})'
>>> phone_number_pattern = re.compile(phone_number_regex)
>>> phone_number_match = phone_number_pattern.match(phone_number)
>>> phone_number_groups = phone_number_match.groupdict()
>>> phone_number_groups['area_code']
'234'
>>> phone_number_groups['local_number']
'5678900'
--
\ “… one of the main causes of the fall of the Roman Empire was |
`\ that, lacking zero, they had no way to indicate successful |
_o__) termination of their C programs.” —Robert Firth |
Ben Finney
More information about the Python-list
mailing list