Regular Expressions...

Ben Finney bignose+hates-spam at benfinney.id.au
Wed Jan 7 20:41:14 EST 2009


"Ken D'Ambrosio" <ken at jots.org> writes:

> Hi, all.  As a recovering Perl guy, I have to admit I don't quite "get"
> the re module.  For example, I'd like to do a few things (I'm going to use
> phone numbers, 'cause that's what I'm currently dealing with):
> 12345678900 -- How would I:
> - Get just the area code?
> - Get just the seven-digit number?
> 
> In Perl, I'd so something like
> m/^1(...)(.......)/;

Wouldn't that be better as:

    m/^1(\d{3})(\d{7})$/;

I'll assume that more-precise expression in what follows.

> and then I'd have the numbers in $1 and $2, respectively.  But the Python
> stuff simply isn't clicking for me.

In general, where a set of data is likely to be iterated, the Pythonic
way to present it is via a single iterable (instead of, in your Perl
example, separate variables).

Then, for those (generally less frequent) cases where you do want the
separate items, you can bind them in a single statement:

    (foo, bar, baz) = some_sequence

or

    (foo, bar, baz) = (item for item in some_sequence)

e.g.:

    >>> (foo, bar, baz) = [1, 2, 3]
    >>> foo
    1
    >>> bar
    2
    >>> baz
    3

So, the match returned by the various ‘re’ module match functions is
an object which allows access to the grouped matches as a sequence.

> If anyone could supply concrete examples of how to do the problem,
> above, that would be terrific.

Assuming the following:

    >>> import re
    >>> phone_number_regex = '^1(\d{3})(\d{7})$'

Trivial one-shot example:

    >>> phone_number = '12345678900'
    >>> (area_code, local_number) = re.match(phone_number_regex, phone_number).groups()
    >>> area_code
    '234'
    >>> local_number
    '5678900'

More explicit example, showing the various steps and assuming you want
to re-use the various values in multiple statements:

    >>> phone_number_pattern = re.compile(phone_number_regex)
    >>> phone_number_pattern
    <_sre.SRE_Pattern object at 0xf7f8c598>

    >>> phone_number = '12345678900'
    >>> phone_number_match = phone_number_pattern.match(phone_number)
    >>> phone_number_match
    <_sre.SRE_Match object at 0xf7f52338>

    >>> (area_code, local_number) = phone_number_match.groups()
    >>> area_code
    '234'
    >>> local_number
    '5678900'

Python regular expressions also allow naming each group, for later
access to the matches via a dict:

    >>> phone_number_regex = '^1(?P<area_code>\d{3})(?P<local_number>\d{7})'
    >>> phone_number_pattern = re.compile(phone_number_regex)
    >>> phone_number_match = phone_number_pattern.match(phone_number)
    >>> phone_number_groups = phone_number_match.groupdict()
    >>> phone_number_groups['area_code']
    '234'
    >>> phone_number_groups['local_number']
    '5678900'

-- 
 \       “… one of the main causes of the fall of the Roman Empire was |
  `\        that, lacking zero, they had no way to indicate successful |
_o__)                  termination of their C programs.” —Robert Firth |
Ben Finney



More information about the Python-list mailing list