[Tutor] using re to match text and extract info

Emmanuel Ruellan emmanuel.ruellan at laposte.net
Thu Dec 31 18:37:58 CET 2009


What's wrong with the phone number?

>>> phoneNumber.search(line).groups()
('03', '88', '23', '05', '66')

This looks fine to me.

Here is a regex that splits the line into several named groups. Test it with
other strings, though

>>> line = "ALSACE 67000 Strasbourg 24 rue de la Division Leclerc 03 88 23
05 66 strasbourg at artisansdumonde.org"

>>> details_re =
re.compile(r'(?P<region>^\D+)(?P<postcode>\d+)\s+(?P<town>[\D\s]+)(?P<address>.+?)(?P<phone>\d{2}
\d{2} \d{2} \d{2}
\d{2})\s+(?P<email>[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4})')

>>> m = details_re.search(line)

>>> print m.groups()
('ALSACE ', '67000', 'Strasbourg ', '24 rue de la Division Leclerc ', '03 88
23 05 66', 'strasbourg at artisansdumonde.org')

>>> print m.group('phone')
03 88 23 05 66

>>> print m.group('email')
strasbourg at artisansdumonde.org


Emmanuel


On Thu, Dec 31, 2009 at 2:49 PM, Norman Khine <norman at khine.net> wrote:

>
>
> hello,
>
> >>> import re
> >>> line = "ALSACE 67000 Strasbourg 24 rue de la Division Leclerc 03 88 23
> 05 66 strasbourg at artisansdumonde.org"
> >>> m = re.search('[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}', line)
> >>> emailAddress .search(r"(\d+)", line)
> >>> phoneNumber = re.compile(r'(\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2})')
> >>> phoneNumber.search(line)
>
> but this jumbles the phone number and also includes the 67000.
>
> how can i split the 'line' into a list?
>
> thanks
> norman
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091231/8dace709/attachment-0001.htm>


More information about the Tutor mailing list