regex: im getting better

Tim Roberts timr at probo.com
Thu Oct 3 03:14:44 EDT 2002


":B nerdy" <thoa0025 at mail.usyd.edu.au> wrote:

>$pattern = '|<input(\s+([^=>]*)="([^"]*)")*>|ism';

Is that Perl?  It ain't Python.  The Python equivalent would be, I think:

  pattern = re.compile('<input(\s+([^=>]*)="([^"]*)")*>', re.I|re.S|re.M)

>i'd like to match all the input tags's but also in a subexpression, i'd like
>to match each of the parameters in the format
>parameter_name="parameter_value"
>where parameter_name and parameter_value are strings
>
>my pattern doesnt work, it only matches the last parameter, whats wrong with
>my pattern? and can someone show me how one would match my description
>above?

You will need to do this in two steps: one to isolate the tag, another to
use findall to fetch the parameters.

However, this isn't very robust.  The double quotes are optional in HTML
(although not in XHTML), and the strings might very well contain either
double quotes or angle brackets.  Both will screw this up.  Michal is
correct; you should use one of the HTML parsers (like sgmllib).
--
- Tim Roberts, timr at probo.com
  Providenza & Boekelheide, Inc.



More information about the Python-list mailing list