newbie re question

Wed Nov 6 16:17:12 EST 2002

Gonçalo Rodrigues wrote:

> I've been trying to grok re's and settled myself a little exercise:
> concoct a re for a Python identifier.
>
> Now what I got is
>
> >>> pattern = re.compile(r'(\s|^)([\w_][\w\._]*)(\s|$)')
> >>> pattern.findall('aadf cdase b ad:aa aasa a.aa a@ aa _aa _aafr@ aa_aa aa__a?jk')
> [('', 'aadf', ' '), (' ', 'b', ' '), (' ', 'aasa', ' '), (' ', 'aa', '> '), (' ', 'aa_aa', ' ')]
>
> But as you can see from the results, not all valid identifiers get
> caught. For example, why isn't 'cdase' caught?

findall returns non-overlapping matches.  there's only a single space
between "aadf" and "cdase", and that was used by the first match.

here's a better pattern:

    pattern = re.compile(r'\b([a-zA-Z_]\w*)\b')

</F>