Useful RE patterns (was: Variable Interpolation - status of PEP215)

Pekka Niiranen krissepu at vip.fi
Wed Jul 3 02:06:16 EDT 2002


Please make also example of recursive decent parser, for example LL(k)

-pekka-

"Mike C. Fletcher" wrote:

> Fredrik, it would be nice to see the list you collect (not just the
> selected final entries).
>
> I'm actually doing something very similar for SimpleParse (pre-built
> parsers for common constructs that can automatically be included in your
> grammars).
>
> <plug>The library feature will appear in SimpleParse 2.0 (now under
> development).  Watch for more details in the weeks ahead or join in in
> the development effort...</plug>
>
> So far I have:
>         int
>         hex
>         float
>         number := hex/float/int
>         double quoted string
>         single quoted string
>         string := (dqs/sqs)
>         semi_colon_comment (ini-files and the like)
>         hash_comment (python-style)
>         slashslash_comment (C++ // comments)
>         slashbang_comment (C /* */ non-nesting comments)
>         slashbang_nest_comment (as previous, but allows nesting)
>
>         (and common character classes as well, but RE has those already)
>
> Common ones I'm thinking of adding:
>         Identifiers (e.g. Python, XML, HTML, C, filenames, URIs)
>         Dates (with a decent selection of formats, suitable for human data entry
> processing)
>         Times (again, a number of formats)
>         Display-formatted numbers (e.g.  200,000.00 or 200 000,00 or (200,00) ;
> locale specific by default, possibly offering a few common international
> formats)
>         Common units of measurement (SI units only? or maybe Imperial as well.
> Anyway, type would be something like: unit_weight or unit_distance or
> unit_energy (parsers would then define expression,unit to require a unit
> or expression,unit? to provide a default unit))
>         Irrational numbers (under numbers, i or j forms)
>         Monetary values (locale-specific base, possible with a "world" version to
> allow for parsing Pounds, Francs, Euros, Yen, Dollars etceteras without
> needing to switch locales, support for surrounding brackets meaning
> negative, those kinds of things :) ).
>         IP Addresses
>         Dotted identifiers
>
> Higher-level constructs under consideration:
>         Mathematical expressions
>         Lists, tuples, dicts (not sure how to make this generic without requiring
> a specific name for key/value expressions)
>
> Possible Python-specific additions (seen in tokeniser.py for your purposes):
>         Calling/parameter-lists (definition and use)
>         Triple quoted strings (under strings)
>
> SGML/XML/HTML-specific, thinking of including them as
> simpleparse/common/sgml.py:
>         Identifier
>         Tag
>         Attribute
>         Comment
>         Entity References
>         Processing instruction
>         Various DTD elements (not sure if worth the trouble)
>
> For most of those you could probably find RE versions in various libs of
> the standard library (after all, they're common :) ).
>
> Enjoy,
> Mike
>
>
>
> Michael Hudson wrote:
>  > Norman Shelley <Norman_Shelley-RRDN60 at email.sps.mot.com> writes:
>  >
>  >
>  >>Fredrik Lundh wrote:
>  >>
>  >>
>  >>>...
>  >>>If I were to add a dozen (or so) patterns to the (S)RE module,
>  >>>what should I pick?  What patterns do you find yourself using
>  >>>over and over again?
>  >>
>  >>All kinds of numerics, e.g. scientific (1e-6, 2e6, ...) and
> engineering (1u,
>  >>2M and/or 2MEG, ...) notation.
>  >>
>  >>Python identifiers as previously mentioned.
>  >
>  >
>  > Well, *they're* already in the tokenize module.
>  >
>  > Cheers,
>  > M.
>  >
>
> --
> _______________________________________
>     Mike C. Fletcher
>     http://members.rogers.com/mcfletch/




More information about the Python-list mailing list