Useful RE patterns (was: Variable Interpolation - status of PEP215)
Pekka Niiranen
krissepu at vip.fi
Wed Jul 3 02:06:16 EDT 2002
Please make also example of recursive decent parser, for example LL(k)
-pekka-
"Mike C. Fletcher" wrote:
> Fredrik, it would be nice to see the list you collect (not just the
> selected final entries).
>
> I'm actually doing something very similar for SimpleParse (pre-built
> parsers for common constructs that can automatically be included in your
> grammars).
>
> <plug>The library feature will appear in SimpleParse 2.0 (now under
> development). Watch for more details in the weeks ahead or join in in
> the development effort...</plug>
>
> So far I have:
> int
> hex
> float
> number := hex/float/int
> double quoted string
> single quoted string
> string := (dqs/sqs)
> semi_colon_comment (ini-files and the like)
> hash_comment (python-style)
> slashslash_comment (C++ // comments)
> slashbang_comment (C /* */ non-nesting comments)
> slashbang_nest_comment (as previous, but allows nesting)
>
> (and common character classes as well, but RE has those already)
>
> Common ones I'm thinking of adding:
> Identifiers (e.g. Python, XML, HTML, C, filenames, URIs)
> Dates (with a decent selection of formats, suitable for human data entry
> processing)
> Times (again, a number of formats)
> Display-formatted numbers (e.g. 200,000.00 or 200 000,00 or (200,00) ;
> locale specific by default, possibly offering a few common international
> formats)
> Common units of measurement (SI units only? or maybe Imperial as well.
> Anyway, type would be something like: unit_weight or unit_distance or
> unit_energy (parsers would then define expression,unit to require a unit
> or expression,unit? to provide a default unit))
> Irrational numbers (under numbers, i or j forms)
> Monetary values (locale-specific base, possible with a "world" version to
> allow for parsing Pounds, Francs, Euros, Yen, Dollars etceteras without
> needing to switch locales, support for surrounding brackets meaning
> negative, those kinds of things :) ).
> IP Addresses
> Dotted identifiers
>
> Higher-level constructs under consideration:
> Mathematical expressions
> Lists, tuples, dicts (not sure how to make this generic without requiring
> a specific name for key/value expressions)
>
> Possible Python-specific additions (seen in tokeniser.py for your purposes):
> Calling/parameter-lists (definition and use)
> Triple quoted strings (under strings)
>
> SGML/XML/HTML-specific, thinking of including them as
> simpleparse/common/sgml.py:
> Identifier
> Tag
> Attribute
> Comment
> Entity References
> Processing instruction
> Various DTD elements (not sure if worth the trouble)
>
> For most of those you could probably find RE versions in various libs of
> the standard library (after all, they're common :) ).
>
> Enjoy,
> Mike
>
>
>
> Michael Hudson wrote:
> > Norman Shelley <Norman_Shelley-RRDN60 at email.sps.mot.com> writes:
> >
> >
> >>Fredrik Lundh wrote:
> >>
> >>
> >>>...
> >>>If I were to add a dozen (or so) patterns to the (S)RE module,
> >>>what should I pick? What patterns do you find yourself using
> >>>over and over again?
> >>
> >>All kinds of numerics, e.g. scientific (1e-6, 2e6, ...) and
> engineering (1u,
> >>2M and/or 2MEG, ...) notation.
> >>
> >>Python identifiers as previously mentioned.
> >
> >
> > Well, *they're* already in the tokenize module.
> >
> > Cheers,
> > M.
> >
>
> --
> _______________________________________
> Mike C. Fletcher
> http://members.rogers.com/mcfletch/
More information about the Python-list
mailing list