Regular Expression - old regex module vs. re module

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri Jun 30 13:58:12 EDT 2006


"Jim Segrave" <jes at nl.demon.net> wrote in message
news:12aaigaohtou291 at corp.supernews.com...
> In article <ePapg.6149$Bh.3500 at tornado.texas.rr.com>,
> Paul McGuire <ptmcg at austin.rr._bogus_.com> wrote:
>
> >Not an re solution, but pyparsing makes for an easy-to-follow program.
> >TransformString only needs to scan through the string once - the
> >"reals-before-ints" testing is factored into the definition of the
> >formatters variable.
> >
> >Pyparsing's project wiki is at http://pyparsing.wikispaces.com.
>
> If fails for floats specified as ###. or .###, it outputs an integer
> format and the decimal point separately. It also ignores \# which
> should prevent the '#' from being included in a format.
>
Ah!  This may be making some sense to me now.  Here are the OP's original
re's for matching.

exponentPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\*\*\*\*\)')
floatPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\)')
integerPattern = regex.compile('\(^\|[^\\#]\)\(##+\)')
leftJustifiedStringPattern = regex.compile('\(^\|[^\\<]\)\(<<+\)')
rightJustifiedStringPattern = regex.compile('\(^\|[^\\>]\)\(>>+\)')

Each re seems to have two parts to it.  The leading parts appear to be
guards against escaped #, <, or > characters, yes?  The second part of each
re shows the actual pattern to be matched.  If so:

It seems that we *don't* want "###." or ".###" to be recognized as floats,
floatPattern requires at least one "#" character on either side of the ".".
Also note that single #, <, and > characters don't seem to be desired, but
at least two or more are required for matching.  Pyparsing's Word class
accepts an optional min=2 constructor argument if this really is the case.
And it also seems that the pattern is supposed to be enclosed in ()'s.  This
seems especially odd to me, since one of the main points of this funky
format seems to be to set up formatting that preserves column alignment of
text, as if creating a tabular output - enclosing ()'s just junks this up.

My example also omitted the exponent pattern.  This can be handled with
another expression like realFormat, but with the trailing "****" characters.
Be sure to insert this expression before realFormat in the list of
formatters.

I may be completely off in my re interpretation.  Perhaps one of the re
experts here can explain better what the OP's re's are all about.  Can
anybody locate/cite the actual spec for this formatting, um, format?

-- Paul





More information about the Python-list mailing list