Regular Expression - old regex module vs. re module

Jim Segrave jes at nl.demon.net
Fri Jun 30 14:59:45 EDT 2006


In article <UCdpg.7174$Uc3.5798 at tornado.texas.rr.com>,
Paul McGuire <ptmcg at austin.rr._bogus_.com> wrote:
>"Jim Segrave" <jes at nl.demon.net> wrote in message
>news:12aaigaohtou291 at corp.supernews.com...
>> In article <ePapg.6149$Bh.3500 at tornado.texas.rr.com>,
>> Paul McGuire <ptmcg at austin.rr._bogus_.com> wrote:
>>
>> >Not an re solution, but pyparsing makes for an easy-to-follow program.
>> >TransformString only needs to scan through the string once - the
>> >"reals-before-ints" testing is factored into the definition of the
>> >formatters variable.
>> >
>> >Pyparsing's project wiki is at http://pyparsing.wikispaces.com.
>>
>> If fails for floats specified as ###. or .###, it outputs an integer
>> format and the decimal point separately. It also ignores \# which
>> should prevent the '#' from being included in a format.
>>
>Ah!  This may be making some sense to me now.  Here are the OP's original
>re's for matching.
>
>exponentPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\*\*\*\*\)')
>floatPattern = regex.compile('\(^\|[^\\#]\)\(#+\.#+\)')
>integerPattern = regex.compile('\(^\|[^\\#]\)\(##+\)')
>leftJustifiedStringPattern = regex.compile('\(^\|[^\\<]\)\(<<+\)')
>rightJustifiedStringPattern = regex.compile('\(^\|[^\\>]\)\(>>+\)')
>
>Each re seems to have two parts to it.  The leading parts appear to be
>guards against escaped #, <, or > characters, yes?  The second part of each
>re shows the actual pattern to be matched.  If so:
>
>It seems that we *don't* want "###." or ".###" to be recognized as floats,
>floatPattern requires at least one "#" character on either side of the ".".
>Also note that single #, <, and > characters don't seem to be desired, but
>at least two or more are required for matching.  Pyparsing's Word class
>accepts an optional min=2 constructor argument if this really is the case.
>And it also seems that the pattern is supposed to be enclosed in ()'s.  This
>seems especially odd to me, since one of the main points of this funky
>format seems to be to set up formatting that preserves column alignment of
>text, as if creating a tabular output - enclosing ()'s just junks this up.
>

The poster was excluding escaped (with a '\' character, but I've just
looked up the Perl format statement and in fact fields always begin
with a '@', and yes having no digits on one side of the decimal point
is legal. Strings can be left or right justified '@<<<<', '@>>>>', or
centred '@||||', numerics begin with an @, contain '#' and may contain
a decimal point. Fields beginning with '^' instead of '@' are omitted
if the format is a numeric ('#' with/without decimal). I assumed from
the poster's original patterns that one has to worry about '@', but
that's incorrect, they need to be present to be a format as opposed to
ordinary text and there's appears to be no way to embed a '@' in an
format. It's worth noting that PERL does implicit float to int
coercion, so it treats @### the same for ints and floats (no decimal
printed).

For the grisly details:

http://perl.com/doc/manual/html/pod/perlform.html

-- 
Jim Segrave           (jes at jes-2.demon.nl)




More information about the Python-list mailing list