Regular Expression - old regex module vs. re module

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri Jun 30 15:35:13 EDT 2006


"Jim Segrave" <jes at nl.demon.net> wrote in message
news:12aat51q5ubf1a3 at corp.supernews.com...
<snip>
> The poster was excluding escaped (with a '\' character, but I've just
> looked up the Perl format statement and in fact fields always begin
> with a '@', and yes having no digits on one side of the decimal point
> is legal. Strings can be left or right justified '@<<<<', '@>>>>', or
> centred '@||||', numerics begin with an @, contain '#' and may contain
> a decimal point. Fields beginning with '^' instead of '@' are omitted
> if the format is a numeric ('#' with/without decimal). I assumed from
> the poster's original patterns that one has to worry about '@', but
> that's incorrect, they need to be present to be a format as opposed to
> ordinary text and there's appears to be no way to embed a '@' in an
> format. It's worth noting that PERL does implicit float to int
> coercion, so it treats @### the same for ints and floats (no decimal
> printed).
>
> For the grisly details:
>
> http://perl.com/doc/manual/html/pod/perlform.html
>
> -- 
> Jim Segrave           (jes at jes-2.demon.nl)
>

Ah, wunderbar!  Some further thoughts...

I can see that the OP omitted the concept of "@|||" centering, since the
Python string interpolation forms only support right or left justified
fields, and it seems he is trying to do some form of format->string interp
automation.  Adding centering would require not only composing a suitable
string interp format, but also some sort of pad() operation in the arg
passed to the string interp operation.  I suspect this also rules out simple
handling of the '^' operator as mentioned in the spec, and likewise for the
trailing ellipsis if a field is not long enough for the formatted value.

The '@' itself seems to be part of the field, so "@<<<<" would be a 5
column, left-justified string.  A bare '@' seems to be a single string
placeholder (meaningless to ask right or left justified :) ), since this is
used in the doc's hack for including a "@" in the output.  (That is, as you
said, the original spec provides no mechanism for escaping in a '@'
character, it has to get hacked in as a value dropped into a single
character field.)

The Perl docs say that fields that are too long are truncated.  This does
not happen in Python string interps for numeric values, but it can be done
with strings (using the precision field).
>>> print "%-10s" % string.ascii_uppercase
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>>> print "%-10.10s" % string.ascii_uppercase
ABCDEFGHIJ

So if we were to focus on support for "@", "@>>>", "@<<<", "@###" and
"@###.##" (with and without leading or trailing digits about the decimal)
style format fields, this shouldn't be overly difficult, and may even meet
the OP's requirements.  (The OP seemed to also want some support for
something like "@##.###****" for scientific notation, again, not a
dealbreaker.)

-- Paul





More information about the Python-list mailing list