local greediness ???

Wed Apr 19 09:47:50 EDT 2006

<tygerc at gmail.com> wrote in message
news:1145423359.206549.52510 at v46g2000cwv.googlegroups.com...
> hi, all. I need to process a file with the following format:
> $ cat sample
> [(some text)2.3(more text)4.5(more text here)]
> [(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
> [(xxx)11.0(bbb\))8.9(end here)]
> .......
>
> my goal here is for each line, extract every '(.*)' (including the
> round
> brackets, put them in a list, and extract every float on the same line
> and put them in a list..

Are you wedded to re's?  Here's a pyparsing approach for your perusal.  It
uses the new QuotedString class, treating your ()-enclosed elements as
custom quoted strings (including backslash escape support).

Some other things the parser does for you during parsing:
- converts the numeric strings to floats
- processes the \) escaped paren, returning just the )
Why not? While parsing, the parser "knows" it has just parsed a floating
point number (or an escaped character), go ahead and do the conversion too.

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net.)

--------------------
test = r"""
[(some text)2.3(more text)4.5(more text here)]
[(aa bb ccc)-1.2(kdk)12.0(xxxyyy)]
[(xxx)11.0(bbb\))8.9(end here)]
"""
from pyparsing import oneOf,Combine,Optional,Word,nums,QuotedString,Suppress

# define a floating point number
sign = oneOf("+ -")
floatNum = Combine( Optional(sign) + Word(nums) + "." + Word(nums) )

# have parser convert to actual floats while parsing
floatNum.setParseAction(lambda s,l,t: float(t[0]))

# define a "quoted string" where ()'s are the opening and closing quotes
parenString = QuotedString("(",endQuoteChar=")",escChar="\\")

# define the overall entry structure
entry = Suppress("[") + parenString + floatNum + parenString + floatNum +
parenString + Suppress("]")

# scan for floats
for toks,start,end in floatNum.scanString(test):
    print toks[0]
print

# scan for paren strings
for toks,start,end in parenString.scanString(test):
    print toks[0]
print

# scan for entries
for toks,start,end in entry.scanString(test):
    print toks
print
--------------------
Gives:
2.3
4.5
-1.2
12.0
11.0
8.9

some text
more text
more text here
aa bb ccc
kdk
xxxyyy
xxx
bbb)
end here

['some text', 2.2999999999999998, 'more text', 4.5, 'more text here']
['aa bb ccc', -1.2, 'kdk', 12.0, 'xxxyyy']
['xxx', 11.0, 'bbb)', 8.9000000000000004, 'end here']