re beginner

Paul McGuire ptmcg at austin.rr._bogus_.com
Sun Jun 4 20:07:56 EDT 2006


"John Machin" <sjmachin at lexicon.net> wrote in message
news:4483665A.206 at lexicon.net...
> Fantastic -- at least for the OP's carefully copied-and-pasted input.
> Meanwhile back in the real world, there might be problems with multiple
> tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.
> In that case a loop approach that validated as it went and was able to
> report the position and contents of any invalid input might be better.

Yeah, for that you'd need more like a real parser... hey, wait a minute!
What about pyparsing?!

Here's a pyparsing version.  The definition of the parsing patterns takes
little more than the re definition does - the bulk of the rest of the code
is parsing/scanning the input and reporting the results.

The pyparsing home page is at http://pyparsing.wikispaces.com.

-- Paul


stuff = 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
print "Original input string:"
print stuff
print

from pyparsing import *

# define low-level elements for parsing
itemWord = Word(alphas, alphanums+".!?")
itemDesc = OneOrMore(itemWord)
integer = Word(nums)

# add parse action to itemDesc to merge separate words into single string
itemDesc.setParseAction( lambda s,l,t: " ".join(t) )

# define macro element for an entry
entry = itemDesc.setResultsName("item") + integer.setResultsName("qty")

# scan through input string for entry's, print out their named fields
print "Results when scanning for entries:"
for t,s,e in entry.scanString(stuff):
    print t.item,t.qty
print

# parse entire string, building ParseResults with dict-like access
results = dictOf( itemDesc, integer ).parseString(stuff)
print "Results when parsing entries as a dict:"
print "Keys:", results.keys()
for item in results.items():
    print item
for k in results.keys():
    print k,"=", results[k]


prints:

Original input string:
Yellow hat 2 Blue shirt 1
White socks 4 Green pants 1
Blue bag 4 Nice perfume 3
Wrist watch 7 Mobile phone 4
Wireless cord! 2 Building tools 3
One for the money 7 Two for the show 4

Results when scanning for entries:
Yellow hat 2
Blue shirt 1
White socks 4
Green pants 1
Blue bag 4
Nice perfume 3
Wrist watch 7
Mobile phone 4
Wireless cord! 2
Building tools 3
One for the money 7
Two for the show 4

Results when parsing entries as a dict:
Keys: ['Wireless cord!', 'Green pants', 'Blue shirt', 'White socks', 'Mobile
phone', 'Two for the show', 'One for the money', 'Blue bag', 'Wrist watch',
'Nice perfume', 'Yellow hat', 'Building tools']
('Wireless cord!', '2')
('Green pants', '1')
('Blue shirt', '1')
('White socks', '4')
('Mobile phone', '4')
('Two for the show', '4')
('One for the money', '7')
('Blue bag', '4')
('Wrist watch', '7')
('Nice perfume', '3')
('Yellow hat', '2')
('Building tools', '3')
Wireless cord! = 2
Green pants = 1
Blue shirt = 1
White socks = 4
Mobile phone = 4
Two for the show = 4
One for the money = 7
Blue bag = 4
Wrist watch = 7
Nice perfume = 3
Yellow hat = 2
Building tools = 3





More information about the Python-list mailing list