Recommended data structure for newbie
Paul McGuire
ptmcg at austin.rr._bogus_.com
Wed May 3 09:10:27 EDT 2006
"manstey" <manstey at csu.edu.au> wrote in message
news:1146626916.066395.206540 at y43g2000cwc.googlegroups.com...
> Hi,
>
> I have a text file with about 450,000 lines. Each line has 4-5 fields,
> separated by various delimiters (spaces, @, etc).
>
> I want to load in the text file and then run routines on it to produce
> 2-3 additional fields.
>
<snip>
Matthew -
If you find re's to be a bit cryptic, here is a pyparsing version that may
be a bit more readable, and will easily scan through your input file:
================
from pyparsing import OneOrMore, Word, alphas, oneOf, restOfLine, lineno
data = """gee fre asd[234
ger dsf asd[243
gwer af as.:^25a"""
# define format of input line, that is:
# - one or more words, composed of alphabetic characters, periods, and
colons
# - one of the characters '[' or '^'
# - the rest of the line
entry = OneOrMore( Word(alphas+".:") ) + oneOf("[ ^") + restOfLine
# scan for matches in input data - for each match, scanString will
# report the matching tokens, and start and end locations
for toks,start,end in entry.scanString(data):
print toks
print
# scan again, this time generating additional fields
for toks,start,end in entry.scanString(data):
tokens = list(toks)
# change these lines to implement your
# desired generation code - couldn't guess
# what you wanted from your example
tokens.append( toks[0]+toks[1] )
tokens.append( toks[-1] + toks[-1][-1] )
tokens.append( str( lineno(start, data) ) )
print tokens
================
prints:
['gee', 'fre', 'asd', '[', '234']
['ger', 'dsf', 'asd', '[', '243']
['gwer', 'af', 'as.:', '^', '25a']
['gee', 'fre', 'asd', '[', '234', 'geefre', '2344', '1']
['ger', 'dsf', 'asd', '[', '243', 'gerdsf', '2433', '2']
['gwer', 'af', 'as.:', '^', '25a', 'gweraf', '25aa', '3']
You asked about data structures specifically. The core collections in
python are lists, dicts, and more recently, sets. Pyparsing returns tokens
from its matching process using a pyparsing-defined class called
ParseResults. Fortunately, using Python's "duck-typing" model, you can
treat ParseResults objects just like a list, or like a dict if you have
assigned names to the fields in the parsing expression.
Download pyparsing at http://pyparsing.sourceforge.net.
-- Paul
More information about the Python-list
mailing list