Recommended data structure for newbie

Paul McGuire ptmcg at austin.rr._bogus_.com
Wed May 3 09:10:27 EDT 2006


"manstey" <manstey at csu.edu.au> wrote in message
news:1146626916.066395.206540 at y43g2000cwc.googlegroups.com...
> Hi,
>
> I have a text file with about 450,000 lines. Each line has 4-5 fields,
> separated by various delimiters (spaces, @, etc).
>
> I want to load in the text file and then run routines on it to produce
> 2-3 additional fields.
>

<snip>

Matthew -

If you find re's to be a bit cryptic, here is a pyparsing version that may
be a bit more readable, and will easily scan through your input file:

================
from pyparsing import OneOrMore, Word, alphas, oneOf, restOfLine, lineno

data = """gee fre asd[234
ger dsf asd[243
gwer af as.:^25a"""

# define format of input line, that is:
# - one or more words, composed of alphabetic characters, periods, and
colons
# - one of the characters '[' or '^'
# - the rest of the line
entry = OneOrMore( Word(alphas+".:") ) + oneOf("[ ^") + restOfLine

# scan for matches in input data - for each match, scanString will
# report the matching tokens, and start and end locations
for toks,start,end in entry.scanString(data):
    print toks
print

# scan again, this time generating additional fields
for toks,start,end in entry.scanString(data):
    tokens = list(toks)
    # change these lines to implement your
    # desired generation code - couldn't guess
    # what you wanted from your example
    tokens.append( toks[0]+toks[1] )
    tokens.append( toks[-1] + toks[-1][-1] )
    tokens.append( str( lineno(start, data) ) )
    print tokens

================
prints:
['gee', 'fre', 'asd', '[', '234']
['ger', 'dsf', 'asd', '[', '243']
['gwer', 'af', 'as.:', '^', '25a']

['gee', 'fre', 'asd', '[', '234', 'geefre', '2344', '1']
['ger', 'dsf', 'asd', '[', '243', 'gerdsf', '2433', '2']
['gwer', 'af', 'as.:', '^', '25a', 'gweraf', '25aa', '3']


You asked about data structures specifically.  The core collections in
python are lists, dicts, and more recently, sets.  Pyparsing returns tokens
from its matching process using a pyparsing-defined class called
ParseResults.  Fortunately, using Python's "duck-typing" model, you can
treat ParseResults objects just like a list, or like a dict if you have
assigned names to the fields in the parsing expression.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul





More information about the Python-list mailing list