parsing tab and newline delimited text

elsa kerensaelise at hotmail.com
Tue Aug 3 22:14:30 EDT 2010


Hi,

I have a large file of text I need to parse. Individual 'entries' are
separated by newline characters, while fields within each entry are
separated by tab characters.

So, an individual entry might have this form (in printed form):

Title    date   position   data

with each field separated by tabs, and a newline at the end of data.
So, I thought I could simply open a file, read each line in in turn,
and parse it....

f=open('MyFile')
line=f.readline()
parts=line.split('\t')

etc...

However, 'data' is a fairly random string of characters. Because the
files I'm processing are large, there is a good chance that in every
file, there is a data field that might look like this:

899998dlKKlS\lk3#kdf\nllllKK99

or like this:

LLLSDKJJJdkkf334$\ttttks)))K99

so, you see the random strings '\n' and '\t' are stopping me from
being able to parse my file correctly. Any
suggestions on how to overcome this problem would be greatly
appreciated.

Many thanks,

Elsa



More information about the Python-list mailing list