parsing tab and newline delimited text

MRAB python at mrabarnett.plus.com
Tue Aug 3 23:05:56 EDT 2010


elsa wrote:
> Hi,
> 
> I have a large file of text I need to parse. Individual 'entries' are
> separated by newline characters, while fields within each entry are
> separated by tab characters.
> 
> So, an individual entry might have this form (in printed form):
> 
> Title    date   position   data
> 
> with each field separated by tabs, and a newline at the end of data.
> So, I thought I could simply open a file, read each line in in turn,
> and parse it....
> 
> f=open('MyFile')
> line=f.readline()
> parts=line.split('\t')
> 
> etc...
> 
> However, 'data' is a fairly random string of characters. Because the
> files I'm processing are large, there is a good chance that in every
> file, there is a data field that might look like this:
> 
> 899998dlKKlS\lk3#kdf\nllllKK99
> 
> or like this:
> 
> LLLSDKJJJdkkf334$\ttttks)))K99
> 
> so, you see the random strings '\n' and '\t' are stopping me from
> being able to parse my file correctly. Any
> suggestions on how to overcome this problem would be greatly
> appreciated.
> 
When you say random strings '\n', etc, are they the backslash character
\ followed by the letter n? If so, then you don't have a problem. They
are \ followed by n.

If, on the other hand, by '\n' you mean the newline character, then,
well, that's a newline character, and there's (probably) nothing you can
do about it.



More information about the Python-list mailing list