[Tutor] Parsing text file with Python

Alan Gauld alan.gauld at btinternet.com
Sat Mar 24 01:48:36 CET 2007


"Jay Mutter III" <jmutter at uakron.edu> wrote
> 1.)  Are there better ways to write this?

There are always other ways, as to which is better depends
on your judgement criteria. Your way works.

> 2.) As it writes out the one group to the new file for companies it
> is as if it leaves blank lines behind for if I don't have the elif 
> len
> (line) . 1 the
>       inventor's file has blank lines in it.

I'm not sure what you mean here can you elaborate,
maybe with some sample data?

> 3.) I reopened the inventor's file to get a count of lines but is
> there a better way to do this?

You could track the numbers of items being written as you go.
The only disadvantage of your technique is the time invloved
in opening the file and rereading the data then counting it.
On a really big file that could take a long time. But it has
the big advantage of simplicity.

A couple of points:

> in_filename = raw_input('What is the COMPLETE name of the file you
> would like to process?    ')
> in_file = open(in_filename, 'rU')

You might want to put your file opening code inside a try/except
in case the file isn't there or is locked.

> text = in_file.readlines()
> count = len(text)
> print "There are ", count, 'lines to process in this file'

Unless this is really useful info you could simplify by
omitting the readlines and count and just iterating over
the file. If you use enumerate you even get the final
count for free at the end.

for count,line in enumerate(in_file):
     # count is the line number, line the data

> for line in text:
>     if line.endswith(')\n'):
>         companies.write(line)
>     elif line.endswith(') \n'):
>         companies.write(line)

You could use a boolean or to combine these:

     if line.endswith(')\n') or line.endswith(') \n'):
         companies.write(line)

> in_filename2 = raw_input('What was the name of the inventor\'s
> file ?    ')

Given you opened it surely you already know?
It should be stored in patentdata so you don't need
to ask again?

Also you could use flush() and then seek(0) and then readlines()
before closing the file to get the count. but frankly thats being 
picky.


> in_file2 = open(in_filename2, 'rU')
> text2 = in_file2.readlines()
> count = len(text2)

Well done,


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld 




More information about the Tutor mailing list