use fileinput to read a specific line

Fredrik Lundh fredrik at pythonware.com
Tue Jan 8 05:59:49 EST 2008


jo3c wrote:

> hi everybody
> im a newbie in python
> i need to read line 4 from a header file
> using linecache will crash my computer due to memory loading, because
> i am working on 2000 files each is 8mb
> 
> fileinput don't load the file into memory first
> how do i use fileinput module to read a specific line from a file?
> 
> for line in fileinput.Fileinput('sample.txt')
> ????

I could have sworn that I posted working code (including an explanation 
why linecache wouldn't work) the last time you asked about this...  yes, 
here it is again:

 > i have a 2000 files with header and data
 > i need to get the date information from the header
 > then insert it into my database
 > i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt')
 > to get the date on line 4 in the txt file i use
 > linecache.getline('/mydata/myfile.txt/, 4)
 >
 > but if i use
 > linecache.getline('glob.glob('/mydata/*/*/*.txt', 4) won't work

glob.glob returns a list of filenames, so you need to call getline once 
for each file in the list.

but using linecache is absolutely the wrong tool for this; it's designed 
for *repeated* access to arbitrary lines in a file, so it keeps all the
data in memory.  that is, all the lines, for all 2000 files.

if the files are small, and you want to keep the code short, it's easier 
to just grab the file's content and using indexing on the resulting list:

     for filename in glob.glob('/mydata/*/*/*.txt'):
         line = list(open(filename))[4-1]
         ... do something with line ...

(note that line numbers usually start with 1, but Python's list indexing 
starts at 0).

if the files might be large, use something like this instead:

     for filename in glob.glob('/mydata/*/*/*.txt'):
         f = open(filename)
         # skip first three lines
         f.readline(); f.readline(); f.readline()
         # grab the line we want
         line = f.readline()
         ... do something with line ...

</F>




More information about the Python-list mailing list