linecache and glob

jo3c JO3chiang at gmail.com
Mon Jan 7 22:26:47 EST 2008


On Jan 4, 5:25 pm, Fredrik Lundh <fred... at pythonware.com> wrote:
> jo3c wrote:
> > i have a 2000 files with header and data
> > i need to get the date information from the header
> > then insert it into my database
> > i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt')
> > to get the date on line 4 in the txt file i use
> > linecache.getline('/mydata/myfile.txt/, 4)
>
> > but if i use
> > linecache.getline('glob.glob('/mydata/*/*/*.txt', 4) won't work
>
> glob.glob returns a list of filenames, so you need to call getline once
> for each file in the list.
>
> but using linecache is absolutely the wrong tool for this; it's designed
> for *repeated* access to arbitrary lines in a file, so it keeps all the
> data in memory.  that is, all the lines, for all 2000 files.
>
> if the files are small, and you want to keep the code short, it's easier
> to just grab the file's content and using indexing on the resulting list:
>
>      for filename in glob.glob('/mydata/*/*/*.txt'):
>          line = list(open(filename))[4-1]
>          ... do something with line ...
>
> (note that line numbers usually start with 1, but Python's list indexing
> starts at 0).
>
> if the files might be large, use something like this instead:
>
>      for filename in glob.glob('/mydata/*/*/*.txt'):
>          f = open(filename)
>          # skip first three lines
>          f.readline(); f.readline(); f.readline()
>          # grab the line we want
>          line = f.readline()
>          ... do something with line ...
>
> </F>

thank you guys, i did hit a wall using linecache, due to large file
loading into memory.. i think this last solution works well for me
thanks



More information about the Python-list mailing list