reading specific lines of a file

John Machin sjmachin at lexicon.net
Sun Jul 16 01:39:32 EDT 2006


On 16/07/2006 2:54 PM, Nick Vatamaniuc top-posted:
> Yi,
> Use the linecache module.

Yi, *don't* use the linecache module without carefully comparing the 
documentation and the implementation with your requirements.

You will find that you have the source code on your computer -- mine 
(Windows box) is at c:\Python24\Lib\linecache.py. When you read right 
down to the end (it's not a large file, only 108 lines), you'll find this:

     try:
         fp = open(fullname, 'rU')
         lines = fp.readlines()
         fp.close()
     except IOError, msg:
##      print '*** Cannot open', fullname, ':', msg
         return []
     size, mtime = stat.st_size, stat.st_mtime
     cache[filename] = size, mtime, lines, fullname

Looks like it's caching the *whole* of *each* file. Not unreasonable 
given it appears to have been written to get source lines to include in 
tracebacks.

It might just not be what you want if as you say you have "a huge txt 
file". How many megabytes is "huge"?

Cheers,
John

  The documentation states that :
> """
> The linecache module allows one to get any line from any file, while
> attempting to optimize internally, using a cache, the common case where
> many lines are read from a single file.
>>>> import linecache
>>>> linecache.getline('/etc/passwd', 4)
> 'sys:x:3:3:sys:/dev:/bin/sh\012'
> """
> 
> Please note that you cannot really skip over the lines unless each has
> a fixed known size. (and if all lines have a fixed, known size then
> they can be considered as 'records' and you can use seek() and other
> random access magic. That is why sometimes it is a lot faster to use
> fixed length rows in a database => increase the speed of search but at
> the expense of wasted space! - but this is a another topic for another
> discussion...).
> 
> So the point is that you won't be able to jump to line 15000 without
> reading lines 0-14999. You can either iterate over the rows by yourself
> or simply use the 'linecache' module like shown above. If I were you I
> would use the linecache, but of course you don't mention anything about
> the context of your project so it is hard to say.
> 
> Hope this helps,
> Nick Vatamaniuc
> 
> 
> Yi Xing wrote:
>> Hi All,
>>
>> I want to read specific lines of a huge txt file (I know the line #).
>> Each line might have different sizes. Is there a convenient and fast
>> way of doing this in Python? Thanks.
>>
>> Yi Xing
> 



More information about the Python-list mailing list