what happens when the file begin read is too big for all lines to be read with "readlines()"

Sun Nov 20 04:47:27 EST 2005

Steve Holden wrote:

>Xiao Jianfeng wrote:
>  
>
>>Steven D'Aprano wrote:
>>
>>
>>    
>>
>>>On Sun, 20 Nov 2005 11:05:53 +0800, Xiao Jianfeng wrote:
>>>
>>>
>>>
>>>
>>>      
>>>
>>>>I have some other questions:
>>>>
>>>>when "fh" will be closed?
>>>>  
>>>>
>>>>        
>>>>
>>>When all references to the file are no longer in scope:
>>>
>>>def handle_file(name):
>>>  fp = file(name, "r")
>>>  # reference to file now in scope
>>>  do_stuff(fp)
>>>  return fp
>>>
>>>
>>>f = handle_file("myfile.txt)
>>># reference to file is now in scope
>>>f = None
>>># reference to file is no longer in scope
>>>
>>>At this point, Python *may* close the file. CPython currently closes the
>>>file as soon as all references are out of scope. JPython does not -- it
>>>will close the file eventually, but you can't guarantee when.
>>>
>>>
>>>
>>>
>>>      
>>>
>>>>And what shoud I do if I want to explicitly close the file immediately 
>>>>after reading all data I want?
>>>>  
>>>>
>>>>        
>>>>
>>>That is the best practice.
>>>
>>>f.close()
>>>
>>>
>>>
>>>
>>>      
>>>
>> Let me introduce my problem I came across last night first.
>>
>> I need to read a file(which may be small or very big) and to check line 
>>by line
>> to find a specific token, then the data on the next line will be what I 
>>want.
>> 
>> If I use readlines(), it will be a problem when the file is too big.
>>
>> If I use "for line in OPENED_FILE:" to read one line each time, how can 
>>I get
>> the next line when I find the specific token?
>> And I think reading one line each time is less efficient, am I right?
>>
>>    
>>
>Not necessarily. Try this:
>
>     f = file("filename.txt")
>     for line in f:
>         if token in line: # or whatever you need to identify it
>             break
>     else:
>         sys.exit("File does not contain token")
>     line = f.next()
>
>Then line will be the one you want. Since this will use code written in 
>C to do the processing you will probably be pleasantly surprised by its 
>speed. Only if this isn't fast enough should you consider anything more 
>complicated.
>
>Premature optimizations can waste huge amounts of unnecessary 
>programming time. Don't do it. First try measuring a solution that works!
>  
>
  Oh yes, thanks.

>regards
>  Steve
>  
>
  First, I must say thanks to all of you. And I'm really sorry that I 
didn't
  describe my problem clearly.

  There are many tokens in the file, every time I find a token, I have 
to get
  the data on the next line and do some operation with it. It should be easy
  for me to find just one token using the above method, but there are 
more than
  one.

  My method was:

  f_in = open('input_file', 'r')
  data_all = f_in.readlines()
  f_in.close()

  for i in range(len(data_all)):
      line = data[i]
      if token in line:
          # do something with data[i + 1]

  Since my method needs to read all the file into memeory, I think it 
may be not
  efficient when processing very big file.

  I really appreciate all suggestions! Thanks again.

  Regrads,

  xiaojf