[BangPypers] How to handle files efficiently in python

Thu Mar 24 06:33:10 CET 2011

On Thu, Mar 24, 2011 at 7:56 AM, Senthil Kumaran <orsenthil at gmail.com>wrote:

> On Thu, Mar 24, 2011 at 02:25:04AM +0530, Vishal wrote:
> > if you could read the entire file in one go...(i.e. unless your file is
> more
> > than 50MB)...how about the following?
>
> >>> for line in reversed((open('filename').readlines()[-1:-n:-1])):
> ...     print line
>
> Some comments:
>
> > # n is the number of lines you want to read.
> > l = open(filename).read().rsplit('\n', n+1)
>
> - readlines would be better.
>
> > # following is to keep the memory requirement low.
> > # but this is optional, if you only want to print the lines, and then end
> > the python process.
> > l[0] = None
>
> - Could not get why you are setting the first item to None.
>
> > gc.collect()
>
> This does not free anything. Where is something un-referenced for it
> to garbage collect?
>
> --
> Senthil
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>

setting l[0] to None, un-references the earlier string data associated with
that name, which is then (force) collected by the collect() call.
I have tried it multiple times, (on my windows box) and it works perfectly.
In fact, I found it to be the only way to make sure memory consumption stays
low when I have to deal with reading data columns from files that are in
hundreds of megabytes.

Would love to know of other deterministic ways of freeing memory in Python.

-- 
Thanks and best regards,
Vishal Sapre