Python garbage collector/memory manager behaving strangely

Thomas Rachel nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915 at spamschutz.glglgl.de
Thu Nov 15 06:20:32 EST 2012


Am 17.09.2012 04:28 schrieb Jadhav, Alok:
> Thanks Dave for clean explanation. I clearly understand what is going on
> now. I still need some suggestions from you on this.
>
> There are 2 reasons why I was using  self.rawfile.read().split('|\n')
> instead of self.rawfile.readlines()
>
> - As you have seen, the line separator is not '\n' but its '|\n'.
> Sometimes the data itself has '\n' characters in the middle of the line
> and only way to find true end of the line is that previous character
> should be a bar '|'. I was not able specify end of line using
> readlines() function, but I could do it using split() function.
> (One hack would be to readlines and combine them until I find '|\n'. is
> there a cleaner way to do this?)
> - Reading whole file at once and processing line by line was must
> faster. Though speed is not of very important issue here but I think the
> tie it took to parse complete file was reduced to one third of original
> time.

With

def itersep(f, sep='\0', buffering=1024, keepsep=True):
         if keepsep:
                 keepsep=sep
         else:   keepsep=''
         data = f.read(buffering)
         next_line = data # empty? -> end.
         while next_line: # -> data is empty as well.
                 lines = data.split(sep)
                 for line in lines[:-1]:
                         yield line+keepsep
                 next_line = f.read(buffering)
                 data = lines[-1] + next_line
         # keepsep: only if we have something.
         if (not keepsep) or data:
                 yield data

you can iterate over everything you want without needing too much 
memory. Using a larger "buffering" might improve speed a little bit.


Thomas



More information about the Python-list mailing list