Fast forward-backward (write-read)

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Oct 23 18:53:42 EDT 2012


On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:

> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs at it.uu.se> wrote:
>> I am working with some rather large data files (>100GB) 
[...]
>> Finally, to my question --- What is a fast way to write these variables
>> to an external file and then read them in backwards?
> 
> Don't forget to use timeit for an average OS utilization.

Given that the data files are larger than 100 gigabytes, the time 
required to process each file is likely to be in hours, not microseconds. 
That being the case, timeit is the wrong tool for the job, it is 
optimized for timings tiny code snippets. You could use it, of course, 
but the added inconvenience doesn't gain you any added accuracy.

Here's a neat context manager that makes timing long-running code simple:


http://code.activestate.com/recipes/577896



> I'd suggest two list comprehensions for now, until I've reviewed it some
> more:

I would be very surprised if the poster will be able to fit 100 gigabytes 
of data into even a single list comprehension, let alone two.

This is a classic example of why the old external processing algorithms 
of the 1960s and 70s will never be obsolete. No matter how much memory 
you have, there will always be times when you want to process more data 
than you can fit into memory.



-- 
Steven



More information about the Python-list mailing list