Fast forward-backward (write-read)
Tim Chase
python.list at tim.thechases.com
Tue Oct 23 12:53:37 EDT 2012
On 10/23/12 11:17, Paul Rubin wrote:
> Virgil Stokes <vs at it.uu.se> writes:
>> Finally, to my question --- What is a fast way to write these
>> variables to an external file and then read them in backwards?
>
> Seeking backwards in files works, but the performance hit is
> significant. There is also a performance hit to scanning pointers
> backwards in memory, due to cache misprediction. If it's something
> you're just running a few times, seeking backwards the simplest
> approach. If you're really trying to optimize the thing, you might
> buffer up large chunks (like 1 MB) before writing. If you're writing
> once and reading multiple times, you might reverse the order of records
> within the chunks during the writing phase.
I agree with Paul here, it's been a while since I did it, and my
dataset was small enough (and passed through once) so I just let it
run. Writing larger chunks is definitely a good way to go.
> You're of course taking a performance bath from writing the program in
> Python to begin with (unless using scipy/numpy or the like), enough that
> it might dominate any effects of how the files are written.
I usually find that the I/O almost always overwhelms the actual
processing.
> Of course (it should go without saying) that you want to dump in a
> binary format rather than converting to decimal.
Again, the conversion to/from decimal hasn't been a great cost in my
experience, as it's overwhelmed by the I/O cost of shoveling the
data to/from disk.
-tkc
More information about the Python-list
mailing list