Real-world Python code 700 times slower than C
hungjunglu
hungjunglu at yahoo.com
Tue Jan 8 17:12:18 EST 2002
--- In python-list at y..., Chris Barker <chrishbarker at a...> wrote:
> I have found floats and doubles take about the same
> amount of time to compute stuff.
Similar experience here. I have not found substantial difference in
floats and doubles, in isolated tests. Floats were faster, but just
by a tiny bit.
> Anyone know why this is? I can see that twice as much memory has to
be
> allocated, de-allocated, and passed around, but that wouldn't
account
> for a 4X slow down. Can anyone offer an explanation?
I am not sure. But I can tell you the story on my side. During the
profiling stage of my current project, I have seen that memory access
time of arrays is often comparable to floating-point operations. That
is, just accessing a subindexed variable takes a time comparable to
doing a multiplication. (I guess I am from a generation where
floating point operations were slower.) By relocating the memory
storage structure, the time spent in a routine can be changed by a
factor 5 or so. I am not exactly sure why, but I guess Visual C++ may
allocate the memory in some tricky way. I am still trying to figure
out the whole thing. I can think of memory page-swapping as one
possible explanation. I really wish I had more control over memory
location and access time. I am not sure whether all this has to do
with far memory addresses and near memory addresses, but it surely
seems like so. I am too disconnected from modern chipset
architecture, now.
> about memory usage. Now I guess I have to take speed into account as
> well.
Yeah... I wish to know how to control memories better, because it
does seem that depending on how you store arrays, the access time can
vary greatly. I haven't been able to figure out any correlation with
data structures (i.e.: whether it depends on classes, static/non-
static, whether it's better to store things in arrays of arrays, or
simply matrices, whether it helps to manipulate my own memory store
instead of using 'new'/'delete' or whether to use local variables,
etc.) I just know that once an array is stored in the 'far' memory,
it takes time to access them... I tried to use memcpy() and Intel's
BLAS library to copy things into local variables, hoping that copying
chunks of data (at "burst rate", if this makes sense at all) would
help in speed, but it did not help: once the memory is far, it's
slow, there is no "burst rate" gain by bringing in chunks of data
instead of individual doubles. I can only say that it has been a
frustrating exercise for me trying to establish correlations between
data structure and access time. I hope someone else has better
insight.
(Another fact: Intel's BLAS dot-product does perform faster than C++
code. A reason why I often use C++ arrays instead of STL template
vectors.)
regards,
Hung Jung
More information about the Python-list
mailing list