Real-world Python code 700 times slower than C

Tue Jan 8 17:12:18 EST 2002

--- In python-list at y..., Chris Barker <chrishbarker at a...> wrote:
> I have found floats and doubles take about the same
> amount of time to compute stuff. 

Similar experience here. I have not found substantial difference in 
floats and doubles, in isolated tests. Floats were faster, but just 
by a tiny bit.

> Anyone know why this is? I can see that twice as much memory has to 
be
> allocated, de-allocated, and passed around, but that wouldn't 
account
> for a 4X slow down. Can anyone offer an explanation?

I am not sure. But I can tell you the story on my side. During the 
profiling stage of my current project, I have seen that memory access 
time of arrays is often comparable to floating-point operations. That 
is, just accessing a subindexed variable takes a time comparable to 
doing a multiplication. (I guess I am from a generation where 
floating point operations were slower.) By relocating the memory 
storage structure, the time spent in a routine can be changed by a 
factor 5 or so. I am not exactly sure why, but I guess Visual C++ may 
allocate the memory in some tricky way. I am still trying to figure 
out the whole thing. I can think of memory page-swapping as one 
possible explanation. I really wish I had more control over memory 
location and access time. I am not sure whether all this has to do 
with far memory addresses and near memory addresses, but it surely 
seems like so. I am too disconnected from modern chipset 
architecture, now. 

> about memory usage. Now I guess I have to take speed into account as
> well.

Yeah... I wish to know how to control memories better, because it 
does seem that depending on how you store arrays, the access time can 
vary greatly. I haven't been able to figure out any correlation with 
data structures (i.e.: whether it depends on classes, static/non-
static, whether it's better to store things in arrays of arrays, or 
simply matrices, whether it helps to manipulate my own memory store 
instead of using 'new'/'delete' or whether to use local variables, 
etc.) I just know that once an array is stored in the 'far' memory, 
it takes time to access them... I tried to use memcpy() and Intel's 
BLAS library to copy things into local variables, hoping that copying 
chunks of data (at "burst rate", if this makes sense at all) would 
help in speed, but it did not help: once the memory is far, it's 
slow, there is no "burst rate" gain by bringing in chunks of data 
instead of individual doubles. I can only say that it has been a 
frustrating exercise for me trying to establish correlations between 
data structure and access time. I hope someone else has better 
insight.

(Another fact: Intel's BLAS dot-product does perform faster than C++ 
code. A reason why I often use C++ arrays instead of STL template 
vectors.)

regards,

Hung Jung