affects on extended modules

Pedro pedro_rodriguez at club-internet.fr
Fri Dec 28 05:19:56 EST 2001


"Curtis Jensen" <cjensen at bioeng.ucsd.edu> wrote:

> Pedro <pedro_rodriguez at club-internet.fr> wrote in message
> news:<pan.2001.12.06.12.45.41.172197.2456 at club-internet.fr>...
>> "Curtis Jensen" <cjensen at bioeng.ucsd.edu> wrote:
>> 
>> > Kragen Sitaker wrote:
>> >> 
>> >> Curtis Jensen <cjensen at bioeng.ucsd.edu> writes:
>> >> > We have created a python interface to some core libraries of our
>> >> > own making.  We also have a C interface to these same libraries.
>> >> > However, the the python interface seems to affect the speed of the
>> >> > extended libraries.  ie.  some library routines have their own
>> >> > benchmark code, and the time of exection from the start of the
>> >> > library routine to the end of the library routine (not including
>> >> > any python code execution), takes longer than it's C counterpart.
>> >> 
>> >> In the Python version, the code is in a Python extension module,
>> >> right?
>> >>  A .so or .dll file?  Is it also in the C counterpart?  (If that's
>> >>  not
>> >> it, can you provide more details on how you compiled and linked the
>> >> two?)
>> >> 
>> >> In general, referring to dynamically loaded things through symbols
>> >> --- even from within the same file --- tends to be slower than
>> >> referring to things that aren't dynamically loaded.
>> >> 
>> >> What architecture are you on?  If you're on the x86, maybe Numeric
>> >> is being stupid and allocating things that aren't maximally aligned.
>> >>  But you'd probably notice a pretty drastic difference in that case.
>> >> 
>> >> ... or maybe Numeric is being stupid and allocating things in a way
>> >> that causes cache-line contention.
>> >> 
>> >> Hope this helps.
>> > 
>> > Thanks for the responce.  The C counterpart is directly linked
>> > together into one large binary (yes, the python is using a dynamicaly
>> > linked object file, a .so).  So, That might be the source of the
>> > problem.  I can try and make a dynamicaly linked version of the C
>> > counterpart and see how that affects the speed.  We are running on
>> > IRIX 6.5 machines (mips).
>> > Thanks.
>> > 
>> > 
>> Don't know if this helps but I had a similar problem on Linux.
>> 
>> The context was : a python script was calling an external program and
>> parsing output (with popen) many times. I decided to optimize this by
>> turning the external program into a dynamicaly linked library with
>> python bindings. I expected to gain the extra system calls to fork and
>> start a new process, but it turned out that this solution was slower.
>> 
>> The problem was caused by multithreading stuff. When using the library
>> straight from a C program, I didn't link with multithreaded libraries
>> and so all system calls weren't protected (they don't need to lock and
>> unlock their resources).
>> 
>> Unfortunately, the library was reading files with fgetc (character by
>> character :( ). Since the Python version I used was compiled with
>> multi-threading enabled, it turned out that the fgetc function used in
>> this case lock/unlock features, which cause the extra waste of time.
>> 
>> To find this, I compiled my library with profiling (I think I needed to
>> use some system call to activate profiling from the library, since I
>> couldn't rebuild Python).
>> 
>> OT : at the end I fixed the library (fgetc replaced by fgets), and
>> didn't gain anything by turning the external program into a python
>> extension. Since it seemed that Linux disk cache was good, I removed
>> the python extension thus keeping a pure Python program, and
>> implemented a cache for the results of the external program. This was
>> much simpler and more efficient in this case.
> 
> 
> Is this a problem with i/o only?  Our the code sections that we
> benchmarked has no i/o in it.
> 
> --
> Curtis Jensen

In my case, it was only i/o related.

If your problem, as I understand it, is :
+ I've got a function f() written in C
+ f() execution is doing some benchmark telling how much time it took
  to complete
+ calling f() from a C binary gives a (significant) shorter duration
  than calling (the same) f() from a Python extension

you may have to check what f() is doing, because , what I was stating is,
that it may be affected by the python environment :

- Are doing extensive calls to an external library ?
  In my case, some glibc calls need to inforce reentrancy protection
  when running in a multithreaded context. These protections blew out
  any gain.

- If you're doing calls to external libraries, are you linked against
  the same versions ? (ldd on binaries and libraries may help)

- More basicaly, did you compile with the same options ?
  Could the differences point to a possible source of your problem ?
  (may be worth checking optimization, debug, conditional compilation
  options)

Regards,
-- 

Pedro





More information about the Python-list mailing list