[Python-Dev] Compiling Python on Linux with Intel's icc

Alex Leach albl500 at york.ac.uk
Thu Mar 1 19:39:19 CET 2012

Dear Python Devs,

I've been attempting to compile a fully functional version of Python 2.7 using 
Intel's C compiler, having built supposedly optimal versions of numpy and 
scipy, using Intel Composer XE and Intel's Math Kernel Library. I can build a 
working Python binary, but I'd really appreciate if someone could check my 
compile options, and perhaps suggest ways I could further optimise the build.

*** COMPILE FAILURE - ffi64.c ***

I've managed to compile everything in the python distribution except for 
Modules/_ctypes/libffi/src/x86/ffi64.c. So to get the compilation to actually 
work, I've had to use the config option '--with-system-ffi'. If someone could 
suggest a patch for ffi64.c, I'd happily test it, as I've been unable to fix the 
code myself! The problem is with register_args, which uses GCC's __int128_t, 
but this doesn't exist when using icc.

The include guard to use could be:-

I've tried using this guard around the register_args struct, at the top of 
ffi64.c, and where I see register_args used, around lines 592-616, according to 
the suggestion at http://software.intel.com/en-
us/forums/showthread.php?t=56652, but have been unable to get a working 
solution... A patch would be appreciated!

*** Tests ***

After compilation, there's a few tests that are consistently failing, mainly 
involved with floating point precision: test_cmath, test_math and test_float.  
Also, I wrote a very short script to test the time of for loop execution and 
integer multiplication. This script (below) has nearly always completed faster 
using the default Ubuntu Python rather than my own build.

Obviously, I was hoping to get a faster python, but the size of the final 
binary is almost twice the size of the default Ubuntu version (5.2MB cf. 
2.7MB), which I thought might cause a startup overhead that leads to slower 
execution times when running such a basic script.

$ cat ~/bin/timetest.py

RANGE = 10000

print "running {0}^2 = {1} for loop iterations".format( RANGE,RANGE**2 )

for i in xrange(RANGE):
    for j in xrange(RANGE):
        i * j

*** TIMES ***

## ICC-compiled python ##
$ time ./python ~/bin/timetest.py
running 10000^2 = 100000000 for loop iterations

real    0m2.767s
user    0m2.720s
sys     0m0.008s

## System python ##
$ time python ~/bin/timetest.py
running 10000^2 = 100000000 for loop iterations

real    0m2.781s
user    0m2.776s
sys     0m0.000s

Oh... My python appears to run faster than gcc's now - checked this a few 
times now, mine's staying faster... :) I've compiled and re-compiled python 
dozens of times now, but it's still failing some tests...

*** Build Environment ***

Ubuntu 10.10 server kernel (`uname -r`=3.0.0-16-server) with KDE 4.7.4

$ tail ~/.bashrc

#### Custom Commands
export PATH=$PATH:/usr/local/cuda/bin:$HOME/bin
export PYTHONPATH=$HOME/bin:/usr/lib/pymodules/python2.7
export PYTHONSTARTUP=$HOME/.pystartup
# Load Intel compiler and library variables.
source /usr/intel/bin/compilervars.sh intel64
source /usr/intel/impi/4.0.3/bin/mpivars.sh intel64
source /usr/intel/tbb/bin/tbbvars.sh intel64

$ env | grep 'PATH\|FLAGS'

*** Download, configure and Build instructions ***
$ hg clone -r 2.7 http://hg.python.org/cpython
$ hg update -r 2.7

*** Generate Profile-Guided Optimisation stuff with first build ***
$ make distclean && mkdir PGO
$ CC=icc AR=xiar LD=xild CXX=icpc \
   CPPFLAGS+="-I/usr/include \
	-I/usr/include/x86_86-linux-gnu \
	-I/usr/src/linux-headers-3.0.0-16-server/include/" \
   CFLAGS+="-O3 \
	-fomit-frame-pointer \
	-shared-intel \
	-fpic \
	-prof-gen \
	-prof-dir $PWD/PGO \
	-fp-model precise \
	-fp-model source \
	-xHost \
   ./configure --with-system-ffi --with-libc="-lirc" --with-libm="-limf" 
$ make -j9

*** Use the PGO-generated information in new build ***
$ make clean 
$ CC=icc AR=xiar LD=xild CXX=icpc \
   CPPFLAGS+="-I/usr/include \
	-I/usr/include/x86_86-linux-gnu \
	-I/usr/src/linux-headers-3.0.0-16-server/include/" \
   CFLAGS+="-O3 \
	-fomit-frame-pointer \
	-shared-intel \
	-fpic \
	-prof-use \
	-prof-dir $PWD/PGO \
	-fp-model precise \
	-fp-model source \
	-xHost \
	-ftz \
	-fomit-frame-pointer" \
   ./configure --with-system-ffi --with-libc="-lirc" --with-libm="-limf" 
$ make -j9

$ make test
building dbm using gdbm

Python build finished, but the necessary bits to build these modules were not 
_bsddb             bsddb185           dl              
imageop            sunaudiodev                        
To find the necessary bits, look in setup.py in detect_modules() for the 
module's name.

find ./Lib -name '*.py[co]' -print | xargs rm -f
./python -Wd -3 -E -tt  ./Lib/test/regrtest.py -l 
/usr/local/src/pysrc/cpython/Lib/unittest/util.py:2: ImportWarning: Not 
importing directory '/usr/local/src/pysrc/cpython/Lib/collections': missing 
  from collections import namedtuple, OrderedDict
== CPython 2.7.3rc1 (2.7:5c52e7c6d868+, Feb 29 2012, 22:10:22) [GCC Intel(R) 
C++ gcc 4.6 mode]
==   Linux-3.0.0-16-server-x86_64-with-debian-wheezy-sid little-endian
==   /usr/local/src/pysrc/cpython/build/test_python_16278
Testing with flags: sys.flags(debug=0, py3k_warning=1, division_warning=1, 
division_new=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, 
no_user_site=0, no_site=0, ignore_environment=1, tabcheck=2, verbose=0, 
unicode=0, bytes_warning=0, hash_randomization=0)


test test_cmath failed -- Traceback (most recent call last):
  File "/usr/local/src/pysrc/cpython/Lib/test/test_cmath.py", line 352, in 
  File "/usr/local/src/pysrc/cpython/Lib/test/test_cmath.py", line 94, in 
    'got {!r}'.format(a, b))
AssertionError: acos0000: acos(complex(0.0, 0.0))
Expected: complex(1.5707963267948966, -0.0)
Received: complex(1.5707963267948966, 0.0)
Received value insufficiently close to expected value.
test_curses skipped -- Use of the `curses' resource not enabled
test test_float failed -- Traceback (most recent call last):
  File "/usr/local/src/pysrc/cpython/Lib/test/test_float.py", line 1273, in 
    self.identical(fromHex('0x0.ffffffffffffd6p-1022'), MIN-3*TINY)
  File "/usr/local/src/pysrc/cpython/Lib/test/test_float.py", line 983, in 
    self.fail('%r not identical to %r' % (x, y))
AssertionError: 0.0 not identical to 2.2250738585072014e-308
test test_strtod failed -- multiple errors occurred; run in verbose mode for 

347 tests OK.
5 tests failed:
    test_cmath test_float test_gdb test_math test_strtod
1 test altered the execution environment:
37 tests skipped:
    test_aepack test_al test_applesingle test_bsddb test_bsddb185
    test_bsddb3 test_cd test_cl test_codecmaps_cn test_codecmaps_hk
    test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses
    test_dl test_gl test_imageop test_imgfile test_kqueue
    test_linuxaudiodev test_macos test_macostools test_msilib
    test_ossaudiodev test_scriptpackages test_smtpnet
    test_socketserver test_startfile test_sunaudiodev test_timeout
    test_tk test_ttk_guionly test_urllib2net test_urllibnet
    test_winreg test_winsound test_zipfile64
4 skips unexpected on linux2:
    test_bsddb test_bsddb3 test_tk test_ttk_guionly
make: *** [test] Error 1

*** Drill down to test_strtod error ***

$ ./python
Python 2.7.3rc1 (2.7:5c52e7c6d868+, Feb 29 2012, 22:10:22) 
[GCC Intel(R) C++ gcc 4.6 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from test import test_strtod
>>> test_strtod.test_main()
test_bigcomp (test.test_strtod.StrtodTests) ... FAIL
test_boundaries (test.test_strtod.StrtodTests) ... FAIL
test_halfway_cases (test.test_strtod.StrtodTests) ... ok
test_parsing (test.test_strtod.StrtodTests) ... FAIL
test_particular (test.test_strtod.StrtodTests) ... FAIL
test_short_halfway_cases (test.test_strtod.StrtodTests) ... ok
test_underflow_boundary (test.test_strtod.StrtodTests) ... FAIL

FAIL: test_bigcomp (test.test_strtod.StrtodTests)
Traceback (most recent call last):
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 214, in 
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in 
    "expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for 81608e-328: 
expected 0x0.0000000000002p-1022, got 0x0.0p+0

FAIL: test_boundaries (test.test_strtod.StrtodTests)
Traceback (most recent call last):
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 191, in 
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in 
    "expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for 
22250738585072002149149e-330: expected 0x0.ffffffffffffep-1022, got 0x0.0p+0

FAIL: test_parsing (test.test_strtod.StrtodTests)
Traceback (most recent call last):
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 243, in 
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in 
    "expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for -6.E-310: 
expected -0x0.06e7344a56502p-1022, got -0x0.0p+0

FAIL: test_particular (test.test_strtod.StrtodTests)
Traceback (most recent call last):
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 393, in 
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in 
    "expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for 
12579816049008305546974391768996369464963024663104e-357: expected 
0x0.90bbd7412d19fp-1022, got 0x0.0p+0

FAIL: test_underflow_boundary (test.test_strtod.StrtodTests)
Traceback (most recent call last):
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 205, in 
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in 
    "expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for 
expected 0x0.0000000000001p-1022, got 0x0.0p+0

Ran 7 tests in 0.280s

FAILED (failures=5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 396, in 
  File "/usr/local/src/pysrc/cpython/Lib/test/test_support.py", line 1094, in 
  File "/usr/local/src/pysrc/cpython/Lib/test/test_support.py", line 1077, in 
    raise TestFailed(err)
test.test_support.TestFailed: multiple errors occurred

*** Binary size and linked libraries ***
## My Intel build ##
$ ls -l ./python && ldd ./python
-rwxrwxr-x 1 user user 5.2M 2012-02-29 22:10 ./python
        linux-vdso.so.1 =>  (0x00007fffde1ec000)
        libirc.so => 
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe5f0ada000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 
        libimf.so => 
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe5f0287000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe5efcd1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fe5f107e000)
        libintlc.so.5 => 

## System build ##
$ ls -lhH /usr/bin/python && ldd /usr/bin/python
-rwxr-xr-x 1 root root 2.7M 2011-10-04 22:26 /usr/bin/python
        linux-vdso.so.1 =>  (0x00007fff509ff000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3e337ab000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 
        libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 
        libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f3e32d8f000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3e32b0b000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3e3276b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3e33c03000)

*** Conclusion (finally!) ***
The Intel Python build looks very promising, but I don't yet trust it to the 
extent that I'd to go ahead and install it or use it in place of the system 
build. None of the errors look too alarming though, so I'm confident that I 
could actually get this to work, with the right help.

If someone could help me pass these final tests and compile the ffi64.c module, 
that'd be amazing!

I hope to hear back from you,
Kind regards,

ps. Sorry how long this email turned out!
pps. I'd be happy to write up the fully working solution on a wiki or 
somewhere, if anyone has any suggestions where?

More information about the Python-Dev mailing list