[Python-Dev] Compiling Python on Linux with Intel's icc
Alex Leach
albl500 at york.ac.uk
Thu Mar 1 19:39:19 CET 2012
Dear Python Devs,
I've been attempting to compile a fully functional version of Python 2.7 using
Intel's C compiler, having built supposedly optimal versions of numpy and
scipy, using Intel Composer XE and Intel's Math Kernel Library. I can build a
working Python binary, but I'd really appreciate if someone could check my
compile options, and perhaps suggest ways I could further optimise the build.
*** COMPILE FAILURE - ffi64.c ***
I've managed to compile everything in the python distribution except for
Modules/_ctypes/libffi/src/x86/ffi64.c. So to get the compilation to actually
work, I've had to use the config option '--with-system-ffi'. If someone could
suggest a patch for ffi64.c, I'd happily test it, as I've been unable to fix the
code myself! The problem is with register_args, which uses GCC's __int128_t,
but this doesn't exist when using icc.
The include guard to use could be:-
#ifdef __INTEL_COMPILER
...
#else
...
#endif
I've tried using this guard around the register_args struct, at the top of
ffi64.c, and where I see register_args used, around lines 592-616, according to
the suggestion at http://software.intel.com/en-
us/forums/showthread.php?t=56652, but have been unable to get a working
solution... A patch would be appreciated!
*** Tests ***
After compilation, there's a few tests that are consistently failing, mainly
involved with floating point precision: test_cmath, test_math and test_float.
Also, I wrote a very short script to test the time of for loop execution and
integer multiplication. This script (below) has nearly always completed faster
using the default Ubuntu Python rather than my own build.
Obviously, I was hoping to get a faster python, but the size of the final
binary is almost twice the size of the default Ubuntu version (5.2MB cf.
2.7MB), which I thought might cause a startup overhead that leads to slower
execution times when running such a basic script.
*** TEST SCRIPT ***
$ cat ~/bin/timetest.py
RANGE = 10000
print "running {0}^2 = {1} for loop iterations".format( RANGE,RANGE**2 )
for i in xrange(RANGE):
for j in xrange(RANGE):
i * j
*** TIMES ***
## ICC-compiled python ##
$ time ./python ~/bin/timetest.py
running 10000^2 = 100000000 for loop iterations
real 0m2.767s
user 0m2.720s
sys 0m0.008s
## System python ##
$ time python ~/bin/timetest.py
running 10000^2 = 100000000 for loop iterations
real 0m2.781s
user 0m2.776s
sys 0m0.000s
Oh... My python appears to run faster than gcc's now - checked this a few
times now, mine's staying faster... :) I've compiled and re-compiled python
dozens of times now, but it's still failing some tests...
*** Build Environment ***
Ubuntu 10.10 server kernel (`uname -r`=3.0.0-16-server) with KDE 4.7.4
$ tail ~/.bashrc
#### Custom Commands
export PATH=$PATH:/usr/local/cuda/bin:$HOME/bin
export PYTHONPATH=$HOME/bin:/usr/lib/pymodules/python2.7
export PYTHONSTARTUP=$HOME/.pystartup
export
LD_LIBRARY_PATH=/lib64:/usr/lib64:/usr/local/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib
# Load Intel compiler and library variables.
source /usr/intel/bin/compilervars.sh intel64
source /usr/intel/impi/4.0.3/bin/mpivars.sh intel64
source /usr/intel/tbb/bin/tbbvars.sh intel64
$ env | grep 'PATH\|FLAGS'
MANPATH=/usr/intel/impi/4.0.3.008/man:/usr/intel/composer_xe_2011_sp1.9.293/man/en_US:/usr/intel/composer_xe_2011_sp1.9.293/man/en_US:/usr/intel/impi/4.0.3.008/man:/usr/intel/composer_xe_2011_sp1.9.293/man/en_US:/usr/intel/composer_xe_2011_sp1.9.293/man/en_US:/usr/intel/impi/4.0.3.008/man:/usr/intel/composer_xe_2011_sp1.9.293/man/en_US:/usr/intel/composer_xe_2011_sp1.9.293/man/en_US:/usr/local/man:/usr/local/share/man:/usr/share/man:/usr/intel/man:::
LIBRARY_PATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21
FPATH=/usr/intel/composer_xe_2011_sp1.9.293/mkl/include:/usr/intel/composer_xe_2011_sp1.9.293/mkl/include
LD_LIBRARY_PATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/impi/4.0.3.008/ia32/lib:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/biol/arb/lib:/lib64:/usr/lib64:/usr/local/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/intel/composer_xe_2011_sp1.9.293/debugger/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/lib/intel64
CPATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/mkl/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include
NLSPATH=/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64/locale/%l_%t/%N:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64/locale/%l_%t/%N:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64/locale/%l_%t/%N:/usr/intel/composer_xe_2011_sp1.9.293/debugger/intel64/locale/%l_%t/%N
PATH=/usr/intel/impi/4.0.3.008/ia32/bin:/usr/intel/composer_xe_2011_sp1.9.293/bin/intel64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/intel/bin:/usr/local/cuda/bin:/usr/local/cuda/bin:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64
PYTHONPATH=/usr/lib/pymodules/python2.7/
WINDOWPATH=7
QT_PLUGIN_PATH=$HOME/.kde/lib/kde4/plugins/:/usr/lib/kde4/plugins/
*** Download, configure and Build instructions ***
$ hg clone -r 2.7 http://hg.python.org/cpython
Since...
$ hg update -r 2.7
*** Generate Profile-Guided Optimisation stuff with first build ***
$ make distclean && mkdir PGO
$ CC=icc AR=xiar LD=xild CXX=icpc \
CPPFLAGS+="-I/usr/include \
-I/usr/include/x86_86-linux-gnu \
-I/usr/src/linux-headers-3.0.0-16-server/include/" \
CFLAGS+="-O3 \
-fomit-frame-pointer \
-shared-intel \
-fpic \
-prof-gen \
-prof-dir $PWD/PGO \
-fp-model precise \
-fp-model source \
-xHost \
-ftz"
./configure --with-system-ffi --with-libc="-lirc" --with-libm="-limf"
$ make -j9
*** Use the PGO-generated information in new build ***
$ make clean
$ CC=icc AR=xiar LD=xild CXX=icpc \
CPPFLAGS+="-I/usr/include \
-I/usr/include/x86_86-linux-gnu \
-I/usr/src/linux-headers-3.0.0-16-server/include/" \
CFLAGS+="-O3 \
-fomit-frame-pointer \
-shared-intel \
-fpic \
-prof-use \
-prof-dir $PWD/PGO \
-fp-model precise \
-fp-model source \
-xHost \
-ftz \
-fomit-frame-pointer" \
./configure --with-system-ffi --with-libc="-lirc" --with-libm="-limf"
$ make -j9
...
$ make test
building dbm using gdbm
Python build finished, but the necessary bits to build these modules were not
found:
_bsddb bsddb185 dl
imageop sunaudiodev
To find the necessary bits, look in setup.py in detect_modules() for the
module's name.
find ./Lib -name '*.py[co]' -print | xargs rm -f
./python -Wd -3 -E -tt ./Lib/test/regrtest.py -l
/usr/local/src/pysrc/cpython/Lib/unittest/util.py:2: ImportWarning: Not
importing directory '/usr/local/src/pysrc/cpython/Lib/collections': missing
__init__.py
from collections import namedtuple, OrderedDict
== CPython 2.7.3rc1 (2.7:5c52e7c6d868+, Feb 29 2012, 22:10:22) [GCC Intel(R)
C++ gcc 4.6 mode]
== Linux-3.0.0-16-server-x86_64-with-debian-wheezy-sid little-endian
== /usr/local/src/pysrc/cpython/build/test_python_16278
Testing with flags: sys.flags(debug=0, py3k_warning=1, division_warning=1,
division_new=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0,
no_user_site=0, no_site=0, ignore_environment=1, tabcheck=2, verbose=0,
unicode=0, bytes_warning=0, hash_randomization=0)
.........
test_cmath
test test_cmath failed -- Traceback (most recent call last):
File "/usr/local/src/pysrc/cpython/Lib/test/test_cmath.py", line 352, in
test_specific_values
msg=error_message)
File "/usr/local/src/pysrc/cpython/Lib/test/test_cmath.py", line 94, in
rAssertAlmostEqual
'got {!r}'.format(a, b))
AssertionError: acos0000: acos(complex(0.0, 0.0))
Expected: complex(1.5707963267948966, -0.0)
Received: complex(1.5707963267948966, 0.0)
Received value insufficiently close to expected value.
...
test_curses skipped -- Use of the `curses' resource not enabled
...
test_float
test test_float failed -- Traceback (most recent call last):
File "/usr/local/src/pysrc/cpython/Lib/test/test_float.py", line 1273, in
test_from_hex
self.identical(fromHex('0x0.ffffffffffffd6p-1022'), MIN-3*TINY)
File "/usr/local/src/pysrc/cpython/Lib/test/test_float.py", line 983, in
identical
self.fail('%r not identical to %r' % (x, y))
AssertionError: 0.0 not identical to 2.2250738585072014e-308
.....
test test_strtod failed -- multiple errors occurred; run in verbose mode for
details
......
347 tests OK.
5 tests failed:
test_cmath test_float test_gdb test_math test_strtod
1 test altered the execution environment:
test_distutils
37 tests skipped:
test_aepack test_al test_applesingle test_bsddb test_bsddb185
test_bsddb3 test_cd test_cl test_codecmaps_cn test_codecmaps_hk
test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses
test_dl test_gl test_imageop test_imgfile test_kqueue
test_linuxaudiodev test_macos test_macostools test_msilib
test_ossaudiodev test_scriptpackages test_smtpnet
test_socketserver test_startfile test_sunaudiodev test_timeout
test_tk test_ttk_guionly test_urllib2net test_urllibnet
test_winreg test_winsound test_zipfile64
4 skips unexpected on linux2:
test_bsddb test_bsddb3 test_tk test_ttk_guionly
make: *** [test] Error 1
*** Drill down to test_strtod error ***
$ ./python
Python 2.7.3rc1 (2.7:5c52e7c6d868+, Feb 29 2012, 22:10:22)
[GCC Intel(R) C++ gcc 4.6 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from test import test_strtod
>>> test_strtod.test_main()
test_bigcomp (test.test_strtod.StrtodTests) ... FAIL
test_boundaries (test.test_strtod.StrtodTests) ... FAIL
test_halfway_cases (test.test_strtod.StrtodTests) ... ok
test_parsing (test.test_strtod.StrtodTests) ... FAIL
test_particular (test.test_strtod.StrtodTests) ... FAIL
test_short_halfway_cases (test.test_strtod.StrtodTests) ... ok
test_underflow_boundary (test.test_strtod.StrtodTests) ... FAIL
======================================================================
FAIL: test_bigcomp (test.test_strtod.StrtodTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 214, in
test_bigcomp
self.check_strtod(s)
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in
check_strtod
"expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for 81608e-328:
expected 0x0.0000000000002p-1022, got 0x0.0p+0
======================================================================
FAIL: test_boundaries (test.test_strtod.StrtodTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 191, in
test_boundaries
self.check_strtod(s)
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in
check_strtod
"expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for
22250738585072002149149e-330: expected 0x0.ffffffffffffep-1022, got 0x0.0p+0
======================================================================
FAIL: test_parsing (test.test_strtod.StrtodTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 243, in
test_parsing
self.check_strtod(s)
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in
check_strtod
"expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for -6.E-310:
expected -0x0.06e7344a56502p-1022, got -0x0.0p+0
======================================================================
FAIL: test_particular (test.test_strtod.StrtodTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 393, in
test_particular
self.check_strtod(s)
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in
check_strtod
"expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for
12579816049008305546974391768996369464963024663104e-357: expected
0x0.90bbd7412d19fp-1022, got 0x0.0p+0
======================================================================
FAIL: test_underflow_boundary (test.test_strtod.StrtodTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 205, in
test_underflow_boundary
self.check_strtod(s)
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 105, in
check_strtod
"expected {}, got {}".format(s, expected, got))
AssertionError: Incorrectly rounded str->float conversion for
24703282292062327208828439643411068618252990130716238221279284125033775363572e-400:
expected 0x0.0000000000001p-1022, got 0x0.0p+0
----------------------------------------------------------------------
Ran 7 tests in 0.280s
FAILED (failures=5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/src/pysrc/cpython/Lib/test/test_strtod.py", line 396, in
test_main
test_support.run_unittest(StrtodTests)
File "/usr/local/src/pysrc/cpython/Lib/test/test_support.py", line 1094, in
run_unittest
_run_suite(suite)
File "/usr/local/src/pysrc/cpython/Lib/test/test_support.py", line 1077, in
_run_suite
raise TestFailed(err)
test.test_support.TestFailed: multiple errors occurred
*** Binary size and linked libraries ***
## My Intel build ##
$ ls -l ./python && ldd ./python
-rwxrwxr-x 1 user user 5.2M 2012-02-29 22:10 ./python
linux-vdso.so.1 => (0x00007fffde1ec000)
libirc.so =>
/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64/libirc.so
(0x00007fe5f0f30000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x00007fe5f0cde000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe5f0ada000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1
(0x00007fe5f08d7000)
libimf.so =>
/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64/libimf.so
(0x00007fe5f050b000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe5f0287000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
(0x00007fe5f0071000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe5efcd1000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe5f107e000)
libintlc.so.5 =>
/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64/libintlc.so.5
(0x00007fe5efb85000)
## System build ##
$ ls -lhH /usr/bin/python && ldd /usr/bin/python
-rwxr-xr-x 1 root root 2.7M 2011-10-04 22:26 /usr/bin/python
linux-vdso.so.1 => (0x00007fff509ff000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x00007f3e339b0000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3e337ab000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1
(0x00007f3e335a8000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0
(0x00007f3e33357000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
(0x00007f3e32fa7000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f3e32d8f000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3e32b0b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3e3276b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3e33c03000)
*** Conclusion (finally!) ***
The Intel Python build looks very promising, but I don't yet trust it to the
extent that I'd to go ahead and install it or use it in place of the system
build. None of the errors look too alarming though, so I'm confident that I
could actually get this to work, with the right help.
If someone could help me pass these final tests and compile the ffi64.c module,
that'd be amazing!
I hope to hear back from you,
Kind regards,
Alex
ps. Sorry how long this email turned out!
pps. I'd be happy to write up the fully working solution on a wiki or
somewhere, if anyone has any suggestions where?
More information about the Python-Dev
mailing list