From vlad-fbsd at acheronmedia.com Wed Dec 6 05:56:03 2017 From: vlad-fbsd at acheronmedia.com (Vlad K.) Date: Wed, 06 Dec 2017 11:56:03 +0100 Subject: [Wheel-builders] manylinux compile/optimization options Message-ID: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> Hello list, I need to understand if there are any specific compilation options set for the manylinux precompiled binaries, by the packaging system or by individual python module developers. We have an image processing application that does certain color correction, resizing and watermarking of images for a webapp. The sources are quite large (some as large as 100 megapixels) and the execution time is very noticeable, some images taking up to a minute to complete (yes we have post-processing cache, but that initial hit is crucial). I have noticed a difference in execution time of our test and production cases between Linux (Gentoo) and FreeBSD. Takes the same code on FreeBSD 3 times more to process. Investigating further I realized on Linux that pip is pulling-in already compiled .so libraries, whereas on FreeBSD it compiles locally (testing with virtualenv and installation with pip). So when I reinstalled on Linux but ignoring the precompiled .so libs (with --no-binary :all: given to pip), the execution time shot up 3x (longer) and equals that of FreeBSD. But, when I recompiled again, but forcing CFLAGS="-O3" (just tried as a test, and that worked right off teh bat), the execution time came down to equal the one when installed from PyPi wheel cache. Same thing happened when I did so on FreeBSD. So adding -O3 to CFLAGS when building Pillow and ignoring precompiled libs from wheel cache, improved execution time (of our use case) 3 times! Now, since -O3 is not a wise thing to do unless you know what you're doing (and I don't, in this case), I need to find out if that's what was added to the PyPi wheel cache, or something else, and where was that defined as I could find no relevant CFLAGS definitions in the Pillow source code (or maybe it's there but I just can't see it), or in fact any compilation flags other than -O0 for the test cases. I also need to understand this so I can force the same compilation environment for the FreeBSD port of Python Pillow module (graphics/py-pillow). Thanks. -- Vlad K. From njs at pobox.com Wed Dec 6 06:03:29 2017 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Dec 2017 03:03:29 -0800 Subject: [Wheel-builders] manylinux compile/optimization options In-Reply-To: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> References: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> Message-ID: On Wed, Dec 6, 2017 at 2:56 AM, Vlad K. wrote: > Hello list, > > I need to understand if there are any specific compilation options set for > the manylinux precompiled binaries, by the packaging system or by individual > python module developers. The manylinux docker image doesn't do anything special here. (In fact it ships a rather old version of gcc -- I'd expect newer versions to generate even faster code.) > Now, since -O3 is not a wise thing to do unless you know what you're doing > (and I don't, in this case), I need to find out if that's what was added to > the PyPi wheel cache, or something else, and where was that defined as I > could find no relevant CFLAGS definitions in the Pillow source code (or > maybe it's there but I just can't see it), or in fact any compilation flags > other than -O0 for the test cases. > > I also need to understand this so I can force the same compilation > environment for the FreeBSD port of Python Pillow module > (graphics/py-pillow). I think your best bet is to ask the Pillow developers, or whoever builds their wheels for distribution. -n -- Nathaniel J. Smith -- https://vorpus.org From vlad-fbsd at acheronmedia.com Wed Dec 6 06:33:20 2017 From: vlad-fbsd at acheronmedia.com (Vlad K.) Date: Wed, 06 Dec 2017 12:33:20 +0100 Subject: [Wheel-builders] manylinux compile/optimization options In-Reply-To: References: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> Message-ID: On 2017-12-06 12:03, Nathaniel Smith wrote: > > I think your best bet is to ask the Pillow developers, or whoever > builds their wheels for distribution. Thanks, I'll do that. -- Vlad K. From rmcgibbo at gmail.com Wed Dec 6 10:15:52 2017 From: rmcgibbo at gmail.com (Robert T. McGibbon) Date: Wed, 6 Dec 2017 10:15:52 -0500 Subject: [Wheel-builders] manylinux compile/optimization options In-Reply-To: References: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> Message-ID: In a performance-critical situation like this, compiling Pillow from source specifically for your production box using a up-to-date compiler toolchain that can target your specific processor generation (-march=native or similar) is probably a win. There's a bit of tension between binary performance and compatibility. When facing this tradeoff, manylinux-1 obviously favors compatibility. On Wed, Dec 6, 2017 at 6:33 AM, Vlad K. wrote: > On 2017-12-06 12:03, Nathaniel Smith wrote: > >> >> I think your best bet is to ask the Pillow developers, or whoever >> builds their wheels for distribution. >> > > Thanks, I'll do that. > > > -- > Vlad K. > > _______________________________________________ > Wheel-builders mailing list > Wheel-builders at python.org > https://mail.python.org/mailman/listinfo/wheel-builders > -- -Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad-fbsd at acheronmedia.com Wed Dec 6 11:19:11 2017 From: vlad-fbsd at acheronmedia.com (Vlad K.) Date: Wed, 06 Dec 2017 17:19:11 +0100 Subject: [Wheel-builders] manylinux compile/optimization options In-Reply-To: References: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> Message-ID: On 2017-12-06 16:15, Robert T. McGibbon wrote: > There's a bit of tension > between binary performance and compatibility. When facing this > tradeoff, manylinux-1 obviously favors compatibility. Agreed, but the weird thing here is that the manylinux precompiled Pillow .so libs are much faster than if Pillow was fully compiled on the machine locally, without any "optimizations" other than -march=native. Only when -O3 is added to the local compilation does the performance match the one of .so libs coming from PyPi. That's the weird part, and I was trying to figure out what exactly is being done to the officially precompiled libs, whether -O3 or something else was used, and why that which was used is not part of the official setup.py, but that's indeed the question for Pillow devs. I asked here on this list first, because not seeing any compiler flags in the official sources (other than -O0 for debug makefile target), I thought maybe some optimizations are done by the (PyPi) packaging. I've opened an issue in case anyone is interested in a follow up (or please let me know if you want me to update this thread with more info): https://github.com/python-pillow/Pillow/issues/2879 Thanks. -- Vlad K. From mail at marcelm.net Wed Dec 6 13:29:05 2017 From: mail at marcelm.net (Marcel Martin) Date: Wed, 6 Dec 2017 19:29:05 +0100 Subject: [Wheel-builders] manylinux compile/optimization options In-Reply-To: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> References: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> Message-ID: <540c1f68-a5ca-7e4d-c476-2730146d53df@marcelm.net> On 2017-12-06 11:56, Vlad K. wrote: > I need to understand if there are any specific compilation options set > for the manylinux precompiled binaries, by the packaging system or by > individual python module developers. [...] > Now, since -O3 is not a wise thing to do [...] Why do you think so? As far as I know, using -O3 is perfectly normal and safe! When you compile Python from sources - at least on Linux -, it sets -O3 by default and this is propagated into the default CFLAGS that setuptools/distutils use for compiling extensions. I?m not quite sure how exactly setuptools/distutils derive the actual CFLAGS, but you can run 'python -m sysconfig' or 'python-config --cflags' to see lots of configuration settings that have to do with CFLAGS. You can run this to check the manylinux1 Docker image, for example: $ docker run quay.io/pypa/manylinux1_x86_64 /opt/python/cp36-cp36m/bin/python3.6-config --cflags This outputs: -I/opt/_internal/cpython-3.6.0/include/python3.6m -I/opt/_internal/cpython-3.6.0/include/python3.6m -Wno-unused-result -Wsign-compare -Wformat -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes I would suggest that you check how your FreeBSD Python was compiled. Perhaps the solution is to ensure that it is compiled with -O3. If it works as on Linux, I would assume that then all C extensions would also be compiled with -O3. Regards, Marcel From vlad-fbsd at acheronmedia.com Wed Dec 6 15:22:21 2017 From: vlad-fbsd at acheronmedia.com (Vlad K.) Date: Wed, 06 Dec 2017 21:22:21 +0100 Subject: [Wheel-builders] manylinux compile/optimization options In-Reply-To: <540c1f68-a5ca-7e4d-c476-2730146d53df@marcelm.net> References: <09caa51135ba65b3cb2773ba33032655@acheronmedia.com> <540c1f68-a5ca-7e4d-c476-2730146d53df@marcelm.net> Message-ID: <8914fddf187b28f9ebfa9e749c073040@acheronmedia.hr> On 2017-12-06 19:29, Marcel Martin wrote: > > Why do you think so? As far as I know, using -O3 is perfectly normal > and > safe! Because experience has shown that blanket -O3 can lead to undefined behavior or "optimizations" that actually result with worse performing code. I have certainly seen broken code with global -O3, and it is illadvised on Gentoo as a global option. Instead individual packages enable -O3 if it's known that the flag will produce adequate and faster code. Also Phoronix has some tests that show -O3 better in some cases, but worse in others, for example: https://www.phoronix.com/scan.php?page=article&item=clear-gcc6-opts&num=2 and previously https://www.phoronix.com/scan.php?page=article&item=gcc_47_optimizations&num=1 Both on FreeBSD and Gentoo pip will compile without any -O flags, even though Python (on both) is compiled with -O2. However on FreeBSD the compiler is clang, and talking to some developers I was informed that -O3 is not a solution, but using LLVM's LTO is. LTO is also a subject in some experimental Gentoo builds. There may be individual python ports that explicitly enable -O3, but I haven't checked for those. With that, I wanted to understand this behavior for Pillow so I can submit a change requesting Pillow be compiled with -O3 for now, until LTO is understood better (and available, I think it requires LLVM newer than the one in FreeBSD base), because from what I've seen the difference is seriously significant, and so far I see no breakage, running Pillow compiled with -O3 in production. -- Vlad K.