Issue 32616: Significant performance problems with Python 2.7 built with clang 3.x or 4.x (original) (raw)

Created on 2018-01-22 04:34 by zmwangx, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5574 merged methane,2018-02-07 02:27
Messages (21)
msg310395 - (view) Author: Zhiming Wang (zmwangx) * Date: 2018-01-22 04:34
Python 2.7 could be significantly slower (5x in some cases) when compiled with clang 3.x or 4.x, compared to clang 5.x. This is quite a problem on macOS, since the latest clang from Apple (which comes with Xcode 9.2) is based on LLVM 4.x. This issue was first noticed by Bart Skowron and reported to the Homebrew project.[1] I ran some preliminary benchmarks (here[2] are the exact setup scripts) with just a simple loop: import time def f(n): while n > 0: n -= 1 start = time.time() f(50000000) stop = time.time() print('%.6f' % (stop - start)) and here are my results: - macOS 10.13.2 on a MacBook Pro: 2.082144 /usr/bin/python2.7 7.964049 /usr/local/bin/python2.7 8.750652 dist/python27-apple-clang-900/bin/python2.7 8.476405 dist/python27-clang-3.9/bin/python2.7 8.625660 dist/python27-clang-4.0/bin/python2.7 1.760096 dist/python27-clang-5.0/bin/python2.7 3.254814 /usr/local/bin/python3.6 2.864716 dist/python-master-apple-clang-900/bin/python3 3.071757 dist/python-master-clang-3.9/bin/python3 2.925192 dist/python-master-clang-4.0/bin/python3 2.908782 dist/python-master-clang-5.0/bin/python3 - Ubuntu 17.10 in VirtualBox: 1.475095 /usr/bin/python2.7 8.576817 dist/python27-clang-3.9/bin/python2.7 8.165588 dist/python27-clang-4.0/bin/python2.7 1.779193 dist/python27-clang-5.0/bin/python2.7 1.728321 dist/python27-gcc-5/bin/python2.7 1.570040 dist/python27-gcc-6/bin/python2.7 1.604617 dist/python27-gcc-7/bin/python2.7 2.323037 /usr/bin/python3.6 2.964338 dist/python-master-clang-3.9/bin/python3 3.054277 dist/python-master-clang-4.0/bin/python3 2.734908 dist/python-master-clang-5.0/bin/python3 2.490278 dist/python-master-gcc-5/bin/python3 2.494691 dist/python-master-gcc-6/bin/python3 2.642277 dist/python-master-gcc-7/bin/python3 I haven't got time to run more rigorous benchmark suites (e.g., the performance[3] package). I did try the floating point benchmark from performance, and again saw a 2x difference in performance. [1] https://github.com/Homebrew/homebrew-core/issues/22743 [2] https://gist.github.com/zmwangx/f8151ba8907ba8159a07fdd1528fc2b5 [3] https://pypi.python.org/pypi/performance
msg310423 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-01-22 15:36
Has anyone done the same analysis with Python 3.6 or 3.7?
msg310424 - (view) Author: Zhiming Wang (zmwangx) * Date: 2018-01-22 15:37
My benchmarks above do contain py37 (master) stats.
msg311597 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-02-04 11:52
Is there anything we (the CPython developers) can do about this? If I read the issue correctly clang 5.x generates faster binaries than clang 3.x and 4.x. If that is indeed the issue there's probably not much we can do about this. BTW. I'm -1 on building the installer with anything but the compiler included in Xcode (and it would be nice to build with a recent version of Xcode to use an up-to-date compiler and SDK)
msg311651 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-02-05 08:21
It seems clang4 failed to efficient register assigning. FYI, --without-computed-gotos configure option make penalty smaller. clang 5 (wihtout CGs): 2.653426 clang 5 (with CGs): 1.997584 clang 4 (without CGs): 3.330879 clang 4 (with CGs): 8.585673
msg311661 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-02-05 11:17
This is assembly code for FAST_DISPATCH() https://paste.ubuntu.com/26523948/ It seems there are many redundant spills. But I don't know how to remove them. Are their clang expert?
msg311723 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-02-06 11:31
Bad news: --enable-optimization doesn't solve it. I hope same thing doesn't happen for Python 3. Anyone tried Xcode 9.3? What version of LLVM does Apple use? Anyway, I think we need help of LLVM expert.
msg311747 - (view) Author: Zhiming Wang (zmwangx) * Date: 2018-02-06 19:52
Turns out python 2.7.10 doesn't suffer from the performance issue even when compiled with stock clang 4.x, and upon further investigation, I tracked down the commit that introduced the regression: commit 2c992a0788536087bfd78da8f2c62b30a461d7e2 Author: Benjamin Peterson <benjamin@python.org> Date: Thu May 28 12:45:31 2015 -0500 backport computed gotos (#4753) So Naoki was right that computed gotos is (solely) to blame here.
msg311751 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-02-06 21:35
Quick test: there doesn't seem to be a similar regression when building 3.6 with the current clang provided by Xcode 9.2, just with 2.7. And both 2.7 and 3.6 configure HAVE_COMPUTED_GOTOS on. Benjamin? (FWIW, the 2.7.x binaries provided by the python.org installers do not suffer from this performance regression as they are not built with clang.)
msg311780 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-02-07 10:09
New changeset 2942b909d9a428e6683d90b3436cfa4a81bd5d8a by INADA Naoki in branch '2.7': bpo-32616: Disable computed gotos by default for clang < 5 (GH-5574) https://github.com/python/cpython/commit/2942b909d9a428e6683d90b3436cfa4a81bd5d8a
msg311839 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-02-08 17:32
I'm sorry, my patch doesn't work on Xcode (Apple LLVM). computed-gotos is still enabled by default. Apple doesn't expose LLVM version. It's really annoying. $ cat x.c #include <stdio.h> int main() { printf("__clang__ : %d\n", __clang__); printf("__llvm__ : %d\n", __llvm__); printf("__VERSION__ : %s\n", __VERSION__); printf("__clang_version__ : %s\n", __clang_version__); printf("__clang_major__ : %d\n", __clang_major__); printf("__clang_minor__ : %d\n", __clang_minor__); printf("__clang_patchlevel__ : %d\n", __clang_patchlevel__); } $ cc x.c && ./a.out __clang__ : 1 __llvm__ : 1 __VERSION__ : 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2) __clang_version__ : 9.0.0 (clang-900.0.39.2) __clang_major__ : 9 __clang_minor__ : 0 __clang_patchlevel__ : 0
msg311842 - (view) Author: Zhiming Wang (zmwangx) * Date: 2018-02-08 18:59
Yeah, Apple LLVM versions are a major headache. I resorted to feature detection, using C++ coroutines support as the clang 5 distinguisher[1]: $ cat /tmp/test/stub.cc #include <experimental/coroutine> int main() { return 0; } $ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -v Apple LLVM version 9.0.0 (clang-900.0.39.2) Target: x86_64-apple-darwin17.4.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin $ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -o stub stub.cc -fcoroutines-ts -stdlib=libc++ stub.cc:1:10: fatal error: 'experimental/coroutine' file not found #include <experimental/coroutine> ^~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. $ /Applications/Xcode-beta.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -v Apple LLVM version 9.1.0 (clang-902.0.31) Target: x86_64-apple-darwin17.4.0 Thread model: posix InstalledDir: /Applications/Xcode-beta.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin $ /Applications/Xcode-beta.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -o stub stub.cc -fcoroutines-ts -stdlib=libc++ Here Xcode.app is Xcode 9.2 and Xcode-beta.app is Xcode 9.3 beta 2. The conclusion here seems to be that Apple LLVM 9.0.0 is based on LLVM 4, while Apple LLVM 9.1.0 is based on LLVM 5. [1] http://releases.llvm.org/5.0.0/tools/clang/docs/ReleaseNotes.html#major-new-features
msg311862 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-02-09 02:45
How can we distinguish Apple LLVM with LLVM easily? Or should we disable computed-gotos by default on LLVM? It's only for Python 2. 5x slowdown is too large comparing to 10% speedup.
msg311863 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-02-09 03:36
Can anyone explain what the difference is between 2.7 and 3.6, i.e. why there is the performance regression for 2.7 but not for 3.6 using the same compiler instance? It would be better to understand and solve that problem rather than trying to special case compiler versions.
msg311865 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-02-09 03:54
I don't know exactly. But as far as I saw, Python 3's eval loop has less function-wide local variables. For example, ROT_THREE uses only block local variable. https://github.com/python/cpython/blob/a48e78a0b7761dd74f1d03fc69e0f6caa6f02fe6/Python/ceval.c#L1109-L1111 On the other hand, there are more function-wide local variables in Python 2. And some of them are used over `case`s actually. https://github.com/python/cpython/blob/672fd7d8162f76aff8423fa5c7bfd2b1e91faf57/Python/ceval.c#L802-L807 I suspect that's why LLVM4 failed to optimize Python 2 but success to optimize Python 3.
msg311868 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-02-09 08:12
I different question w.r.t. detection of the clang/llvm version on Apple's system compiler: Is it worthwhile to do so? If the compiler included in the Xcode 9.3 beta (and hence likely the one in Xcode 9.3 final) fixes the performance issue a very large subset of people building Python for themselves will get a fixed compiler fairly soon. It would then be enough to warn about this issue in a readme file for other users.
msg311992 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-02-11 09:55
This should remain everyone that backporting performance improvements is not a no-brainer.
msg315308 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-04-15 05:11
A followup - Ronald asked: > w.r.t. detection of the clang/llvm version on Apple's system compiler: Is it worthwhile to do so? Now that Xcode 9.3 (for macOS 10.13+) is officially released, I ran a quick series of test on it and on the most recent Xcode versions for the last several macOS OS families: 10.12, 10.11, and 10.9 (I didn't have a 10.10 system available at the moment). Only looking at the most recent supported Xcode/compiler version for each major release is reasonable since I think most people follow Apple's strong encouragement to keep software updated and only use the most recent releases. And I think most people follow Apple's lead in using their build tools, via Xcode or the command line utilities, rather than a third-party compiler. By that measure, it seems clear that (1) there is only one current version that exhibits the performance degradation, that is the Xcode 9.2 version labeled Apple LLVM 9.0.0 (clang-900.0.39.2) and (2) that is now only an issue for macOS 10.12 (Sierra) where Xcode 9.2 is (and will likely remain) the most recent version. For macOS 10.13 (High Sierra), the compiler in the newly released Xcode 9.3 does not exhibit the problem. And the most recent versions of Xcode for the tested earlier macOS releases do not either. BTW, the MacPorts project maintains a handy webpage listing Xcode releases and compiler versions by macOS release: https://trac.macports.org/wiki/XcodeVersionInfo Here are the results. The methodology was to download and build the just released Python 2.7.15rc1 from source using the default configure options, i.e. just ./configure, and then run the test program 3 times with it and then three times with the Apple-provided system /usr/bin/python2.7 as a baseline. ProductName: Mac OS X ProductVersion: 10.13.4 BuildVersion: 17E199 2.7.15rc1 (default, Apr 15 2018, 00:22:29) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] 1.181971 1.180467 1.173380 2.7.10 (default, Oct 6 2017, 22:29:07) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] 1.567072 1.587779 1.570056 ProductName: Mac OS X ProductVersion: 10.12.6 BuildVersion: 16G1314 2.7.15rc1 (default, Apr 15 2018, 00:31:39) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] 6.290989 6.356802 6.295680 2.7.10 (default, Feb 7 2017, 00:08:15) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] 1.306507 1.312231 1.302826 ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G20015 2.7.15rc1 (default, Apr 15 2018, 00:38:12) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] 1.846332 1.855483 1.896600 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] 1.453702 1.426298 1.440348 ProductName: Mac OS X ProductVersion: 10.9.5 BuildVersion: 13F1911 2.7.15rc1 (default, Apr 15 2018, 00:42:08) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] 1.720303 1.712045 1.710216 2.7.5 (default, Mar 9 2014, 22:15:05) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] 1.681696 1.685704 1.686414
msg315887 - (view) Author: Michael Romero (Michael Romero) Date: 2018-04-29 10:37
So is this now considered resolved for High Sierra users via 2.7.15rc1?
msg339970 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-04-11 12:55
I'm using Mojave. I don't have regression. $ /usr/local/bin/python2 x.py # Homebrew python@2 1.681729 $ /usr/bin/python x.py # System python2 1.891549 Could someone using High Sierra or Sierra test it? x.py is test script in https://bugs.python.org/issue32616#msg310395
msg339971 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-04-11 12:59
https://github.com/Homebrew/homebrew-core/blob/b2aff6271caa04508fb1529fdd5edbb22f4e7f21/Formula/python%402.rb#L75-L82 Homebrew checks compiler version too. I don't think it's worth enough to add more code to avoid this trouble.
History
Date User Action Args
2022-04-11 14:58:56 admin set github: 76797
2019-04-11 12:59:21 methane set status: open -> closedresolution: wont fixmessages: + stage: resolved
2019-04-11 12:55:09 methane set messages: +
2018-04-29 10:37:33 Michael Romero set nosy: + Michael Romeromessages: +
2018-04-15 05:11:23 ned.deily set messages: +
2018-02-11 09:55:28 pitrou set nosy: - pitrou
2018-02-11 09:55:16 pitrou set nosy: + pitroumessages: +
2018-02-09 08:12:32 ronaldoussoren set messages: +
2018-02-09 03:54:25 methane set messages: +
2018-02-09 03:36:28 ned.deily set messages: +
2018-02-09 02:45:11 methane set messages: +
2018-02-08 18:59:15 zmwangx set status: pending -> openmessages: +
2018-02-08 17:32:33 methane set status: closed -> pendingresolution: fixed -> (no value)messages: + stage: resolved -> (no value)
2018-02-08 07:26:40 methane set status: open -> closedresolution: fixedstage: patch review -> resolved
2018-02-07 10:09:40 methane set messages: +
2018-02-07 02:27:37 methane set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest5392>
2018-02-06 21:35:15 ned.deily set nosy: + benjamin.petersonmessages: + components: - macOS
2018-02-06 19:52:37 zmwangx set messages: +
2018-02-06 11:31:25 methane set messages: +
2018-02-05 11:17:25 methane set messages: +
2018-02-05 08:21:30 methane set messages: +
2018-02-04 11:52:03 ronaldoussoren set messages: +
2018-01-26 23:50:58 terry.reedy set nosy: + ned.deily, ronaldoussorencomponents: + macOS
2018-01-23 01:15:34 tdsmith set nosy: + tdsmith
2018-01-22 15:37:47 zmwangx set messages: +
2018-01-22 15:36:30 barry set nosy: + barrymessages: +
2018-01-22 14:19:28 pablogsal set type: performancecomponents: + Interpreter Core
2018-01-22 05:10:04 methane set nosy: + methane
2018-01-22 04:34:14 zmwangx create