Issue 36357: Build 32bit Python on Windows with SSE2 instruction set (original) (raw)

On windows, it seems 32bit builds (3.7.2/3.8.0a2) don't using SSE2 sufficiently.

I test on 3.8 branch, python38.dll only uses XMM register 28 times. The official build is the same. After enable this option, python38.dll uses XMM register 11,704 times.

--- a/PCbuild/pythoncore.vcxproj +++ b/PCbuild/pythoncore.vcxproj @@ -88,6 +88,7 @@ $(zlibDir);%(AdditionalIncludeDirectories) _USRDLL;Py_BUILD_CORE;Py_ENABLE_SHARED;MS_DLL_ID="$(SysWinVer)";%(PreprocessorDefinitions) _Py_HAVE_ZLIB;%(PreprocessorDefinitions)

x86 instruction set has only a few number of registers. In my understanding, using XMM registers on 32bit build will brings a small speed up. I'm not an expert of this kind knowledge, sorry if I'm wrong.