[Python-Dev] cpython (2.7): Fix comment blocks. Adjust blocksize to a power-of-two for better divmod (original) (raw)
Victor Stinner victor.stinner at gmail.com
Mon Jun 24 12:50:01 CEST 2013
- Previous message: [Python-Dev] cpython (2.7): Fix comment blocks. Adjust blocksize to a power-of-two for better divmod
- Next message: [Python-Dev] cpython (2.7): Fix comment blocks. Adjust blocksize to a power-of-two for better divmod
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2013/6/24 Raymond Hettinger <raymond.hettinger at gmail.com>:
Lastly, there was a change I just put in to Py 3.4 replacing the memcpy() with a simple loop and replacing the "deque->" references with local variables. Besides giving a small speed-up, it made the code more clear and less at the mercy of various implementations of memcpy().
Ideally, I would like 2.7 and 3.3 to replace their use of memcpy() as well, but the flavor of this thread suggests that is right out.
The specific memcpy() function is usually highly optimized with assembler code for each architecture. The GNU libc now does better: it can choose the fastest version depending on the CPU version (MMX, SSE, etc.) at runtime. If I understood correctly, the glibc contains different version of memcpy, and the dynamic linker (ld.so) chooses the version depending on the CPU.
GCC is also able to inline memcpy() when the size is known at compile time. I also saw two code paths when the size is only known at runtime: inline version for small size, and function call for larger copy. Python has a Py_MEMCPY which implements exactly that, but only for Visual Studio. I suppose that Visual Studio does not implement this optimization. By the way, Py_MEMCPY() is only used in few places.
So it's surprising to read that a dummy loop is faster than memcpy()... even if I already see this in my own micro-benchmarks :-) Do you have an idea on how we can decide between the dummy loop and memcpy()? Using a benchmark? Or can it be decided just by reading the C code?
What is the policy for using Py_MEMCPY() vs memcpy()?
Victor
- Previous message: [Python-Dev] cpython (2.7): Fix comment blocks. Adjust blocksize to a power-of-two for better divmod
- Next message: [Python-Dev] cpython (2.7): Fix comment blocks. Adjust blocksize to a power-of-two for better divmod
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]