Issue 26058: PEP 509: Add ma_version to PyDictObject (original) (raw)

Created on 2016-01-09 09:30 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (31)

msg257809 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-09 09:30

Attached patch implements the following PEP currently discussed on python-ideas: https://faster-cpython.readthedocs.org/pep_dict_version.html#pep-dict-version

msg257810 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-09 09:34

Result of pybench on Dict microbenchmarks:

              DictCreation:    44ms    49ms  -10.5%    44ms    49ms  -10.7%
         DictWithFloatKeys:    35ms    35ms   -0.5%    35ms    35ms   -1.0%
       DictWithIntegerKeys:    28ms    28ms   -1.2%    28ms    29ms   -2.3%
        DictWithStringKeys:    26ms    27ms   -3.2%    26ms    28ms   -4.8%
    SimpleDictManipulation:    52ms    53ms   -0.7%    53ms    53ms   -0.4%

Hum, as usuall, pybench doesn't seem reliable at all: I expect worse performance with the patch since it adds "version++" in dict.setimte(). I don't really trust pybench, results seem to have a lot of noise :-/

Maybe I'm not using pybench correctly? I used:

$ make distclean; ./configure && make $ ./python Tools/pybench/pybench.py -f pybench.default $ patch -p1 < dict_version.patch $ ./python Tools/pybench/pybench.py -f pybench.dictversion $ ./python Tools/pybench/pybench.py -s pybench.dictversion -c pybench.default

Full output:


PYBENCH 2.1


Benchmark: pybench.dictversion

Rounds: 10
Warp:   10
Timer:  time.perf_counter

Machine Details:
   Platform ID:    Linux-4.2.5-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
   Processor:      x86_64

Python:
   Implementation: CPython
   Executable:     /home/haypo/prog/python/default/python
   Version:        3.6.0a0
   Compiler:       GCC 5.1.1 20150618 (Red Hat 5.1.1-4)
   Bits:           64bit
   Build:          Jan  9 2016 10:27:40 (#default:53271aa4d84c+)
   Unicode:        UCS4

Comparing with: pybench.default

Rounds: 10
Warp:   10
Timer:  time.perf_counter

Machine Details:
   Platform ID:    Linux-4.2.5-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
   Processor:      x86_64

Python:
   Implementation: CPython
   Executable:     /home/haypo/prog/python/default/python
   Version:        3.6.0a0
   Compiler:       GCC 5.1.1 20150618 (Red Hat 5.1.1-4)
   Bits:           64bit
   Build:          Jan  9 2016 10:21:57 (#default:53271aa4d84c)
   Unicode:        UCS4

Test minimum run-time average run-time this other diff this other diff

      BuiltinFunctionCalls:    42ms    42ms   +0.1%    43ms    43ms   -0.9%
       BuiltinMethodLookup:    26ms    25ms   +5.7%    26ms    25ms   +5.1%
             CompareFloats:    27ms    27ms   -0.9%    27ms    28ms   -4.2%
     CompareFloatsIntegers:    60ms    63ms   -3.3%    61ms    65ms   -6.8%
           CompareIntegers:    41ms    38ms   +7.9%    41ms    38ms   +7.2%
    CompareInternedStrings:    30ms    28ms   +5.7%    30ms    28ms   +5.0%
              CompareLongs:    24ms    22ms   +8.6%    24ms    22ms   +8.5%
            CompareStrings:    22ms    22ms   +0.3%    23ms    24ms   -6.6%
ComplexPythonFunctionCalls:    43ms    42ms   +1.2%    43ms    45ms   -5.2%
             ConcatStrings:    29ms    32ms  -11.0%    29ms    33ms  -13.6%
           CreateInstances:    45ms    45ms   +0.3%    46ms    46ms   -0.4%
        CreateNewInstances:    34ms    34ms   +0.6%    35ms    34ms   +0.7%
   CreateStringsWithConcat:    58ms    58ms   +0.1%    58ms    58ms   -0.1%
              DictCreation:    44ms    49ms  -10.5%    44ms    49ms  -10.7%
         DictWithFloatKeys:    35ms    35ms   -0.5%    35ms    35ms   -1.0%
       DictWithIntegerKeys:    28ms    28ms   -1.2%    28ms    29ms   -2.3%
        DictWithStringKeys:    26ms    27ms   -3.2%    26ms    28ms   -4.8%
                  ForLoops:    22ms    22ms   +0.4%    22ms    22ms   +0.6%
                IfThenElse:    34ms    34ms   +0.9%    34ms    34ms   +0.8%
               ListSlicing:    34ms    34ms   -0.2%    34ms    34ms   -0.1%
            NestedForLoops:    37ms    36ms   +2.1%    37ms    36ms   +2.1%
  NestedListComprehensions:    36ms    35ms   +1.4%    36ms    36ms   +1.8%
      NormalClassAttribute:    75ms    77ms   -2.5%    75ms    77ms   -2.3%
   NormalInstanceAttribute:    37ms    37ms   +2.2%    38ms    37ms   +2.5%
       PythonFunctionCalls:    37ms    36ms   +1.8%    37ms    37ms   +1.6%
         PythonMethodCalls:    50ms    47ms   +5.5%    50ms    48ms   +4.5%
                 Recursion:    61ms    61ms   -0.2%    61ms    62ms   -0.2%
              SecondImport:    35ms    36ms   -2.9%    35ms    37ms   -3.3%
       SecondPackageImport:    37ms    37ms   -0.3%    37ms    37ms   -0.4%
     SecondSubmoduleImport:    89ms    87ms   +2.1%    90ms    88ms   +1.6%
   SimpleComplexArithmetic:    23ms    23ms   +0.0%    24ms    24ms   -0.1%
    SimpleDictManipulation:    52ms    53ms   -0.7%    53ms    53ms   -0.4%
     SimpleFloatArithmetic:    25ms    25ms   -1.2%    25ms    25ms   -1.0%
  SimpleIntFloatArithmetic:    32ms    32ms   -0.3%    32ms    32ms   -1.1%
   SimpleIntegerArithmetic:    32ms    32ms   -0.6%    32ms    32ms   -0.3%
  SimpleListComprehensions:    29ms    28ms   +2.3%    30ms    29ms   +3.2%
    SimpleListManipulation:    27ms    28ms   -1.3%    28ms    28ms   -1.4%
      SimpleLongArithmetic:    22ms    22ms   +3.6%    23ms    22ms   +4.4%
                SmallLists:    38ms    38ms   -1.9%    38ms    41ms   -7.0%
               SmallTuples:    47ms    44ms   +7.8%    47ms    44ms   +7.5%
     SpecialClassAttribute:    75ms    73ms   +2.3%    75ms    73ms   +2.2%
  SpecialInstanceAttribute:    38ms    39ms   -1.8%    38ms    39ms   -2.1%
            StringMappings:    84ms    83ms   +0.8%    84ms    84ms   +0.5%
          StringPredicates:    49ms    49ms   -0.6%    49ms    49ms   -1.0%
             StringSlicing:    42ms    42ms   +0.8%    42ms    42ms   +0.7%
                 TryExcept:    25ms    25ms   -0.4%    25ms    25ms   -0.8%
                TryFinally:    31ms    31ms   +1.1%    32ms    31ms   +1.4%
            TryRaiseExcept:    12ms    12ms   -0.8%    12ms    12ms   -1.4%
              TupleSlicing:    43ms    43ms   +0.1%    43ms    43ms   -0.6%
               WithFinally:    49ms    50ms   -0.9%    49ms    50ms   -0.9%
           WithRaiseExcept:    39ms    39ms   -0.6%    39ms    40ms   -1.6%

Totals: 2011ms 2006ms +0.2% 2025ms 2035ms -0.5%

(this=pybench.dictversion, other=pybench.default)

msg257814 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-09 09:53

Using timeit to microbenchmark dict operations (constructor, setitem, delitem, clear), I get exactly the same performance or even better performance (???) with the patch.

./python -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'

Original: 318 ns Patched: 318 ns

./python -m timeit 'd={i:i for i in range(216)}' 'for i in range(216): d[i]=i-1' 'for i in range(216): d[i]=i+1' 'for i in range(215): del d[i]' 'd.clear()'

Original: 19.9 ms Patched: 18.9 ms (5% faster)

msg257818 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-09 11:00

Updated patch, I forgot to include unit tests.

I added a test on integer overflow on the version. The patch adds _testcapi.dict_setversion(dict, version) and _testcapi.PY_SIZE_MAX.

msg257819 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2016-01-09 11:05

I don't understand this:

The version 0 is reserved for "missing key"

msg257820 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-09 11:09

I don't understand this:

The version 0 is reserved for "missing key"

Oh, ignore it, it's an outdated comment :-) Dictionary version is initialized to 0, it's not more "reserved".

FYI I added the comment in my first implementation which also added a version to dictionary entries.

msg257848 - (view)

Author: Brett Cannon (brett.cannon) * (Python committer)

Date: 2016-01-09 18:03

A better way to benchmark this is to run hg.python.org/benchmarks with a patched interpreter and see if that shows any performance impact.

msg257875 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-09 22:42

Brett:

A better way to benchmark this is to run hg.python.org/benchmarks with a patched interpreter and see if that shows any performance impact.

Ok. Here is the output. Can you please explain me the result? :-) Especially the "significant" value.

$ ./python.orig ../benchmarks/perf.py ./python.orig ./python.patched

INFO:root:Automatically selected timer: perf_counter INFO:root:Running ./python.patched ../benchmarks/lib3/2to3/2to3 -f all ../benchmarks/lib/2to3 INFO:root:Running ./python.patched ../benchmarks/lib3/2to3/2to3 -f all ../benchmarks/lib/2to3 1 time INFO:root:Running ./python.orig ../benchmarks/lib3/2to3/2to3 -f all ../benchmarks/lib/2to3 INFO:root:Running ./python.orig ../benchmarks/lib3/2to3/2to3 -f all ../benchmarks/lib/2to3 1 time INFO:root:Running ./python.patched ../benchmarks/performance/bm_chameleon_v2.py -n 50 --timer perf_counter INFO:root:Running ./python.orig ../benchmarks/performance/bm_chameleon_v2.py -n 50 --timer perf_counter INFO:root:Running ./python.patched ../benchmarks/performance/bm_django_v3.py -n 50 --timer perf_counter INFO:root:Running ./python.orig ../benchmarks/performance/bm_django_v3.py -n 50 --timer perf_counter INFO:root:Running ./python.patched ../benchmarks/performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle pickle INFO:root:Running ./python.orig ../benchmarks/performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle pickle INFO:root:Running ./python.patched ../benchmarks/performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle unpickle INFO:root:Running ./python.orig ../benchmarks/performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle unpickle INFO:root:Running ./python.patched ../benchmarks/performance/bm_json_v2.py -n 50 --timer perf_counter INFO:root:Running ./python.orig ../benchmarks/performance/bm_json_v2.py -n 50 --timer perf_counter INFO:root:Running ./python.patched ../benchmarks/performance/bm_json.py -n 50 --timer perf_counter json_load INFO:root:Running ./python.orig ../benchmarks/performance/bm_json.py -n 50 --timer perf_counter json_load INFO:root:Running ./python.patched ../benchmarks/performance/bm_nbody.py -n 50 --timer perf_counter INFO:root:Running ./python.orig ../benchmarks/performance/bm_nbody.py -n 50 --timer perf_counter INFO:root:Running ./python.patched ../benchmarks/performance/bm_regex_v8.py -n 50 --timer perf_counter INFO:root:Running ./python.orig ../benchmarks/performance/bm_regex_v8.py -n 50 --timer perf_counter INFO:root:Running ./python.patched ../benchmarks/performance/bm_tornado_http.py -n 100 --timer perf_counter INFO:root:Running ./python.orig ../benchmarks/performance/bm_tornado_http.py -n 100 --timer perf_counter [ 1/10] 2to3... [ 2/10] chameleon_v2... [ 3/10] django_v3... [ 4/10] fastpickle... [ 5/10] fastunpickle... [ 6/10] json_dump_v2... [ 7/10] json_load... [ 8/10] nbody... [ 9/10] regex_v8... [10/10] tornado_http...

Report on Linux smithers 4.2.8-300.fc23.x86_64 #1 SMP Tue Dec 15 16:49:06 UTC 2015 x86_64 x86_64 Total CPU cores: 8

2to3

6.825440 -> 6.996616: 1.03x slower

fastpickle

Min: 0.449710 -> 0.458054: 1.02x slower Avg: 0.451589 -> 0.583100: 1.29x slower Significant (t=-8.04) Stddev: 0.00076 -> 0.11568: 152.3886x larger

fastunpickle

Min: 0.549672 -> 0.545734: 1.01x faster Avg: 0.551284 -> 0.696920: 1.26x slower Significant (t=-6.65) Stddev: 0.00118 -> 0.15493: 130.9694x larger

json_load

Min: 0.432500 -> 0.466602: 1.08x slower Avg: 0.433771 -> 0.471205: 1.09x slower Significant (t=-10.63) Stddev: 0.00495 -> 0.02440: 4.9303x larger

nbody

Min: 0.230055 -> 0.234925: 1.02x slower Avg: 0.230985 -> 0.236441: 1.02x slower Significant (t=-7.56) Stddev: 0.00048 -> 0.00508: 10.5095x larger

The following not significant results are hidden, use -v to show them: chameleon_v2, django_v3, json_dump_v2, regex_v8, tornado_http.

msg257880 - (view)

Author: Brett Cannon (brett.cannon) * (Python committer)

Date: 2016-01-10 01:07

So min is the fastest time in a benchmark execution, average is the average across all benchmark executions, and the t-value is some statistics thing. Anything marked insignificant had such a small average difference it isn't worth reporting.

If you want to smooth out the numbers you should do a rigorous run that uses more loops per benchmark to help smooth out outliers. And if you are doing this on Linux there is a flag to measure memory usage as well.

msg257961 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-11 14:48

The PEP is now at python.org: PEP 509.

msg257992 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-11 17:37

Patch version 3 updated for the second version of the PEP 509. Main changes:

msg258138 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-13 10:38

Patch version 4: I forgot to include my patch to update test_sys. Now the full Python test suite pass.

msg258139 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-13 10:41

I'm not a big fan of pybench (it looks unstable and so not reliable), but here are results with dict_version-4.patch.

I used more loops and a lower warp factor to get more reliable tests (I hope):

./python Tools/pybench/pybench.py -f pybench.orig -w 2 -C 100 -n 25 ./python.patched Tools/pybench/pybench.py -f pybench.dictver -w 2 -C 100 -n 25 ./python Tools/pybench/pybench.py -s pybench.dictver -c pybench.orig


PYBENCH 2.1


Benchmark: pybench.dictver

Rounds: 25
Warp:   2
Timer:  time.perf_counter

Machine Details:
   Platform ID:    Linux-4.2.8-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
   Processor:      x86_64

Python:
   Implementation: CPython
   Executable:     /home/haypo/prog/python/default/python
   Version:        3.6.0a0
   Compiler:       GCC 5.3.1 20151207 (Red Hat 5.3.1-2)
   Bits:           64bit
   Build:          Jan 13 2016 11:27:53 (#default:77d24f51effc+)
   Unicode:        UCS4

Comparing with: pybench.orig

Rounds: 25
Warp:   2
Timer:  time.perf_counter

Machine Details:
   Platform ID:    Linux-4.2.8-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
   Processor:      x86_64

Python:
   Implementation: CPython
   Executable:     /home/haypo/prog/python/default/python
   Version:        3.6.0a0
   Compiler:       GCC 5.3.1 20151207 (Red Hat 5.3.1-2)
   Bits:           64bit
   Build:          Jan 13 2016 11:14:50 (#default:77d24f51effc)
   Unicode:        UCS4

Test minimum run-time average run-time this other diff this other diff

      BuiltinFunctionCalls:   230ms   229ms   +0.4%   241ms   230ms   +5.1%
       BuiltinMethodLookup:   130ms   132ms   -1.2%   135ms   134ms   +0.4%
             CompareFloats:   147ms   149ms   -1.4%   151ms   149ms   +1.2%
     CompareFloatsIntegers:   330ms   333ms   -0.8%   347ms   335ms   +3.6%
           CompareIntegers:   214ms   209ms   +2.5%   223ms   209ms   +6.5%
    CompareInternedStrings:   160ms   145ms  +10.8%   170ms   145ms  +16.9%
              CompareLongs:   121ms   120ms   +0.2%   124ms   120ms   +3.4%
            CompareStrings:   132ms   131ms   +0.7%   138ms   132ms   +4.8%
ComplexPythonFunctionCalls:   233ms   235ms   -0.7%   241ms   238ms   +1.1%
             ConcatStrings:   166ms   165ms   +0.3%   177ms   167ms   +6.0%
           CreateInstances:   240ms   247ms   -3.1%   253ms   249ms   +1.5%
        CreateNewInstances:   178ms   186ms   -4.2%   188ms   188ms   +0.1%
   CreateStringsWithConcat:   315ms   316ms   -0.5%   331ms   318ms   +4.3%
              DictCreation:   254ms   236ms   +7.8%   262ms   237ms  +10.5%
         DictWithFloatKeys:   211ms   199ms   +6.1%   219ms   201ms   +8.9%
       DictWithIntegerKeys:   171ms   163ms   +5.4%   180ms   166ms   +9.0%
        DictWithStringKeys:   163ms   142ms  +14.5%   170ms   144ms  +17.7%
                  ForLoops:   121ms   121ms   -0.3%   125ms   124ms   +0.9%
                IfThenElse:   179ms   178ms   +0.7%   185ms   178ms   +3.6%
               ListSlicing:   194ms   193ms   +0.4%   198ms   194ms   +2.2%
            NestedForLoops:   212ms   210ms   +1.2%   220ms   210ms   +4.6%
  NestedListComprehensions:   205ms   212ms   -3.3%   218ms   215ms   +1.5%
      NormalClassAttribute:   429ms   407ms   +5.5%   446ms   408ms   +9.3%
   NormalInstanceAttribute:   212ms   206ms   +2.8%   226ms   209ms   +8.0%
       PythonFunctionCalls:   208ms   210ms   -1.4%   215ms   212ms   +1.2%
         PythonMethodCalls:   275ms   253ms   +8.7%   293ms   255ms  +14.9%
                 Recursion:   333ms   328ms   +1.4%   366ms   329ms  +11.0%
              SecondImport:   190ms   188ms   +0.8%   201ms   188ms   +6.7%
       SecondPackageImport:   195ms   192ms   +1.8%   214ms   192ms  +11.7%
     SecondSubmoduleImport:   472ms   447ms   +5.7%   502ms   455ms  +10.3%
   SimpleComplexArithmetic:   141ms   138ms   +2.1%   146ms   139ms   +5.2%
    SimpleDictManipulation:   287ms   288ms   -0.3%   313ms   290ms   +8.0%
     SimpleFloatArithmetic:   138ms   137ms   +0.7%   151ms   140ms   +7.9%
  SimpleIntFloatArithmetic:   172ms   180ms   -4.3%   182ms   181ms   +0.4%
   SimpleIntegerArithmetic:   172ms   169ms   +1.9%   184ms   170ms   +8.1%
  SimpleListComprehensions:   166ms   167ms   -0.7%   179ms   173ms   +3.3%
    SimpleListManipulation:   159ms   153ms   +3.9%   166ms   155ms   +7.6%
      SimpleLongArithmetic:   118ms   117ms   +1.2%   123ms   118ms   +4.3%
                SmallLists:   210ms   204ms   +2.8%   221ms   205ms   +7.3%
               SmallTuples:   244ms   252ms   -3.1%   255ms   256ms   -0.2%
     SpecialClassAttribute:   424ms   416ms   +1.8%   442ms   419ms   +5.5%
  SpecialInstanceAttribute:   214ms   211ms   +1.3%   223ms   214ms   +4.5%
            StringMappings:   442ms   446ms   -0.8%   460ms   447ms   +2.9%
          StringPredicates:   259ms   258ms   +0.2%   281ms   260ms   +8.2%
             StringSlicing:   218ms   220ms   -1.2%   228ms   221ms   +3.3%
                 TryExcept:   130ms   135ms   -3.9%   134ms   136ms   -1.4%
                TryFinally:   173ms   171ms   +0.9%   180ms   172ms   +4.4%
            TryRaiseExcept:    62ms    63ms   -1.2%    63ms    63ms   +0.9%
              TupleSlicing:   223ms   224ms   -0.3%   251ms   232ms   +8.0%
               WithFinally:   263ms   264ms   -0.4%   280ms   265ms   +5.4%
           WithRaiseExcept:   205ms   203ms   +0.7%   215ms   204ms   +5.5%

Totals: 11040ms 10900ms +1.3% 11635ms 10991ms +5.9%

(this=pybench.dictver, other=pybench.orig)

msg258151 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-13 16:12

guard_benchmark.patch: patch adding a _testcapi.guard_benchmark(), a microbenchmark on dictionary guard. The benchmark measures the cost of checking if a dictionary key was modified.

Run the benchmark with attached guard_benchmark.py. Result on my PC:

python3 -m platform: Linux-4.2.8-300.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three

You have to modify manually _testcapi.c to choose between the 3 implementations.

guard_benchmark.patch requires the issue #26098 patch and the fat module which implements fat.GuardDict. The fat module can be found at: https://github.com/haypo/fat

msg258820 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-22 16:22

Patch version 5: a global counter is now used to set ma_version field of dictionaries. The global counter is incremented each time that a dictionary is created and each time that a dictionary is modified.

A dictionary version is now unique: two dictionaries cannot have the same version. So if a guard stores a version, the check on the version will fail if a different dictionary is used.

msg258824 - (view)

Author: Yury Selivanov (yselivanov) * (Python committer)

Date: 2016-01-22 16:40

Patch version 5: a global counter is now used to set ma_version field of dictionaries. The global counter is incremented each time that a dictionary is created and each time that a dictionary is modified.

This is great, thank you, Victor.

msg258826 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-22 16:45

This is great, thank you, Victor.

I will update the PEP 509 later for the global counter.

msg258867 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-01-23 12:50

Patch version 6: remove now unused function _testcapi.dict_set_version(). I also moved tests to test_pep509.py to make it more explicit that the implementation of the PEP 509 is not part the Python dictionary type specification, other Python implementations can choose to not implement it.

msg263414 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-04-14 15:20

The implementation is outdated: ma_version was renamed to ma_version_tag in the latest version of the PEP 509.

msg263525 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-04-15 20:49

The implementation is outdated: ma_version was renamed to ma_version_tag in the latest version of the PEP 509.

Patch version 7 is updated to the latest PEP (rebased and rename ma_version to ma_version_tag).

msg263726 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-04-19 10:24

Patch version 8:

msg263727 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-04-19 10:38

timeit microbenchmarks on dict_version-8.patch, minimum of 10 runs.

$ ./python.orig -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()' 1000000 loops, best of 3: 0.292 usec per loop $ ./python.version -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()' 1000000 loops, best of 3: 0.293 usec per loop

=> 1 nanosecond (0.3%) slower

$ ./python.orig -m timeit 'd={i:i for i in range(216)}' 'for i in range(216): d[i]=i-1' 'for i in range(216): d[i]=i+1' 'for i in range(215): del d[i]' 'd.clear()' 10 loops, best of 3: 21.2 msec per loop $ ./python.version -m timeit 'd={i:i for i in range(216)}' 'for i in range(216): d[i]=i-1' 'for i in range(216): d[i]=i+1' 'for i in range(215): del d[i]' 'd.clear()' 10 loops, best of 3: 21.3 msec per loop

=> 0.1 ms (0.5%) slower

msg263728 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-04-19 10:47

I ran again timeit microbenchmarks with CPU isolation on dict_version-8.patch, minimum of 10 runs.

-m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'

-m timeit 'd={i:i for i in range(216)}' 'for i in range(216): d[i]=i-1' 'for i in range(216): d[i]=i+1' 'for i in range(215): del d[i]' 'd.clear()'

msg263729 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-04-19 10:52

pybench results on dict_version-8.patch with CPU isolation: http://haypo-notes.readthedocs.org/microbenchmark.html

As usual, I'm very skeptical on the pybench results which almost look like noise. I don't understand how my change can make any operation faster, whereas some benchmarks are faster with the patch...

Dict microbenchmarks:

       DictCreation:    38ms    36ms   +4.8%    39ms    37ms   +3.9%
  DictWithFloatKeys:    40ms    40ms   -0.8%    40ms    40ms   -0.4%
DictWithIntegerKeys:    33ms    31ms   +7.2%    33ms    31ms   +7.6%
 DictWithStringKeys:    29ms    28ms   +0.4%    29ms    29ms   +0.7%

SimpleDictManipulation: 59ms 59ms -0.4% 59ms 59ms -0.4%

Full output:

$ ./python.version Tools/pybench/pybench.py -f pybench.version $ ./python.orig Tools/pybench/pybench.py -f pybench.orig $ ./python.orig Tools/pybench/pybench.py -s pybench.version -c pybench.orig


PYBENCH 2.1


Benchmark: pybench.version

Rounds: 10
Warp:   10
Timer:  time.perf_counter

Machine Details:
   Platform ID:    Linux-4.4.4-301.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
   Processor:      x86_64

Python:
   Implementation: CPython
   Executable:     /home/haypo/prog/python/default/python.version
   Version:        3.6.0a0
   Compiler:       GCC 5.3.1 20151207 (Red Hat 5.3.1-2)
   Bits:           64bit
   Build:          Apr 19 2016 12:29:16 (#default:e281a57d5b29+)
   Unicode:        UCS4

Comparing with: pybench.orig

Rounds: 10
Warp:   10
Timer:  time.perf_counter

Machine Details:
   Platform ID:    Linux-4.4.4-301.fc23.x86_64-x86_64-with-fedora-23-Twenty_Three
   Processor:      x86_64

Python:
   Implementation: CPython
   Executable:     /home/haypo/prog/python/default/python.orig
   Version:        3.6.0a0
   Compiler:       GCC 5.3.1 20151207 (Red Hat 5.3.1-2)
   Bits:           64bit
   Build:          Apr 19 2016 12:30:36 (#default:e281a57d5b29)
   Unicode:        UCS4

Test minimum run-time average run-time this other diff this other diff

      BuiltinFunctionCalls:    49ms    50ms   -1.3%    49ms    50ms   -1.2%
       BuiltinMethodLookup:    26ms    26ms   -0.8%    26ms    27ms   -1.2%
             CompareFloats:    29ms    30ms   -2.0%    29ms    30ms   -1.9%
     CompareFloatsIntegers:    39ms    39ms   +1.4%    39ms    39ms   +1.5%
           CompareIntegers:    43ms    43ms   -1.1%    43ms    43ms   -1.1%
    CompareInternedStrings:    28ms    28ms   -0.2%    28ms    28ms   -0.3%
              CompareLongs:    25ms    25ms   +2.8%    25ms    25ms   +2.9%
            CompareStrings:    26ms    27ms   -0.8%    26ms    27ms   -1.6%
ComplexPythonFunctionCalls:    44ms    44ms   -1.6%    44ms    45ms   -1.6%
             ConcatStrings:    35ms    33ms   +6.4%    35ms    33ms   +6.1%
           CreateInstances:    49ms    48ms   +2.6%    50ms    49ms   +1.8%
        CreateNewInstances:    37ms    36ms   +2.5%    37ms    36ms   +2.2%
   CreateStringsWithConcat:    65ms    63ms   +3.3%    66ms    64ms   +2.9%
              DictCreation:    38ms    36ms   +4.8%    39ms    37ms   +3.9%
         DictWithFloatKeys:    40ms    40ms   -0.8%    40ms    40ms   -0.4%
       DictWithIntegerKeys:    33ms    31ms   +7.2%    33ms    31ms   +7.6%
        DictWithStringKeys:    29ms    28ms   +0.4%    29ms    29ms   +0.7%
                  ForLoops:    25ms    25ms   -0.4%    26ms    26ms   -0.3%
                IfThenElse:    37ms    35ms   +3.3%    37ms    36ms   +3.0%
               ListSlicing:    39ms    38ms   +0.3%    39ms    39ms   +0.0%
            NestedForLoops:    40ms    40ms   +0.1%    40ms    40ms   -0.0%
  NestedListComprehensions:    41ms    42ms   -0.2%    42ms    42ms   -0.9%
      NormalClassAttribute:    82ms    78ms   +4.0%    82ms    79ms   +3.8%
   NormalInstanceAttribute:    43ms    42ms   +1.9%    44ms    43ms   +1.8%
       PythonFunctionCalls:    40ms    42ms   -5.0%    41ms    43ms   -4.8%
         PythonMethodCalls:    51ms    52ms   -1.1%    51ms    52ms   -0.8%
                 Recursion:    69ms    72ms   -4.8%    69ms    73ms   -4.7%
              SecondImport:    37ms    38ms   -0.1%    37ms    38ms   -0.1%
       SecondPackageImport:    40ms    39ms   +1.9%    40ms    39ms   +1.6%
     SecondSubmoduleImport:    96ms    97ms   -1.0%    96ms    97ms   -1.1%
   SimpleComplexArithmetic:    28ms    27ms   +1.7%    29ms    28ms   +1.9%
    SimpleDictManipulation:    59ms    59ms   -0.4%    59ms    59ms   -0.4%
     SimpleFloatArithmetic:    27ms    27ms   +2.2%    27ms    27ms   +2.3%
  SimpleIntFloatArithmetic:    31ms    31ms   +1.6%    31ms    31ms   +1.7%
   SimpleIntegerArithmetic:    31ms    31ms   +0.8%    31ms    31ms   +0.5%
  SimpleListComprehensions:    33ms    33ms   +0.2%    33ms    33ms   +0.4%
    SimpleListManipulation:    31ms    31ms   -0.0%    32ms    32ms   -1.5%
      SimpleLongArithmetic:    21ms    22ms   -2.5%    22ms    22ms   -1.8%
                SmallLists:    43ms    44ms   -3.3%    43ms    44ms   -3.5%
               SmallTuples:    51ms    53ms   -2.9%    52ms    54ms   -3.6%
     SpecialClassAttribute:    80ms    81ms   -1.0%    80ms    81ms   -1.1%
  SpecialInstanceAttribute:    42ms    42ms   -0.5%    42ms    42ms   -0.7%
            StringMappings:    87ms    86ms   +0.8%    87ms    86ms   +0.9%
          StringPredicates:    53ms    53ms   +0.3%    53ms    53ms   +0.2%
             StringSlicing:    48ms    49ms   -1.3%    48ms    49ms   -0.8%
                 TryExcept:    25ms    25ms   +0.6%    25ms    25ms   +0.7%
                TryFinally:    35ms    35ms   +1.1%    36ms    35ms   +0.7%
            TryRaiseExcept:    13ms    13ms   -0.5%    13ms    13ms   -0.6%
              TupleSlicing:    46ms    46ms   +0.2%    48ms    50ms   -3.2%
               WithFinally:    54ms    53ms   +0.8%    54ms    54ms   +0.7%
           WithRaiseExcept:    44ms    43ms   +2.8%    44ms    43ms   +2.7%

Totals: 2156ms 2150ms +0.3% 2171ms 2169ms +0.1%

(this=pybench.version, other=pybench.orig)

msg263730 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2016-04-19 11:01

On 19.04.2016 12:52, STINNER Victor wrote:

As usual, I'm very skeptical on the pybench results which almost look like noise. I don't understand how my change can make any operation faster, whereas some benchmarks are faster with the patch...

This can easily happen as a result of different memory layout, but is very much dependent on the machine architecture, CPU, memory type, etc.

Dict microbenchmarks:

       DictCreation:    38ms    36ms   +4.8%    39ms    37ms   +3.9%
  DictWithFloatKeys:    40ms    40ms   -0.8%    40ms    40ms   -0.4%
DictWithIntegerKeys:    33ms    31ms   +7.2%    33ms    31ms   +7.6%
 DictWithStringKeys:    29ms    28ms   +0.4%    29ms    29ms   +0.7%

SimpleDictManipulation: 59ms 59ms -0.4% 59ms 59ms -0.4%

Only dict creation and the integer keys benchmark results are relevant.

Could you perhaps check what's causing these slowdowns ?

msg263731 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-04-19 11:11

Could you perhaps check what's causing these slowdowns ?

It's obvious, no? My patch causes the slowdown.

On a timeit microbenchmark, I don't see such slowdown. That's also why I suspect that pybench is unstable.

python3.6 -m timeit '{}' says 105 ns with and without the patch.

python3.6 -m timeit 'd={}; d[1]=1; d[2]=2; d[3]=3; d[4]=4; d[5]=5; d[6]=6; d[7]=7; d[8]=8; d[9]=9; d[10]=10' says 838 ns with and without the patch.

I have to "cheat": I run timeit enough times until I see the "minimum".

msg263732 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2016-04-19 11:25

On 19.04.2016 13:11, STINNER Victor wrote:

STINNER Victor added the comment:

Could you perhaps check what's causing these slowdowns ?

It's obvious, no? My patch causes the slowdown.

Well, yes, of course :-) I meant whether there's anything you can do about those slowdowns.

On a timeit microbenchmark, I don't see such slowdown. That's also why I suspect that pybench is unstable.

python3.6 -m timeit '{}' says 105 ns with and without the patch.

python3.6 -m timeit 'd={}; d[1]=1; d[2]=2; d[3]=3; d[4]=4; d[5]=5; d[6]=6; d[7]=7; d[8]=8; d[9]=9; d[10]=10' says 838 ns with and without the patch.

I have to "cheat": I run timeit enough times until I see the "minimum".

Those operations are too fast for timeit. The overhead associated with looping is much larger than the time it takes to run the operation itself. That's why in pybench I put the operations into blocks of repeated statements. The interpreter then doesn't spend time on branching when going from one statement execution to the next (inside those blocks) and you get closer to the real runtime of the operation you're trying to measure.

msg263787 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-04-19 23:21

Results of the CPython benchmark suite on dict_version-8.patch.

IMHO regex_v8 can be ignored, this benchmark is unstable (issue #26275).

Original python: ../pep509/python 3.6.0a0 (default:3a9b47b062b9, Apr 19 2016, 16:23:15) [GCC 5.3.1 20151207 (Red Hat 5.3.1-2)]

Patched python: ../pep509/python 3.6.0a0 (default:96f61aab2c6e, Apr 19 2016, 15:21:15) [GCC 5.3.1 20151207 (Red Hat 5.3.1-2)]

$ python3 -u perf.py --rigorous -b all ../default.copy/python ../pep509/python

INFO:root:Automatically selected timer: perf_counter (...) Report on Linux smithers 4.4.4-301.fc23.x86_64 #1 SMP Fri Mar 4 17:42:42 UTC 2016 x86_64 x86_64 Total CPU cores: 8

2to3

Min: 6.714927 -> 6.856992: 1.02x slower Avg: 6.726773 -> 6.984359: 1.04x slower Significant (t=-5.16) Stddev: 0.01160 -> 0.11111: 9.5816x larger

call_method

Min: 0.312567 -> 0.323260: 1.03x slower Avg: 0.313399 -> 0.323821: 1.03x slower Significant (t=-284.62) Stddev: 0.00042 -> 0.00048: 1.1430x larger

call_simple

Min: 0.242091 -> 0.228922: 1.06x faster Avg: 0.243613 -> 0.229294: 1.06x faster Significant (t=2134.23) Stddev: 0.00010 -> 0.00005: 1.9124x smaller

etree_generate

Min: 0.255097 -> 0.265558: 1.04x slower Avg: 0.256947 -> 0.267478: 1.04x slower Significant (t=-67.89) Stddev: 0.00111 -> 0.00108: 1.0250x smaller

etree_parse

Min: 0.281726 -> 0.290976: 1.03x slower Avg: 0.283626 -> 0.292642: 1.03x slower Significant (t=-62.59) Stddev: 0.00109 -> 0.00094: 1.1540x smaller

etree_process

Min: 0.217164 -> 0.229552: 1.06x slower Avg: 0.219401 -> 0.231242: 1.05x slower Significant (t=-75.05) Stddev: 0.00112 -> 0.00111: 1.0053x smaller

fannkuch

Min: 1.012875 -> 0.985196: 1.03x faster Avg: 1.014760 -> 0.992104: 1.02x faster Significant (t=46.44) Stddev: 0.00287 -> 0.00395: 1.3778x larger

float

Min: 0.259226 -> 0.251918: 1.03x faster Avg: 0.266530 -> 0.260176: 1.02x faster Significant (t=10.69) Stddev: 0.00403 -> 0.00437: 1.0836x larger

json_load

Min: 0.430602 -> 0.433593: 1.01x slower Avg: 0.431278 -> 0.486924: 1.13x slower Significant (t=-24.86) Stddev: 0.00045 -> 0.02238: 49.8974x larger

normal_startup

Min: 0.319092 -> 0.326082: 1.02x slower Avg: 0.320093 -> 0.326842: 1.02x slower Significant (t=-70.52) Stddev: 0.00067 -> 0.00069: 1.0283x larger

regex_effbot

Min: 0.048969 -> 0.048313: 1.01x faster Avg: 0.050019 -> 0.048415: 1.03x faster Significant (t=31.41) Stddev: 0.00051 -> 0.00006: 8.8152x smaller

regex_v8

Min: 0.051040 -> 0.043412: 1.18x faster Avg: 0.051325 -> 0.043577: 1.18x faster Significant (t=45.60) Stddev: 0.00120 -> 0.00120: 1.0013x larger

richards

Min: 0.157625 -> 0.161126: 1.02x slower Avg: 0.158952 -> 0.162519: 1.02x slower Significant (t=-34.92) Stddev: 0.00073 -> 0.00071: 1.0351x smaller

silent_logging

Min: 0.070547 -> 0.069902: 1.01x faster Avg: 0.072246 -> 0.069934: 1.03x faster Significant (t=65.51) Stddev: 0.00035 -> 0.00003: 11.9828x smaller

The following not significant results are hidden, use -v to show them: call_method_slots, call_method_unknown, chameleon_v2, chaos, django_v3, etree_iterparse, fastpickle, fastunpickle, formatted_logging, go, hexiom2, json_dump_v2, mako_v2, meteor_contest, nbody, nqueens, pathlib, pickle_dict, pickle_list, pidigits, raytrace, regex_compile, simple_logging, spectral_norm, startup_nosite, telco, tornado_http, unpack_sequence, unpickle_list.

msg275132 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2016-09-08 20:06

New changeset d43f819caea7 by Victor Stinner in branch 'default': Add a new private version to the builtin dict type https://hg.python.org/cpython/rev/d43f819caea7

msg275133 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-09-08 20:07

The PEP 509 has been approved by Guido, I just pushed the implementation.

msg275566 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2016-09-10 04:52

New changeset c3776dd858f0 by Victor Stinner in branch 'default': Try to fix sizeof unit tests on dict https://hg.python.org/cpython/rev/c3776dd858f0

History

Date

User

Action

Args

2022-04-11 14:58:26

admin

set

github: 70246

2016-09-10 04:52:35

python-dev

set

messages: +

2016-09-08 20:07:22

vstinner

set

status: open -> closed
resolution: fixed
messages: +

2016-09-08 20:06:39

python-dev

set

nosy: + python-dev
messages: +

2016-05-15 10:32:20

jstasiak

set

nosy: + jstasiak

2016-04-19 23:21:52

vstinner

set

messages: +

2016-04-19 11:25:31

lemburg

set

messages: +

2016-04-19 11:11:55

vstinner

set

messages: +

2016-04-19 11:01:23

lemburg

set

nosy: + lemburg
messages: +

2016-04-19 10:52:21

vstinner

set

messages: +

2016-04-19 10:47:41

vstinner

set

messages: +

2016-04-19 10:38:56

vstinner

set

messages: +

2016-04-19 10:24:07

vstinner

set

files: + dict_version-8.patch

messages: +

2016-04-15 20:49:10

vstinner

set

files: + dict_version-7.patch

messages: +

2016-04-14 15:20:31

vstinner

set

messages: +

2016-01-27 18:34:44

yselivanov

link

issue26219 dependencies

2016-01-23 12:50:13

vstinner

set

files: + dict_version-6.patch

messages: +

2016-01-22 16:45:21

vstinner

set

messages: +

2016-01-22 16:40:44

yselivanov

set

messages: +

2016-01-22 16:22:09

vstinner

set

files: + dict_version-5.patch

messages: +

2016-01-18 20:50:01

r.david.murray

set

nosy: + r.david.murray

2016-01-14 20:24:25

yselivanov

set

nosy: + yselivanov

2016-01-13 16:12:52

vstinner

set

files: + guard_benchmark.py

2016-01-13 16:12:34

vstinner

set

files: + guard_benchmark.patch

messages: +

2016-01-13 10:41:19

vstinner

set

messages: +

2016-01-13 10:38:30

vstinner

set

files: + dict_version-4.patch

messages: +

2016-01-11 17:37:56

vstinner

set

files: + dict_version-3.patch

messages: +
versions: + Python 3.6, - Python 2.7

2016-01-11 14:48:17

vstinner

set

messages: +

2016-01-11 14:47:57

vstinner

set

title: Add dict.__version__ read-only property -> PEP 509: Add ma_version to PyDictObject

2016-01-10 01:07:19

brett.cannon

set

messages: +

2016-01-09 22:42:34

vstinner

set

messages: +
versions: + Python 2.7, - Python 3.6

2016-01-09 18:03:02

brett.cannon

set

nosy: + brett.cannon
messages: +

2016-01-09 11:09:25

vstinner

set

messages: +

2016-01-09 11:05:46

pitrou

set

nosy: + pitrou
messages: +

2016-01-09 11:00:03

vstinner

set

files: + dict_version-2.patch

messages: +

2016-01-09 09:53:06

vstinner

set

messages: +

2016-01-09 09:34:12

vstinner

set

messages: +

2016-01-09 09:30:48

vstinner

create