[Python-Dev] PEP 414 - some numbers from the Django port (original) (raw)

Vinay Sajip vinay_sajip at yahoo.co.uk
Sat Mar 3 03:28:55 CET 2012

Previous message: [Python-Dev] odd "tuple does not support assignment" confusion...
Next message: [Python-Dev] PEP 414 - some numbers from the Django port
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

PEP 414 mentions the use of function wrappers and talks about both their obtrusiveness and performance impact on Python code. In the Django Python 3 port, I've used unicode_literals, and hence have no u prefixes in the ported code, and use a function wrapper to adorn native strings where they are needed.

Though the port is still work in progress, it passes all tests on 2.x and 3.x with the SQLite adapter, with only a small number skipped specifically during the porting exercise (generally due to representational differences).

I'd like to share some numbers from this port to see what people here think about them.

Firstly, on obtrusiveness: Out of a total of 1872 source files, the native string marker only appears in 30 files - 18 files in Django itself, and 12 files in the test suite. This is less than 2% of files, so the native string markers are not especially invasive when looking at code. There are only 76 lines in the ported Django which contain native string markers.

Secondly, on performance. I ran the following steps 6 times:

Run the test suite on unported Django using Python 2.7.2 ("vanilla") Run the test suite on the ported Django using Python 2.7.2 ("ported") Run the test suite on the ported Django using Python 3.2.2 ("ported3")

Django skips some tests because dependencies aren't installed (e.g. PIL for Python 3.2). The raw numbers, in seconds elapsed for the test run, are given below:

vanilla (4659 tests): 468.586 486.231 467.584 464.916 480.530 475.457 ported (4655 tests): 467.350 480.902 479.276 478.748 478.115 486.044 ported3 (4609 tests): 463.161 470.423 463.833 448.097 456.727 504.402

If we allow for the different numbers of tests run by dividing by the number of tests and multiplying by 100, we get:

vanilla-weighted: 10.057 10.436 10.036 9.979 10.314 10.205 ported-weighted: 10.040 10.331 10.296 10.285 10.271 10.441 ported3-weighted: 10.049 10.207 10.064 9.722 9.909 10.944

If I run these through ministat, it tells me there is no significant difference in these data sets, with a 95% confidence level:

$ ministat -w 74 vanilla-weighted ported-weighted ported3-weighted x vanilla-weighted

ported-weighted

ported3-weighted +--------------------------------------------------------------------------+ | * + | |* * x ** * ++x+ * *| ||___|M|AAM_AM||_| |_ +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 6 9.979 10.436 10.205 10.171167 0.17883782

6 10.04 10.441 10.296 10.277333 0.13148485 No difference proven at 95.0% confidence

6 9.722 10.944 10.064 10.149167 0.42250274 No difference proven at 95.0% confidence

So, looking at a large project in a relevant problem domain, unicode_literals and native string markers would appear not to adversely impact readability or performance.

Your comments would be appreciated.

Regards,

Vinay Sajip

Previous message: [Python-Dev] odd "tuple does not support assignment" confusion...
Next message: [Python-Dev] PEP 414 - some numbers from the Django port
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list