Issue 26049: Poor performance when reading large xmlrpc data (original) (raw)

Created on 2016-01-08 13:36 by pokoli, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
python27.patch pokoli,2016-01-08 14:32 review
default.patch pokoli,2016-01-08 16:05 review
default.patch ced,2016-01-09 10:26 review
client.py ced,2016-02-28 09:45
server.py ced,2016-02-28 09:46
Messages (16)
msg257756 - (view) Author: Sergi Almacellas Abellana (pokoli) * Date: 2016-01-08 13:36
By default, python xmlrpclib parser reads data by chunks of 1024 bytes [1], which leads to a lot of data concatenations when reading large data, which is very slow in python. Increasing the chuck size from 1024 bytes to a higher value makes improve in performance. On the same machine, we have tested with 20MB of data and we've got the following results: Chucks of 1024: 1min 48.933491sec Chucks of 10 * 1024 * 1024: 0.245282sec We have chosen 10 * 1024 * 1024, as it is the same value used in We have done our tests on python2.7, but same code exists for default branch [2] (and 3.x branches also [3][4][5][6]), so I belive all the versions are affected. I can work on a patch if you think this change is acceptable. IMHO it's logical to allow the developer to customize the chunk size instead of using a hard coded one. [1] https://hg.python.org/cpython/file/2.7/Lib/xmlrpclib.py#l1479 [2] https://hg.python.org/cpython/file/default/Lib/xmlrpc/client.py#l1310 [3] https://hg.python.org/cpython/file/3.5/Lib/xmlrpc/client.py#l1310 [4] https://hg.python.org/cpython/file/3.4/Lib/xmlrpc/client.py#l1324 [5] https://hg.python.org/cpython/file/3.3/Lib/xmlrpc/client.py#l1316 [6] https://hg.python.org/cpython/file/3.2/Lib/xmlrpc/client.py#l1301
msg257757 - (view) Author: Cédric Krier (ced) * Date: 2016-01-08 13:52
I don't think it is necessary to allow to customize the chunk size. Indeed Python should provide a good value by default that works for all platforms.
msg257759 - (view) Author: Sergi Almacellas Abellana (pokoli) * Date: 2016-01-08 14:32
I attach a patch to fix on default series
msg257760 - (view) Author: Sergi Almacellas Abellana (pokoli) * Date: 2016-01-08 14:32
And Also another patch for 2.7 branches
msg257764 - (view) Author: Sergi Almacellas Abellana (pokoli) * Date: 2016-01-08 16:05
I added a new patch which fixed comments
msg257774 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-01-08 19:14
I believe performance enhancements are, by default, limited to 'default', but this one might be a good candidate for backport.
msg257811 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-09 09:38
Reading with 10 MB limit allocates 10 MB buffer. It may be better to start with 1024 bytes limit, and increase it by 2 times on every iteration.
msg257815 - (view) Author: Cédric Krier (ced) * Date: 2016-01-09 10:00
Will it not be better indeed to just stream.read() without any argument? Because HTTPResponse will call _safe_read with just the length of the header.
msg257816 - (view) Author: Cédric Krier (ced) * Date: 2016-01-09 10:26
Answering to myself, it is better to read by chunk to feed the parser also by chunk. So here is a version of the patch which increases by 2 on every loop.
msg260444 - (view) Author: Cédric Krier (ced) * Date: 2016-02-18 09:33
ping
msg260767 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-24 08:24
Could you please provide a microbenchmark that exposes the benefit of this optimization?
msg260768 - (view) Author: Cédric Krier (ced) * Date: 2016-02-24 08:39
Is there an infrastructure already in place for such microbenchmark?
msg260791 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-24 11:13
There is no special infrastructure. You can just provide two files. One script should implement simple server that produces large output. Other script should implement the client that makes a number of requests and measure the time. Of course you can implement this in one file using multiprocessing if you prefer.
msg260970 - (view) Author: Cédric Krier (ced) * Date: 2016-02-28 09:46
Here is the client/server scripts. I don't measure a big performance improvement with it. I think the improvement measured in are linked to the way xmlrpclib is overriden in Tryton.
msg260976 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-02-28 13:36
Thanks Cédric. There is no improvement for the first request. But if repeat requests, there is small (about 5%) improvement. This is too small.
msg260978 - (view) Author: Cédric Krier (ced) * Date: 2016-02-28 13:52
One advantage, I see, is when xmlrpclib is overridden to use an other marshaller which is can not be feed chunk by chunk. So reducing the number of call to feed will have a bigger impact. But I don't know if this is enough for Python.
History
Date User Action Args
2022-04-11 14:58:25 admin set github: 70237
2016-05-18 14:01:38 Oscar Andres Alvarez Montero set nosy: + Oscar Andres Alvarez Montero
2016-02-28 13:52:38 ced set messages: +
2016-02-28 13:36:43 serhiy.storchaka set messages: +
2016-02-28 09:46:30 ced set files: + server.pymessages: +
2016-02-28 09:45:02 ced set files: + client.py
2016-02-24 11:13:19 serhiy.storchaka set messages: +
2016-02-24 08:39:21 ced set messages: +
2016-02-24 08:24:07 serhiy.storchaka set messages: +
2016-02-18 09:33:44 ced set messages: +
2016-01-22 09:33:07 resteve set nosy: + resteve
2016-01-16 18:46:38 serhiy.storchaka set stage: patch review
2016-01-09 10:26:45 ced set files: + default.patchmessages: +
2016-01-09 10:00:35 ced set messages: +
2016-01-09 09:38:04 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2016-01-08 19:14:55 terry.reedy set nosy: + loewis, terry.reedymessages: + versions: + Python 3.6
2016-01-08 16:05:49 pokoli set files: - default.patch
2016-01-08 16:05:39 pokoli set files: + default.patchmessages: +
2016-01-08 14:32:54 pokoli set files: + python27.patchmessages: +
2016-01-08 14:32:35 pokoli set files: + default.patchkeywords: + patchmessages: +
2016-01-08 13:52:28 ced set type: performancemessages: + nosy: + ced
2016-01-08 13:36:38 pokoli create