Issue 21790: Change blocksize in http.client to the value of resource.getpagesize (original) (raw)
Created on 2014-06-17 15:40 by demian.brecht, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (8)
Author: Demian Brecht (demian.brecht) *
Date: 2014-06-17 15:40
When sending data, the blocksize is currently hardcoded to 8192. It should likely be set to the value of resource.getpagesize().
Author: PCManticore (Claudiu.Popa) *
Date: 2014-06-17 20:47
resource module is available only on Unix.
Author: Demian Brecht (demian.brecht) *
Date: 2014-06-20 14:49
Updated to mmap.PAGESIZE, which seems to be available on all systems.
Author: STINNER Victor (vstinner) *
Date: 2014-06-20 15:03
When sending data, the blocksize is currently hardcoded to 8192.
Yeah, same value for io.DEFAULT_BUFFER_SIZE.
It should likely be set to the value of resource.getpagesize().
Could you please elaborate? A page size is 4096 bytes on my Linux. So your page double the number of calls to read(), right?
shutil.copyfileobj() uses a buffer of 16 KB by default.
See also the issue #21679 which adds a private copy of the stat().st_blksize attribute in a FileIO object.
Author: Demian Brecht (demian.brecht) *
Date: 2014-06-20 19:56
I very well could be missing something here (and, admittedly, my OS knowledge is rusty at best), but for the most part, page sizes are 4096, except for SPARC, which is 8192.
So your page double the number of calls to read(), right?
No. read() is called until EOF. I'm assuming that 8192 may have been used to accommodate worst case architecture?
I'd have to dig through the C side of things (which I haven't done yet) to see what's going on down at that level, but my assumption is that it's allocating 8192 bytes for each read. Should EOF be reached with <= 1 page filled, it'd result in a wasted page.
Far from the end of the world, but just something I noticed in passing.
Author: Clay Gerrard (Clay Gerrard)
Date: 2017-05-01 18:13
I don't think the default is bad/wrong so much as it's an inflexible magic number.
I think blocksize should be parameterized as an attribute on the connection - or at least a class attribute or module level constant so we can monkey patch it without having to create a subclass and redefine the entire send method (!?).
Not having a good string to pull at here is causing a non-trivial amount of pain:
https://github.com/sigmavirus24/requests-toolbelt/pull/84/files https://github.com/sigmavirus24/requests-toolbelt/issues/75 https://webcache.googleusercontent.com/search?q=cache:WPSngjyUzg4J:https://technology.jana.com/2015/12/30/dont-use-python-httplib-or-any-library-that-uses-it-to-transfer-large-files-especially-to-china/+&cd=5&hl=en&ct=clnk&gl=us https://bugs.launchpad.net/python-swiftclient/+bug/1671621
Author: Martin Panter (martin.panter) *
Date: 2017-05-02 01:26
For plain-text (non-SSL) HTTP uploads, perhaps using “socket.sendfile” would help: Issue 13559.
Another idea would be to expose and document the low-level socket (or SSL wrapper) and let the user copy data with whatever chunk size they desire.
Author: Martin Panter (martin.panter) *
Date: 2017-11-05 23:28
Issue 31945 proposes adding a “blocksize” parameter to HTTPConnection objects, so I suggest to closing in favour of that one.
History
Date
User
Action
Args
2022-04-11 14:58:05
admin
set
github: 65989
2017-11-08 16:17:02
berker.peksag
set
status: open -> closed
stage: resolved
2017-11-05 23:28:20
martin.panter
set
superseder: Configurable blocksize in HTTP(S)Connection
resolution: rejected
messages: +
2017-05-02 01:26:19
martin.panter
set
nosy: + martin.panter
messages: +
2017-05-01 18:13:15
Clay Gerrard
set
nosy: + Clay Gerrard
messages: +
2014-06-20 19:56:49
demian.brecht
set
messages: +
2014-06-20 15:03:11
vstinner
set
nosy: + vstinner
messages: +
2014-06-20 14:49:06
demian.brecht
set
files: + issue21790.patch
messages: +
2014-06-17 20:47:43
Claudiu.Popa
set
nosy: + Claudiu.Popa
messages: +
2014-06-17 15:40:24
demian.brecht
create