Issue 18720: Switch suitable constants in the socket module to IntEnum (original) (raw)
Created on 2013-08-12 21:10 by eli.bendersky, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Messages (30)
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-12 21:10
As part of original plan and since issue #18264 has been resolved, it's time to dust off some old patches I have for the socket.* module. The socket.AF_* and socket.SOCK_* constants are good candidates for IntEnum conversion.
I'm attaching an initial patch that handles socket.AF_* (as a proof-of-concept; socket.SOCK_* should get identical treatment); it only touches Lib/socket.py and all regrtest tests pass without changes. Result:
import socket socket.AF_INET <AddressFamily.AF_INET: 2> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.family <AddressFamily.AF_INET: 2> '{}'.format(s.family) 'AddressFamily.AF_INET'
The code in the patch is pretty well commented, and I also want to point out a couple of decision points that came up:
- The underlying socketmodule.c has no idea of IntEnums, and neither IMHO it should. These constants are essentially one-way, going from Python into C. They are only exposed back through read-only accessors (e.g. s.family above), at which point the Python code can wrap them back into the enum. The alternative, making socketmodule.c use enums is probably way more complicated than really necessary.
- As a followup to (1), the constants are actually wrapped into an IntEnum at the Python level and exposed back to the user. My hacking of globals() there may be a bit rough - suggestions for a better way are welcome.
- A bunch of AF_* constants exported by Python built on my x64 Ubuntu are not documented in socket.rst; I'm still wrapping them all in enums; I basically went over all PyModule_AddIntMacro(m, AF_*) in PyInit__socket.
PTAL
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-12 21:55
Attaching updated patch answering Ethan's review. Also added a tiny sanity test that makes sure that AF_INET is indeed an enum. A more comprehensive test probably won't hurt for the final patch.
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2013-08-14 07:12
What about SOCK_STREAM, SOCK_DGRAM, and SOCK_RAW?
Author: Antoine Pitrou (pitrou) *
Date: 2013-08-14 09:54
I'm not really thrilled by the current implementation. The way it is done leads to duplication among socket constants. Also, if I create a socket from a numeric value (because a well-known socket family is known yet known by Python), then socket.family will fail.
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-14 12:45
I'm not really thrilled by the current implementation. The way it is done leads to duplication among socket constants.
Can you clarify which duplication you mean specifically? In the code review tool, Serhiy proposed that instead of listing all the constants explicitly in the Python code, we can just do:
AddressFamily = IntEnum('AddressFamily', {name: value for name, value in moduledict.items() if name.startswith('AF')})
Is this less duplication?
Also, if I create a socket from a numeric value (because a well-known socket family is known yet known by Python), then socket.family will fail.
Well, the documentation of socket.socket says it should be one of the AF_* constants. One thing we can do is check in the socket.socket constructor whether it's a known AF_* constants and fail early with a meaningful error message. Would that be better?
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-14 12:46
What about SOCK_STREAM, SOCK_DGRAM, and SOCK_RAW?
This is a proof of concept. As I mentioned in the original message starting this issue - the SOCK_* constants will get the same treatment as AF_* constants, once we decide what the right treatment is.
Author: Antoine Pitrou (pitrou) *
Date: 2013-08-14 13:11
Can you clarify which duplication you mean specifically? In the code review tool, Serhiy proposed that instead of listing all the constants explicitly in the Python code, we can just do:
AddressFamily = IntEnum('AddressFamily', {name: value for name, value in moduledict.items() if name.startswith('AF')})
Yes, that's better.
Also, if I create a socket from a numeric value (because a well-known socket family is known yet known by Python), then socket.family will fail.
Well, the documentation of socket.socket says it should be one of the AF_* constants. One thing we can do is check in the socket.socket constructor whether it's a known AF_* constants and fail early with a meaningful error message. Would that be better?
Well, no. We should still allow creating sockets with unknown numeric values. The .family attribute should then return the raw integer.
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2013-08-14 13:24
I'm surprised that test_repr is not failed. '%i' % socket.AddressFamily.AF_INET returns 'AddressFamily.AF_INET' instead of '2' as I expected. Is not this a bug in IntEnum implementation?
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-14 13:40
Attaching a new patch that uses Serhiy's suggestion to avoid the duplication, and uses Antoine's suggestion to keep socket.family working even if the family is an unknown numeric constant.
Note that the latter is challenging to test - I need families the OS knows and socket doesn't. Any suggestions are welcome. For the time being the test I cooked up is an abomination ;-)
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-14 13:43
On Wed, Aug 14, 2013 at 6:24 AM, Serhiy Storchaka <report@bugs.python.org>wrote:
Serhiy Storchaka added the comment:
I'm surprised that test_repr is not failed. '%i' % socket.AddressFamily.AF_INET returns 'AddressFamily.AF_INET' instead of '2' as I expected. Is not this a bug in IntEnum implementation?
I did notice it when playing with the implementation. Perhaps %i doesn't call enum? Need to look into it more carefully...
Author: Ethan Furman (ethan.furman) *
Date: 2013-08-14 13:49
Issue18738 created.
Author: Christian Heimes (christian.heimes) *
Date: 2013-08-14 13:56
The manual list of AF_ prefixes looks ugly. We are going to run into the same problem for every module. How about an additional constructor that takes a name, dict and string prefix?
class IntEnum(int, Enum): @classmethod def from_moduledict(cls, name, moddict, prefix): members = {name: value for name, value in moddict.items() if name.startswith(prefix)} return cls(name, members)
import socket, enum, pprint AddressFamily = enum.IntEnum.from_moduledict("AddressFamily", socket.dict, "AF_") pprint.pprint(dict(AddressFamily.members)) {'AF_APPLETALK': <AddressFamily.AF_APPLETALK: 5>, 'AF_ASH': <AddressFamily.AF_ASH: 18>, 'AF_ATMPVC': <AddressFamily.AF_ATMPVC: 8>, 'AF_ATMSVC': <AddressFamily.AF_ATMSVC: 20>, 'AF_AX25': <AddressFamily.AF_AX25: 3>, 'AF_BLUETOOTH': <AddressFamily.AF_BLUETOOTH: 31>, 'AF_BRIDGE': <AddressFamily.AF_BRIDGE: 7>, 'AF_CAN': <AddressFamily.AF_CAN: 29>, 'AF_DECnet': <AddressFamily.AF_DECnet: 12>, 'AF_ECONET': <AddressFamily.AF_ECONET: 19>, 'AF_INET': <AddressFamily.AF_INET: 2>, 'AF_INET6': <AddressFamily.AF_INET6: 10>, 'AF_IPX': <AddressFamily.AF_IPX: 4>, 'AF_IRDA': <AddressFamily.AF_IRDA: 23>, 'AF_KEY': <AddressFamily.AF_KEY: 15>, 'AF_LLC': <AddressFamily.AF_LLC: 26>, 'AF_NETBEUI': <AddressFamily.AF_NETBEUI: 13>, 'AF_NETLINK': <AddressFamily.AF_NETLINK: 16>, 'AF_NETROM': <AddressFamily.AF_NETROM: 6>, 'AF_PACKET': <AddressFamily.AF_PACKET: 17>, 'AF_PPPOX': <AddressFamily.AF_PPPOX: 24>, 'AF_RDS': <AddressFamily.AF_RDS: 21>, 'AF_ROSE': <AddressFamily.AF_ROSE: 11>, 'AF_ROUTE': <AddressFamily.AF_NETLINK: 16>, 'AF_SECURITY': <AddressFamily.AF_SECURITY: 14>, 'AF_SNA': <AddressFamily.AF_SNA: 22>, 'AF_TIPC': <AddressFamily.AF_TIPC: 30>, 'AF_UNIX': <AddressFamily.AF_UNIX: 1>, 'AF_UNSPEC': <AddressFamily.AF_UNSPEC: 0>, 'AF_WANPIPE': <AddressFamily.AF_WANPIPE: 25>, 'AF_X25': <AddressFamily.AF_X25: 9>}
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-14 14:03
Christian, this is an interesting idea - but I suggest we review it once we actually get to convert "many modules". For the time being the manual listing is gone in the recent patch, and for a number of constant families in the same module (i.e. SOCK_* that will be added here) we can combine the lookup into a single loop - so no need walking the module dict multiple times.
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2013-08-14 14:03
I thing the AddressFamily enum class and the family property should be documented.
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-14 14:07
I thing the AddressFamily enum class and the family property should be documented.
I'm leaving the documentation to the end when we decide on the implementation strategy. Once that happens I'll post a patch with .rst changes, etc.
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2013-08-14 14:12
Nitpick: If we have grouped this constants in an enum class perhaps it will be good if they will be enumerated in the order of its values. Or no?
Author: Charles-François Natali (neologix) *
Date: 2013-08-16 10:38
I've just had a quick look at the patch, but from what I can see, socket.getaddrinfo() will return a raw integer for the family, and not an AddressFamily instance. That's unfortunate.
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-16 13:28
Great catch, Charles-François, thanks. It's actually an interesting example in favor of the approach - it's very nice to see descriptive family names in the data returned by getaddrinfo, instead of obtuse numeric values.
The attached patch fixes it in a similar vein to what was done for the family accessor: getaddrinfo is overridden in Python and does the necessary conversion. User code is unaffected. It has to rebuild the list (since those are tuples) but I don't think it matters in terms of performance for this specific function. As a bonus, getaddrinfo got a much nicer docstring :)
Note again that if this is acceptable, I'm going to similarly convert SOCK_* and post a full patch (with some documentation too).
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-16 13:36
Looks nice:
for t in socket.getaddrinfo('www.google.com', 'http'): ... t ... (<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.112', 80)) (<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.112', 80)) (<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.116', 80)) (<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.116', 80)) (<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.113', 80)) (<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.113', 80)) (<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.115', 80)) (<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.115', 80)) (<AddressFamily.AF_INET: 2>, 1, 6, '', ('74.125.239.114', 80)) (<AddressFamily.AF_INET: 2>, 2, 17, '', ('74.125.239.114', 80)) (<AddressFamily.AF_INET6: 10>, 1, 6, '', ('2607:f8b0:4005:800::1011', 80, 0, 0)) (<AddressFamily.AF_INET6: 10>, 2, 17, '', ('2607:f8b0:4005:800::1011', 80, 0, 0))
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2013-08-16 19:09
Besides some nitpicks which I have left on Rietveld the patch LGTM.
I guess the _family_converter idiom will be popular enough. Perhaps we will add some helper just to the Enum class.
Author: Charles-François Natali (neologix) *
Date: 2013-08-16 21:37
The attached patch fixes it in a similar vein to what was done for the family accessor: getaddrinfo is overridden in Python and does the necessary conversion.
Am I the only one thinking that wrapping every C function with a Python counterpart to perform enum conversions is tedious, ugly and error-prone :-\
Author: Christian Heimes (christian.heimes) *
Date: 2013-08-16 21:44
I wonder how much of an performance issue these wrappers are going to be.
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-16 23:36
Charles-François: unfortunately, the alternative seems to be even more tedious and error prone. Internally, socketmodule.c really uses ints all over the place. Changing that to use IntEnum objects from Python would be very tedious and touch a lot more code.
Christian - I think that practically none. None of the touched places seems like a performance concern (even without realizing that these are sockets and network operations we're talking about here)
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-26 02:32
Serhiy, patch 5 addresses your latest review. Thanks. As for _family_converter, let's see how common this really becomes. Based on one use-case I wouldn't add it to Enum itself. But if we see it gets repeated in many places, we can.
Author: Guido van Rossum (gvanrossum) *
Date: 2013-08-26 19:50
Looks good. Go ahead and convert the rest of the socket constants. Then we can consider other modules, e.g. select and os.
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-30 13:55
[Thanks, Guido]
Attaching patch with SOCK_* constants converted as well. The module globals are currently walked twice, once to extract AF_* and once SOCK_*. This is not a real performance problem, but should we fold it into a single loop that collects two dicts? The code will be less pretty but more efficient.
Another issue is _intenum_converter. Ethan - do you think it belongs in the enum module as a helper function or something of the sort?
Author: Guido van Rossum (gvanrossum) *
Date: 2013-08-30 14:58
I prefer two prettier loops over one less pretty one in this case. No opinion about _intenum_converter yet. (IMO refactoring can always be lazy, i.e. after you have multiple copies of the same code there's time to consider whether you should unify them and what the pros and cons are.)
On Fri, Aug 30, 2013 at 6:55 AM, Eli Bendersky <report@bugs.python.org>wrote:
Eli Bendersky added the comment:
[Thanks, Guido]
Attaching patch with SOCK_* constants converted as well. The module globals are currently walked twice, once to extract AF_* and once SOCK_*. This is not a real performance problem, but should we fold it into a single loop that collects two dicts? The code will be less pretty but more efficient.
Another issue is _intenum_converter. Ethan - do you think it belongs in the enum module as a helper function or something of the sort?
Added file: http://bugs.python.org/file31521/socket-intenum-af-type.6.patch
Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18720>
Author: Roundup Robot (python-dev)
Date: 2013-08-31 22:13
New changeset 038543d34166 by Eli Bendersky in branch 'default': Switch the AF_* and SOCK_* constants in the socket module to IntEnum. http://hg.python.org/cpython/rev/038543d34166
Author: Ethan Furman (ethan.furman) *
Date: 2013-08-31 22:18
Eli Bendersky added the comment:
Another issue is _intenum_converter. Ethan - do you think it belongs in the enum module as a helper function or something of the sort?
I'm fine with it being a non-public member of enum, although if we had a place for miscellaneous tools that might be a better spot for it.
Author: Eli Bendersky (eli.bendersky) *
Date: 2013-08-31 22:21
On Sat, Aug 31, 2013 at 3:18 PM, Ethan Furman <report@bugs.python.org>wrote:
Ethan Furman added the comment:
Eli Bendersky added the comment:
Another issue is _intenum_converter. Ethan - do you think it belongs in the enum module as a helper function or something of the sort?
I'm fine with it being a non-public member of enum, although if we had a place for miscellaneous tools that might be a better spot for it.
For the time being, I'm OK with Guido's suggestion to keep it lazy. As long as it's just being used in a single module, there's not much to think about. If we end up needing it when switching other modules, we'll see what to do.
History
Date
User
Action
Args
2022-04-11 14:57:49
admin
set
github: 62920
2013-08-31 22:21:55
eli.bendersky
set
messages: +
2013-08-31 22🔞38
ethan.furman
set
messages: +
2013-08-31 22:13:55
python-dev
set
status: open -> closed
nosy: + python-dev
messages: +
resolution: fixed
stage: patch review -> resolved
2013-08-30 14:58:46
gvanrossum
set
messages: +
2013-08-30 13:55:42
eli.bendersky
set
files: + socket-intenum-af-type.6.patch
messages: +
2013-08-26 19:50:23
gvanrossum
set
messages: +
2013-08-26 02:32:37
eli.bendersky
set
files: + socket-intenum-af.5.patch
messages: +
2013-08-16 23:36:25
eli.bendersky
set
messages: +
2013-08-16 21:44:58
christian.heimes
set
messages: +
2013-08-16 21:37:29
neologix
set
messages: +
2013-08-16 19:09:01
serhiy.storchaka
set
messages: +
2013-08-16 13:36:35
eli.bendersky
set
messages: +
2013-08-16 13:28:34
eli.bendersky
set
files: + socket-intenum-af.4.patch
messages: +
2013-08-16 10:38:26
neologix
set
messages: +
2013-08-14 14:12:58
serhiy.storchaka
set
messages: +
2013-08-14 14:07:01
eli.bendersky
set
messages: +
2013-08-14 14:03:27
serhiy.storchaka
set
messages: +
2013-08-14 14:03:12
eli.bendersky
set
messages: +
2013-08-14 13:56:04
christian.heimes
set
nosy: + christian.heimes
messages: +
2013-08-14 13:49:59
ethan.furman
set
messages: +
2013-08-14 13:43:11
eli.bendersky
set
messages: +
2013-08-14 13:40:47
eli.bendersky
set
files: + socket-intenum-af.3.patch
messages: +
2013-08-14 13:24:13
serhiy.storchaka
set
messages: +
2013-08-14 13:11:51
pitrou
set
messages: +
2013-08-14 12:46:24
eli.bendersky
set
messages: +
2013-08-14 12:45:27
eli.bendersky
set
messages: +
2013-08-14 09:58:34
pitrou
set
nosy: + neologix
2013-08-14 09:54:59
pitrou
set
messages: +
2013-08-14 07:12:04
serhiy.storchaka
set
messages: +
2013-08-12 21:55:41
eli.bendersky
set
files: + socket-intenum-af.2.patch
messages: +
2013-08-12 21:38:42
serhiy.storchaka
set
nosy: + serhiy.storchaka
2013-08-12 21:10:49
eli.bendersky
create