[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Fri Oct 21 02:48:53 EDT 2016


On 19 October 2016 at 01:28, Chris Barker - NOAA Federal <chris.barker at noaa.gov> wrote:

def getbuiltinmethods(): ... return [(name, methodname) for name, obj in getbuiltintypes().items() for methodname, method in vars(obj).items() if not methodname.startswith("")]_ ... len(getbuiltinmethods()) 230 So what? No one looks in all the methods of builtins at once.

Yes, Python implementation developers do, which is why it's a useful part of defining the overall "size" of Python and how that is growing over time.

When we define a new standard library module (particularly pure Python ones) rather than new methods on builtin types, we create substantially less additional work for other implementations, and we make it easier for educators to decide whether or not they should be introducing their students to the new capabilities.

That latter aspect is important, as providing functionality as separate modules means we also gain an enhanced ability to explain "What is this for?", which is something we regularly struggle with when making changes to the core language to better support relatively advanced domain specific use cases (see http://learning-python.com/books/python-changes-2014-plus.html for one generalist author's perspective on the vast gulf that can arise between "What professional programmers want" and "What's relevant to new programmers")

If we have anything like an OO System (and python builtins only sort of do...) then folks look for a built in that they need, and only then look at its methods.

If you need to work with bytes, you'll look at the bytes object and bytarray object. Having to go find some helper function module to know to efficiently do something with bytes is VERY non-discoverable!

Which is more comprehensible and discoverable, dict.setdefault(), or collections.defaultdict()?

Micro-optimisations like dict.setdefault() typically don't make sense in isolation - they only make sense in the context of a particular pattern of thought. Now, one approach to such patterns is to say "We just need to do a better job of teaching people to recognise and use the pattern!". This approach tends not to work very well - you're often better off extracting the entire pattern out to a higher level construct, giving that construct a name, and teaching that, and letting people worry about how it works internally later.

(For a slightly different example, consider the rationale for adding the secrets module, even though it's mostly just a collection of relatively thin wrappers around os.urandom())

bytes and bytarray are already low-level objects -- adding low-level functionality to them makes perfect sense.

They're not really that low level. They're relatively low level (especially for Python), but they're still a long way away from the kind of raw control over memory layout that a language like C or Rust can give you.

And no, this is not just for asycio at all -- it's potentially useful for any byte manipulation.

Yes, which is why I think the end goal should be a public iobuffers module in the standard library. Doing IO buffer manipulation efficiently is a complex topic, but it's also one where there are:

+1 on a frombuffer() method.

Still -1 in the absence of evidence that a good IO buffer abstraction for asyncio and the standard library can't be written without it (where the evidence I'll accept is "We already wrote the abstraction layer, and not having this builtin feature necessarily introduces inefficiencies or a lack of portability beyond CPython into our implementation").

Putting special purpose functionality behind an import gate helps to provide a more explicit context of use This is a fine argument for putting bytearray in a separate module -- but that ship has sailed. The method to construct a bytearray from a buffer belongs with the bytearray object.

The bytearray constructor already accepts arbitrary bytes-like objects. What this proposal is about is a way to more efficiently snapshot a slice of a bytearray object for use in asyncio buffer manipulation in cases where all of the following constraints apply:

For a great many use cases, we simply don't care about those constraints (especially the last one), so adding bytes.frombuffer is just confusing: we can readily predict that after adding it, a future Stack Overflow question will be "When should I use bytes.frombuffer() in Python instead of the normal bytes constructor?"

By contrast, if we instead say "We want Python to natively support efficient. readily discoverable, IO buffer manipulation", then folks can ask "What's preventing us from providing an iobuffers module today?" and start working towards that end goal (just as the"selectors" module was added as an asyncio-independent abstraction layer over select, epoll and kqueue, but probably wouldn't have been without the asyncio use case to drive its design and implementation as a standard library module)

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list