open()able objects (original) (raw)

Like others above, I don’t really like the idea of having these four specific dunder methods: it’s an incomplete API (for example as stated it doesn’t provide for + at all) and inherits the downside of the classic “bunch of letters” mode parameter without the benefit:

Using a single __open__ dunder would be better than that, but I wouldn’t support this either on the grounds that the downside of the mode parameter still makes it not worth bothering with.

What would be better still?

IMV you need the full set of functions, with some extra arguments, and they should be named more appropriately:

__open_bytes_reader__(*,
    buffering=-1) -> BytesReader
__open_bytes_writer__(*, append=False, must_create=False, truncate=None,
    buffering=-1) -> BytesWriter
__open_bytes_duplex__(*, allow_create=False, must_create=False, truncate=None,
    buffering=-1) -> BytesDuplex
__open_str_reader__(*,
    buffering=-1, encoding=None, errors=None, newline=None) -> StrReader
__open_str_writer__(*, append=False, must_create=False, truncate=None,
    buffering=-1, encoding=None, errors=None, newline=None) -> StrWriter
__open_str_duplex__(*, allow_create=False, must_create=False, truncate=None,
    buffering=-1, encoding=None, errors=None, newline=None) -> StrDuplex

You then translate mode as follows:

r..._reader()
w..._writer()
x..._writer(must_create=True)
a..._writer(append=True)
r+..._duplex(truncate=False)
w+..._duplex(truncate=True)
PHP x+..._duplex(must_create=True)
PHP c..._writer(truncate=False)
PHP c+..._duplex(allow_create=True)

combined with

bbytes_...
tstr_...
neither b nor tstr_...

This covers all possible modes you can pass to open, while presenting a better interface which could also be used for a future split of builtin open and pathlib.Path.open to the above six functions (without their leading/trailing underscores of course).

The return type of each function can be specified differently:

What are these PHP modes?

The PHP documentation for fopen describes modes x+, c and c+ which are not mentioned in the Python documentation.

c and c+ aren’t implemented by Python, but IMO they should be - or at least c+ as this is the more useful of the two.

x+ in Python does not raise an exception if I request it (using builtin open) but I’ve not tested thoroughly if it actually does what I expect it to do. If Python does already implement this mode, it ought to be documented; otherwise, it should (preferably, IMO) be implemented and documented, or if it’s decided that Python should not support this mode, attempting to use this mode should raise an exception and, again, this should be documented.

If it’s decided that Python shouldn’t support c or c+, then the allow_create arg can be removed from ..._duplex, the truncate arg can be removed from ..._writer, and the default for truncate in ..._duplex could be False, rather than None.

I was in two minds about whether to include these in this post - since it may well seem like an arbitrary, even unrelated, addition to the matter at hand - but I felt ultimately that it was better to include it, since if I simply proposed this API without consideration of x+, c and c+, the small differences in how the arguments would be specified might make it more difficult to add them later in a natural way.

Why the extra arguments?

There’s a possibly subtle reason why I’ve decided to use boolean arguments for append, must_create and truncate, while using separate functions altogether to determine read/write and binary/text. The reason is: append, must_create and truncate do not affect the API of the resulting object, and only affect what happens during the open call and not what happens afterwards (not counting the caveat regarding append and seek/tell described below).

append=True should override must_create and truncate; meanwhile, the default of truncate=None should act like truncate=True for open_..._writer or truncate=False for open_..._duplex. I specify None as the default here rather than True or False because a default of True on open_..._writer would be confusing when setting append=True. (As above, if the c mode is not added, then the truncate argument would be removed from open_..._writer and this problem goes away.)

When translating the r+ and w+ modes, the call should explicitly specify the arg in both cases since it’s not necessarily obvious otherwise. (I’ve chosen truncate=False to be the default behavior for open_..._duplex since I believe r+ to be much more commonly used than w+, but I haven’t specifically researched this.)

A note on seek/tell

The above doesn’t specify whether or not seek/tell can be used on the resulting object.

Now in almost all cases, this isn’t a problem, because it is “the thing that is being opened” that will determine this (so the appropriate dunder method can be typed to return an interface that does/doesn’t have seek/tell defined based on the knowledge that files opened from a particular class will/won’t be able to use them).

The sole exception, to my knowledge, is append mode: due to OS-related reasons, seek won’t (or at least might not) work there (and I’m not sure about tell), even if it theoretically could.

I think this combination is minor enough that it suffices to place a note in the doc comment of any seek (and tell if needed) interface function to state that it won’t work if the file was opened in append mode.