(original) (raw)
On 12/04/2012 01:08 AM, Antoine Pitrou wrote:
Le Mon, 03 Dec 2012 14:29:35 -0800, Larry Hastings a écrit :/\*\[clinic\] dbm.open -> mapping basename=dbmopenconst char *filename;
The filename to open.
So how does it handle the fact that filename can either be a unicode
string or a fsencoding-encoded bytestring? And how does it do the right
encoding/decoding dance, possibly platform-specific?[...]
I see, it doesn't :-)
If you compare the Clinic-generated code to the current
implementation of dbm.open (and all the other functions I've
touched) you'll find the "format units" specified to PyArg_Parse*
are identical. Thus I assert the replacement argument parsing is no
worse (and no better) than what's currently shipping in Python.
Separately, I contributed code that handles unicode vs bytes for
filenames in a reasonably cross-platform way; see "path_converter"
in Modules/posixmodule.c. (This shipped in Python 3.3.) And
indeed, I have examples of using "path_converter" with Clinic in my
branch.
Along these lines, I've been contemplating proposing that Clinic
specifically understand "path" arguments, distinctly from other
string arguments, as they are both common and rarely handled
correctly. My main fear is that I probably don't understand all
their complexities either ;-)
Anyway, this is certainly something we can consider *improving* for
Python 3.4. But for now I'm trying to make Clinic an
indistinguishable drop-in replacement.
I like the idea, but it needs more polishing. I don't think the various
"duck types" accepted by Python can be expressed fully in plain C types
(e.g. you must distinguish between taking all kinds of numbers or only
an __index__-providing number).
Naturally I agree Clinic needs more polishing. But the problem you
fear is already solved. Clinic allows precisely expressing any
existing PyArg_ "format unit"** through a combination of the type of
the parameter and its "flags". The flags only become necessary for
types used by multiple format units; for example, s, z, es, et, es#,
et#, y, and y# all map to char *, so it's necessary to disambiguate
by using the "flags". The specific case you cite
("__index__-providing number") is already unambiguous; that's n,
mapped to Py_ssize_t. There aren't any other format units that map
to a Py_ssize_t, so we're done.
** Well, any format unit except w*. I don't handle it just because
I wasn't sure how best to do so.
/arry