[Python-Dev] bpo-34595: How to format a type name? (original) (raw)

Victor Stinner vstinner at redhat.com
Tue Sep 11 18:23:45 EDT 2018

Previous message (by thread): [Python-Dev] 3.7.1 and 3.6.7 Releases Coming Soon
Next message (by thread): [Python-Dev] bpo-34595: How to format a type name?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

Last week, I opened an issue to propose to add a new %T formatter to PyUnicode_FromFormatV() and so indirectly to PyUnicode_FromFormat() and PyErr_Format():

https://bugs.python.org/issue34595

I merged my change, but then Serhiy Storchaka asked if we can add something to get the "fully qualified name" (FQN) of a type, ex "datetime.timedelta" (FQN) vs "timedelta" (what I call "short" name). I proposed a second pull request to add %t (short) in addition to %T (FQN).

But then Petr Viktorin asked me to open a thread on python-dev to get a wider discussion. So here I am.

The rationale for this change is to fix multiple issues:

C extensions use Py_TYPE(obj)->tp_name which returns a fully qualified name for C types, but the name (without the module) for Python name. Python modules use type(obj).name which always return the short name.
currently, many C extensions truncate the type name: use "%.80s" instead of "%s" to format a type name
"%s" with Py_TYPE(obj)->tp_name is used more than 200 times in the C code, and I dislike this complex pattern. IMHO "%t" with obj would be simpler to read, write and maintain.
I want C extensions and Python modules to have the same behavior: respect the PEP 399. Petr considers that error messages are not part of the PEP 399, but the issue is wider than only error messages.

The main issue is that at the C level, Py_TYPE(obj)->tp_name is "usually" the fully qualified name for types defined in C, but it's only the "short" name for types defined in Python.

For example, if you get the C accelerator "_datetime", PyTYPE(obj)->tp_name of a datetime.timedelta object gives you "datetime.timedelta", but if you don't have the accelerator, tp_name is just "timedelta".

Another example, this script displays "mytimedelta(0)" if you have the C accelerator, but "main.mytimedelta(0)" if you use the Python implementation:

import sys #sys.modules['_datetime'] = None import datetime

class mytimedelta(datetime.timedelta): pass

print(repr(mytimedelta()))

So I would like to fix this kind of issue.

Type names are mainly used for two purposes:

format an error message
obj.repr()

It's unclear to me if we should use the "short" or the "fully qualified" name. It should maybe be decided on a case by case basis.

There is also a 3rd usage: to implement reduce, here backward compatibility matters.

Note: The discussion evolved since my first implementation of %T which just used the not well defined Py_TYPE(obj)->tp_name.

Petr asked me why not exposing functions to get these names. For example, with my second PR (not merged), there are 3 (private) functions:

/* type.name / const char _PyType_Name(PyTypeObject type); / type.qualname / PyObject _PyType_QualName(PyTypeObject *type);

type.module "." type.qualname (but type.qualname for builtin types) */ PyObject * _PyType_FullName(PyTypeObject *type);

My concern here is that each caller has to handler error:

PyErr_Format(PyExc_TypeError, "must be str, not %.100s", Py_TYPE(obj)->tp_name);

would become:

PyObject type_name = _PyType_FullName(Py_TYPE(obj)); if (name == NULL) { / do something with this error ... */ PyErr_Format(PyExc_TypeError, "must be str, not %U", type_name); Py_DECREF(name);

When I report an error, I dislike having to handle new errors... I prefer that the error handling is done inside PyErr_Format() for me, to reduce the risk of additional bugs.

Serhiy also asked if we could expose the same feature at the Python level: provide something to get the fully qualified name of a type. It's not just f"{type(obj).__module}.{type(obj).name}", but you have to skip the module for builtin types like "str" (not return "builtins.str").

Maybe we can have "name: {0:t}, FQN: {0:T}".format(type(obj)). "t" for name and "T" for fully qualfied name. We would only have to modify type.format().

I'm not sure if we need to add new formatters to str % args.

Example of Python code:

raise TypeError("must be str, not %s" % type(fmt).name)

I'm not sure about Python changes. My first concern was just to avoid Py_TYPE(obj)->tp_name at the C level. But again, we should keep C and Python consistent. If the behavior of C extensions change, Python modules should be adapted as well, to get the same behavior.

Note: I reverted my change which added the %T formatter from PyUnicode_FromFormatV() to clarify the status of this issue.

Victor

Previous message (by thread): [Python-Dev] 3.7.1 and 3.6.7 Releases Coming Soon
Next message (by thread): [Python-Dev] bpo-34595: How to format a type name?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list