Add dataclass_factory
argument to dataclasses.make_dataclass
for custom dataclass transformation support (original) (raw)
Forward GitHub issue: python/cpython#118974
Feature or enhancement
Proposal:
typing.dataclass_transform (PEP 681 – Data Class Transforms) allows users define their own dataclass
decorator that can be recognized by the type checker.
Here is a real-world example use case:
Also, dataclasses.asdict and dataclasses.astuple allow users pass an extra argument for the factory of the returned instance.
However, the make_dataclass
function does not support third-party dataclass
factory (e.g., flax.struct.dataclass):
It can only apply dataclasses.dataclass
(see the return
statement above).
This feature request issue will discuss the possibility of adding a new dataclass_factory
argument to the dataclasses.make_dataclass to support third-party dataclasss transformation, similar to dict_factory
for dataclasses.asdict.
# dataclasses.py
def make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True,
repr=True, eq=True, order=False, unsafe_hash=False,
frozen=False, match_args=True, kw_only=False, slots=False,
weakref_slot=False, module=None,
dataclass_factory=dataclass):
...
# Apply the normal decorator.
return dataclass_factory(cls, init=init, repr=repr, eq=eq, order=order,
unsafe_hash=unsafe_hash, frozen=frozen,
match_args=match_args, kw_only=kw_only, slots=slots,
weakref_slot=weakref_slot)
sobolevn (Nikita Sobolev) May 13, 2024, 9:58am 2
Can you please show an example? How would you want to use this new param?
XuehaiPan (Xuehai Pan) May 13, 2024, 11:06am 3
I want to re-export the dataclasses
functionally in my own package. Here is the snippet to illustrate my use case:
# mypkg/
# ├── __init__.py
# └── dataclasses.py
import dataclasses
from typing_extensions import dataclass_transform # Python 3.11+
from mypkg import xxx, yyy, zzz
__all__ = ['dataclass', 'field', 'make_dataclass']
@dataclass_transform(field_specifiers=(field,))
def dataclass(cls=None, /, *, **kwargs):
xxx(kwargs) # do something
if cls is not None:
klass = dataclasses.dataclass(cls, **kwargs)
yyy(klass, kwargs) # do something else
return klass
def wrapper(cls):
klass = dataclasses.dataclass(cls, **kwargs)
yyy(klass, kwargs) # do something else
return klass
return wrapper
def field(**kwargs):
zzz(kwargs) # do something
return dataclasses.field(kwargs)
def make_dataclass(**kwargs):
return dataclasses.make_dataclass(
dataclass_factory=dataclass, # my own dataclass() above
**kwargs,
)
The users can do:
import mypkg
@mypkg.dataclasses.dataclass
class Foo:
x: int
y: int
Bar = mypkg.dataclasses.make_dataclass('Bar', [('a', float), ('b', int)])
NeilGirdhar (Neil Girdhar) May 13, 2024, 5:00pm 4
Do you really find Bar
as nice as Foo
? Seems significantly worse.
Can you not implement make_dataclass
in your package by creating a custom type, adding in the annotations you want, and finally applying your dataclass
function?
XuehaiPan (Xuehai Pan) May 13, 2024, 5:30pm 5
Yes, the normal use case of the @dataclass
decorator is more elegant and readable. But sometimes there are use cases for dynamic class creation, just like subclassing typing.NamedTuple
vs. calling collections.namedtuple
.
NUM_LAYERS = 32
MyNetwork = dataclasses.make_dataclass('MyNetwork', [(f'layer{i}', Layer) for i in range(NUM_LAYERS)])
NeilGirdhar (Neil Girdhar) May 13, 2024, 5:51pm 6
I understand, but can you not generate write a make_dataclass
function of your own without delegating to the dataclasses.make_dataclass
using the instructions in my last comment?
Oh, I see, but you want the annotations to be right. Got it.
XuehaiPan (Xuehai Pan) May 13, 2024, 6:04pm 7
I can do this, but I don’t think ordinary users can understand that and it is also not easy to use. I want to re-export the dataclasses
functionally in my package and then ship it to PyPI.
NUM_LAYERS = 32
MyNetwork1 = dataclasses.make_dataclass('MyNetwork1', [(f'layer{i}', Layer) for i in range(NUM_LAYERS)])
MyNetwork2 = type('MyNetwork2', (object,), {'__annotations__': {f'layer{i}': Layer for i in range(NUM_LAYERS)}})
MyNetwork2 = dataclasses.dataclass(MyNetwork2)
Also, I do not want to copy-paste the code of dataclasses.make_dataclass
in my package. I want to make it always sync with the stdlib.