Dataclasses and non-dataclasses inheritance (original) (raw)
My codebase included (over many modules) code like:
class C0:
pass
@dataclass
class DC2:
my_field: bool = True
class C1(C0, DC2): # original state
pass
And lo it was good:
print(len(C1.__dataclass_fields__)) # 1, life is good
But then I tried turning the non-dataclass parent into a dataclass:
@dataclass
class DC1:
pass
class C2(DC1, DC2): # messes up __dataclass_fields__
pass
print(len(C2.__dataclass_fields__)) # 0 ??
Making the child a dataclass changes the state back to the expected one:
@dataclass
class DC2(DC1, DC2):
pass
print(len(DC2.__dataclass_fields__)) # 1
If until now it just seemed suspicious, I’m pretty sure this next one qualifies as a bug (or at the very least a glaring footgun): changing the order of the C2 parents flips the state too!
class C3(DC2, DC1):
pass
print(len(C3.__dataclass_fields__)) # 1 ?!?
You can play with it here: Python Playground - Online Python Programming IDE
Perhaps there’s some documented warning like ‘don’t mix dataclasses and non-dataclasses via inheritance’? A few places in the docs make me suspect otherwise, eg:
dataclasses.is_dataclass(obj)
Return True if its parameter is a dataclass (including subclasses of a dataclass)
Before I open a github bug and perhaps suggest a PR to change this behavior, could you experts kindly lend an opinion - is any of this somehow intentional? Is there something else I’m missing?
OfekShilon (Ofek Shilon) April 19, 2025, 7:13am 2
MegaIng (Cornelius Krupp) April 19, 2025, 8:37am 3
This is not really fixable because of the way dataclasses are implemented. It’s not really possible for them to implement behavior that happens on inheritance unless you manually call the decorator.
What you are seeing is straight inheritance of these fields without recomputation. So of course changing the order matters - this is true for all class attributes, including methods.
I don’t see a reasonable fix that doesn’t break some other usecases. ( The closest I can think of is dataclasses
also generating an __init_subclass__
method - but especially with respect to respecting user-defined versions of such methods this seens tricky)
IMO linters and type checkers should get warning for this edge cases, but I don’t think anything should change in python itself.
OfekShilon (Ofek Shilon) April 19, 2025, 9:29am 4
Perhaps dataclass itself should err upon mixing dataclasses and non-dataclasses on the inheritance tree for a specific method/attribute?
JamesParrott (James Parrott) April 19, 2025, 10:14am 5
Does decorating the parent again with @dataclass
fix it?
This could be a false positive, as it has zero member variables, so won’t affect the count in its parent.
OfekShilon (Ofek Shilon) April 19, 2025, 10:37am 6
Doesn’t help. The examples already include other workarounds, but maybe the root problem can be addressed - as one can imagine, in real code it wasn’t as direct.
oscarbenjamin (Oscar Benjamin) April 19, 2025, 11:12am 7
There were lengthy discussions related to this in True constructors.
DavidCEllis (David Ellis) April 19, 2025, 11:37am 8
As far as I can tell your examples show the behaviour I’d expect.
This seems to be more to do with forgetting to use the @dataclass
decorator, as the dataclass attribute you’re checking isn’t defined on the child class, so it’s looked up on the parents according to the MRO.
In your examples both DC1
and DC2
have __dataclass_fields__
. DC1
’s is empty and DC2
’s has one entry. In C2
’s MRO, DC1
comes before DC2
so you get the attribute from DC1
which is empty while in C3
’s MRO, DC2
comes first so you get its attribute with 1 entry.
When you decorate the child class with @dataclass
it does its own resolution checking for fields from parent classes and defines its own __dataclass_fields__
so it’s not retrieved from the parents any more.
You can’t really prevent people from forgetting to use the @dataclass
decorator just by virtue of it being a decorator. The class is first constructed as a non-dataclass and then the @dataclass
decorator converts it so a parent class can’t check if a child is a dataclass because initially it won’t be.
If you want to make sure that everything in the inheritance tree is a dataclass your best bet is probably to wrap dataclass
in an __init_subclass__
method (providing you don’t use slots=True
) along with a check that every class in the MRO (except object
) is a dataclass and then use that class as your base class instead of using @dataclass
directly. You should be able to use typing.dataclass_transform
with this to make static tools behave.
NeilGirdhar (Neil Girdhar) April 19, 2025, 12:17pm 9
The easier way to do this is to make a Dataclass
class that calls dataclass
on cls
in __init_subclass__
, decorate that with dataclass_transform
(so that type checkers understand what it does), and inherit from that instead of decorating your classes with @dataclass
. This wil ensure that all child classes of a dataclass are automatically dataclasses—even if you forget.
Example and complex example.
DavidCEllis (David Ellis) April 19, 2025, 12:32pm 10
Yes that was what I meant by wrap dataclass
in an __init_subclass__
method.
OfekShilon (Ofek Shilon) April 19, 2025, 7:53pm 11
@erictraut would it be feasible for a type checker to alert, say when a non-dataclass has a dataclass on its mro?
erictraut (Eric Traut) April 19, 2025, 9:26pm 12
would it be feasible for a type checker to alert, say when a non-dataclass has a dataclass on its mro?
Are you asking whether it’s possible to implement such a check statically? The answer is yes, a type checker or linter could implement this check statically. Or are you asking whether I think a type checker should implement such a check? The answer is “it depends”. This isn’t really a type checking issue, so type checker maintainers might be reluctant to add such a check. I’m also skeptical that this is a common source of bugs, so taking the time to add such a check (and translating error messages to other languages, etc.) would be difficult to justify without added evidence that this is a common problem.
OfekShilon (Ofek Shilon) April 20, 2025, 7:54am 13
What you describe are the inner implementation details of dataclass, and how they cause the behavior observed (and thank you for that!). However, That’s still a long way from “the behaviour one would expect” - I doubt any user would expect a change in order of dataclass parents would cause fields to be added or subtracted.
OfekShilon (Ofek Shilon) April 20, 2025, 7:56am 14
How would one go about collection such evidence? Is there some process in place?
I can say that in our case this was the root cause of some long standing bugs.
Alternatively - would you consider accepting a PR for such an added rule?
erictraut (Eric Traut) April 20, 2025, 8:19am 15
This isn’t really the right forum to be discussing pyright feature requests. Feel free to open a discussion thread or enhancement request in the pyright issue tracker.
DavidCEllis (David Ellis) April 20, 2025, 9:24am 16
dataclasses
uses a decorator, so the generation of new methods is not inherited and needs to be applied on each class you wish to have the dataclass methods. This is the only thing that is specific to dataclasses.
The observed behaviour then follows from standard inheritance as it does for hand written classes.
class Base1:
def __init__(self, arg1="arg1"):
self.arg1 = arg1
def __repr__(self):
return f"{self.__class__.__name__}(arg1={self.arg1!r})"
class Base2:
def __init__(self):
pass
def __repr__(self):
return f"{self.__class__.__name__}()"
class Child1(Base1, Base2):
pass
class Child2(Base2, Base1):
pass
print(Child1()) # Child1(arg1='arg1')
print(Child2()) # Child2()
This is not to say that it’s not easy to forget to place the @dataclass
decorator, just that if it is missing then this behaviour is what I would expect.
If you want the dataclass features applied automatically to inheriting classes to avoid this possibility, then you need to make the base class perform the application. Note that you may want to force kw_only=True
as the order of arguments will depend on the inheritance order.
from dataclasses import dataclass, field
from typing import dataclass_transform
@dataclass_transform(field_specifiers=(field,))
class DCBase:
def __init_subclass__(cls, /, **kwargs):
# optional: check for slots=True in kwargs and error
dataclass(cls, **kwargs)
class DC1(DCBase):
arg1: str = "arg1"
class DC2(DCBase):
arg2: str = "arg2"
class IC1(DC1, DC2):
pass
class IC2(DC2, DC1):
pass
print(IC1()) # IC1(arg2='arg2', arg1='arg1')
print(IC2()) # IC2(arg1='arg1', arg2='arg2')
oscarbenjamin (Oscar Benjamin) April 20, 2025, 11:38am 17
If this __init_subclass__
approach is something that is commonly useful then maybe it would be better if the dataclasses module provided this functionality directly.
DavidCEllis (David Ellis) April 20, 2025, 12:42pm 18
One problem with adding an __init_subclass__
approach is that the method is called after the class has been created so it’s too late to add __slots__
by that point. The current decorator can ‘cheat’ by creating an entirely new class that looks like the old one but with slots and returning it, but this doesn’t work inside __init_subclass__
(and has some of its own issues anyway). This is fine in your own code as you can just choose to not support slots but in the stdlib people would probably expect it to work, so this would probably need to be implemented with a metaclass.
In my own dataclass-like package[1] I created both a decorator and a metaclass/base class implementation and while there are some applications where the decorator is more appropriate, most of the time I find I end up using the base class. I would be curious to know if this held true in general in the case that both tools were available from dataclasses
.
- It’s intended to be more like a construction kit for building dataclass-like tools, but it also includes the ‘prefab’ implementation I built with those tools. ↩︎