bpo-33129: Add kwarg support for dataclass' generated init method by wanderrful · Pull Request #19206 · python/cpython (original) (raw)

Hello there! Thanks for reviewing my PR.

Context

The @dataclass feature shines brightly in the context of web development because we need to have a consistent expectation from our many APIs about the shape of their response JSONs for things like DTOs. In a microservice or monorepo architecture, consistency with our DTOs is a must.

For example, consider an API endpoint that returns a user object such as:

@dataclass
class GetUserResponse:
  id: int
  first_name: str

This now allows us to interact with an API response object as though it were any other class!

def get_user(id: int) -> GetUserResponse:
  response_body: dict = request('get', f'http://foo.bar/api/user/{id}').json()
  return GetUserResponse(**response_body)

This feature is incredibly useful in the context of front-end and back-end web development because we can know what exactly we're working with when we interact with other services and apps, such as a microservice in your company's Kubernetes cluster or perhaps a third party API that handles user authentication and feature permissions.

Use Case

However, there is currently a limitation to this wonderful feature: what if we are only concerned with some subset of the returned data from that API? Or, alternatively, what if the API we are consuming suddenly includes new members in its JSON object (e.g. last_name)?

TypeError: __init__() got an unexpected keyword argument 'last_name'

If we attempt to run our hypothetical software again, we find that because the @dataclass members do not perfectly align with what was promised... suddenly our program crashes, integration tests fail, acceptance tests fail, the microservice is unable to perform in the production environment, people get paged, et cetera.

At time of writing, one workaround we could do is to leverage the field(init=False) feature, like so:

@dataclass
class GetUserResponse:
  ...
  last_name: str = field(init=False)

However, this does not address the pain point because we still have to go back and account for changes in the response object preemptively every time something changes in how microservices or APIs interact with our software, in order to prevent future runtime crashes! It's not realistic to predict these breaking external changes, even if those changes come from another team within the same company.

Wouldn't it be great if a @dataclass could just work with only the information we want, without having to specify every single object member in our response body?

Implementation

This pain point can be healed with just a one line change in dataclasses.py by adding **kwargs to allow for unaccounted data members!

After incorporating this one-line change, we can now use @dataclass in a much more robust fashion. In the above example, we are now able to create a GetUserResponse object that does not save the last_name member. We can still include it in our class specification, but our program will no longer crash at runtime just because of the sudden change in our signature!

Alternatives

If writing this directly into the generated __init__ method is a blocker, we could instead add a layer of indirection with a new @dataclass signature flag (e.g. @dataclass(strict=False)) that will only add **kwargs to the generated __init__ method if the flag is set appropriately. I'll be happy to accomodate that if you guys want to go in that direction instead.

Well, that's my PR. Thanks for reading!

https://bugs.python.org/issue33129