Statically typed and named structs (original) (raw)
Are there any libraries that can help define named structs to statically validate code working with struct
? Couldn’t find anything myself, but currently unpacking usually results in tuple[Any, ...]
which is not very helpful.
from struct import Struct
header = Struct(b"7s1s1s3s")
parsed = header.unpack(data)
reveal_type(parsed) # tuple[Any, ...]
The only mention of similar idea I’ve found is on Pyright repo (Unknown type from struct.unpack · Issue #4727 · microsoft/pyright · GitHub), but it’s out of scope for Pyright. Though it’s been awhile - maybe @mikeshardmind have seen any workarounds for this?
Thinking about this, the other related issue is not to have just statically defined struct types but to also have names for each field. So maybe both issues would be resolved by some namedstruct
method that would return provided named tuple on Struct.unpack
(though admittedly struct.unpack
would still require some alternative solution or type checker support).
Though fields types won’t be tied to their struct string definitions, it still will be possible for namedstruct
to ensure they match on runtime.
Thoughts on this?
from struct import Struct
from typing import NamedTuple, Annotated
class NamedStructure[T](Struct): ...
class Header(NamedTuple):
magic: Annotated[bytes, "7s"]
pointer: Annotated[bytes, "2s"]
endian: Annotated[bytes, "2s"]
version: Annotated[bytes, "2s"]
header_struct: NamedStructure[Header] = namedstruct(Header)
test = header_struct.unpack(data)
reveal_type(test) # Header
With this, and other situations where something else is enforcing the type (like a database, or a parser that isn’t dynamically typed), there’s two options IMO.
- Just use the type system seeing providing a gradual type to your advantage:
from struct import Struct
# ensure these are kept in sync
type HeaderType = tuple[bytes, bytes, bytes, bytes]
HEADER_FORMAT = "!7s1s1s3s"
...
parsed: HeaderType = struct.unpack(HEADER_FORMAT, data)
- Use
dataclass_transform
to write a wrapper around this (I don’t have an example on hand for this, though msgspec can pack/unpack typed structs to/from msgpack, so there’s a lead on it)
Would be neat if python typecheckers understood it without this, given these struct format strings often end up as static information, but it’s not part of the type system, so those are the current options.
NeilGirdhar (Neil Girdhar) May 3, 2025, 5:56pm 3
Is this not what protobuf does?
In my opinion? no. That’s not to say protobuf isn’t a good tool, it is, but Protobuf involves codegen, usually as a build step, and has a different audience IMO.
As far as type-safety goes, it’s not any safer than the example I wrote above with annotating the right type, so long as there’s any process/CI test/attention whatsoever to that comment about keeping the type in sync[1]
- That is to say, both are type-safe so long as there is good process, and neither case can a typechecker detect if there isn’t good process. With protobuf, if the stub is wrong (which can happen if forgot to regenerate it since protobuf only emits a stub when told to), or with struct if you change the struct format without changing the type. ↩︎
antonagestam (Anton Agestam) May 3, 2025, 10:23pm 5
There was a nice talk by Stephanie Weirich about a language extension in Haskell I saw a few years back that added support for type system knowledge of regular expressions. Conceptually I think this problem is similar to that, although a lot simpler. There’s a statically known string literal who’s value when used in a certain context affects a function return type.
i think it would be very interesting to see such venues explored for Python typing.