[RFC] Revamp interface for enum representations · Issue #3050 · rust-lang/rust-bindgen (original) (raw)
- Feature Name:
enum_style
- Start Date: 2024-12-09
- Issue: [RFC] Revamp interface for enum representations #3050
Summary
Introduce a new, extendable interface to pick multiple enum
representations for the same type. In a nutshell we would introduce a new --enum-style
CLI argument with its equivalent Builder
method which would supersede the already existing --constified-enum-module
, --bitfield-enum
, --newtype-enum
, --newtype-global-enum
, --rustified-enum
and --rustified-non-exhaustive-enum
CLI arguments and their respective Builder
method counterparts.
This interface would allow users to pick multiple enum representations for a single C enum and to enable different features for each representation.
Motivation
The main motivation is the lack of a "silver bullet" representation for C
enums in Rust.
Currently we have two (#2908 and #2980) upcoming enum representations that interact with the already existing representation in non-trivial ways. In particular, #2908 introduces extensions to the already --rustified-enum
representation, these extensions generate safe and unsafe conversions between the C
enum values and the "rustified" enum values to avoid unsoundness issues.
At the same time, the existing interface has become increasingly bloated, as each new representation requires the addition of a new CLI flag and method, even when it's essentially an old representation with just an extra feature. An example of this is, --newtype-enum
and --newtype-global-enum
, where the only difference is the namespacing of the constants for each variant.
Guide-level explanation
Bindgen can map C/C++ enums into Rust in different ways. The way bindgen maps enums can be customized using the Builder::enum_style
method, which receives a sequence of EnumVariation
s and a regex pattern:
impl Builder { /// Apply the provided representations to the C enums whose name matches // the provided regex pattern. pub fn enum_style<I, P>( mut self, representations: I, pattern: P, ) -> Self where I: IntoIterator<Item=EnumRepresentation>, P: AsRef; }
/// This is just EnumVariations
with a new name for clarity.
pub enum EnumRepresentation {
/// Represent a C enum using a Rust enum.
Rust {
/// Indicates whether the Rust enum should be #[non_exhaustive]
.
non_exhaustive: bool,
},
/// Represent a C enum using a newtype over the enum's ctype.
NewType {
/// Indicates whether the newtype will have bitwise operators.
bitfield: bool,
/// Indicates if the variants will be represented as global
/// constants instead of being inside an impl
block of the newtype.
global: bool,
},
/// Represent a C enum using a ctype constant for each variant.
Const {
/// The generated constants will be inside a module with the same name
/// as the enum.
module: bool
},
}
When this method is used, bindgen will generate the provided representation for each C enum whose name matches the provided regex pattern.
This interface has a CLI equivalent under the --enum-style
. Which takes arguments of the form <REPRS>=<REGEX>
. Where <REGEX>
is a regex pattern and <REPRS>
is a comma-separated sequence of enum representations. Each enum representation consists of a name optionally followed by a comma-separated list of features:
rust(non_exhaustive?)
: SeeEnumRepresentation::Rust
.newtype(bitfield?, global?)
: SeeEnumRepresentation::NewType
.const(module?)
: SeeEnumRepresentation::Const
.
Reference-level explanation
This feature would be fairly self contained and its only interaction would be with the already existing enum representation features.
Internally, RegexSet
s for each enum representation would still be stored separatedly. However, a declarative macro would be used to generate both EnumRepresentation
and a constant slice with all the possible values that EnumRepresentation
could have. With the current representations that would be:
const ALL_REPRS: &[EnumRepresentation] = &[ Rust { non_exhaustive: false }, Rust { non_exhaustive: true}, NewType { bitfield: false, global: false }, NewType { bitfield: true, global: false }, NewType { bitfield: false, global: true }, NewType { bitfield: true, global: true }, Const { module: false }, Const { module: true }, ];
This constant would allow us to generate another slice of type &[(EnumRepresentation, RegexSet)]
which would replace all the existing fields of BindgenOptions
related to enum representation as iterating over it would allow us to choose the right representation for each enum.
With this approach adding a new feature to the existing representation or adding a new representation would require less changes and should be easier to maintain.
Drawbacks
The main drawback is the fact that this is a breaking change, as it would deprecate the existing interface for enum representation.
Another drawback is related to allowing multiple representations for a single C enum, as the current behavior of bindgen is to choose one documented option by default. This is, if the user calls bindgen with --rustified-enum Foo
and --constified-enum Foo
, only the Rust representation for Foo
will be chosen by bindgen. With the new interface --enum-style rust,const=Foo
, both representations would be generated. Which is a breaking change and might cause unexpected behavior on users that rely on bindgen choosing one of the two.
Additionally, allowing multiple representations for a single C enum would make bindgen more likely to generate invalid Rust code, for example, calling bindgen with --enum-style rust,newtype=Foo
would produce both a Rust enum and a Newtype for Foo
, which would cause a name collision.
Finally, the heavy reliance on macros makes this code less intuitive for new contributors.
Rationale and alternatives
An alternative would be simply to not implement this RFC, which would keep the enum representation interface prone to bloat and increasingly difficult to maintain.
Unresolved questions
Currently it is not clear if we should prevent the generation of invalid Rust code by adding extra checks which guarantee that incompatible representations won't be used for the same C enum.
Future possibilities
The advantages of this design are the ease to extend it to new representations or new features. Examples of this, are #2908 and #2980, which could be integrated by adding new fields to EnumRepresentation::Rust
and by adding a new variant to EnumRepresentation
respectively.
This interface would be easily representable if bindgen were to adopt a configuration format like TOML, as enum styles could be represented by arrays:
[[enum-style]] pattern = "foo" representations = ["rust", "const"]