How to handle static inline
functions · rust-lang/rust-bindgen · Discussion #2405 (original) (raw)
Before v0.64.0
was released, the only way to handle static inline
functions on bindgen
was using the --generate-inline-functions
option which generated rust bindings for these functions. However, that meant that the input C library should still expose those function symbols somehow, most likely by compiling the library without inlining enabled which could be a serious performance issue.
With the new bindgen
version there is another alternative, the --wrap-static-fns-*
flags which generate external wrapper functions for these static inline
functions, requiring the user to only compile these generated wrappers against the headers file being used as an input. For me, the clearest way to explain how this works is by doing an example.
Let's say we have the following input.h
header file:
static inline int inc(int x) { return x + 1; }
static int dec(int x) { return x - 1; }
If we passed this file to bindgen without any flags we would get an empty output:
$ bindgen input.h /* automatically generated by rust-bindgen 0.64.0 */
However, if we pass the --wrap-static-fns
flag we get the following:
$ bindgen --experimental --wrap-static-fns input.h /* automatically generated by rust-bindgen 0.64.0 */
extern "C" { #[link_name = "inc__extern"] pub fn inc(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int; } extern "C" { #[link_name = "dec__extern"] pub fn dec(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int; }
We need to pass this --experimental
flag because this feature is not complete and prone to change. However, the good news is that now we got rust bindings for both inc
and dec
. Additionally a new c source file should be created under the bindgen
directory inside your temporal folder (/tmp/bindgen/
if you're on unix-like systems):
$ cat /tmp/bindgen/extern.c #include "input.h"
// Static Wrappers
int inc__extern(int x) { return inc(x); } int dec__extern(int x) { return dec(x); }
These __extern
functions are wrappers for the static functions we defined in our input. Now the only thing we need to do is to compile this new extern.c
file into a library and include input.h
:
$ clang -O -c -o extern.o /tmp/bindgen/extern.c -include input.h $ objdump -d extern.o
extern.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 : 0: 8d 47 01 lea 0x1(%rdi),%eax 3: c3 ret 4: 66 66 66 2e 0f 1f 84 data16 data16 cs nopw 0x0(%rax,%rax,1) b: 00 00 00 00 00
0000000000000010 : 10: 8d 47 ff lea -0x1(%rdi),%eax 13: c3 ret
As we can see, the extern.o
object file includes two symbols: inc__extern
and dec__extern
. These symbols are the ones that will replace inc
and dec
in our Rust bindings, and that's why both function declarations in the bindings have the #[link_name]
attribute overriding the linking name.
We could take different approaches from here, one of them would be turning this object file into a static library:
$ ar rcs libextern.a extern.o
or if you're on windows:
$ LIB extern.o /OUT:extern.lib
And now we can link our bindings against this libextern
static library with rust. This same procedure could be done in a build script:
use bindgen::{Builder, CargoCallbacks};
use std::path::PathBuf;
fn main() { let input = "input.h";
// Tell bindgen to generate wrappers for static functions
let bindings = Builder::default()
.header(input)
.parse_callbacks(Box::new(CargoCallbacks))
.wrap_static_fns(true)
.generate()
.unwrap();
let output_path = PathBuf::from(std::env::var("OUT_DIR").unwrap());
// This is the path to the object file.
let obj_path = output_path.join("extern.o");
// This is the path to the static library file.
let lib_path = output_path.join("libextern.a");
// Compile the generated wrappers into an object file.
let clang_output = std::process::Command::new("clang")
.arg("-O")
.arg("-c")
.arg("-o")
.arg(&obj_path)
.arg(std::env::temp_dir().unwrap().join("bindgen").join("extern.c"))
.arg("-include")
.arg(input)
.output()
.unwrap();
if !clang_output.status.success() {
panic!(
"Could not compile object file:\n{}",
String::from_utf8_lossy(&clang_output.stderr)
);
}
// Turn the object file into a static library
#[cfg(not(target_os = "windows"))]
let lib_output = Command::new("ar")
.arg("rcs")
.arg(out_dir_path.join("libextern.a"))
.arg(obj_path)
.output()
.unwrap();
#[cfg(target_os = "windows")]
let lib_output = Command::new("LIB")
.arg(obj_path)
.arg(format!("/OUT:{}", out_dir_path.join("libextern.lib").display())
.output()
.unwrap();
if !lib_output.status.success() {
panic!(
"Could not emit library file:\n{}",
String::from_utf8_lossy(&ar_output.stderr)
);
}
// Tell cargo to statically link against the `libextern` static library.
println!("cargo:rustc-link-lib=static=extern");
// Write the rust bindings.
bindings
.write_to_file(output_path.join("bindings.rs"))
.expect("Cound not write bindings to the Rust file");
}
In either case, you should be able to call inc
and dec
from rust without issue now!
Using LTO optimizations
If you made it up to this point you might have noticed that using the wrappers for static
function is going to be less performant just because those functions are not being inlined by the Rust compiler. To illustrate this. We will edit the src/lib.rs
file so it has the following contents:
mod bindings { #![allow(non_upper_case_globals)] #![allow(non_camel_case_types)] #![allow(non_snake_case)]
include!(concat!(env!("OUT_DIR"), "/bindings.rs"));
}
#[inline(never)] #[no_mangle] pub fn increase(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int { unsafe { bindings::inc(x) } }
#[inline(never)] #[no_mangle] pub fn decrease(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int { unsafe { bindings::dec(x) } }
and we will add a src/main.rs
file with the following contents:
fn main() { assert_eq!(1, playground_bindgen::increase(0)); assert_eq!(0, playground_bindgen::decrease(1)); }
where playground_bindgen
is the name of our crate.
If we compile this crate using cargo build --release
and then disassemble the resulting binary using objdump
we will find this
0000000000008550 : 8550: ff 25 ba 45 04 00 jmp *0x445ba(%rip) # 4cb10 <_GLOBAL_OFFSET_TABLE_+0x1d8> 8556: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 855d: 00 00 00
0000000000008560 : 8560: ff 25 8a 44 04 00 jmp *0x4448a(%rip) # 4c9f0 <_GLOBAL_OFFSET_TABLE_+0xb8> 8566: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 856d: 00 00 00
Basically increase
and decrease
are just jumping to someplace else instead of doing lea
as inc__extern
and dec__extern
do.
In order to solve this, we can enable LTO optimizations for our crate. First we need to change the clang
invocation so it uses "thin" LTO:
let clang_output = std::process::Command::new("clang") .arg("-flto=thin") .arg("-O") .arg("-c") .arg("-o") .arg(&obj_path) .arg(std::env::temp_dir().unwrap().join("bindgen").join("extern.c")) .arg("-include") .arg(input) .output() .unwrap();
We must also change the ar
invocation (if someone knows the windows equivalent of this, please let me know):
let lib_output = Command::new("ar")
.arg("crus")
.arg(out_dir_path.join("libextern.a"))
.arg(obj_path)
.output()
.unwrap();
Then we must change the Cargo.toml
manifest to enable "thin" LTO from the rust side by adding the following:
[profile.release] lto = "thin"
Finally we can compile our project with the following RUSTFLAGS
:
$ env RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld" cargo build --release
Now if we check the generated machine code using objdump
we will find this
000000000004ac10 : 4ac10: 8d 47 01 lea 0x1(%rdi),%eax 4ac13: c3 ret 4ac14: cc int3 4ac15: cc int3 4ac16: cc int3 4ac17: cc int3 4ac18: cc int3 4ac19: cc int3 4ac1a: cc int3 4ac1b: cc int3 4ac1c: cc int3 4ac1d: cc int3 4ac1e: cc int3 4ac1f: cc int3
000000000004ac20 : 4ac20: 8d 47 ff lea -0x1(%rdi),%eax 4ac23: c3 ret
Where increase
and decrease
just do lea
and then return!
Customizing the wrappers
There are additional flags/methods to customize the behavior of this feature:
- You can change the path of the wrapper functions file by using the
--wrap-static-fns-path
. You should not try to set the extension of this file as bindgen will infer automatically if this should be a C or C++ source code file. - You can change the
__extern
suffix used for wrapper functions by using the--wrap-static-fns-suffix
. This is useful if for some reason there are name collisions with the default suffix.
Where's the catch?
The weakest point of this feature is the C/C++ code generation. As of today, we can only generate a subset of C code and we know that this subset is good enough to compile some real-life libraries. However, C++ support is lacking (PRs are welcome!).
If you have any issues with this feature you can open a new issue or discussion and tag me.
Thanks to @JMS55 for the windows instructions and to @DemiMarie for the LTO suggestion!