How to handle static inline functions · rust-lang/rust-bindgen · Discussion #2405 (original) (raw)

Before v0.64.0 was released, the only way to handle static inline functions on bindgen was using the --generate-inline-functions option which generated rust bindings for these functions. However, that meant that the input C library should still expose those function symbols somehow, most likely by compiling the library without inlining enabled which could be a serious performance issue.

With the new bindgen version there is another alternative, the --wrap-static-fns-* flags which generate external wrapper functions for these static inline functions, requiring the user to only compile these generated wrappers against the headers file being used as an input. For me, the clearest way to explain how this works is by doing an example.

Let's say we have the following input.h header file:

static inline int inc(int x) { return x + 1; }

static int dec(int x) { return x - 1; }

If we passed this file to bindgen without any flags we would get an empty output:

$ bindgen input.h /* automatically generated by rust-bindgen 0.64.0 */

However, if we pass the --wrap-static-fns flag we get the following:

$ bindgen --experimental --wrap-static-fns input.h /* automatically generated by rust-bindgen 0.64.0 */

extern "C" { #[link_name = "inc__extern"] pub fn inc(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int; } extern "C" { #[link_name = "dec__extern"] pub fn dec(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int; }

We need to pass this --experimental flag because this feature is not complete and prone to change. However, the good news is that now we got rust bindings for both inc and dec. Additionally a new c source file should be created under the bindgen directory inside your temporal folder (/tmp/bindgen/ if you're on unix-like systems):

$ cat /tmp/bindgen/extern.c #include "input.h"

// Static Wrappers

int inc__extern(int x) { return inc(x); } int dec__extern(int x) { return dec(x); }

These __extern functions are wrappers for the static functions we defined in our input. Now the only thing we need to do is to compile this new extern.c file into a library and include input.h:

$ clang -O -c -o extern.o /tmp/bindgen/extern.c -include input.h $ objdump -d extern.o

extern.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 : 0: 8d 47 01 lea 0x1(%rdi),%eax 3: c3 ret 4: 66 66 66 2e 0f 1f 84 data16 data16 cs nopw 0x0(%rax,%rax,1) b: 00 00 00 00 00

0000000000000010 : 10: 8d 47 ff lea -0x1(%rdi),%eax 13: c3 ret

As we can see, the extern.o object file includes two symbols: inc__extern and dec__extern. These symbols are the ones that will replace inc and dec in our Rust bindings, and that's why both function declarations in the bindings have the #[link_name] attribute overriding the linking name.

We could take different approaches from here, one of them would be turning this object file into a static library:

$ ar rcs libextern.a extern.o

or if you're on windows:

$ LIB extern.o /OUT:extern.lib

And now we can link our bindings against this libextern static library with rust. This same procedure could be done in a build script:

use bindgen::{Builder, CargoCallbacks};
use std::path::PathBuf;

fn main() { let input = "input.h";

// Tell bindgen to generate wrappers for static functions
let bindings = Builder::default()
    .header(input)
    .parse_callbacks(Box::new(CargoCallbacks))
    .wrap_static_fns(true)
    .generate()
    .unwrap();

let output_path = PathBuf::from(std::env::var("OUT_DIR").unwrap());
// This is the path to the object file.
let obj_path = output_path.join("extern.o");
// This is the path to the static library file.
let lib_path = output_path.join("libextern.a");

// Compile the generated wrappers into an object file.
let clang_output = std::process::Command::new("clang")
    .arg("-O")
    .arg("-c")
    .arg("-o")
    .arg(&obj_path)
    .arg(std::env::temp_dir().unwrap().join("bindgen").join("extern.c"))
    .arg("-include")
    .arg(input)
    .output()
    .unwrap();
    
if !clang_output.status.success() {
    panic!(
        "Could not compile object file:\n{}",
        String::from_utf8_lossy(&clang_output.stderr)
    );
}

// Turn the object file into a static library
#[cfg(not(target_os = "windows"))]
let lib_output = Command::new("ar")
    .arg("rcs")
    .arg(out_dir_path.join("libextern.a"))
    .arg(obj_path)
    .output()
    .unwrap();
#[cfg(target_os = "windows")]
let lib_output = Command::new("LIB")
    .arg(obj_path)
    .arg(format!("/OUT:{}", out_dir_path.join("libextern.lib").display())
    .output()
    .unwrap();
if !lib_output.status.success() {
    panic!(
        "Could not emit library file:\n{}",
        String::from_utf8_lossy(&ar_output.stderr)
    );
}

// Tell cargo to statically link against the `libextern` static library.
println!("cargo:rustc-link-lib=static=extern");

// Write the rust bindings.
bindings
    .write_to_file(output_path.join("bindings.rs"))
    .expect("Cound not write bindings to the Rust file");

}

In either case, you should be able to call inc and dec from rust without issue now!

Using LTO optimizations

If you made it up to this point you might have noticed that using the wrappers for static function is going to be less performant just because those functions are not being inlined by the Rust compiler. To illustrate this. We will edit the src/lib.rs file so it has the following contents:

mod bindings { #![allow(non_upper_case_globals)] #![allow(non_camel_case_types)] #![allow(non_snake_case)]

include!(concat!(env!("OUT_DIR"), "/bindings.rs"));

}

#[inline(never)] #[no_mangle] pub fn increase(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int { unsafe { bindings::inc(x) } }

#[inline(never)] #[no_mangle] pub fn decrease(x: ::std::os::raw::c_int) -> ::std::os::raw::c_int { unsafe { bindings::dec(x) } }

and we will add a src/main.rs file with the following contents:

fn main() { assert_eq!(1, playground_bindgen::increase(0)); assert_eq!(0, playground_bindgen::decrease(1)); }

where playground_bindgen is the name of our crate.

If we compile this crate using cargo build --release and then disassemble the resulting binary using objdump we will find this

0000000000008550 : 8550: ff 25 ba 45 04 00 jmp *0x445ba(%rip) # 4cb10 <_GLOBAL_OFFSET_TABLE_+0x1d8> 8556: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 855d: 00 00 00

0000000000008560 : 8560: ff 25 8a 44 04 00 jmp *0x4448a(%rip) # 4c9f0 <_GLOBAL_OFFSET_TABLE_+0xb8> 8566: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 856d: 00 00 00

Basically increase and decrease are just jumping to someplace else instead of doing lea as inc__extern and dec__extern do.

In order to solve this, we can enable LTO optimizations for our crate. First we need to change the clang invocation so it uses "thin" LTO:

let clang_output = std::process::Command::new("clang") .arg("-flto=thin") .arg("-O") .arg("-c") .arg("-o") .arg(&obj_path) .arg(std::env::temp_dir().unwrap().join("bindgen").join("extern.c")) .arg("-include") .arg(input) .output() .unwrap();

We must also change the ar invocation (if someone knows the windows equivalent of this, please let me know):

let lib_output = Command::new("ar")
    .arg("crus")
    .arg(out_dir_path.join("libextern.a"))
    .arg(obj_path)
    .output()
    .unwrap();

Then we must change the Cargo.toml manifest to enable "thin" LTO from the rust side by adding the following:

[profile.release] lto = "thin"

Finally we can compile our project with the following RUSTFLAGS:

$ env RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld" cargo build --release

Now if we check the generated machine code using objdump we will find this

000000000004ac10 : 4ac10: 8d 47 01 lea 0x1(%rdi),%eax 4ac13: c3 ret 4ac14: cc int3 4ac15: cc int3 4ac16: cc int3 4ac17: cc int3 4ac18: cc int3 4ac19: cc int3 4ac1a: cc int3 4ac1b: cc int3 4ac1c: cc int3 4ac1d: cc int3 4ac1e: cc int3 4ac1f: cc int3

000000000004ac20 : 4ac20: 8d 47 ff lea -0x1(%rdi),%eax 4ac23: c3 ret

Where increase and decrease just do lea and then return!

Customizing the wrappers

There are additional flags/methods to customize the behavior of this feature:

Where's the catch?

The weakest point of this feature is the C/C++ code generation. As of today, we can only generate a subset of C code and we know that this subset is good enough to compile some real-life libraries. However, C++ support is lacking (PRs are welcome!).

If you have any issues with this feature you can open a new issue or discussion and tag me.

Thanks to @JMS55 for the windows instructions and to @DemiMarie for the LTO suggestion!