Set external linkage on Rust functions defined with foreign ABIs. by miselin · Pull Request #9945 · rust-lang/rust (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation42 Commits0 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This fixes an issue where extern "C"
functions would have internal linkage, making them invisible to anything that might be linked against the resulting object. In my use case, a Rust function is being called from assembly, and because the Rust symbol did not have the right linkage, I ran into linker errors.
This has also now been applied to bare Rust functions as well as statics that are publicly visible; that is, those which have a fully public path.
This could currently be happening for one of two reasons:
- You're building a binary. All functions are flagged as internal in this case (or at least rust functions, not sure if all functions are caught).
- You're building a library and this function isn't exported in the sense that another crate can use it.
I'm hesitant to land this without any tests. Linkage is something which is becoming quite tricky for us with lots of various cases here and there. I would expose it under a symbol with a particular name and then have that in an extern
block in another test/file to make sure that it links right.
For the test, also be sure that it fails on current master and then passes afterwards. No use in committing tests that always work!
I think that this location for setting the linkage is fine for now. I think in general this needs to be cleaned up (because linkage is set in so many different places), but that's another problem for another time.
Another use case is in building a library that needs to use symbols in the main binary it is linked against. For example, a plugin opened with dlopen might need to find symbols in the main binary. (edit: I just read this again and realised it's basically a roundabout way of saying your first reason up there...)
I have a test that shows the broken behaviour on untouched master and I am just confirming now that this changeset does in fact solve the problem.
In the longer term ideally there'd be a function attribute for this purpose that can define the linkage of the function, such as #[linkage(weak)]
, #[linkage(global)]
, and so on.
I'll bump the next commit through once I'm happy with the test.
@pcmattman: Are you compiling the Rust code with --lib
? If you're linking in extra code, you need to be, even if it defines main
.
@@ -376,6 +376,9 @@ pub fn register_rust_fn_with_foreign_abi(ccx: @mut CrateContext, |
---|
node_id, |
lib::llvm::CCallConv, |
llfn_ty); |
unsafe { |
llvm::LLVMSetLinkage(llfn, lib::llvm::ExternalLinkage as u32); |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this, and let's actually move this to base::get_item_val
and you don't have to use the raw LLVM function (there's a helper lib::SetLinkage
I think)
We currently have linkage sprawling throughout, and good to try to keep it in one place!
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - much nicer in base::get_item_val
. Using the helper now as well.
PR updated and squashed into one commit as well.
eddyb mentioned this pull request
After thinking about this, I'm not convinced that this is the correct approach to take. I think that you should be able to declare internal extern "C"
functions, which this does not allow. On the other hand, I think that our visibility system is an excellent method of exporting functions from a library (or keeping them hidden).
I think that the real bug in this situation is that when you're compiling an executable all functions are flagged as internal except for main
. I think that in this case normally exported extern
functions are incorrectly being flagged as internal.
@alexcrichton: clang
has the same behaviour when compiling with -flto
, extern
is about the ABI not anything to do with visibility/linkage
Agree we should not be setting linkage based on the extern
keyword.
Sure, no worries, I'm happy to consider alternative approaches. In the meantime I can solve the problem in my own project with objcopy --globalize-symbol
, so there's no rush :-).
With respect to the visibility system, with this change applied linkage is overridden to internal (implicitly, eek!) if the function is not visible. That is, if a foreign ABI function is not visible to the "parent" of the crate root, it will be emitted with internal linkage.
An example (bin
crate):
main.rs
pub mod foreign1;
pub mod foreign2;
mod foreign3;
foreign1.rs
extern "C" foreignabi1() -> int {
5
}
foreign2.rs
pub extern "C" foreignabi2() -> int {
5
}
foreign3.rs
pub extern "C" foreignabi3() -> int {
5
}
Only foreign2::foreignabi2
will be linked with external linkage. foreign1::foreignabi1
will be linked with internal linkage as it is not pub
, even though it is extern "C"
. foreign3::foreignabi3
will be linked with internal linkage as it is not visible to main
's parent, even though the function is defined pub extern "C"
.
However, as mentioned this internal linkage override is implicit - this patch does not actually explicitly check function visibility before setting external linkage. It would probably be best to make that behaviour explicit for foreign ABI functions, if it ends up being desirable.
For now I think that our visibility system is sufficient enough to encode what should be exported and what should be kept internal. I thought that the code example you had compiles today, but if it doesn't then that's definitely a bug!
Apologies, my previous comment was not incredibly clear. To clarify what I was trying to describe slightly (also noting that some of the syntax in my previous comment is not correct - more psuedocode-y), without this pull request applied, nm
run on the output of rustc -o main main.rs
gives (on OSX):
0000000100000c00 t _foreignabi1
0000000100000c80 t _foreignabi2
0000000100000d00 t _foreignabi3
0000000100000dc0 T _main
The #[no_mangle]
attribute was applied to this example before building. Note the lowercase t
indicating private.
Now, with this PR applied, nm
outputs:
0000000100000c00 t _foreignabi1
0000000100000c80 T _foreignabi2
0000000100000d00 t _foreignabi3
0000000100000dc0 T _main
So everything still compiles with or without - no bug there :-) - just an issue with symbol linkage.
The extern
marker isn't associated with anything special in terms of linkage. If you want the functions to not be marked private, you can use rustc --lib -c
and then link together an executable with the other parts of the crate. Rust is either building a library crate (many exposed symbols) or a executable crate (just an entry point exposed) - all of the options are still available through that.
If it's private when building a library but the path is fully public, that's a bug we should fix (without making it exposed from an executable).
Good point. No, this is not with building a library, so all behaviour is as-expected and there is not a bug.
I presume foreign ABI functions in a bin crate would be expected to be used as parameters to functions in C libraries as the like, rather than called by plugins or other external entities?
@pcmattman: Yeah, it's just about the ABI. It's confusing that we use the same keyword for declaring blocks of external functions.
We desperately need to offer other forms of building crates (static linking, whole program optimization), but I think exposing a symbol from an executable is always going to be an edge case where you probably end up using external compiler/linker calls anyway to link in whatever is going to call that function.
There is a case where building a binary requires externalizing a symbol, without compiling with -c and linking the object file against some other object file importing Rust functions.
That is, if you don't link against a library providing the mem* functions (memcpy, memmove, memset, etc.), LLVM can still generate calls to those functions.
You can implement them in Rust, but if the symbol is not external, it will either get removed or it won't be recognized by LLVM as the target of those calls.
This is also the only case that can't be solved by moving everything to Rust (to remove the need for exporting Rust symbols to C or assembly).
I don't really understand why it would have to be external for that case. Why wouldn't the calls to memcpy
be able to find an internal definition of it? As a workaround, they could be defined in a crate once we have static linking support.
The current flagging of all functions as internal in a binary could be replaced by reachability-based visibility if it took into account the differences between executables and libraries. In a library, it would have to treat any calls from #[inline]
functions as exposing those functions but not in an executable.
I also don't understand - I'm a bit confused now. It shouldn't be necessary to externalise a symbol such as memcpy
that is used only within the crate as the linker should still be able to use internal symbols at that stage. The root issue that led me to create this PR was making foreign ABI functions available externally in a bin crate - symbol references inside a crate shouldn't be a problem.
Is there a code example that shows a binary failing to build because another symbol inside the crate does not have external linkage?
Also, FWIW objcopy --globalize-symbol
doesn't work to make a symbol external after compiling an object file if it's been removed as 'dead, uncalled, unreferenced' code, because the symbol doesn't exist.
- External linkage is enough to label the code as 'possibly called/referenced', but only makes sense on functions where it is the desired outcome for them to be exported and visible outside the
bin
crate. - clang/gcc offer
__attribute__((used))
for this purpose, which does not affect linkage. This emits a hidden reference (with no side-effects) to the symbol, allowing a function/variable to be maintained and emitted even if it doesn't seem to be referenced, without affecting its visibility.
Ahh, the wonderful rabbit hole of linkages, visibility, and reachability. :-)
I don't think #[used]
is the right solution here either. Fully implemented reachability-based visibility with knowledge of the difference between executables and libraries for dealing with #[inline]
seems best.
mem* wouldn't be reachable without a hack, like what clang does for __attribute__((used))
, because they would be used solely by intrinsic codegen, which comes after the function can get removed.
A similar thing happens when calling a function from inline assembly (asm!("call foobar");
on x86), which is the main use case mentioned in the GCC documentation for __attribute__((used))
.
EDIT: Also see this test from my last PR, it should also work with #[used]
instead of #[linkage(external)]
.
I'm suggesting making fully public paths in an executable external again.
I thought that this was going to make fully-public extern
items external in an executable build? You can't technically interface with a bare fn
that's not extern
because the ABI isn't necessarily stable (I think?)
It's making non-fully-private items external too right now. Until the reachability check actually works and takes into account that #[inline]
in an executable isn't exposed, it takes away the ability to keep symbols internal.
I really don't understand why visibility would be tied to something exposing a C ABI rather than a Rust ABI. All normal crates are exposing Rust ABI functions.
There's nothing stopping interfacing with a Rust ABI function, we just don't guarantee that the ABI remains the same between Rust versions. A library exposing a stable ABI has to go out of the way to make sure none of the types is the signature is ever changed, such as never adding a private field to one.
@thestinger what do you mean by non-fully-private in this context?
As in this patch results in functions with paths that are not fully public being exposed as external symbols. It removes the ability to expose only the entry point.
@thestinger is this in reference to the example in my past comment (at #9945 (comment)), or are there additional cases I haven't considered where functions that are not fully public are exposed by this patch?
mod foo { pub extern "C" fn bar() {} }
pub extern "C" fn baz() { foo::bar() }
fn main() {}
Yup, you're right - after applying #[no_mangle]
to make reading the output from nm
easier:
0000000100000cd0 T _bar
0000000100000d50 T _baz
0000000100000e10 T _main
bar
now has external linkage and it shouldn't.
An example of possible desired output would be:
0000000100000cd0 t _bar
0000000100000d50 T _baz
0000000100000e10 T _main
(as bar
is not fully public)
Okay, I broke the PR with my last push (... figuring that out now... - update: fixed!), but I now have the following output from nm
:
0000000000400840 t bar
00000000004008c0 T baz
0000000000400980 T main
(different addresses as this was done on a different host)
The latest commit adds a compile-fail
test and updates the run-pass
test to handle the addition of bare functions and statics to the set of things that get external linkage when they are publicly visible.
Fixed broken test - looks like my testing of the tests used a stale file? Full testsuite (make check
) is confirmed to pass locally now.
Tests failed as OSX doesn't successfully link libraries unless it can resolve all symbols at library creation time (or it has been passed -undefined dynamic_lookup
or -U <symbolname>
).
The point of failure is in building the auxiliary library rather than actually executing the test proper.
I have removed the compile-fail
test as it seems to only be reliable on Linux (on OSX it breaks in building the auxiliary library rather than in trying to link against it).
It could be a run-fail
test, doing a proper fail!()
in a signal handler for the SIGTRAP the dynamic linker gives? It does this on OSX, not sure what ld.so
does on Linux when this happens.
The run-pass
test remains, and will fail if linkage breaks (in the case of OSX, it won't fail if linkage breaks until #10062 is resolved).
Most recent failure due to an undefined reference to __rust_crate_map_toplevel12743
when building stdtest
on OSX.
@alexcrichton you mentioned yesterday in IRC this happens when crate_map()
is called twice - is that still correct and relevant here, or is the undefined reference an indicator of an issue with this PR applied?
Once by changes land for extracting libuv to another crate, this should be ready for another go (although it will probably need a rebase).
This is now freshly rebased to latest master.
OK after thinking about this, I don't think that this is the right way to do things, and I don't think that the right way to do things will involve this sort of management regardless. I don't think that this should be merged now because I believe that this is going in the opposite direction from where it should.
The results of privacy should not be at all used inside of codegen. The privacy of a function is a difficult idea and with this change you would be forced to decide whether privacy or reachability is your visibility constraint. I still believe that we should have a unified "set the linkage of this LLVM value" function, but that may be very far off. For now, I believe that all of these changes should be moved inside of the reachability phase of the compiler instead of the codegen phase. Additionally, I think the check to see if the compiler is building a library and if so mark everything as internal should be moved to reachability as well. The best way to begin unifying how linkage visibility is determined is to begin using one source of information for visibility, and that source should be reachability.
So to merge this, I would rather see code removed from trans and moved to reachability (with appropriate checks for building a library or not) and then trans should "do the right thing".
Additionally, I'd like to see some more comprehensive tests no matter what comes out of this. I don't think that the way you've set up the tests right now is the best way to do it. It involves weird linker hacks on OSX, won't work at all on windows, and only works by luck on linux. Instead, I would be much more comfortable using the unstable::dynamic_lib
library for determining whether a symbol was exposed or not. This should return null for all internalized symbols and non-null for any symbols which made it through as external.
Does that sound OK with you? I can open an issue for this as well if you don't have time to get around to this. I believe though that this is more future-proof along with moving the code in the compiler to where it should be anyway.
Sounds OK to me - now's the time to get it right, rather than try and fix it later.
I should be able to get around to this, assuming there's no significant urgency (~1-2 weeks worst-case, maybe?). Happy to release to somebody else though, if time is of the essence.
You can certainly claim it! We're certainly in no rush for this, and if you need any help feel free to ping me!
Hm, so github thinks that this pull request has an infinite number of commits. It sounds like this may need to get reopened. (not sure why it's closed in the first place, but something looks awry).
Oooer that does not look right. I must've accidentally pushed origin/master into the remote. I'll have a look at figuring out why this is broken when I pick it back up again.
Should never have done the work on master in the first place :-)
flip1995 pushed a commit to flip1995/rust that referenced this pull request
…ogiq
Re-enable uninlined_format_args
on multiline format!
fix rust-lang/rust-clippy#9719
There was an issue with the code suggestion which can be sometimes completely broken (fortunately when applied it's valid), so we do not show it.
changelog: [uninlined_format_args
] re-enable for multiline format expression, but do not show the literal code suggestion in those cases