GitHub - njsmith/machomachomangler: Tools for mangling Mach-O and PE binaries (original) (raw)

Mach-O Mach-O Mangler

Automated test status (Travis) Automated test status (Appveyor) Test coverage

This is a little library for mangling Mach-O and PE files in various ways. These are the formats used for executables and shared libraries on MacOS and Windows, respectively. (If you want the equivalent for for Linux, then check out patchelf.)

Macho-O features

Some rather specialized (and complex) Mach-O mangling tools designed to support the pynativelib proposalto allow native libraries to be distributed as standalone wheel files. Specifically this includes:

It turns out that this exact combination of things is the only way provided for by the MacOS linker/loader to have dylib/bundle A linked against dylib B where the relative on-disk location of A and B is not known until after the executable starts, while preserving the usual two-level namespace rules for avoiding symbol collisions. I promise it will all make sense once I have a chance to write it up properly...

Some known limitations of the Mach-O mangling code:

PE features

A tool that can read in a PE file (.exe or .dll) that is currently linked to foo.dll, and rewrite it so that it becomes linked to bar.dll instead (similar to patchelf --replace on Linux, or install_name_tool -change on OS X). This is useful for avoiding naming collisions between different versions of the same library.

For example, suppose you have two Python extensions A.dll andB.dll, that are distributed separately by different people. They both contain some fortran code linked to to libgfortran-3.dll, so both packages ship a copy of libgfortran-3.dll. Because of the way Windows DLL loading works, what will happen is that if I loadA.dll first, then both A.dll and B.dll will end up using A's copy of libgfortran-3.dll, while B's copy will be ignored. (Or vice-versa if I import B first.) This will happen even if I arrange things so that A's copy is not on the DLL search path at the time that B is loaded -- Windows always checks for already-loaded DLL's with a given basename before it actually checks the DLL search path (modulo some complications around SxS assemblies, but you don't really want to go there).

This is bad, because there's no guarantee that B.dll will work with A's version of libgfortran-3.dll (e.g., A's copy might be too old for B). Welcome to DLL hell!

We could avoid all this by renaming the colliding libraries to have different names, e.g. libgfortran-3-for-A.dll andlibgfortran-3-for-B.dll. But if we just rename the files, then everything will break, because A.dll is looking forlibgfortran-3.dll, not libgfortran-3-for-A.dll.

This is where machomachomangler comes in: it lets you patchA.dll so that it's linked to libgfortran-3-for-A.dll. And then everything works. Hooray.

This basically solves the same problem as private SxS assemblies, except better in all ways: it's simpler (no XML manifests), more flexible (no finicky requirements for the filesystem layout), and doesn't require reading the awful SxS assembly documentation.

Example usage:

$ python3 -m machomachomangler.cmd.redll A.dll A-patched.dll libgfortran-3.dll libgfortan-3-for-A.dll

There's an example in example/ then you can play with. E.g. on Debian with a mingw-w64 cross-compiler and wine installed:

$ cd pe-example/

$ ./build.sh

$ wine test.exe dll_function says: test_dll

$ mv test_dll.dll test_dll_renamed.dll

Apparently wine's way of signalling a missing DLL is to fail silently.

$ wine test.exe || echo "failed -- test_dll.dll is missing" failed -- test_dll.dll is missing

$ PYTHONPATH=.. python3 -m machomachomangler.cmd.redll test.exe test-patched.exe test_dll.dll test_dll_renamed.dll

Now it works again:

$ wine test-patched.exe dll_function says: test_dll

Some known limitations of the PE dll-import-switcheroo code:

General limitations

Only tested on Python 3.4 and 3.5. Probably any Python 3 will work, and Python 2 definitely won't without some fixes. (There's lots of fiddly byte-string handling.)

I'm lazy, so I just load the whole binary files into memory -- maybe several copies of it. This actually wouldn't be too hard to fix (using memory mapping etc.) but I guess it doesn't matter that much becausewho has multi-gigabyte Mach-O/PE images??

Contact

wheel-builders@python.org

License

It's Saturday afternoon, I've got the flu or something, and I'm spending my free time writing software to make some proprietary operating systems -- ones that are backed by one of the world's larger corporations -- better able to compete for developers with other, better-designed operating systems. I mean, I'm not saying that poring over the PE/COFF specification isn't fun! But it's not _that_fun. (And honestly the Mach-O docs are absolutely terrible, to the extent they exist at all.)

To assuage my annoyance, this software is licensed under the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License or (at your option) any later version. See LICENSE.txt for details.

This shouldn't have any effect on most uses, since it only affects people who are redistributing this software or running it on behalf of other people; you can use this software to manipulate your BSD-licensed DLLs, your proprietary-licensed DLLs, or whatever you like, and that's fine. The license affects the code for machomachomangler itself; not the code you run it on.

However, if for some reason you or your company have some kind of allergy to this license, send me an email and we'll work out an appropriate tithe.

Also, to preserve our options in case I get over this fit of pique, please license all contributions under the MIT license. (I definitely will not switch to any proprietary license, but might switch to a permissive OSS license.) Thanks!

Code of conduct

Contributors are requested to follow our code of conductin all project spaces.