bpo-29505: Add fuzz tests for float(str), int(str), unicode(str) by ssbr · Pull Request #2878 · python/cpython (original) (raw)

This would be literally the first C++ code shipped by CPython that ran inside the interpreter.

The only reason it even "ships" is because I want to run it in tests. Is there a way to include it in test builds but exclude it from the release, or does it not make much difference?

I understand why you use C++ here

To be clear: I actually don't understand myself why I use C++ here. Isn't there a way to run the compiler on a C file and link it with something else that uses C++? My experience with running the C/C++ compilers is minimal, and I am pretty sure we can use C here, somehow.

I definitely don't want this PR to come across as "we must use C++", but rather -- I did it in C++ to get it working so I could mail something out, but I'd really appreciate learning how to do it in C instead. (Most of my time in this PR was actually spent getting it building with oss-fuzz's environment, sadly.)

See commit cb9cdc0 for more details on what fails to build and why.

Also, a quick perusal of the documentation for libfuzzer suggests a strong dependency on Clang. CPython supports at least two other compilers--gcc and Microsoft Visual C++--and supports at least one major platform where Clang is not a supported compiler (Windows). Can you confirm that the build process and test suite will degrade gracefully when compiling with gcc or MSVC++?

I can do one better: nothing about this actually depends on libFuzzer, so this should compile and run on every platform (that has C++ support).

Of course, adding a dependency on C++ support to pass tests is, per above, probably not acceptable. My hope is we can get it working in C, otherwise I will make the test fail gracefully in the absence of a working C++ compiler.

Some background might help: tl;dr is that to support fuzzing / libFuzzer, all you need to do is export a specific function, LLVMFuzzerTestOneInput. libFuzzer actually provides a main() which calls that function for fuzz tests. We aren't doing that here though, we're just defining and testing LLVMFuzzerTestOneInput.

In other words, to write a fuzzer, you define LLVMFuzzerTestOneInput in foo.cpp and then use clang++ foo.cpp -lfuzzer to compile the fuzz test to an executable binary. If you want to test that LLVMFuzzerTestOneInput is remotely valid in your continuous integration, separately from running it in a fuzz test, you can call it from your unit tests.

So the only relationship this code has with libFuzzer is that it exports a function that libFuzzer expects to exist. It doesn't actually depend on libFuzzer, and isn't built against libFuzzer within Python continuous integration. It works on all platforms you'd expect it to -- it's just a C++ file that calls the regular CPython API. It is only built against libFuzzer in the oss-fuzz project, which will actually check out a copy of cpython at the master revision and compile the fuzz tests against libFuzzer, and run them.

And to go full circle, this is the core of why it was written in C++ instead of C -- if I compile a C file with "-lfuzzer" it completely blows up because libFuzzer still depends on libstdc++ and I don't know how to compile with C but still link with C++ stuff. (Or something like that. I'm a C++/C noob, I use blaze/bazel for everything.)

I hope that gives enough background that this makes a little more sense. And I'll try to write this up a bit in README.rst once I get back to my work computer tomorrow!