Initial Migration of Fuzz Tests & Integration Scripts From the OSS-Fuzz Project Repo by DaveLak · Pull Request #1901 · gitpython-developers/GitPython (original) (raw)
Ok, sorry for the delay. Below are my thoughts regarding fuzz_tree.py
, but also, a little extra context on the OSS-Fuzz execution environment because it may be helpful for thinking about #1901 (comment).
About the OSS-Fuzz Environment
You both touched on it, but for the sake of shared understanding, the scripts and tests look like they do in part because of the OSS-Fuzz container environment implementation details.
The key points to understand are:
- There are two stages of an OSS-Fuzz run that influence why the shell scripts and fuzz tests look like they do: a build stage and a fuzzer execution stage.
- Each stage has its own distinct container environment.
- Each container environment's filesystem is mostly read-only.
- The containers are Linux based (Ubuntu 20.04 right now I believe.)
Build Stage Execution Environment
build.sh script environment
When your build.sh script is executed, the following locations are available within the image:
Location Env Variable Description /out/ $OUT Directory to store build artifacts (fuzz targets, dictionaries, options files, seed corpus archives). /src/ $SRC Directory to checkout source files. /work/ $WORK Directory to store intermediate files.
Everything else is read-only.
Fuzzer Execution Stage Environment
File system
Everything except/tmp
is read-only, including the directory that your fuzz target executable lives in./dev
is also unavailable.
Regarding the Fuzz Tests
When I initially started on this, my primary objective was really only about fixing the build. I did try to clean them up a bit in the process and apply some best practices, but those changes were mostly incidental and I intentionally avoided any significant refactors.
After we decided to bring them here, my focus shifted to lowering the barrier to entry for contributors because of the nature of the project. In particular, I tried to distill the useful information from various place that I haven't seen documented in one place yet. One such tip is the direct execution section, but in hindsight I overlooked the environment assumptions in fuzz_tree.py
.
On fuzz_tree.py
Specifically
This test is less than ideal for several reasons.
- Writing to the
/tmp
dir
In addition to what has been mentioned, modifying global state and I/O operations (e.g., writing to the temp dir) are problematic for fuzzing in general.
Further, @EliahKagan's remark in #1901 (comment) about "implicating other things in /tmp
" made me realize this may already be causing issues inside the OSS-Fuzz execution environment. A non-obvious implementation detail specific to Python projects in OSS-Fuzz that the compile_python_fuzzer function in the build.sh script uses Pyinstaller to bundle the fuzz harness and all related dependencies into a single file archive that gets transferred to the fuzzer execution stage, where it is unpacked into a subdirectory of /tmp
for execution. Which, I believe means, the test execution is actually using the test executable as a fixture 🥴
I have yet to confirm this, but I wouldn't be surprised if that is the cause of the issue I described as a possible " circular dependency " in google/oss-fuzz#11763
- It's very slow
With an average execs per second of ~4 after the seed corpus loads on a short local run, it is far, far, below what is hoped for. In contrast, I'm seeing about 1432 exec/s on fuzz_config.py
in a similar local run. fuzz_config.py
creates a config fake in-memory but never writes it to disk.
I haven't dug in yet, but I think this is partly related to point 1 above and partly related to some of the implementation details of the legacy code in GitPython.
This is less of a concern/priority than the /tmp
dir issue though because even if it's slow, it's still at least testing the API.
Thoughts on Next Steps
I'm fully in agreement that the /tmp
issue and documentation related to it must be fixed before other updates are considered. I think it can be broken down into smaller problems in the following prioritized order:
- Address adverse potential that the local execution documentation exposes unaware contributors to.
Possible quick fix options:
- Remove that section all together for now (probably quickest and safest?)
- Add a Warning or Caution callout to that section about
fuzz_tree.py
(this feels not as great IMHO)
- Address the
/tmp
writing implementation issues infuzz_tree.py
- TemporaryDirectory as suggested by @EliahKagan looks like a good solution but I would need to test it in OSS-Fuzz (which I can certainly do regardless of the downstream PR's status.)
- I've also considered proposing we add a Dockerfile to the
fuzzing/
dir that would provide a "batteries included" way for folks to use the direct execution method quickly and easily inside a container environment prepared specifically to support it. Even before this discussion, I thought that would be the ideal way to document it because users wouldn't need to worry about any of the particulars beyond having Docker installed, which they'd already need for the alternative anyway.
I think the most robust solution would be some combination of both points above, but I don't know enough to comment on the potential issues that we may face goinf the TemporaryDirectory
route inside OSS-Fuzz. The Dockerfile solution on the other hand is something I've already been toying with locally.
I also looked at options like MemoryFS from https://github.com/PyFilesystem/pyfilesystem2 because an in-memory solution would be ideal, but I think the git
executable invocation may cause issues with that approach.