codeparrot/github-code-clean · Datasets at Hugging Face (original) (raw)
The dataset viewer is not available for this split.
Job manager was killed while running this job (job exceeded maximum duration).
Error code: JobManagerExceededMaximumDurationError
Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.
This is a cleaner version of Github-code dataset, we add the following filters:
- Average line length < 100
- Alpha numeric characters fraction > 0.25
- Remove auto-generated files (keyword search)
3.39M files are removed making up 2.94% of the dataset.
Downloads last month
35,903