Reproducible papers with artifacts (original) (raw)
Upcoming reproducibility and optimization events organized or supported by the cTuning and cKnowledge
- We are serving on the Program Committee of ACM REP 2025: The 3rd ACM Conference on Reproducibility and Replicability.
- We are testing the new version of the Collective Knowledge Playground and the MLCommons CM workflow automation framework (CMX) to run MLPerf inference v5.0 benchmarks.
News
- We helped organize Artifact Evaluation at IEEE/ACM MICRO'24.
- We helped run the MLPerf inference benchmark at the Student Cluster Competition at SuperComputing'24 using the CM workflow automation framework and MLPerf automations, which we have donated to MLCommons to benefit everyone.
- 2024 July 24: We have published a white paper at ArXiv: "Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments"
- 2024 July 21: We updated our automation, optimization and reproducibility challenges here.
- 2024 July 2: Consider joining artifact evaluation committee at IISWC'24.
- 2024 April 2: cTuning, cKnowledge and MLCommons will help student run MLPerf at the Student Cluster Competition at SuperComputing'24 - please stay tuned for more details!
- 2024 March 28: cTuning, MLCommons and cKnowledge has released a new version of the Collective Mind automation framework with a collection of reusable and technology-agnostic recipes to make it easier to compose and benchmark AI systems across rapidly evolving and commodity models, data set, software and hardware: see how CM helped to automate MLPerf inference v4.0 submissions and check a presentation at MLPerf-Bench workshop @ HPCA'24 of a new project to compose high-performance and cost-effective AI systems.
- 2024 January 26: MLCommons and cTuning has released version 1.6.0 of the human-friendly Collective Mind interface (CM) to help everyone run, manage and reuse a growing number of MLPerf, MLOps and DevOps scripts from MLCommons projects and research papers in a unified way on any operating system with any software and hardware either natively or inside containers.
- 2023 November 1: ACM has released related video about MLCommons CM automation language at their YouTube channel: "Toward a common language to facilitate reproducible research and technology transfer".
- 2023 October 29: The unified CM interface to run MICRO'23 artifacts is available here.
- 2023 October 1: We organized a related tutorial at the 2023 IEEE International Symposium on Workload Characterization: Tutorial: Introducing open-source MLCommons technology for collaborative, reproducible, automated and technology-agnostic benchmarking and optimization of AI/ML systems
- 2023 September 13: HPCWire article about the MLPerf inference v3.1 benchmark results including the highlights of the 1st mass-scale community submission via cTuning enabled by the MLCommons CM automation language!
- 2023 July 23: We opened new community challenges to reproduce and optimize the state-of-the-art research projects in terms of performance, power consumption, accuracy, costs, etc.
- 2023 June 28: Grigori Fursin (cTuning founder) gave a keynote at the 1st ACM conference on reproducibility and replicability about the Collective Mind language (CM) to automate experiments and make them more deterministic and reproducible across continuously changing software, hardware, models and data: [slides].
- 2023 June 14: We are preparing Artifact Evaluation at ACM/IEEE MICRO 2023 - stay tuned for more details! Since criteria for the ACM "Artifacts Evaluated – Reusable" badge are quite vague, we partnered with the MLCommons task force on automation and reproducibility to add their unified interface (MLCommons CM) to the submitted artifacts to make them more portable, reproducible and reusable. This interface was successfully validated at the Student Cluster Competition at SuperComputing'23 and we would like to test it as a possible criteria to obtain the ACM "Artifacts Evaluated – Reusable" badg Our ultimate goal is to provide a common interface to evaluate and reuse all artifacts across diverse and rapidly evolving software and hardware. We suggest the authors to check this tutorial to add CM to their projects.
- 2023 May 17: The cTuning foundation joined forces with AVCC and MLCommons to help develop industry's first Automotive Benchmark based on our automation language and reproducibility methodology.
- 2023 April: We have successfully validated this artifact evaluation methodology combined with the MLCommons CM automation language to automate ~80% of MLPerf inference v3.0 submissions (98% of all power results):LinkedIn,Forbes,ZDNet.
- 2023 April 5: The cTuning foundation joins forces with MLCommons to develop Collective Knowledge Playground for collaborative reproducibility and optimization challenges: press-release.
- 2023 Feb 16: New alpha CK2 GUI to visualize all MLPerf results is available here.
- 2023 Jan 30: New alpha CK2 GUI to run MLPerf inference is available here.
- 2022 October: We kickstarted an open MLCommons workgroup on automation and reproducibility - everyone is welcome to join here.
- 2022 September: We have helped MLCommons to prepare and release CM v1.0.1 - the next generation of the MLCommons Collective Knowledge framework being developed by the public task force to support collaborative and reproducible ML & Systems research! We are very glad to see that more than 80% of all performance results and more than 95% of all power results were automated by the MLCommons CK v2.6.1 in the latest MLPerf inference round thanks to submissions from Qualcomm, Krai, Dell, HPE and Lenovo!
- 2022.April: We are developing the 2nd version of the CK framework to make it easier to transfer scientific knowledge to production systems: GitHub.
- 2022.March: We've successfully completed Artifact Evaluation at ASPLOS 2022.
- ACM TechTalk (video) about artifact evaluation (challenges and solutions).
- Artifact Evaluation at MICRO 2021.
- The report from the "Workflows Community Summit: Bringing the Scientific Workflows Community Together" is available in ArXiv.
- Artifact Evaluation: reproducing papers at ASPLOS 2021 (the list of accepted artifacts).
- The paper about automating artifact evaluation has appeared in the Philosophical Transactions A, the world's longest-running journal where Newton published: DOI, ArXiv.
- cTuning foundation is honored to join MLCommons as a founding member to accelerate machine learning innovation and help with best practices, reproducible benchmarking and workflow automation along with 50+ leading companies and universities: press release.
- Reddit discussion about reproducing ML and systems papers.
- Artifact Evaluation: reproducing papers at MLSys 2020 (the list of accepted artifacts).
- Artifact Evaluation: reproducing papers at ACM ASPLOS 2020 (the list of accepted artifacts).
- Building an open repository with reproduced papers, portable workflows and reusable artifacts: cknow.io.
- Working on a common methodology to share research artifacts (code, data, models) at systems and ML conferences: ACM/cTuning.
- All reproduced papers.
- All our prior reproducibility initiatives with shared artifacts.
Motivation
Researchers, engineers and students struggleto reproduce experimental results and reuse research code from scientific papers due to continuously changing software and hardware, lack of common APIs, stochastic behavior of computer systems and a lack of a common experimental methodology. That is why we decided to set up the Artifact Evaluation processat conferences to help the community validate results from accepted papers with the help of independent evaluatorswhile collaborating with ACM and IEEE on a common methodology, reproducibility checklist and tools to automate this tedious process. Papers that successfully pass such evaluation process receive a set of ACM reproducibility badges printed on the papers themselves:
Please check our "submission"and "reviewing"guidelines for more details. If you have questions or suggestions, do not hesitate to participate in our public discussions using this Artifact Evaluation google group and/or the LinkedIn group.
Related initiatives:
- NISO RP-31-2021, Reproducibility Badging and Definitions
- SIGPLAN's checklist for empirical evaluation;
- The Machine Learning Reproducibility Checklist (NeurIPS);
- The NASEM report on Reproducibility and Replicability in Science: Online PDF and a BoF at SC19 organized and chaired by Lorena A. Barba;
- HOWTO for AEC Submitters (Dan Borowy, Charlie Cursinger, Emma Tosch, John Vilk, and Emery Berger)
- PPoPP'19 HotCRP configuration for artifact evaluation;
- Reproducible optimization tournaments on AI/ML/SW/HW co-design and on quantum computing challenges;
- Artifact Evaluation google group.