Aaron Culich - Academia.edu (original) (raw)
Papers by Aaron Culich
The Binder project creates open source technology for sharable, reproducible, interactive data sc... more The Binder project creates open source technology for sharable, reproducible, interactive data science environments. The 2.0 version of the project began with a one-year grant from the Moore foundation. This Zenodo repository contains the original proposal to the Moore foundation, as well as the closing narrative once the grant finished. You can learn more about the Binder project at: * docs.mybinder.org (for user documentation) * binderhub.readthedocs.io (for BinderHub deployment information) * jupyterhub-team-compass.readthedocs.io (for JupyterHub/Binder team information)
Publishers, funders, and scientific practice increasingly require sharing research data via onlin... more Publishers, funders, and scientific practice increasingly require sharing research data via online repositories. Libraries have a tradition of sharing/archiving materials and are now more involved with research data sharing and archiving. However, large data files - particularly from HPC systems - are very challenging to distribute and archive. Storage can also be a challenge for researchers working in field stations and remote sites. However, NSF-funded cyberinfrastructure (XSEDE) has traditionally emphasized computing power, with little effort to provide data storage/archiving beyond what is needed for a specific computation. Science gateways and resources targeted at smaller computing jobs (like Comet and Jetstream) have succeeded in increasing the accessibility of NSF-funded computing resources. Accessible computational resources should include provision for long-term data storage and take advantage of developments in Library data services. <br> <br> This workshop wi...
Proceedings of the Practice and Experience on Advanced Research Computing
Achieving broad uptake of research computing services is a tremendous challenge when funding for ... more Achieving broad uptake of research computing services is a tremendous challenge when funding for staff positions is constrained. Outreach and ongoing engagement with researchers is both essential and time-consuming, leading to a tension between supporting day-to-day operations and building the kinds of partnerships that ensure ongoing support for the program. Since 2015, Berkeley Research Computing (BRC) has been hiring primarily graduate students into a part-time "domain consultant" role (influenced by ACI-REF job descriptions and Campus Champion activities) that addresses the program's staffing needs in an affordable way, while providing those graduate students with the technical training and professional work experience required for professional research facilitator positions. The domain consultant program has evolved from an hourly student position into a codified set of practices informed by IT service management, addressing needs including: in-person consulting, tier 2 triage of HPC troubleshooting tickets, support for cloud computing and compute in virtualized analytics environments, and user training. In addition, consulting engagements are reviewed and discussed regularly, both in team meetings and in one-on-one meetings with the service manager, to provide opportunities for consultants to hone their skills. This paper will also highlight the value of programs such as BRC's for addressing gaps in graduate education practices that can hinder PhD recipients' success when applying for research facilitator positions. It will also illustrate the value of thinking broadly about partnerships when developing a consulting program, by describing the program's recent expansion into research data management, and at the Lawrence Berkeley Lab.
Black box’ models are increasingly prevalent in our world and have important societal impacts, bu... more Black box’ models are increasingly prevalent in our world and have important societal impacts, but are often difficult to scrutinize or evaluate for bias. Binder provides anyone in the community the opportunity to examine a machine learning pipeline, promoting fairness, accountability, and transparency. Binder is used to create custom computing environments that can be shared and used by many remote users, enabling the user to build and register a Docker image from a repository and connect with JupyterHub. Users can select a specific branch name, commit, or tag to serve. Binder combines two projects: JupyterHub, which provides a scalable system for authenticating users and launching Jupyter Notebook servers, and repo2docker, which generates a Docker image from a Git repository. When connected with JupyterLab, users can navigate a repository on Binder with an IDE as if they were developing the project locally and can explore all underlying data (CSV, JSON, image, etc.). JupyterHub, r...
The Journal of Computational Science Education
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning)
The digital research landscape has changed dramatically over the years, as campuses across the na... more The digital research landscape has changed dramatically over the years, as campuses across the nation have gained access to local research computing resources and services. The Campus Champions program, [1] founded in 2008, has also evolved with these changes to accommodate and support the diversity and growth of the research computing community that we have long supported. Our mission, to promote and facilitate the effective participation of a diverse national community of institutions in the application of advanced digital resources and services to accelerate scientific discovery and scholarly achievement, has been made possible by sustained funding from the National Science Foundation (NSF) via the eXtreme Science and Engineering Discovery Environment (XSEDE) [2] over the past ten years, and has enabled the Campus Champions program to support campus cyberinfrastructure and to foster a thriving community of practice [3] with nationwide impact. To facilitate the continued growth and sustainability of the Campus Champions program beyond XSEDE, and to provide a deeper understanding of the Champion culture and needs, the Champions Sustainability Working Group and the XSEDE Evaluation team conducted a climate study in 2018. The recommendations provided by the climate study will inform our Leadership Team and staff on how we may further our community outreach goals and plan for the future of the program. This paper will highlight and discuss the development, implementation, key findings, and recommendations from that study.
Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact
What actions can we take to foster diverse and inclusive workplaces in the broad fields around da... more What actions can we take to foster diverse and inclusive workplaces in the broad fields around data science? This paper reports from a discussion in which researchers from many different disciplines and departments raised questions and shared their experiences with various aspects around diversity, inclusion, and equity. The issues we discuss include fostering inclusive interpersonal and small group dynamics, rules and codes of conduct, increasing diversity in less-representative groups and disciplines, organizing events for diversity and inclusion, and long-term efforts to champion change.
Proceedings of the 13th Python in Science Conference
What are the challenges and best practices for doing data-intensive research in teams, labs, and ... more What are the challenges and best practices for doing data-intensive research in teams, labs, and other groups? This paper reports from a discussion in which researchers from many different disciplines and departments shared their experiences on doing data science in their domains. The issues we discuss range from the technical to the social, including issues with getting on the same computational stack, workflow and pipeline management, handoffs, composing a well-balanced team, dealing with fluid membership, fostering coordination and communication, and not abandoning best practices when deadlines loom. We conclude by reflecting about the extent to which there are universal best practices for all teams, as well as how these kinds of informal discussions around the challenges of doing research can help combat impostor syndrome.
A system and method of determining who is the rights owner for video uses object recognition and ... more A system and method of determining who is the rights owner for video uses object recognition and can avoid the need for fingerprinting or watermarking. By examining the video for objects that are known to be in videos by a rights holder, ownership of the video can be established within certain confidence bounds. This process can be used to reestablish control of content that may have been released or recorded without authorization or was produced at costs points that precluded more invasive or production intensive ...
A system and method of determining who is the rights owner for video uses object recognition and ... more A system and method of determining who is the rights owner for video uses object recognition and can avoid the need for fingerprinting or watermarking. By examining the video for objects that are known to be in videos by a rights holder, ownership of the video can be established within certain confidence bounds. This process can be used to reestablish control of content that may have been released or recorded without authorization or was produced at costs points that precluded more invasive or production intensive ...
A grant submitted to the Moore foundation for development of the Binder platform for sharable onl... more A grant submitted to the Moore foundation for development of the Binder platform for sharable online computing environments. See the public Binder service at mybinder.org
The Binder project creates open source technology for sharable, reproducible, interactive data sc... more The Binder project creates open source technology for sharable, reproducible, interactive data science environments. The 2.0 version of the project began with a one-year grant from the Moore foundation. This Zenodo repository contains the original proposal to the Moore foundation, as well as the closing narrative once the grant finished. You can learn more about the Binder project at: * docs.mybinder.org (for user documentation) * binderhub.readthedocs.io (for BinderHub deployment information) * jupyterhub-team-compass.readthedocs.io (for JupyterHub/Binder team information)
Publishers, funders, and scientific practice increasingly require sharing research data via onlin... more Publishers, funders, and scientific practice increasingly require sharing research data via online repositories. Libraries have a tradition of sharing/archiving materials and are now more involved with research data sharing and archiving. However, large data files - particularly from HPC systems - are very challenging to distribute and archive. Storage can also be a challenge for researchers working in field stations and remote sites. However, NSF-funded cyberinfrastructure (XSEDE) has traditionally emphasized computing power, with little effort to provide data storage/archiving beyond what is needed for a specific computation. Science gateways and resources targeted at smaller computing jobs (like Comet and Jetstream) have succeeded in increasing the accessibility of NSF-funded computing resources. Accessible computational resources should include provision for long-term data storage and take advantage of developments in Library data services. <br> <br> This workshop wi...
Proceedings of the Practice and Experience on Advanced Research Computing
Achieving broad uptake of research computing services is a tremendous challenge when funding for ... more Achieving broad uptake of research computing services is a tremendous challenge when funding for staff positions is constrained. Outreach and ongoing engagement with researchers is both essential and time-consuming, leading to a tension between supporting day-to-day operations and building the kinds of partnerships that ensure ongoing support for the program. Since 2015, Berkeley Research Computing (BRC) has been hiring primarily graduate students into a part-time "domain consultant" role (influenced by ACI-REF job descriptions and Campus Champion activities) that addresses the program's staffing needs in an affordable way, while providing those graduate students with the technical training and professional work experience required for professional research facilitator positions. The domain consultant program has evolved from an hourly student position into a codified set of practices informed by IT service management, addressing needs including: in-person consulting, tier 2 triage of HPC troubleshooting tickets, support for cloud computing and compute in virtualized analytics environments, and user training. In addition, consulting engagements are reviewed and discussed regularly, both in team meetings and in one-on-one meetings with the service manager, to provide opportunities for consultants to hone their skills. This paper will also highlight the value of programs such as BRC's for addressing gaps in graduate education practices that can hinder PhD recipients' success when applying for research facilitator positions. It will also illustrate the value of thinking broadly about partnerships when developing a consulting program, by describing the program's recent expansion into research data management, and at the Lawrence Berkeley Lab.
Black box’ models are increasingly prevalent in our world and have important societal impacts, bu... more Black box’ models are increasingly prevalent in our world and have important societal impacts, but are often difficult to scrutinize or evaluate for bias. Binder provides anyone in the community the opportunity to examine a machine learning pipeline, promoting fairness, accountability, and transparency. Binder is used to create custom computing environments that can be shared and used by many remote users, enabling the user to build and register a Docker image from a repository and connect with JupyterHub. Users can select a specific branch name, commit, or tag to serve. Binder combines two projects: JupyterHub, which provides a scalable system for authenticating users and launching Jupyter Notebook servers, and repo2docker, which generates a Docker image from a Git repository. When connected with JupyterLab, users can navigate a repository on Binder with an IDE as if they were developing the project locally and can explore all underlying data (CSV, JSON, image, etc.). JupyterHub, r...
The Journal of Computational Science Education
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning)
The digital research landscape has changed dramatically over the years, as campuses across the na... more The digital research landscape has changed dramatically over the years, as campuses across the nation have gained access to local research computing resources and services. The Campus Champions program, [1] founded in 2008, has also evolved with these changes to accommodate and support the diversity and growth of the research computing community that we have long supported. Our mission, to promote and facilitate the effective participation of a diverse national community of institutions in the application of advanced digital resources and services to accelerate scientific discovery and scholarly achievement, has been made possible by sustained funding from the National Science Foundation (NSF) via the eXtreme Science and Engineering Discovery Environment (XSEDE) [2] over the past ten years, and has enabled the Campus Champions program to support campus cyberinfrastructure and to foster a thriving community of practice [3] with nationwide impact. To facilitate the continued growth and sustainability of the Campus Champions program beyond XSEDE, and to provide a deeper understanding of the Champion culture and needs, the Champions Sustainability Working Group and the XSEDE Evaluation team conducted a climate study in 2018. The recommendations provided by the climate study will inform our Leadership Team and staff on how we may further our community outreach goals and plan for the future of the program. This paper will highlight and discuss the development, implementation, key findings, and recommendations from that study.
Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact
What actions can we take to foster diverse and inclusive workplaces in the broad fields around da... more What actions can we take to foster diverse and inclusive workplaces in the broad fields around data science? This paper reports from a discussion in which researchers from many different disciplines and departments raised questions and shared their experiences with various aspects around diversity, inclusion, and equity. The issues we discuss include fostering inclusive interpersonal and small group dynamics, rules and codes of conduct, increasing diversity in less-representative groups and disciplines, organizing events for diversity and inclusion, and long-term efforts to champion change.
Proceedings of the 13th Python in Science Conference
What are the challenges and best practices for doing data-intensive research in teams, labs, and ... more What are the challenges and best practices for doing data-intensive research in teams, labs, and other groups? This paper reports from a discussion in which researchers from many different disciplines and departments shared their experiences on doing data science in their domains. The issues we discuss range from the technical to the social, including issues with getting on the same computational stack, workflow and pipeline management, handoffs, composing a well-balanced team, dealing with fluid membership, fostering coordination and communication, and not abandoning best practices when deadlines loom. We conclude by reflecting about the extent to which there are universal best practices for all teams, as well as how these kinds of informal discussions around the challenges of doing research can help combat impostor syndrome.
A system and method of determining who is the rights owner for video uses object recognition and ... more A system and method of determining who is the rights owner for video uses object recognition and can avoid the need for fingerprinting or watermarking. By examining the video for objects that are known to be in videos by a rights holder, ownership of the video can be established within certain confidence bounds. This process can be used to reestablish control of content that may have been released or recorded without authorization or was produced at costs points that precluded more invasive or production intensive ...
A system and method of determining who is the rights owner for video uses object recognition and ... more A system and method of determining who is the rights owner for video uses object recognition and can avoid the need for fingerprinting or watermarking. By examining the video for objects that are known to be in videos by a rights holder, ownership of the video can be established within certain confidence bounds. This process can be used to reestablish control of content that may have been released or recorded without authorization or was produced at costs points that precluded more invasive or production intensive ...
A grant submitted to the Moore foundation for development of the Binder platform for sharable onl... more A grant submitted to the Moore foundation for development of the Binder platform for sharable online computing environments. See the public Binder service at mybinder.org