Olivier Sallou - Academia.edu (original) (raw)
Papers by Olivier Sallou
Le Centre pour la Communication Scientifique Directe - HAL - Inria, Nov 14, 2016
National audienc
for public discussion includes service-info, filter by descriptor language transition to openapi ... more for public discussion includes service-info, filter by descriptor language transition to openapi 3.0 (which introduces some formatting changes) only as opposed to converted from swagger by swagger2openapi behind the scenes , transition to new doc system, add galaxy
Auto-generated OpenAPI upon commit Show checker in Tool
Created in 2001, GenOuest is now well established as an ISO9001:2008 certified support core facil... more Created in 2001, GenOuest is now well established as an ISO9001:2008 certified support core facility. It is now recognized by IBiSA and is involved in Renabi, the french network of bioinformatics platforms. GenOuest is a team of 7 members including 3 permanents, serving more than 200 authentified users launching 120 000 jobs per month. GenOuest is an operational facility working side by side with biologists in Biogenouest. As a technological platform, GenOuest provides a comprehensive environment with a 300 core cluster, NVidia Tesla S1070 (4 GPU with 256 processor each) and 70 Tb storage. In a multi-disciplinary context, GenOuest acts as a mediation structure, linking together the different partners coming from different fields: biology, bioinformatics research, computing science research. This model has already given birth to new tools through the collaborations between diverse teams. GenOuest provides specialized biological data resources (Aphidbase, Germonline, etc.) as well as ...
The Galaxy team said during GCC2014 :"Docker, docker, docker!", demonstrating their int... more The Galaxy team said during GCC2014 :"Docker, docker, docker!", demonstrating their interest in this new platform for application distribution. Galaxy allows now to use this Linux virtualization system as a good way to avoid common dependencies problems such as version evolution and cohabitation. The use of isolated containers allows thus to execute tools installed in different systems. We here show how the use of the combined technologies of cloud and Docker appears to be a good solution to easily deploy and develop new Galaxy tools.
Nucleic acids research, Jan 20, 2015
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified ... more The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one m...
ABSTRACT Research processes in Life sciences are evolving at a rapid pace. This evolution, mainly... more ABSTRACT Research processes in Life sciences are evolving at a rapid pace. This evolution, mainly due to technological advances, offers more powerful equipment and generalizes the digital format of research data. In the data deluge context, we need to overcome the current "datanami" and prepare for the future. In Life Sciences, we are noting a sharp increase of storage and computing needs. The current model, consisting to regularly add hardware resources on the Bio-informatics core facilities without global coordination, is no longer sustainable. Scientific data management and analysis has to be enhanced in order to offer services and developments corresponding to the new uses. Using Information and Communication Technology (ICT) as international standards and softwares (ISAtools software suite for metadata management, Galaxy web platform for data analysis and HUBzero for scientific collaboration), we propose a life sciences Virtual Research Environment (VRE) for Western France Science communities. If deployment of this kind of environment is challenging, it represents an opportunity to pave the way towards better research processes through enhanced collaboration, data management, analysis practices and resources optimization.
EMBnet.journal, 2013
Motivation and Objectives The challenge for everyone is to be aware of existing implementations o... more Motivation and Objectives The challenge for everyone is to be aware of existing implementations of a particular desired functionality and the compatibility with the local infrastructure. Strategically, it is beneficial to know other contributors to the externally maintained library, and to ensure that contributions are integrated with the remaining code in the best future-compatible way and with the least possible redundancies. To help achieve these goals, the Bioinformatics Open Source Conference (BOSC) was established in 2000 by the Open Bioinformatics Foundation Bio* project members as an international venue for showcasing new projects and progress, and for developers worldwide to meet in person. To support team building and help communication, BOSC adopted Birds-of-a-Feather (BoF) sessions, i.e. group meetings of one-two hours.
Journées RESeaux - JRES 2019, Dec 3, 2019
L'Institut Français de Bioinformatique (IFB) propose différents services pour le traitement des d... more L'Institut Français de Bioinformatique (IFB) propose différents services pour le traitement des données des sciences de la vie, en partie basés sur une fédération de clouds académiques. Le portail Biosphère (https://biosphere.france-bioinformatique.fr) fournit plusieurs interfaces pour simplifier l'usage du cloud de l'IFB : le catalogue RAINBio des environnements modèles (appliances), un tableau de bord pour gérer les déploiements et un registre des données publiques disponibles. La fédération IFB-Biosphère, initiée fin 2016, comporte 5 400 coeurs et 27 téraoctets de mémoire, répartis entre 6 sites basés sur Openstack, fédérés avec le système Nuvla. En plus des composants de base, d'autres plus spécifiques comme Manila pour la fourniture de volumes partagés en mode fichier, sont requis pour la majorité des applications bioinformatiques. La gestion des utilisateurs repose sur les identifiants institutionnels de la fédération d'identités eduGAIN, avec un proxy "keycloack" et des clients OpenID Connect. Les appliances bioinformatiques proposent de nombreux outils courants pour l'analyse de données biologiques, 33 sont actuellement publiées dans le catalogue RAINBio. Ces environnements fournissent des outils comme "conda", "docker" ou "ansible"; des interfaces scientifiques de haut-niveau (portails web Rstudio ou Jupyter Notebook), ou un bureau graphique à distance. Certains environnements comprennent plusieurs composants reposant sur autant de machines virtuelles ou conteneurs. Le quota de base, extensible, permet de déployer des VMs, avec jusqu'à 128 coeurs et 3 To de RAM. Le cloud IFB-Biosphère est utilisé pour des analyses scientifiques pouvant être intensives (4 000 coeurs), et par de nombreuses sessions de formation, écoles scientifiques, cursus de masters universitaires, workshops ou hackathons. Mots-clefs Sciences de la vie, Bioinformatique, Calcul scientifique, Traitement des données scientifiques, Cloud computing JRES 2019-Dijon 11/12 Ces images pré-configurées, de conteneurs et machines virtuelles, peuvent alors être partagées comme ressources publiques pour diffuser un logiciel ou une méthode. L'infrastructure est utilisée pour des analyses scientifiques intensives (jusqu'à 4 000 coeurs de calcul) et par de nombreuses sessions de formation, écoles scientifiques, cursus de masters universitaires, workshops ou hackathons, dont certaines depuis plusieurs années.
cross-species toolbox for the reproductive science community
Background: Computational biology comprises a wide range of technologies and approaches. Multiple... more Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individua...
Nowadays, Docker containers are used to ease application deployment, from command lines tools to ... more Nowadays, Docker containers are used to ease application deployment, from command lines tools to cluster management1. This technology has a strong impact in bioinformatics where specialized software can often require multiple dependencies. It is a long term preservation solution for legacy and unmaintained tools and it enables a better process isolation in a multi-user environment. Docker as a way to quickly integrate new tools is already used with Galaxy. We have setup a functional prototype of a web registry of Docker images, BioShaDock,2 dedicated to bioinformatics tools and utilities. We created a set of tools descriptors based on Docker images available in our toolshed3. Even if a general purpose registry can be used to hold shared Docker containers, we think that a domain centric registry, e.g. for the French life science community through a registry linked to the cloud of the French Institute of Bioinformatics (IFB8), would have a significant impact on bioinformatician produc...
Bioinformatics software development has become a cornerstone in modern biology research. Large-sc... more Bioinformatics software development has become a cornerstone in modern biology research. Large-scale quantitative biology studies have created a demand for more complex workflows and data analysis pipelines. Challenges in reproducing bioinformatics analyses are compounded by the fact that the programs themselves are difficult to install on computers because they rely on software libraries, compilers, and other files, and environment variables collectively called dependencies that are assumed to be available and, thus, are often poorly documented. The Bioconda and BioContainers community have created a complete ecosystem that allow bioinformatics software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create and distribute bioinformatics containers with a special focus on omics technologies. These cross-platform containers can be integrated into more comprehensive bioinformatics pipelines and differe...
L'etat de la pratique des outils de reconnaissance de motifs et l'ecart qui peut etre obs... more L'etat de la pratique des outils de reconnaissance de motifs et l'ecart qui peut etre observe avec les besoins reels de modelisation des personnes en charge de l'analyse des structures genomiques montrent clairement le besoin de langages de plus haut niveau pour decrire et rechercher ces structures dans les sequences genomiques. Il apparait ainsi necessaire de proposer de nouveaux outils permettant de definir des modeles expressifs de familles de sequences biologiques, modeles bases a la fois sur le contenu et la structure des sequences. Cet article presente Logol, une application de reconnaissance de motifs concue pour analyser des sequences potentiellement grandes avec des motifs biologiques realistes. Logol est constitue d'un langage de description de motifs et de la suite logicielle associee, permettant de realiser effectivement l'analyse de sequences (d'ADN, ARN ou proteines) avec ces motifs. Le langage, base sur un formalisme grammatical de haut niveau,...
IFB, the French Elixir Node, is a national service infrastructure which provides services and res... more IFB, the French Elixir Node, is a national service infrastructure which provides services and resources in bioinformatics[1] . IFB’s goal is to offer to scientific users and developers a scalable, flexible and user-friendly computation facility associated to a large storage capacity, as needed for current life science data processing. To analyze heterogeneous biological data, bioinformaticians require hundreds of different specialized software including well-established tools as well as research prototypes. In addition, these software are used alone or in workflows, from GUI or command lines, for production, tests or developments. Thus, providing an updated and complete set of tools requires huge resources. To offer an efficient service for this expected diversity of usages, we propose a software architecture and a cloud model which bring solutions for tools packaging, rapid deployment and multiple channel software distribution. We describe here the set of technical components that ...
Considering eScience as "enhancing" Science through ICT, the Virtual Research Environment (VRE) r... more Considering eScience as "enhancing" Science through ICT, the Virtual Research Environment (VRE) represents the eScience application "tool". Gathering Scientists with data, software and processing resources through the web, a VRE aimed to facilitate collaboration tasks and answer communities needs. eBiogenouest : a Western story At the request of scientists, we have provided a Galaxy server in late of 2012 and created the Galaxy User Group Grand Ouest (GUGGO). HUBzero [1] : our VRE's gate This collaborative space is intended to help users establish new collaborations. With the Galaxy analysis platform and our metadata management environment, they can pursue their work in an integrated environment.
Journal of Proteome Research
BioContainers is an open-source project that aims to create, store, and distribute bioinformatics... more BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize the software containers including the metadata, versions, licenses, and/or software dependencies. BioContainers supports multiple packaging and container technologies such as Conda, Docker, and Singularity. The BioContainers provide over 9000 bioinformatics tools including more than 200 proteomics and mass spectrometry tools. Here, we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). The BioContainers Registry provides a fast and convenient way to find and retrieve bioinformatics tools packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.
Le Centre pour la Communication Scientifique Directe - HAL - Inria, Nov 14, 2016
National audienc
for public discussion includes service-info, filter by descriptor language transition to openapi ... more for public discussion includes service-info, filter by descriptor language transition to openapi 3.0 (which introduces some formatting changes) only as opposed to converted from swagger by swagger2openapi behind the scenes , transition to new doc system, add galaxy
Auto-generated OpenAPI upon commit Show checker in Tool
Created in 2001, GenOuest is now well established as an ISO9001:2008 certified support core facil... more Created in 2001, GenOuest is now well established as an ISO9001:2008 certified support core facility. It is now recognized by IBiSA and is involved in Renabi, the french network of bioinformatics platforms. GenOuest is a team of 7 members including 3 permanents, serving more than 200 authentified users launching 120 000 jobs per month. GenOuest is an operational facility working side by side with biologists in Biogenouest. As a technological platform, GenOuest provides a comprehensive environment with a 300 core cluster, NVidia Tesla S1070 (4 GPU with 256 processor each) and 70 Tb storage. In a multi-disciplinary context, GenOuest acts as a mediation structure, linking together the different partners coming from different fields: biology, bioinformatics research, computing science research. This model has already given birth to new tools through the collaborations between diverse teams. GenOuest provides specialized biological data resources (Aphidbase, Germonline, etc.) as well as ...
The Galaxy team said during GCC2014 :"Docker, docker, docker!", demonstrating their int... more The Galaxy team said during GCC2014 :"Docker, docker, docker!", demonstrating their interest in this new platform for application distribution. Galaxy allows now to use this Linux virtualization system as a good way to avoid common dependencies problems such as version evolution and cohabitation. The use of isolated containers allows thus to execute tools installed in different systems. We here show how the use of the combined technologies of cloud and Docker appears to be a good solution to easily deploy and develop new Galaxy tools.
Nucleic acids research, Jan 20, 2015
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified ... more The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one m...
ABSTRACT Research processes in Life sciences are evolving at a rapid pace. This evolution, mainly... more ABSTRACT Research processes in Life sciences are evolving at a rapid pace. This evolution, mainly due to technological advances, offers more powerful equipment and generalizes the digital format of research data. In the data deluge context, we need to overcome the current "datanami" and prepare for the future. In Life Sciences, we are noting a sharp increase of storage and computing needs. The current model, consisting to regularly add hardware resources on the Bio-informatics core facilities without global coordination, is no longer sustainable. Scientific data management and analysis has to be enhanced in order to offer services and developments corresponding to the new uses. Using Information and Communication Technology (ICT) as international standards and softwares (ISAtools software suite for metadata management, Galaxy web platform for data analysis and HUBzero for scientific collaboration), we propose a life sciences Virtual Research Environment (VRE) for Western France Science communities. If deployment of this kind of environment is challenging, it represents an opportunity to pave the way towards better research processes through enhanced collaboration, data management, analysis practices and resources optimization.
EMBnet.journal, 2013
Motivation and Objectives The challenge for everyone is to be aware of existing implementations o... more Motivation and Objectives The challenge for everyone is to be aware of existing implementations of a particular desired functionality and the compatibility with the local infrastructure. Strategically, it is beneficial to know other contributors to the externally maintained library, and to ensure that contributions are integrated with the remaining code in the best future-compatible way and with the least possible redundancies. To help achieve these goals, the Bioinformatics Open Source Conference (BOSC) was established in 2000 by the Open Bioinformatics Foundation Bio* project members as an international venue for showcasing new projects and progress, and for developers worldwide to meet in person. To support team building and help communication, BOSC adopted Birds-of-a-Feather (BoF) sessions, i.e. group meetings of one-two hours.
Journées RESeaux - JRES 2019, Dec 3, 2019
L'Institut Français de Bioinformatique (IFB) propose différents services pour le traitement des d... more L'Institut Français de Bioinformatique (IFB) propose différents services pour le traitement des données des sciences de la vie, en partie basés sur une fédération de clouds académiques. Le portail Biosphère (https://biosphere.france-bioinformatique.fr) fournit plusieurs interfaces pour simplifier l'usage du cloud de l'IFB : le catalogue RAINBio des environnements modèles (appliances), un tableau de bord pour gérer les déploiements et un registre des données publiques disponibles. La fédération IFB-Biosphère, initiée fin 2016, comporte 5 400 coeurs et 27 téraoctets de mémoire, répartis entre 6 sites basés sur Openstack, fédérés avec le système Nuvla. En plus des composants de base, d'autres plus spécifiques comme Manila pour la fourniture de volumes partagés en mode fichier, sont requis pour la majorité des applications bioinformatiques. La gestion des utilisateurs repose sur les identifiants institutionnels de la fédération d'identités eduGAIN, avec un proxy "keycloack" et des clients OpenID Connect. Les appliances bioinformatiques proposent de nombreux outils courants pour l'analyse de données biologiques, 33 sont actuellement publiées dans le catalogue RAINBio. Ces environnements fournissent des outils comme "conda", "docker" ou "ansible"; des interfaces scientifiques de haut-niveau (portails web Rstudio ou Jupyter Notebook), ou un bureau graphique à distance. Certains environnements comprennent plusieurs composants reposant sur autant de machines virtuelles ou conteneurs. Le quota de base, extensible, permet de déployer des VMs, avec jusqu'à 128 coeurs et 3 To de RAM. Le cloud IFB-Biosphère est utilisé pour des analyses scientifiques pouvant être intensives (4 000 coeurs), et par de nombreuses sessions de formation, écoles scientifiques, cursus de masters universitaires, workshops ou hackathons. Mots-clefs Sciences de la vie, Bioinformatique, Calcul scientifique, Traitement des données scientifiques, Cloud computing JRES 2019-Dijon 11/12 Ces images pré-configurées, de conteneurs et machines virtuelles, peuvent alors être partagées comme ressources publiques pour diffuser un logiciel ou une méthode. L'infrastructure est utilisée pour des analyses scientifiques intensives (jusqu'à 4 000 coeurs de calcul) et par de nombreuses sessions de formation, écoles scientifiques, cursus de masters universitaires, workshops ou hackathons, dont certaines depuis plusieurs années.
cross-species toolbox for the reproductive science community
Background: Computational biology comprises a wide range of technologies and approaches. Multiple... more Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individua...
Nowadays, Docker containers are used to ease application deployment, from command lines tools to ... more Nowadays, Docker containers are used to ease application deployment, from command lines tools to cluster management1. This technology has a strong impact in bioinformatics where specialized software can often require multiple dependencies. It is a long term preservation solution for legacy and unmaintained tools and it enables a better process isolation in a multi-user environment. Docker as a way to quickly integrate new tools is already used with Galaxy. We have setup a functional prototype of a web registry of Docker images, BioShaDock,2 dedicated to bioinformatics tools and utilities. We created a set of tools descriptors based on Docker images available in our toolshed3. Even if a general purpose registry can be used to hold shared Docker containers, we think that a domain centric registry, e.g. for the French life science community through a registry linked to the cloud of the French Institute of Bioinformatics (IFB8), would have a significant impact on bioinformatician produc...
Bioinformatics software development has become a cornerstone in modern biology research. Large-sc... more Bioinformatics software development has become a cornerstone in modern biology research. Large-scale quantitative biology studies have created a demand for more complex workflows and data analysis pipelines. Challenges in reproducing bioinformatics analyses are compounded by the fact that the programs themselves are difficult to install on computers because they rely on software libraries, compilers, and other files, and environment variables collectively called dependencies that are assumed to be available and, thus, are often poorly documented. The Bioconda and BioContainers community have created a complete ecosystem that allow bioinformatics software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create and distribute bioinformatics containers with a special focus on omics technologies. These cross-platform containers can be integrated into more comprehensive bioinformatics pipelines and differe...
L'etat de la pratique des outils de reconnaissance de motifs et l'ecart qui peut etre obs... more L'etat de la pratique des outils de reconnaissance de motifs et l'ecart qui peut etre observe avec les besoins reels de modelisation des personnes en charge de l'analyse des structures genomiques montrent clairement le besoin de langages de plus haut niveau pour decrire et rechercher ces structures dans les sequences genomiques. Il apparait ainsi necessaire de proposer de nouveaux outils permettant de definir des modeles expressifs de familles de sequences biologiques, modeles bases a la fois sur le contenu et la structure des sequences. Cet article presente Logol, une application de reconnaissance de motifs concue pour analyser des sequences potentiellement grandes avec des motifs biologiques realistes. Logol est constitue d'un langage de description de motifs et de la suite logicielle associee, permettant de realiser effectivement l'analyse de sequences (d'ADN, ARN ou proteines) avec ces motifs. Le langage, base sur un formalisme grammatical de haut niveau,...
IFB, the French Elixir Node, is a national service infrastructure which provides services and res... more IFB, the French Elixir Node, is a national service infrastructure which provides services and resources in bioinformatics[1] . IFB’s goal is to offer to scientific users and developers a scalable, flexible and user-friendly computation facility associated to a large storage capacity, as needed for current life science data processing. To analyze heterogeneous biological data, bioinformaticians require hundreds of different specialized software including well-established tools as well as research prototypes. In addition, these software are used alone or in workflows, from GUI or command lines, for production, tests or developments. Thus, providing an updated and complete set of tools requires huge resources. To offer an efficient service for this expected diversity of usages, we propose a software architecture and a cloud model which bring solutions for tools packaging, rapid deployment and multiple channel software distribution. We describe here the set of technical components that ...
Considering eScience as "enhancing" Science through ICT, the Virtual Research Environment (VRE) r... more Considering eScience as "enhancing" Science through ICT, the Virtual Research Environment (VRE) represents the eScience application "tool". Gathering Scientists with data, software and processing resources through the web, a VRE aimed to facilitate collaboration tasks and answer communities needs. eBiogenouest : a Western story At the request of scientists, we have provided a Galaxy server in late of 2012 and created the Galaxy User Group Grand Ouest (GUGGO). HUBzero [1] : our VRE's gate This collaborative space is intended to help users establish new collaborations. With the Galaxy analysis platform and our metadata management environment, they can pursue their work in an integrated environment.
Journal of Proteome Research
BioContainers is an open-source project that aims to create, store, and distribute bioinformatics... more BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize the software containers including the metadata, versions, licenses, and/or software dependencies. BioContainers supports multiple packaging and container technologies such as Conda, Docker, and Singularity. The BioContainers provide over 9000 bioinformatics tools including more than 200 proteomics and mass spectrometry tools. Here, we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). The BioContainers Registry provides a fast and convenient way to find and retrieve bioinformatics tools packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.