Anatomy of a Data Science Software Toolkit That Uses Machine Learning to Aid ‘Bench-to-Bedside’ Medical Research—With Essential Concepts of Data Mining and Analysis Explained (original) (raw)

Data science and machine learning are buzzwords of the early 21st century. Now pervasive through human civilization, how do these concepts translate to use by researchers and clinicians in the life-science and medical field? Here, we describe a software toolkit, just large enough in scale, so that it can be maintained and extended by a small team, optimised for problems that arise in small/medium laboratories. In particular, this system may be managed from data ingestion statistics preparation predictions by a single person. At the system’s core is a graph type database, so that it is flexible in terms of irregular, constantly changing data types, as such data types are common during explorative research. At the system’s outermost shell, the concept of ’user stories’ is introduced to help the end-user researchers perform various tasks separated by their expertise: these range from simple data input, data curation, statistics, and finally to predictions via machine learning algorithm...