Data Intensive Research In South Africa (original) (raw)

Research Data Management in South Africa: How We Shape Up

Australian Academic & Research Libraries, 2014

This paper will explore some of the views that were expressed during the Library and Information Association of South Africa (LIASA) workshop held in cooperation with the UK's Digital Curation Centre (DCC) in March 2014. The event provided an ideal opportunity to assess librarians' views on the changing research data management landscape and to consider how these changes might affect the role of academic librarians in South Africa. The paper compares these views with experiences garnered through the DCC's work to support universities in the UK.

Data-Intensive Research Workshop Report

2010

Authors: All of the workshop's participants, see Appendix A Editors: Malcolm Atkinson, David De Roure, Jano van Hemert, Shantenu Jha, Ruth McNally, Bob Mann, Stratis Viglas and Chris Williams ... Why DIR? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Status of the Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 1.4 Motivation and Strategies for data-intensive Biology . . . . . . . . . . . . . . . . ... 1.5 Analysing and Modelling Large-Scale Enterprise Data . . . . . . . . . . . . . . . ... 1.7 Data Analysis Theme Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.8 Data- ...

Data-Intensive Research Workshop (15-19 March 2010) Report

2010

Data-Intensive Research is any research in any discipline where careful thought about how to use data is essential for achieving success. Later chapters expand on this definition and demonstrate the diversity of ways in which handling and interpreting data may be challenging. Chapter 1 gives a fuller introduction and shows why it is timely to discuss this topic. The Data-Intensive Research workshop was run by the e-Science Institute (http://esi.ed.ac.uk) at the University of Edinburgh for the week 15-19 March 2010. Over the course of the week the workshop involved approximately 100 participants who are the authors of this report (see Appendix A). The workshop was organised by those shown in Table 1 and followed the timetable given in Appendix B. Various web resources were built before, during and after the workshop as shown in Table 2. This report is a first step in communicating the enthusiasm, understanding and sense of direction that was developed during the workshop. All participants contributed to this report in breakout groups, the final panel, and many informal discussions as well as via email lists, the wiki and tweeting. The input of the 30 speakers-see Table 3-is directly incorporated in the report, particularly in Chapters 1 to 4, which correspond approximately to the timetables of Monday's to Thursday's programme. These days viewed data-intensive research from the viewpoints of: (a) introduction to and the context of data-intensive research, (b) challenges emerging from the increasing volumes and sources of data, (c) challenges arising from the complexity of data, and (d) challenges in supporting researchers interacting with data. Friday's programme brought together all of the activities during the week to digest and summarise them, to consolidate and review our understanding, and to initiate the production of this report. It was a primary input into Chapter 5. At the outset the organisers had planned to stimulate bridge-building between technical and discipline silos, by clustering challenges and disciplines into days and by setting up cross-cutting themes that ran throughout the week. The matrix thus formed, with an additional row to consider social and ethical issues that emerged during the workshop, is shown in Table 4. Ruth McNally kindly agreed to be co-opted into the editorial team to take care of that theme. There were two other nascent themes: (a) text-mining applications, particularly the integration of data from text with other data, and (b) training and ramps to better enable the adoption of dataintensive methods and the appreciation of data-intensive results. If anyone would wish to develop a theme section covering these, they will be added to Chapter 5. Whilst the speakers and other participants produced most of the ideas, except where we explicitly quote a person or group, the editors take full responsibility for the selection and presentation of the text and figures in this report. We would like to acknowledge the support of the e-Science Institute's events team and technical support staff who provided a smoothly run environment conducive to research discussions during the preparatory period and throughout the week. We also thank Jo Newman,

Special issue for data intensive eScience

Distributed and Parallel Databases, 2012

Data-intensive science is enabled by the data deluge and has been called the "fourth paradigm" of scientific discovery. New fields are being born, such as drug discovery based on large scale study of correlations in published papers and climate implications from data on the accelerating pace of changes in previously quiescent ice sheets. Astronomy and the search for fundamental particles at the Large Hadron Collider drive mainstream aspects of data-intensive science with many petabytes of data derived from large advanced instruments. Medical imagery and genomics have as much data but from a slew of distributed instruments. It is projected that there will be 24 billion devices on the Internet by 2020. Most of this "Internet of Things" will be small sensors that produce streams of information which will be processed and integrated with other streams and turned into knowledge. The deluge and its impact are pervasive. In synergy with moving to a data-driven world, we are also in the midst of an evolution in the compute landscape of hardware systems. We now live in the world of massive multi-core and GPU processing systems, very large main memory systems, fast networking components, fast solid state drive, and large data centers that consume massive amounts of energy. Computing paradigm is changing and suggesting new programming models, new data structures and more attention to fault tolerance while enabling much easier access to computing. It is clear that many aspects of how we have dealt with data processing have to change in this new world.

A Roadmap for Building Data Science Capacity for Health Discovery and Innovation in Africa

Frontiers in Public Health, 2021

Technological advances now make it possible to generate diverse, complex and varying sizes of data in a wide range of applications from business to engineering to medicine. In the health sciences, in particular, data are being produced at an unprecedented rate across the full spectrum of scientific inquiry spanning basic biology, clinical medicine, public health and health care systems. Leveraging these data can accelerate scientific advances, health discovery and innovations. However, data are just the raw material required to generate new knowledge, not knowledge on its own, as a pile of bricks would not be mistaken for a building. In order to solve complex scientific problems, appropriate methods, tools and technologies must be integrated with domain knowledge expertise to generate and analyze big data. This integrated interdisciplinary approach is what has become to be widely known as data science. Although the discipline of data science has been rapidly evolving over the past c...

Engaging a Data Revolution: Open Science Data Hubs and the New Role for Universities in Africa

Open Information Science, 2019

This paper presents a new ideology for engaging Africa in a data revolution. It explores the idea of creating open-science-data-hubs (OSDH) at flag-ship universities in Africa to preserve and share both internally and externally produced data. Although limited in the technical aspect, the objective here is to explore the pragmatism of how and why such an endeavor in Africa should be undertaken. This paper argues that the African university is uniquely placed to play this new role in today’s technological world and discusses the characteristics and foundational pillars necessary to set up such a program. The arguments provided here challenge Africa to be smart and adopt clever solutions to their data generation, collection and access problems, by finding value and a new role in the intellectuals and institutions of higher learning and in the necessity to involve them in the generation, preservation and sharing of data and knowledge that can be used in the policy formulation process.

The African Open Science Platform: The Future of Science and the Science of the Future

2018

The African Open Science Platform. The Platform's mission is to put African scientists at the cutting edge of contemporary, data-intensive science as a fundamental resource for a modern society. Its building blocks are: • a federated hardware, communications and software infrastructure, including policies and enabling practices, to support Open Science in the digital era; • a network of excellence in Open Science that supports scientists & other societal actors in accumulating and using modern data resources to maximise scientific, social and economic benefit. These objectives will be realised through seven related strands of activity: Strand 0: Register & portal for African & related international data collections & services. Strand 1: A federated network of computational facilities and services. Strand 2: Software tools & advice on policies & practices of research data management. Strand 3: A Data Science Institute at the cutting edge of data analytics and AI.