The value of data: considering the context of production in data economies (original) (raw)

The value of data

Proceedings of the ACM 2011 conference on Computer supported cooperative work - CSCW '11, 2011

In this paper we argue that how scientific collaborations share data is bound up in the ways in which they produce and acquire that data. We draw on ethnographic work with two robotic space exploration teams to show how each community's norms of "data-sharing" are best understood as arising not from the context of the use or exchange of data, but from the context of data production. Shifting our perspective back to the point of production suggests that digital artifacts are embedded in a broader data economy.

Opening up a Dark Habitat and Opening up Data: The Co-emergence of Scientific Collaboration, Infrastructure for Data-sharing, and Data-sharing Practices

British Sociological Association Annual Conference 2014

"Allied to the movement promoting Open Access publishing is the Open Data movement, which aims to facilitate and encourage the open sharing of research data amongst scientists across multiple disciplines and institutions. Studies of scientists’ data practices link barriers to data-sharing with lack of appropriate infrastructure, cultural issues regarding norms and reward structures, and lack of trust amongst researchers. However, there have been fewer studies of actual instances of successful data-sharing. Furthermore, little attention has been paid to the implications of successful data-sharing for the structures of collaborative scientific work. This paper will present findings from a longitudinal ethnographic case study of a large, distributed, multidisciplinary collaborative project studying subseafloor microbial life. This project aims to build a community involving researchers from disparate backgrounds, and to develop infrastructure for the exchange of knowledge, methods, and data. This case study is therefore an ideal opportunity for studying the co-emergence of scientific collaboration, infrastructure for data-sharing, and data-sharing practices. By carefully deconstructing a single observed instance of data-sharing, this paper shows that the sharing of data between researchers in different institutions or disciplines is a rare and fragile accomplishment that involves the interplay of multiple factors. These include high levels of trust between researchers, alignment of researchers’ interests, opportunism in exploiting possibilities afforded by infrastructures, and all this underpinned by serendipity. Conversely, when data-sharing does occur, it promotes new scientific work across disciplinary and institutional boundaries, reconfiguring the structure of the collaboration."

Who’s Got the Data? Interdependencies in Science and Technology Collaborations

Computer Supported Cooperative Work (CSCW), 2012

Science and technology always have been interdependent, but never more so than with today's highly instrumented data collection practices. We report on a long-term study of collaboration between environmental scientists (biology, ecology, marine sciences), computer scientists, and engineering research teams as part of a five-university distributed science and technology research center devoted to embedded networked sensing. The science and technology teams go into the field with mutual interests in gathering scientific data. "Data" are constituted very differently between the research teams. What are data to the science teams may be context to the technology teams, and vice versa. Interdependencies between the teams determine the ability to collect, use, and manage data in both the short and long terms. Four types of data were identified, which are managed separately, limiting both reusability of data and replication of research. Decisions on what data to curate, for whom, for what purposes, and for how long, should consider the interdependencies between scientific and technical processes, the complexities of data collection, and the disposition of the resulting data.

Working Data Together: The Accountability and Reflexivity of Digital Astronomical Practice

Social Studies of Science, 2014

Drawing on ethnomethodology, this article considers the sequential work of astronomers who combine observations from telescopes at two observatories in making a data set for scientific analyses. By witnessing the induction of a graduate student into this work, it aims at revealing the backgrounded assumptions that enter it. I find that these researchers achieved a consistent data set by engaging diverse evidential contexts as contexts of accountability. Employing graphs that visualize data in conventional representational formats of observational astronomy, experienced practitioners held each other accountable by using an ‘implicit cosmology’, a shared (but sometimes negotiable) characterization of ‘what the universe looks like’ through these formats. They oriented to data as malleable, that is, as containing artifacts of the observing situation which are unspecified initially but can be defined and subsequently removed. Alternating between reducing data and deducing astronomical phenomena, they ascribed artifacts to local observing conditions or computational procedures, thus maintaining previously stabilized phenomena reflexively. As researchers in data-intensive sciences are often removed from the instruments that generated the data they use, this example demonstrates how scientists can achieve agreement by engaging stable ‘global’ data sets and diverse contexts of accountability, allowing them to bypass troubling features and limitations of data generators.

If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology

PLoS ONE, 2013

Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are research domains in which data tend to be local in character, minimally structured, and minimally documented. We report on a ten-year study of the Center for Embedded Network Sensing (CENS), a National Science Foundation Science and Technology Center. We found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data. Few repositories exist to accept data in CENS research areas.. Data sharing tends to occur only through interpersonal exchanges. CENS researchers obtain data from repositories, and occasionally from registries and individuals, to provide context, calibration, or other forms of background for their studies. Neither CENS researchers nor those who request access to CENS data appear to use external data for primary research questions or for replication of studies. CENS researchers are willing to share data if they receive credit and retain first rights to publish their results. Practices of releasing, sharing, and reusing of data in CENS reaffirm the gift culture of scholarship, in which goods are bartered between trusted colleagues rather than treated as commodities.

Domesticating data: Traveling and value-making in the data economy

Social Studies of Science, 2023

Data are versatile objects that can travel across contexts. While data's travels have been widely discussed, little attention has been paid to the sites from where and to which data flow. Drawing upon ethnographic fieldwork in two connected data-intensive laboratories and the concept of domestication, we explore what it takes to bring data 'home' into the laboratory. As data come and dwell in the home, they are made to follow rituals, and as a result, data are reshaped and form ties with the laboratory and its practitioners. We identify four main ways of domesticating data. First, through storytelling about the data's origins, data practitioners draw the boundaries of their laboratory. Second, through standardization, staff transform samples into digital data that can travel well while ruling what data can be let into the home. Third, through formatting, data practitioners become familiar with their data and at the same time imprint the data, thus making them belong to their home. Finally, through cultivation, staff turn data into a resource for knowledge production. Through the lens of domestication, we see the data economy as a collection of homes connected by flows, and it is because data are tamed and attached to homes that they become valuable knowledge tools. Such domestication practices also have broad implications for staff, who in the process of 'homing' data, come to belong to the laboratory. To conclude, we reflect on what these domestication processes-which silence unusual behaviours in the data-mean for the knowledge produced in data-intensive research.

Science Friction: Data, Metadata, and Collaboration

When scientists from two or more disciplines work together on related problems, they often face what we call 'science friction'. As science becomes more data-driven, collaborative, and interdisciplinary, demand increases for interoperability among data, tools, and services. Metadata -usually viewed simply as 'data about data', describing objects such as books, journal articles, or datasets -serve key roles in interoperability. Yet we find that metadata may be a source of friction between scientific collaborators, impeding data sharing. We propose an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product. We report examples of highly useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in our ethnographic studies of several large projects in the environmental sciences. Based on this Social Studies of Science 41 evidence, we argue that while metadata products can be powerful resources, usually they must be supplemented with metadata processes. Metadata-as-process suggests the very large role of the ad hoc, the incomplete, and the unfinished in everyday scientific work.

Data Management and Data Sharing in Science and Technology Studies

Science, Technology, & Human Values

This paper presents reports on discussions among an international group of science and technology studies (STS) scholars who convened at the US National Science Foundation (January 2015) to think about data sharing and open STS. The first report, which reflects discussions among members of the Society for Social Studies of Science (4S), relates the potential benefits of data sharing and open science for STS. The second report, which reflects discussions among scholars from many professional STS societies (i.e., European Association for the Study of Science and Technology [ EASST], 4S, Society for the History of Technology [ SHOT], History of Science Society [ HSS], and Philosophy of Science Association [ PSA]), focuses on practical and conceptual issues related to managing, storing, and curating STS data. As is the case for all reports of such open discussions, a scholar’s presence at the meeting does not necessarily mean that they agree with all aspects of the text to follow.

WhatAreData? The Many Kinds of Data and Their Implications for Data Re-Use

Journal of Computer-Mediated Communication, 2007

One key feature of e-science is to encourage archiving and release of data so that they are available in digitally-processable forms for re-use almost from the point of collection. This assumes particular processes of translation by which data can be made visible in transportable and intelligible forms. It also requires mechanisms by which data quality and provenance can be trusted once ''disconnected'' from their producers. By analyzing the ''life stages'' of data in four academic projects, we show that these requirements create difficulties for disciplines where tacit knowledge and craft-like methods are deeply embedded in researchers, as well as for disciplines producing non-digital heterogeneous data or data derived from people rather than from material phenomena. While craft practices and tacit knowledges are a feature of most scientific endeavors, some disciplines currently appear more inclined to attempt to formalize or at least record these knowledges. We discuss the implications this has for the e-science objective of widespread data re-use.