Beckett Sterner | Arizona State University (original) (raw)

Uploads

Papers by Beckett Sterner

Research paper thumbnail of Taxonomy for Humans or Computers? Cognitive Pragmatics for Big Data

Criticism of big data has focused on showing that more is not necessarily better, in the sense th... more Criticism of big data has focused on showing that more is not necessarily better, in the sense that data may lose their value when taken out of context and aggregated together. The next step is to incorporate an awareness of pitfalls for aggregation into the design of data infrastructure and institutions. A common strategy minimizes aggregation errors by increasing the precision of our conventions for identifying and classifying data. As a counterpoint, we argue that there are pragmatic trade-o s between precision and ambiguity that are key to designing effective solutions for generating big data about biodiversity. We focus on the importance of theory-dependence as a source of ambiguity in taxonomic nomenclature and hence a persistent challenge for implementing a single, long-term solution to storing and accessing meaningful sets of biological specimens. We argue that ambiguity does have a positive role to play in scientific progress as a tool for effciently symbolizing multiple aspects of taxa and mediating between conflicting hypotheses about their nature. Pursuing a deeper under- standing of the trade-o s and synthesis of precision and ambiguity as virtues of scientific language and communication systems then offers a productive next step for realizing sound, big biodiversity data services.

Research paper thumbnail of Moving Past the Systematics Wars

It is time to escape the constraints of the Systematics Wars narrative and pursue new questions t... more It is time to escape the constraints of the Systematics Wars narrative and pursue new questions that are better positioned to establish the relevance of the field in this time period to broader issues in the history of biology and history of science. To date, the underlying assumptions of the Systematics Wars narrative have led historians to prioritize theory over practice and the conflicts of a few leading theorists over the less-polarized interactions of systematists at large. We show how shifting to a practice-oriented view of methodology, centered on the trajectory of mathematization in systematics, demonstrates problems with the common view that one camp (cladistics) straightforwardly ''won'' over the other (phenetics). In particular, we critique David Hull's historical account in Science as a Process by demonstrating exactly the sort of intermediate level of positive sharing between phenetic and cladistic theories that undermines their mutually exclusive individuality as conceptual systems over time. It is misleading, or at least inadequate, to treat them simply as holistically opposed theories that can only interact by competition to the death. Looking to the future, we suggest that the concept of workflow provides an important new perspective on the history of mathematization and computerization in biology after World War II.

Research paper thumbnail of Pathways to Pluralism about Biological Individuality

What are the prospects for a monistic view of biological individuality given the multiple epistem... more What are the prospects for a monistic view of biological individuality given the multiple epistemic roles the concept must satisfy? In this paper, I examine the epistemic adequacy of two recent accounts based on the capacity to undergo natural selection. One is from Ellen Clarke, and the other is by Peter Godfrey- Smith. Clarke’s position reflects a strong monism, in that she aims to characterize individuality in purely functional terms and refrains from privileging any specific material properties as important in their own right. I argue that Clarke’s functionalism impairs the epistemic adequacy of her account compared to a middle-ground position taken by Godfrey-Smith. In comparing Clarke and Godfrey-Smith’s account, two pathways emerge to pluralism about biological individuality. The first develops from the contrast between functionalist and materialist approaches, and the second from an underlying temporal structure involved in using evolutionary processes to define individuality.

Research paper thumbnail of The Normative Structure of Mathematization in Systematic Biology

Studies in the History and Philosophy of Biological and Biomedical Sciences, Apr 2014

We argue that the mathematization of science should be understood as a normative activity of advo... more We argue that the mathematization of science should be understood as a normative activity of advocating for a particular methodology with its own criteria for evaluating good research. As a case study, we examine the mathematization of taxonomic classification in systematic biology. We show how mathematization is a normative activity by contrasting its distinctive features in numerical taxonomy in the 1960s with an earlier reform advocated by Ernst Mayr starting in the 1940s. Both Mayr and the numerical taxonomists sought to formalize the work of classification, but Mayr introduced a qualitative formalism based on human judgment for determining the taxonomic rank of populations, while the numerical taxonomists introduced a quantitative formalism based on automated procedures for computing classifications. The key contrast between Mayr and the numerical taxonomists is how they conceptualized the temporal structure of the workflow of classification, specifically where they allowed meta-level discourse about difficulties in producing the classification.

Research paper thumbnail of The Epistemology of Causal Selection: Insights from Systems Biology

Causal Reasoning in Biology, Minnesota Studies in Philosophy of Science

Among the many causes for an event, how do we distinguish the important ones? Are there ways to d... more Among the many causes for an event, how do we distinguish the important ones? Are there ways to distinguish among causes on principled grounds that integrate both practical aims and objective knowledge? Psychologist Tania Lombrozo has suggested that causal explanations “identify factors that are ‘exportable’ in the sense that they are likely to subserve future prediction and intervention” (Lombrozo 2010, 327). Hence portable causes are more important precisely because they provide objective information to prediction and intervention as practical aims. However, I argue that this is only part of the epistemological dimension of causal selection. Recent work on portable causes has implicitly assumed them to be portable within the same causal system at a later time. As a result, it has appeared that the objective content of causal selection includes only facts about the causal structure of that single system. In contrast, I present a case study from systems biology in which scientists are searching for causal factors that are portable across rather than within causal systems. By paying careful attention to how these biologists find portable causes, I show that the objective content of causal selection can extend beyond the immediate systems of interest. In particular, knowledge of the evolutionary history of gene networks is necessary for correctly identifying causal patterns in these networks that explain cellular behavior in a portable way.

Research paper thumbnail of The Practical Value of Biological Information for Research

Philosophy of Science, 2014

Until recently, there has been a general skepticism among philosophers about the practical scient... more Until recently, there has been a general skepticism among philosophers about the practical scientific value of biological information as a concept. Criticism appeared to pin biological information from multiple directions, such as the inadequacy of information theory for grounding semantic properties and the diminishing relevance of the Central Dogma. In the past several years, some philosophers have proposed a more positive view of ascribing information as an exercise in scientific modeling. I argue for an alternative and complementary role for biological information in guiding empirical data collection for the sake of evolutionary theorizing. Carl Bergstrom and Martin Rosvall have recently made a similar claim in their proposed transmission account (Bergstrom and Rosvall 2011a; Bergstrom and Rosvall 2011b). I clarify and expand on their suggestion that one could take a “diagnostic” approach to biological information, in which the concept is defined operationally in terms of a procedure for collecting empirical cases. I suggest that skepticism about the concept’s practical value originated in a misplaced theory-centrism that is still perpetuated in ways by the more recent modeling-based accounts.

Research paper thumbnail of Well-Structured Biology: Numerical Taxonomy and Its Methodological Vision for Systematics

The Evolution of Phylogenetic Systematics, 2013

What does it look like when a group of scientists set out to re-envision an entire field of biolo... more What does it look like when a group of scientists set out to re-envision an entire field of biology in symbolic and formal terms? I analyze the founding and articulation of Numerical Taxonomy between 1950 and 1970, the period when it set out a radical new approach to classification and founded a tradition of mathematics in systematic biology. I argue that introducing mathematics in a comprehensive way also requires re-organizing the daily work of scientists in the field. Numerical taxonomists sought to establish a mathematical method for classification that was universal to every type of organism, and I argue this intrinsically implicated them in a qualitative re-organization of the work of all systematists. I also discuss how Numerical Taxonomy’s re-organization of practice became entrenched across systematic biology even as opposing schools produced their own competing mathematical methods. In this way, the structure of the work process became more fundamental than the methodological theories that motivated it.

Research paper thumbnail of Object Spaces: An Organizing Strategy for Biological Theorizing

Biological Theory, 2009

A classic analytic approach to biological phenomena seeks to refine definitions until classes are... more A classic analytic approach to biological phenomena seeks to refine definitions until classes are sufficiently homogenous to support prediction and explanation, but this approach founders on cases where a single process produces objects with similar forms but heterogeneous behaviors. I introduce object spaces as a tool to tackle this challenging diversity of biological objects in terms of causal processes with well-defined formal properties. Object spaces have three primary components: (1) a combinatorial biological process such as protein synthesis that generates objects with parts that are modular, independent, and organized according to an invariant syntax; (2) a notion of “distance” that relates the objects according to rules of change over time as found in nature or useful for algorithms; (3) mapping functions defined on the space that map its objects to other spaces or apply an evaluative criterion to measure an important quality, such as parsimony or biochemical function. Once defined, an object space can be used to represent and simulate the dynamics of phenomena on multiple scales; it can also be used as a tool for predicting higher-order properties of the objects, including stitching together series of causal processes. Object spaces are the basis for a strategy of theorizing, discovery, and analysis in biology: as heuristic idealizations of biology, they help us transform inchoate, intractable problems into articulated, well-structured ones. Developing an object space is a research strategy with a long, successful history under many other names, and it offers a unifying but not overreaching approach to biological theory.

Research paper thumbnail of Discriminative learning for protein conformation sampling

Protein structure prediction without using templates (i.e., ab initio folding) is one of the most... more Protein structure prediction without using templates (i.e., ab initio folding) is one of the most challenging problems in structural biology. In particular, conformation sampling poses as a major bottleneck of ab initio folding. This article presents CRFSampler, an extensible protein conformation sampler, built on a probabilistic graphical model Conditional Random Fields (CRFs). Using a discriminative learning method, CRFSampler can automatically learn more than ten thousand parameters quantifying the relationship among primary sequence, secondary structure, and (pseudo) backbone angles. Using only compactness and self-avoiding constraints, CRFSampler can efficiently generate protein-like conformations from primary sequence and predicted secondary structure. CRFSampler is also very flexible in that a variety of model topologies and feature sets can be defined to model the sequence-structure relationship without worrying about parameter estimation. Our experimental results demonstrate that using a simple set of features, CRFSampler can generate decoys with much higher quality than the most recent HMM model.

Research paper thumbnail of Predicting and Annotating Catalytic Residues: An Information Theoretic Approach

We introduce a computational method to predict and annotate the catalytic residues of a protein u... more We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.

Research paper thumbnail of Taxonomy for Humans or Computers? Cognitive Pragmatics for Big Data

Criticism of big data has focused on showing that more is not necessarily better, in the sense th... more Criticism of big data has focused on showing that more is not necessarily better, in the sense that data may lose their value when taken out of context and aggregated together. The next step is to incorporate an awareness of pitfalls for aggregation into the design of data infrastructure and institutions. A common strategy minimizes aggregation errors by increasing the precision of our conventions for identifying and classifying data. As a counterpoint, we argue that there are pragmatic trade-o s between precision and ambiguity that are key to designing effective solutions for generating big data about biodiversity. We focus on the importance of theory-dependence as a source of ambiguity in taxonomic nomenclature and hence a persistent challenge for implementing a single, long-term solution to storing and accessing meaningful sets of biological specimens. We argue that ambiguity does have a positive role to play in scientific progress as a tool for effciently symbolizing multiple aspects of taxa and mediating between conflicting hypotheses about their nature. Pursuing a deeper under- standing of the trade-o s and synthesis of precision and ambiguity as virtues of scientific language and communication systems then offers a productive next step for realizing sound, big biodiversity data services.

Research paper thumbnail of Moving Past the Systematics Wars

It is time to escape the constraints of the Systematics Wars narrative and pursue new questions t... more It is time to escape the constraints of the Systematics Wars narrative and pursue new questions that are better positioned to establish the relevance of the field in this time period to broader issues in the history of biology and history of science. To date, the underlying assumptions of the Systematics Wars narrative have led historians to prioritize theory over practice and the conflicts of a few leading theorists over the less-polarized interactions of systematists at large. We show how shifting to a practice-oriented view of methodology, centered on the trajectory of mathematization in systematics, demonstrates problems with the common view that one camp (cladistics) straightforwardly ''won'' over the other (phenetics). In particular, we critique David Hull's historical account in Science as a Process by demonstrating exactly the sort of intermediate level of positive sharing between phenetic and cladistic theories that undermines their mutually exclusive individuality as conceptual systems over time. It is misleading, or at least inadequate, to treat them simply as holistically opposed theories that can only interact by competition to the death. Looking to the future, we suggest that the concept of workflow provides an important new perspective on the history of mathematization and computerization in biology after World War II.

Research paper thumbnail of Pathways to Pluralism about Biological Individuality

What are the prospects for a monistic view of biological individuality given the multiple epistem... more What are the prospects for a monistic view of biological individuality given the multiple epistemic roles the concept must satisfy? In this paper, I examine the epistemic adequacy of two recent accounts based on the capacity to undergo natural selection. One is from Ellen Clarke, and the other is by Peter Godfrey- Smith. Clarke’s position reflects a strong monism, in that she aims to characterize individuality in purely functional terms and refrains from privileging any specific material properties as important in their own right. I argue that Clarke’s functionalism impairs the epistemic adequacy of her account compared to a middle-ground position taken by Godfrey-Smith. In comparing Clarke and Godfrey-Smith’s account, two pathways emerge to pluralism about biological individuality. The first develops from the contrast between functionalist and materialist approaches, and the second from an underlying temporal structure involved in using evolutionary processes to define individuality.

Research paper thumbnail of The Normative Structure of Mathematization in Systematic Biology

Studies in the History and Philosophy of Biological and Biomedical Sciences, Apr 2014

We argue that the mathematization of science should be understood as a normative activity of advo... more We argue that the mathematization of science should be understood as a normative activity of advocating for a particular methodology with its own criteria for evaluating good research. As a case study, we examine the mathematization of taxonomic classification in systematic biology. We show how mathematization is a normative activity by contrasting its distinctive features in numerical taxonomy in the 1960s with an earlier reform advocated by Ernst Mayr starting in the 1940s. Both Mayr and the numerical taxonomists sought to formalize the work of classification, but Mayr introduced a qualitative formalism based on human judgment for determining the taxonomic rank of populations, while the numerical taxonomists introduced a quantitative formalism based on automated procedures for computing classifications. The key contrast between Mayr and the numerical taxonomists is how they conceptualized the temporal structure of the workflow of classification, specifically where they allowed meta-level discourse about difficulties in producing the classification.

Research paper thumbnail of The Epistemology of Causal Selection: Insights from Systems Biology

Causal Reasoning in Biology, Minnesota Studies in Philosophy of Science

Among the many causes for an event, how do we distinguish the important ones? Are there ways to d... more Among the many causes for an event, how do we distinguish the important ones? Are there ways to distinguish among causes on principled grounds that integrate both practical aims and objective knowledge? Psychologist Tania Lombrozo has suggested that causal explanations “identify factors that are ‘exportable’ in the sense that they are likely to subserve future prediction and intervention” (Lombrozo 2010, 327). Hence portable causes are more important precisely because they provide objective information to prediction and intervention as practical aims. However, I argue that this is only part of the epistemological dimension of causal selection. Recent work on portable causes has implicitly assumed them to be portable within the same causal system at a later time. As a result, it has appeared that the objective content of causal selection includes only facts about the causal structure of that single system. In contrast, I present a case study from systems biology in which scientists are searching for causal factors that are portable across rather than within causal systems. By paying careful attention to how these biologists find portable causes, I show that the objective content of causal selection can extend beyond the immediate systems of interest. In particular, knowledge of the evolutionary history of gene networks is necessary for correctly identifying causal patterns in these networks that explain cellular behavior in a portable way.

Research paper thumbnail of The Practical Value of Biological Information for Research

Philosophy of Science, 2014

Until recently, there has been a general skepticism among philosophers about the practical scient... more Until recently, there has been a general skepticism among philosophers about the practical scientific value of biological information as a concept. Criticism appeared to pin biological information from multiple directions, such as the inadequacy of information theory for grounding semantic properties and the diminishing relevance of the Central Dogma. In the past several years, some philosophers have proposed a more positive view of ascribing information as an exercise in scientific modeling. I argue for an alternative and complementary role for biological information in guiding empirical data collection for the sake of evolutionary theorizing. Carl Bergstrom and Martin Rosvall have recently made a similar claim in their proposed transmission account (Bergstrom and Rosvall 2011a; Bergstrom and Rosvall 2011b). I clarify and expand on their suggestion that one could take a “diagnostic” approach to biological information, in which the concept is defined operationally in terms of a procedure for collecting empirical cases. I suggest that skepticism about the concept’s practical value originated in a misplaced theory-centrism that is still perpetuated in ways by the more recent modeling-based accounts.

Research paper thumbnail of Well-Structured Biology: Numerical Taxonomy and Its Methodological Vision for Systematics

The Evolution of Phylogenetic Systematics, 2013

What does it look like when a group of scientists set out to re-envision an entire field of biolo... more What does it look like when a group of scientists set out to re-envision an entire field of biology in symbolic and formal terms? I analyze the founding and articulation of Numerical Taxonomy between 1950 and 1970, the period when it set out a radical new approach to classification and founded a tradition of mathematics in systematic biology. I argue that introducing mathematics in a comprehensive way also requires re-organizing the daily work of scientists in the field. Numerical taxonomists sought to establish a mathematical method for classification that was universal to every type of organism, and I argue this intrinsically implicated them in a qualitative re-organization of the work of all systematists. I also discuss how Numerical Taxonomy’s re-organization of practice became entrenched across systematic biology even as opposing schools produced their own competing mathematical methods. In this way, the structure of the work process became more fundamental than the methodological theories that motivated it.

Research paper thumbnail of Object Spaces: An Organizing Strategy for Biological Theorizing

Biological Theory, 2009

A classic analytic approach to biological phenomena seeks to refine definitions until classes are... more A classic analytic approach to biological phenomena seeks to refine definitions until classes are sufficiently homogenous to support prediction and explanation, but this approach founders on cases where a single process produces objects with similar forms but heterogeneous behaviors. I introduce object spaces as a tool to tackle this challenging diversity of biological objects in terms of causal processes with well-defined formal properties. Object spaces have three primary components: (1) a combinatorial biological process such as protein synthesis that generates objects with parts that are modular, independent, and organized according to an invariant syntax; (2) a notion of “distance” that relates the objects according to rules of change over time as found in nature or useful for algorithms; (3) mapping functions defined on the space that map its objects to other spaces or apply an evaluative criterion to measure an important quality, such as parsimony or biochemical function. Once defined, an object space can be used to represent and simulate the dynamics of phenomena on multiple scales; it can also be used as a tool for predicting higher-order properties of the objects, including stitching together series of causal processes. Object spaces are the basis for a strategy of theorizing, discovery, and analysis in biology: as heuristic idealizations of biology, they help us transform inchoate, intractable problems into articulated, well-structured ones. Developing an object space is a research strategy with a long, successful history under many other names, and it offers a unifying but not overreaching approach to biological theory.

Research paper thumbnail of Discriminative learning for protein conformation sampling

Protein structure prediction without using templates (i.e., ab initio folding) is one of the most... more Protein structure prediction without using templates (i.e., ab initio folding) is one of the most challenging problems in structural biology. In particular, conformation sampling poses as a major bottleneck of ab initio folding. This article presents CRFSampler, an extensible protein conformation sampler, built on a probabilistic graphical model Conditional Random Fields (CRFs). Using a discriminative learning method, CRFSampler can automatically learn more than ten thousand parameters quantifying the relationship among primary sequence, secondary structure, and (pseudo) backbone angles. Using only compactness and self-avoiding constraints, CRFSampler can efficiently generate protein-like conformations from primary sequence and predicted secondary structure. CRFSampler is also very flexible in that a variety of model topologies and feature sets can be defined to model the sequence-structure relationship without worrying about parameter estimation. Our experimental results demonstrate that using a simple set of features, CRFSampler can generate decoys with much higher quality than the most recent HMM model.

Research paper thumbnail of Predicting and Annotating Catalytic Residues: An Information Theoretic Approach

We introduce a computational method to predict and annotate the catalytic residues of a protein u... more We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.