Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological Ontology (original) (raw)

Images are paramount in documentation of morphological data. Production and reproduction costs have traditionally limited how many illustrations taxonomy could afford to publish, and much comparative knowledge continues to be lost as generations turn over. Now digital images are cheaply produced and easily disseminated electronically but pose problems in maintenance, curation, sharing, and use, particularly in long-term data sets involving multiple collaborators and institutions. We propose an efficient linkage of images to phylogenetic data sets via an ontology of morphological terms; an underlying, fine-grained database of specimens, images, and associated metadata; fixation of the meaning of morphological terms (homolog names) by ostensive references to particular taxa; and formalization of images as standard views. The ontology provides the intellectual structure and fundamental design of the relationships and enables intelligent queries to populate phylogenetic data sets with images. The database itself documents primary morphological observations, their vouchers, and associated metadata, rather than the conventional data set cell, and thereby facilitates data maintenance despite character redefinition or specimen reidentification. It minimizes reexamination of specimens, loss of information or data quality, and echoes the data models of web-based repositories for images, specimens, and taxonomic names. Confusion and ambiguity in the meanings of technical morphological terms are reduced by ostensive definitions pointing to features in particular taxa, which may serve as reference for globally unique identifiers of characters. Finally, the concept of standard views (an image illustrating one or more homologs in a specific sex and life stage, in a specific orientation, using a specific device and preparation technique) enables efficient, dynamic linkage of images to the data set and automatic population of matrix cells with images independently of scoring decisions.