The Species Analyst Project (original) (raw)

SEARCH
Advanced Search


ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors


NEWS
Cover Stories
Articles & Papers
Press Releases


CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG


TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps


EVENTS


LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic

The Species Analyst Project

[February 17, 2001] The Species Analyst based at the University of Kansas Natural History Museum and Biodiversity Research Center "is a research project developing standards and software tools for access to the world's natural history collection and observation databases. The primary mechanism for accessing these data is currently through this web site at the Species Analyst Web Interface. Alternative tools are being developed that permit direct access to these data. The tools are in active development, but beta versions may be retrieved from the Species Analyst technical development web site. The Species Analyst uses XML for all its native data caching and retrieval from the remote data sources; the remote data sources are still mostly Z39.50 servers."

Project goals: "The Species Analyst is a research project developing standards, agreements and software tools facilitating the discovery, exchange, use, and analysis of natural history specimen records and observations. Natural history collections are a valuable resource that combined together, provide approximately 3 billion records documenting the global distribution of species over the last 300 years or so. This information is housed in a large number of collections, most of which are using different database management systems, different database schemas, and varying degrees of capture into electronic form. The standards and software developed on the Species Analyst project provides a mechanism for combining most of theses collection databases into a single, coherent, and readily searchable virtual database."

The Species Analyst relies heavily upon the fusion of the ANSI/NISO Z39.50 standard for information retrieval (ISO 23950) and XML. Z39.50 provides an excellent framework for distributed query and retrieval of information both within and across information domains. However, its use is restrictive because of the somewhat obscure nature of it's implementation. All of the tools used by the Species Analyst transform Z39.50 result sets into an XML format that is convenient to process further, either for viewing or data extraction. This fusion of Z39.50 and XML brings standards based information retrieval to the desktop by extending the capabilities of existing tools that users are familiar with such as Microsoft's Internet Explorer and Excel and ESRI's ArcView.

ITIS XML: "The Species Analyst utilizes the information contained in the ITIS*ca database in a variety of ways. The contents of this folder provide information about the XML output option of ITIS and how to extract data from the ITIS XML output. ITIS*ca is the Canadian implementation of the Integrated Taxonomic Information System (ITIS). The goal in this subproject is to define an XML interface to ITIS that may be utilized for programmatic access to the contents of ITIS database implementations. As of June 30, 2000, a new ITIS XML standard was under development.

Mechanism for Searching the ITIS*ca Database: "The ITIS*ca database is accessible through HTTP, hence queries are formatted as URLs. A general query syntax is not yet available for searching the database, however there are two mechanisms for locating taxon records in XML format from ITIS*ca. The first is by using a taxon name or vernacular name as the search term. The second is by using a Taxonomic Serial Number (TSN). TSN searches may also be used to locate parent and child records of a given TSN, which is convenient for navigating the taxonomic database. ZX Sample Scripts for loading an XML document from a Z39.50 target using ZX: "Using ZX, one can perform a complete Z39.50 search and retrieval simply by specifying a URL. The resulting document can be easily reformatted using XSL or the data elements may be extracted using XSL or by using the XMLDOM. This simple piece of JavaScript shows how to load an XML document into the XMLDOM for further processing. Note that the URL can be any URL that generates a valid XML document..."

Version 1.3 of the Darwin Core profile for natural history collections and observation data sets. The Darwin Core (DwC) is "a profile describing the minimum set of standards for search and retrieval of natural history collections and observation databases. Natural history collections and observation data sets represent sets of observations, with each record detailing the observation of an organism, ideally at a specific geo-temporal location. In the case of collections, the observation is permanent in that the organism was collected from the field and preserved in a curated collection intended to last indefinitely. Collected specimens can be prepared in various ways, and several preparations from a single organism are not unusual (skin, skeleton, and perhaps microscope slides), thus there may be several records for a single organism, each representing the organism prepared using different techniques, but all records refering to a single observation event. Conversely, some collection records may represent a collection object that contains many organisms. For example, in icthyology, where the contents of a trawl may be sorted by taxon and lumped into a single collection container. Observation data sets catalog the observation of an organism, also at a specific geo-temporal location, but in this case the organism observed is not collected, and hence the observation record is the only information recorded about the organism. In both cases a taxonomic identification of the organism is attempted, with obvious consequences for accuracy of identification (a specimen available for identification to several experts compared with a potentially fleeting glimpse of an organism in the field)... Record content is also defined in this profile along with suggested syntaxes for encoding result set records in GRS-1, XML, SUTRS, and MARC. Although the profile was originally intended for use with the Z39.50 protocol through Z39.50 servers such as ZBigServer, it is also applicable for defining searches and XML content generated by databases served using HTTP, such as with ZHTTP."

Becoming a Data Provider: "Any institution with specimen or observation collections is welcome to participate in the public distribution of their records using the components available through the Species Analyst. The server software and client tools are available free of charge for non-commercial use. To install ZBigServer is a fairly straight forward process that typically requires a "one-two cups of coffee" period of time. If you are familiar with your database, how it works, and the content of the database is in reasonably good shape, then the whole process can take as little as 15 minutes. Since The Species Analyst clients all generate XML output, there is no technical reason you can not use a web server to provide the same basic functionality as ZBigServer. In fact, a set of active server pages (for Microsoft's IIS or personal web server) are being developed which will let you serve specimen records as XML right from your web server. That is in the future though, right now the easiest and quickest way to start providing data is with ZBigServer. Z39.50 also appears to be more efficient as a mechanism for search and retrieval of data, so there are some performance reasons for choosing Z39.50 over HTTP. Any Z39.50 server that can generate records in the GRS-1 record syntax can participate; there are no peculiar or proprietary features of ZBigServer that exclude the use of other Z39.50 servers."

References:

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover. Newsletter Subscription Newsletter Archives