SIOExplorer Digital Library Infrastructure (original) (raw)
Access Data
Collections
Outreach
Digital Library Infrastructure
Technical Help
About Us
Digital Library Infrastructure
It is encouraging to compare the State-of-the-Art when the SIOExplorer project started with the present condition. In mid-2001 most of the SIO cruise data resided on tapes in boxes, scattered small disk drives on various workstations, or on paper. There were no formal databases in use, virtually no metadata anywhere, and the experts were rapidly retiring. We now have an organized collection of about 1000 cruises, a second-generation streamlined archiving process, and multiple high performance servers and RAID storage systems, including one in the high-bandwidth SDSC machine room. It has taken a group effort by a dedicated team of technicians, programmers, computer scientists, archivists, librarians, researchers and students, not only at SIO, but also at WHOI and LDEO.
SIOExplorer Infrastructure
For each arbitrary digital object (ADO) published in SIOExplorer, descriptive metadata is stored both in an associated file, and a database record.
Authoritative metadata is stored in a MIF (Metadata Interchange Format) file, in ASCII format. The MIF file is associated with an ADO by filename. For example, if the data filename is data.txt, the authoritative metadata would be stored in data.txt.mif. Each MIF file follows an MTF, or Metadata Template File. Metadata values that can be standardized, like data type, port, chief scientist, are selected from the controlled vocabulary file.
The metadata stored in the database enables discovery of data in each of the SIOExplorer data access points.
Technical Description of the ADO/MIF Approch
Simply put, any files that are related, and these are often different kinds of files, can be put in a their own directory and these automatically get tar'red together into something we call an ADO (arbitrary digital object encapsulated in a *.tar wrapper) and a corresponding *.mif (metadata interchange format) file. This preserves the original source files just as they were created at the same time puts everything into a standard form for automated processing. The files can be voluminous or tiny, makes no nevermind.
The contents of the *.mif file get read and loaded into a metadata catalogue that is the equivalent of the standard card catalogue of a conventionla library. We have a specific format and controlled vocabulary defined by what we call a metadata template file (*.mtf). It has both canonical blocks and native blocks. The ADO and *.mif pairs are meant to be archival, transportable, interoperable products that are independent of any given software implementation such as a database or digital library: analogous to books and card catalogue information and independent of any particular library but easily acquired and managed by any one as well.
The canonical blocks are the minimal required metadata and are common to all data files from a given cruise. All blocks are linked by a common ADOidentifier that is a DOI (digital object identfier). The native blocks are optional and designed to accommodate specific instrument classes or derived products. All of these fields are automatically populated using software written to harvest the metadata from various standard sources.