Recording Intention in RASHdb (original) (raw)

Simply recording an entity's name, its attributes and relationships does not necessarily capture its real world semantics or intention. For example, both SWISS-PROT and PIR have an attribute called "keywords", with the domain set of string. Their documentation, however, reveals that they have different intentions or roles within a database entry. SWISS-PROT keywords are intended to act as a complete index for the information in a protein's annotation. In contrast, PIR's keywords are a "catch-all" for annotation that cannot be placed in more specialised fields of an entry. When comparing an reconciling schema it is important to be able to know the intention of schema elements, rather than relying on schema information alone.

As well as recording the data intention of schema elements, it will also be useful to record the biological intention of schema elements. Two entities each from a different database, but both representing a physical protein sequence, may have different representations, especially their names. this would make it difficult to ask some of the BioCompass queries. A consistent mechanism, over and above schema element names, is required for recording biological intention.

The solution to recording both kinds of intention are controlled vocabularies. Two controlled vocabularies will exist for annotating the intentions of schema object within the RASHdb database.