Semantic Heterogeneity in Bioinformatics Resources (original) (raw)

Causes 1 to 4 are included within the standard view of semantic heterogeneity. Hammer and McCleod point out that cause 5 (tools) is really orthogonal to the first four. Batini et al. give the causes of semantic heterogeneity as follows:

  1. Different perspectives -- this is the same as different modellers differing in how they conceptualise the same data;
  2. Equivalent constructs: This is the same as the differing capibilities of the schema representation languages;
  3. Incompatible design specifications: for the same data domain, the application developer may have slightly differing purposes and constraints. this can lead to differing schema.

So, semantic heterogeneiety deals with how data is represented in structural, organisational terms within a database. So, it caqptures whether some data is represented as an entity in one schema, but only as an attribute of an entity in another. This definition extends as far as what data types are used (integer, real or string, for example), as well as units used (centimetres or inches) and precision (two or four decimal places; mark or grade in an exam). Semantic heterogeneity does not extend as far as the instances placed within the schemata of different databases. this description of semantic heterogeneiety does not extend to the fact that SWISS-PROT entry P21598 is equivalent (not identical) to PIR entry S13142. Separate techniques are required to resolve these instance conflicts.

A more grey area is encountered when considering the SWISS-PROT keyword `loop' in this entry is the same as the PIR keyword `p-loop' or that the SWISS-PROT feature key `DISULFID' is equivalent to the term `disulfide' in PIR. It is easier to reconcile two large collections of keywords as a separate exercise from the data organisation reconciliation. Sometimes, a value in one DB can correspond to an attribute or entity in another -- For example, some of the feature key values from SWISS-PROT map onto record names in the feature table of PIR. These cases can be classified to their type of semantic heterogeneity, if the value is assumed to be an attribute. the boundary between data and metadata is somewhat blurred and can depend upon perspective. Tackling this problem is a feature of the RASH schema comparison process

Classification of Semantic Heterogeneities

This classification of the semantic heterogeneities existing in database schema has been taken from Won Kim. it has been adapted, by Won Kim, from an earlier, purely relational form to one that accommodates an object view. Here, the word entity is used for both class and entity or table. Similarly, attribute is used for both field, attribute and method. The classification is as follows:

  1. One to one entity conflict:
    1. Entity name:
      • different names for equivalent entities;
      • same name for different entities;
    2. Entity structure conflict:
      • missing attribute;
      • missing, but implicit attribute;
    3. Entity constraints;
    4. Entity inclusion;
  2. Many to many entity conflict:
    • Composition of two or more one to one entity conflicts;
  3. One to one attribute conflict:
    1. Attribute name:
      • different name for equivalent attribute;
      • same name for different attribute;
    2. Attribute constraints:
      1. integrity constraints;
      2. data type;
      3. composition;
    3. Default values;
    4. Attribute inclusion;
    5. Methods;
  4. Many to many attribute conflict:
    • Composition of two or more one to one attribute conflicts;
  5. Entity to attribute conflict:
    • Composition of two or more one to one entity and one to one attribute conflicts;
  6. Domain representation conflict:
    1. Different expression denoting same information;
    2. Different units;
    3. Different levels of precision.

Some Examples of Bioinformatics Schema Heterogeneity