XML-Data (original) (raw)
W3C Note 05 Jan 1998
This version:
http://www.w3.org/TR/1998/NOTE-XML-data-0105
Latest version:
http://www.w3.org/TR/1998/NOTE-XML-data
Authors:
Andrew Layman, Microsoft Corporation
Edward Jung, Microsoft Corporation
Eve Maler, ArborText
Henry S. Thompson, University of Edinburgh
Jean Paoli, Microsoft Corporation
John Tigue, DataChannel
Norbert H. Mikula, DataChannel
Steve De Rose, Inso Corporation
Status of this Document
This document is a NOTE made available by the World Wide Web Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE. A list of current NOTEs can be found at:http://www.w3.org/TR/.
This document is asubmission to the W3C. Please see Acknowledged W3C Submissionsregarding its disposition.
Contents:
- Introduction
- The Schema Element Type
- The ElementType Declaration
- Properties and Content Models
- Default Values
- Aliases and Correlatives
- Class Hierarchies
- Elements which are References
- Attributes as References
- Constraints & Additional Properties
- Using Elements from Other Schemas
- XML-Specific Elements
- The internal and external entity declaration element type: intEntityDcl and extEntityDcl
- The external declarations element type: extDcls
- Datatypes
- Mapping between Schemas
- Appendix A: Examples
- Appendix B : An XML DTD for XML-Data schemas
Acknowledgements
We thank Paul Grosso(ArborText),Sharon Adler (Inso Corporation),Anders Berglund (Inso Corporation),François Chahuneau(AIS/Berger-Levrault)for their help and contributions to this proposal.
Introduction
Schemas define the characteristics of classes of objects. This paper describes an XML vocabulary for schemas, that is, for defining and documenting object classes. It can be used for classes which as strictly syntactic (for example, XML) or those which indicate concepts and relations among concepts (as used in relational databases, KR graphs and RDF). The former are called "syntactic schemas;" the latter "conceptual schemas."
For example, an XML document might contain a "book" element which lexically contains an "author" element and a "title" element. An XML-Data schema can describe such syntax. However, in another context, we may simply want to represent more abstractly that books have titles and authors, irrespective of any syntax. XML-Data schemas can describe such conceptual relationships. Further, the information about books, titles and authors might be stored in a relational database, in which XML-Data schemas describe row types and key relationships.
One immediate implication of the ideas in this paper is that XML document types can now be described using XML itself, rather than DTD systax. Another is that XML-Data schemas provide a common vocabulary for ideas which overlap between syntactic, database and conceptual schemas. All features can be used together as appropriate.
Schemas are composed principally of declarations for:
- Concepts
- Classes of objects
- Class hierarchies
- Properties
- Constraints
- Relationships
- Indicated by primary key to foreign key matching
- Indicated by URI
- XML DTD Grammars and Compatibility
- grammatical rules governing the valid nesting of the elements and attributes
- attributes of elements
- internal and external entities, represented by intEntityDecl and extEntityDecl
- notations, represented by notationDcl
- Datatypes giving parsing rules and implementation formats.
- Mapping rules allowing abbreviated grammars to map to a conceptual data model.
The Schema Element Type
All schema declarations are contained within a schema element, like this:
<s:schema id='ExampleSchema'>
The namespace of the vocabulary described in this document is named "urn:uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882/".
The ElementType Declaration
The heart of an XML-Data schema is the elementType declaration, which defines a class of objects (or "type of element" in XML terminology). The_id_ attribute serves a dual role of identifying the definition, and also naming the specific class.
Within an elementType, the description subelement may be used to provide a human-readable description of the element's purpose.
The person, natural or otherwise, who wrote the book.
Properties and Content Models
Subelements within elementType define characteristics of the class's members. An XML "content model" is a description of the contents that may validly appear within a particular element type in a document instance.
The example above defines two elements, author and book, and says that a book has one or more authors. The author element may contain a string of character data (but no other elements). For example, the following is valid:
Henry Ford Samuel Crowther
Within an elementType, various specialized subelements (element, group, any, empty, string etc.) indicate which subelements (properties) are allowed/required. Ordinarily, these imply not only the cardinality of the subelements but also their sequence. (We discuss a means to relax sequence later.)
Element
Element indicates the containment of a single element type (property). Each element contains an href attribute referencing another_elementType_, thereby including it in the content model syntacticly, or declaring it to be a property of the object class conceptually. The element may be required or optional, and may occur multiple times, as indicated by its occurs attribute having one of the four values "REQUIRED", "OPTIONAL", "ZEROORMORE" or "ONEORMORE". It has a default of "REQUIRED".
The example above describes a book element type. Here, each instance of a book may contain a title, and must contain one or more authors.
Henry Ford Samuel Crowther My Life and Work
When we discuss type hierarchies, later, we will see that an element type may have subtypes. If so, inclusion of an element type in a content model permits elements of that type directly and all its subtypes.
Empty, Any, String, and Mixed Content
Empty and any content are expressed using predefined elements_empty_ and any. (Empty may be omitted.) _String_means any character string not containing elements, known as "PCDATA" in XML. Any signals that any mixture of subelements is legal, but no free characters. Mixed content (a mixture of parsed character data and one or more elements) is identified by a mixed element, whose content identifies the element types allowed in addition to parsed character data. When the content model is mixed, any number of the listed elements are allowed, in any order.
<s:schema>
...
Henry Ford Samuel Crowther My Life and<titlePart>Work</titlePart>
Here, book is defined to have an optional title and one or more authors. The name element has content model of any, meaning that free text is not allowed, but any arrangement of subelements is valid. The content model of title is mixed, allowing a free intermixture of characters and any number of titleParts. The author, name and titleParts elements have a _content model_of string.
Group
Group indicates a set or sequence of elements, allowing alternatives or ordering among the elements by use of the groupOrder attribute. The group as a whole is treated similarly to an element.
**** ****
In the above example, if a preface or introduction appears, both must, with the preface preceding the introduction. Each of the following is valid:
Henry Ford Henry Ford Prefatory text This is a swell book.
Sometimes a schema designer wants to relax the ordering restrictions among elements, allowing them to appear in any order. This is indicated by setting the groupOrder attribute to "AND":
Now the following is also valid:
Henry Ford This is a swell book. Prefatory text
Finally, a schema can indicate that any one of a list of elements (or groups) is needed. For example, either a preface or an introduction. The groupOrder attribute value "OR" signals this.
Now each of the following is valid:
Henry Ford Prefatory text Henry Ford This is a swell book.
Open and Closed Content Models
XML typically does not allow an element to contain content unless that content was listed in the model. This is useful in some cases, but overly in others in which we would like the listed content model to govern the cardinality and other aspects of whichever subelements are explicitly named, while allowing that other subelements can appear in instances as well.
The distinction is effected by the content attribute taking the values "OPEN" and "CLOSED." The default is "OPEN" meaning that all element types not explicitly listed are valid, without order restrictions. (This idea has a close relation to the Java concept of a final class.)
For example, the following instance data for a book, including the unmentioned element copyrightDate would be valid given the content models declared so far, because they have all been open.
Henry Ford Samuel Crowther My Life and Work **1922**
However, had the content model been declared closed, as follows, the_copyrightDate_ element would be invalid.
<elementType id="Book" **content="CLOSED"**>
A closed content model does not allow instances to contain any elements or attributes beyond those explicitly listed in the elementType declaration.
Default Values
An element with occurs of REQUIRED or OPTIONAL (but not ONEORMORE or ZEROORMORE) can have a default value specified.
**adult**
The default value is implied for all element instances in which it is syntactically omitted.
To indicate that the default value is the only allowed value, the_presence_ attribute is set to "FIXED".
ADULT
Presence has values of "IMPLIED," "SPECIFIED," "REQUIRED," and "FIXED" with the same meanings as defined in XML DTD.
Aliases and Correlatives
ElementTypes can be know be different names in different languages or domains. The equivalence of several names is effected by the sameAs attribute, as in
<elementTypeEquivalent id="livre" **type="#Book"**/> <elementTypeEquivalent id="auteur" **type="#author"**/>
Elements are used to represent both primary object types (nouns) and also properties, relations and so forth. Relations are often known by two names, each reflecting one direction of the relationship. For example, husband and wife, above and below, earlier and later, etc. The correlative element identifies such a pairing.
****
This indicates that "wrote" is another name for the "author" relation, but from the perspective of the person, not the book. That is, the two fragments below express the same fact:
Henry Ford My Life and Work **Henry Ford**...
Henry Ford **My Life and Work** My Life and Work
A correlative may be defined simply to document the alternative name for the relation. However, it may also be used within a content model where it permits instances to use the alternative name. Further it may to establish constraints on the relation, indicate key relationships, etc.
Class Hierarchies
ElementTypes can be organized into categories using the _superType_attribute, as in
This simply indicates that, in some fashion,PencilsI'veBoughtRecently and _BooksI'veBoughtRecently_are subsets of ThingsI'veBoughtRecently. It implies that every valid instance of the subset is a valid instance of the superset. The superset type must have an open content model.
There are restrictions that should be followed, based on the principle that all instances of the species (subtype) must be instances of the genus (supertype):
- The genus type must have content="OPEN".
- It must have either no groups or only groups with groupOrder="AND" (that is, no order constraints).
- You can add new elements and attributes.
- Occurs cardinality can be decreased but not increased.
- Ranges and other constraints are cummulative, that is, all apply (though the exact effect of this depends on the semantics of the constraint type).
- Default values can be made FIXED defaults.
To indicate that the content model of the subset should inherit the content model of a superset, we use a particular kind of superType called "genus" of which only one is allowed per ElementType. This copies the content model of the referenced element type and permits addition of new elements to it. Further, sub-elements occurring in the superset type, if declared again, are replaced by the newer declarations.
The above has the same effect as
Elements which are References
ElementTypes and the content model elements defined so far are sufficient to declare a tree structure of elements. However, some elements such as "author" are not only usable on their own, they also act as references to other elements. For example, "Henry Ford" is the value of the author subelement of a book element. "Henry Ford" is also the value of the _name_element in a person element, and it can be used to connect these two.
Henry Ford Samuel Crowther My Life and WorkHenry Ford
Samuel Crowther
In this capacity, such subelement are often referred to as _relations_when using "knowledge representation" terminology or "keys" when using database terms. (The meaning of "relation" and "key" are slightly different, but the fact which the terms recognize is the same.)
To make such references explicit in the schema, we add declarations for_keys_ and foreign keys.
**** ****
The key element within person tells us that a person can be uniquely identified by his name. The foreignKey element within the author element definition says that the contents of an author element are a foreign key indentifying a person by name.
An uninformed user agent can still display the string "Henry Ford" even if it cannot determine that is supposed to be a person. A savvy agent that reads the schema can do more. It can locate the actual person.
This is the information needed for a join in database terminology.
This mechanism not only handles the typical way in which properties are expressed in databases, it also handles all cases in which the contents of an element are to be interpreted as strings from a restricted vocabulary, such as enumerations, XML nmtokens, etc.
Henry Ford Samuel Crowther My Life and Work **HD9710.U54 F58 1973** **629.2/092/4 B** **0405050887** **Business**
Although not shown here, presumably lccn, dewey and _isbn_are declared in the schema to be foreign keys to corrresponding fields of catalog records. Series is a foreign key to a categorization of books, of which "Business" is one category.
Keys can contain URIs, as in
**http://SSA.gov/blab/people/Henry+Ford** **http://SSA.gov/blab/people/Samuel+Crowther** My Life and Work
This is indicated in the schema by a datatype of "URI".
****
One-to-Many Relations
Element relations are binary. That is, we never express an n-to-1 relationship directly. We do not, for example, list within books a single relation that somehow resolves to all the authors. Instead, we always write the relationship on the 1-to-n side, but allow multiple occurrances of the_subelement_, for example, allowing books to have multiple occurrences of author.
Henry Ford
Samuel Crowther
Harvey S. Firestone
Henry Ford Samuel Crowther My Life and Work Harvey S. Firestone Samuel Crowther Men and Rubber
This example shows a book with several persons as author, and also a person who is author of several books. We discussed such many-to-many relations more under the topic of correlations.
Multipart Keys
When the foreignKey element does not have foreignKeyPart sub-elements (as it does not above) then the entirety of the element's contents (e.g. "Henry Ford") should be used as the key value. However, for multipart foreign keys, or cases where the element has several sub-elements, foreignKeyPart is used, as shown below.
...
My Life and Work Henry Ford
Attributes as References
An alternative way to express a reference is with an attribute.
Henry Ford
Samuel Crowther
My Life and Work
This allows us to link a book to a person, through the author relation, using an attribute of the relation. This exactly parallels the construction we saw above under "multipart keys," where a subelement of author contained the author's name. Here, an attribute of author contains the name. We can express this in our schema as
A widely-used variant of this is to use a URI as a foreign key:
**** **** My Life and Work
In this case, we are using the href attribute to contain a URI. This is a particular kind of foreign key, where the range is any possible resource, and where that resource is not identified by some combination of its properties but instead by a name-resolution service. We indicate this by using an attribute element, with dt= "URI".
****
Constraints & Additional Properties
Min and Max Constraints
Elements can be limited to restricted ranges of values. The min and_max_ elements define the lower and upper bounds.
0131
Such intervals are half-open (that is, the min value is in the interval, and the max value is the smallest value not in the interval).
This rule leads to the simplest calculation in most cases, and is unambiguous with respect to precision. In the above example, it is clear by these rules the 130.9999 is in the interval and 131 is not. However, had we said "all numbers from 0 to 130.99," in practice we would have some ambiguity regarding the status of 130.9999. Or interpretation would depend on the precision that we inferred for the original statement. The issue is particularly ambiguous for dates. (What exactly does "From December 5 to December 8" mean? The use of half-open intervals for representation does not, however, put any requirements on how processors must display intervals. For example, dates in some contexts display differently than their storage. That is, the interval1997-12-051997-12-09might be displayed as "December 5 through December 8".
In certain cases this rule for a half-open interval is impractical (for example, what letter follows "z" in the latin alphabet?) If so, use_maxInclusive_:
AZ
Domain and Range Constraints
We can use the domain and range elements to add constraints to an element's use or value. The domain element, if present, indicates that the element may only be used as a property of certain other elements. That is, syntactically it may appear only in the content model of those other element types. It constrains the sorts of schemas that can be written with the element.
****
The domain property above permits author elements to be used only within elements which are either books or subsets of_books_. Use of domain is optional. If omitted, there is simply no restriction.
The range element is used with elements which are references and declares a restriction on the types of elements to which the relation may refer. Graphically, it describes the target end of a directed edge. Each_range_ element references one elementType, any of which are valid. In this case, below, we have said that an author element must have an href attribute which is a URI reference to a Person or to an element type which is Person or a subset of Person.
****
Other useful properties
Element and attribute types can have an unlimited amount of further information added to them in the schema due to the open nature of XML with namespaces.
Using Elements from Other Schemas
A schema may use elements and attributes from other schemas in content models. For example, a subelement named "http://books.org/date" could be used within a book element as follows:
<s:schema>
****
This can be abbreviated by adopting the rule that namespace-qualified names may be used within the href attribute value of an_element_ or attribute element.
<s:schema>
****
XML-Specific Elements
Attributes
XML-Data schemas contain a number of facilities to match features of XML DTDs or to support certain characteristics of XML. The XML syntax allows that certain properties can be expressed in a form called "attributes." To support this, an elementType can contain attribute declarations, which are divided into attributes with enumerated or notation values, and all other kinds.
An attribute may be given a default value. Whether it is required or optional is signaled by presence. (Presence ordinarily defaults to IMPLIED, but if omitted and there is an explicit default, presence is set to the SPECIFIED.) See the DTD at the end of this document for syntactic details.
Attributes with enumerated (and notation) values permit a values attribute, a space-separated list of legal values. The values attribute is required when the atttype is ENUMERATION or NOTATION, else it is forbidden. In these cases, if a default is specified, it must be one of the specified values.
**** ****
describes an instance such as
My Life and Work Henry Ford
Attributes may also reference elementTypes, meaning that one may use the element type but with attribute syntax. This allows an attribute to explicitly have the same name and semantics even when used on different element types. There are of course some limits: The attribute can still occur only once in an instance, and it cannot contain other elements. However, this allows the semantics of the element type to be employed in attribute syntax.
describes an instance such as
The internal and external entity declaration element type: intEntityDcl and extEntityDcl
This and the next two declarations cover entities. Entities are a shorthand mechanism, similar to macros in a programming language.
Language Technology Group
Here as elsewhere, following XML, systemId must be a URI, absolute or relative, and publicId, if present, must be a Public Identifier (as defined in ISO/IEC 9070:1991, Information technology -- SGML support facilities -- Registration procedures for public text owner identifiers). If a notation is given, it must be declared (see below) and the entity will be treated as binary, i.e., not substituted directly in place of references.
The external declarations element type: extDcls
Although we allow an external entity with declarations to be included, we recommend a different declaration for schema modularization. The extDcls declaration gives a clean mechanism for importing (fragments of) other schemas. It replaces the common SGML idiom of declaring an external parameter entity and then immediately referring to it, and has the same import, namely, that the text referred to by the combination of systemId and publicId is included in the schema in place of the extDcls element, and that replacement text is then subject to the same validity constraints and interpretation as the rest of the schema.
Note that in many cases the desired effect may be better represented by referencing elements (and attributes) from the other schema or subclassing from them.
Datatypes
A dataype indicates that the contents of an element can be interpreted as both a string and also, more specifically, as an object that can be interpreteted more specifically as a number, date, etc. The datatype indicates that the element's contents can be parsed or interpreted to yeild an object more specific than a string.
That is, we distinguish the "type" of an element from its "datatype." The former gives the semantic meaning of an element, such as "birthday" indicating the date on which someone was born. The "datatype" represents the parser class needed to decode the element's contents into an object type more specific than "string." For example, "19541022" is the 22nd of October, 1954 in ISO 8601 date format. (That is, ISO 8601 parsing rules will decode "19541022" into a date, which can then be stored as a date rather than a string.
For example, we would like an XML author to be able to say that the contents of a "size" element is an integer, meaning that it should be parsed according to numeric parsing rules and that it can be stored in integer format. In some contexts an API can expose it as an integer rather than a string.
shirt 8
There are two main contexts for datatypes. First, when dealing with database APIs, such as ODBC, all elements with the same name typically contain the same type of contents. For example, all sizes contain integers or all birthdays contain dates. We will return to this case shortly.
Second, and by contrast, the type of the content may vary widely from instance to instance. The softer we make our software, the more often these flexible cases occur. For example, size could contain the integer 8, or the word "small" or even a formula for computing the size.
We expose the datatype of an element instance by use of a _dt:dt_attribute, where the value of the attribute is a URI giving the datatype. (The URI might be explicitly in URI format or might rely on the XML namespace facility for resolution.) For example, we might find a document containing something like:
shirt 8 shoes large suit =(shirtsize*1.05) + 3
Clearly this technique works for the heterogeneous typing in the above example. It also works for the database case where all element's of the same type have the same datatype.
shirt 8 shoes 6 suit 12
As written above, this is inefficient. Fortunately, XML allows us in schemasto put attributes with default or fixed values, so we could say once that all_size_ elements have a datatype with value "int". Having done so, our our instance just looks like:
shirt 14 shoes 6 suit 16
In a DTD, we can set a fixed attribute value, so that all_size_ elements have datatype "int" or we can set it as a_default_ attribute value so that it is an integer except where explicitly noted otherwise.
shirt 14 shoes large suit 16
XML DTDs today allow such attributes. For example, a DTD can say that all_shirt_ elements have integer datatype by the following:
XML-Data schemas allow the equivalent, though with specialized syntax:
Elements use datatype subelements to give the datatype so that an optional presence attribute of the datatype element can indicate whether the datatype is fixed or merely a default. Attributes can also have datatypes. Because there is no possibility of their being anything other than a fixed type, the datatype of an attribute is signalled by a dt attribute:
How Typed Data is Exposed in the API
Different APIs to typed data will use the datatype attribute differently. The basic XML parser API should expose all element contents as strings regardless of any datatype attribute. (It might also contain supplementary methods to read values as more specific types such as "integer," thereby getting more efficiency.) An ODBC interface could use the datatype attribute to expose each type of element as a column, with the column's datatype determined by the element type's datatype.
Complex Data Types
If a datatype requires a complex structure for storage, or an object-based storage, this is also handled by the dt:dt attribute, where the datatype's storage format can be a structure, Java class, COM++ class, etc. For example, if an application needed to have an element stored in a "ScheduleItem" structure and using some private format, it could note this like
M*D1W4B19971022;100
The datatype does not require a private format. It could also use subelements and attributes such as
* 1 4 19971022 100
In the case of the graph-oriented interfaces (e.g. XML/RDF) the mapping from the XML tree to a graph should add a wrapping node for each non-string data type. The datatype property gives the type of that node. For example, the following two are graphically equivalent:
8 dt:int8
Versioning of Instances
Adding an attribute to an element does not change the other attributes or pose any special versioning problems. For example, an application written to expect an instance to contain "19541022" is not harmed if the schema reveals that this is ISO 8601 format. Versioning within datatypes should be handled by the author's making sure that that subtypes of datatypes retain all the characteristics of the supertype.
If a down-level application is given a datatype it cannot process, it should expose the element contents as a supertype of the indicated datatype. In practice, this will usually mean that unrecognized datatypes will be the same as "dt:string". However, there are cases in which a type will be promoted, for example exposing a boolean in a byte or word rather than a bit, exposing a floating point number in a language's native format, etc.
The Datatypes Namespace
The datatype attribute "dt" is defined in the namespace named "urn:uuid:C2F41010-65B3-11d1-A29F-00AA00C14882/". (See the XML Namespaces Note at the W3C site for details of namespaces.) The full URN of the attribute is "urn:uuid:C2F41010-65B3-11d1-A29F-00AA00C14882/dt".
You will have noticed that the value of the attribute, as used in the examples above, is not lexically a full URI. For example, it reads "int" or "string" etc. Datatype attribute values are abbreviated according to the following rule: If it does not contain a colon, it is a datatype defined in the datatypes namespace "urn:uuid:C2F41010-65B3-11d1-A29F-00AA00C14882/". If it contains a colon, it is to be expanded to a full URI according to the same rules used for other names, as defined by the XML Namespaces Note. For example
8 shirt
has two datatypes whose full names are "urn:uuid:C2F41010-65B3-11d1-A29F-00AA00C14882/integer" and "http://zoosports.com/dt?clothing".
What a datatype's URI Means
Datatypes are identified by URIs. The URI as simply a reference to a section of a document that defines the appropriate parser and storage format of the element. To make this broadly useful, this document defines a set of common data types including all common forms of dates, plus all basic datatypes commonly used in SQL, C, C++, Java and COM (including strings).
The best form of such a document is that it should itself be an XML-Data schema where each datatype is an element declaration. For this purpose we define a subelement which can be used in lieu of a content model. We also define an subelement. Each has a URI as its value. This integrates data types with element types in general.
<schema:elementType id="int">
<schema:elementType id="date.iso8601">
The objecttype sub-element can reference a structure, Java class, COM++ coClass, etc. The syntax subelement identifies a parser which can decode the element's content (and/or attributes) into the object type given the storage type URI. Input to the parser is the element object exposing all its attributes and content tree (that is, the subtree of the grove beginning with the element containing the dt attribute). The objectType attribute in particular is assumed available to the parser so that a single parser can support several objecttypes.
Having said this, all basic data types should be built into the parsers for efficiency and in order to ground the process. For these, the datatype elements serve only to formally document the storage types and parsers, and to give higher-level systems (such as RDF) a more formal basis for datatypes.
I do not currently propose that we attempt to write any universal notation for parsing rules. Certain popular kinds of formats, particularly dates, are not easily expressed in anything but natural language or code, and the parsers must be custom written code. In other words, the URIs for the basic syntax and objecttype elements probably resolve only to text descriptions.
Structured Data Type Attributes
Attributes in cannot XML have structure. I will separately propose some techniques to avoid this problem, specifically that the XML API should contain a method that treats attributes and subelements indistinguishably, and also that the content which is an element's value can be syntactically separated from content which is an element's properties.
Specific Datatypes
This includes all highly-popular types and all the built-in types of popular database and programming languages and systems such as SQL, Visual Basic, C, C++ and Java(tm).
Name | Parse type | Storage type | Examples |
---|---|---|---|
string | pcdata | string (Unicode) | Omwnuma legatai wn onoma monon koinon, o de kata tounoma logos thV ousiaV eteros, oion zuon o te anqropoV kai to gegrammenon. |
number | A number, with no limit on digits, may potentially have a leading sign, fractional digits, and optionally an exponent. Punctuation as in US English. | string | 15, 3.14, -123.456E+10 |
int | A number, with optional sign, no fractions, no exponent. | 32-bit signed binary | 1, 58502, -13 |
float | Same as for "number." | 64-bit IEEE 488 | .314159265358979E+1 |
fixed.14.4 | Same as "number" but no more than 14 dights to the left of the decimal point, and no more than 4 to the right. | 64-bit signed binary | 12.0044 |
boolean | "1" or "0" | bit | 0, 1 (1=="true") |
dateTime.iso8601 | A date in ISO 8601 format, with optional time and no optional zone. Fractional seconds may be as precise as nanoseconds. | Structure or object containing year, month, hour, minute, second, nanosecond. | 19941105T08:15:00301 |
dateTime.iso8601tz | A date in ISO 8601 format, with optional time and optional zone. Fractional seconds may be as precise as nanoseconds. | Structure or object containing year, month, hour, minute, second, nanosecond, zone. | 19941105T08:15:5+03 |
date.iso8601 | A date in ISO 8601 format. (no time) | Structure or object containing year, month, day. | 19541022 |
time.iso8601 | A time in ISO 8601 format, with no date and no time zone. | Structure or object exposing day, hour, minute | |
time.iso8601.tz | A time in ISO 8601 format, with no date but optional time zone. | Structure or object containing day, hour, minute, zonehours, zoneminutes. | 08:15-05:00 |
i1 | A number, with optional sign, no fractions, no exponent. | 8-bit binary | 1, 255 |
i2 | " | 16-bit binary | 1, 703, -32768 |
i4 | " | 32-bit binary | |
i8 | " | 64-bit binary | |
ui1 | A number, unsigned, no fractions, no exponent. | 8-bit unsigned binary | 1, 255 |
ui2 | " | 16-bit unsigned binary | 1, 703, -32768 |
ui4 | " | 32-bit unsigned binary | |
ui8 | " | 64-bit unsigned binary | |
r4 | Same as "number." | IEEE 488 4-byte float | |
r8 | " | IEEE 488 8-byte float | |
float.IEEE.754.32 | " | IEEE 754 4-byte float | |
float.IEEE.754.64 | " | IEEE 754 8-byte float | |
uuid | Hexidecimal digits representing octets, optional embedded hyphens which should be ignored. | 128-bytes Unix UUID structure | F04DA480-65B9-11d1-A29F-00AA00C14882 |
uri | Universal Resource Identifier | Per W3C spec | http://www.ics.uci.edu/pub/ietf/uri/draft-fielding-uri-syntax-00.txt http://www.ics.uci.edu/pub/ietf/uri/ http://www.ietf.org/html.charters/urn-charter.html |
bin.hex | Hexidecimal digits representing octets | no specified size | |
char | string | 1 Unicode character (16 bits) | |
string.ansi | string containing only ascii characters <= 0xFF. | Unicode or single-byte string. | This does not look Greek to me. |
All of the dates and times above reading "iso8601.." actually use a restricted subset of the formats defined by ISO 8601. Years, if specified, must have four digits. Ordinal dates are not used. Of formats employing week numbers, only those that truncate year and month are allowed (5.2.3.3 d, e and f).
Mapping between Schemas
Certain uses of data emphasize syntax, others "conceptual" relations. Syntactic schemas often have fewer elements compared to explicitly conceptual ones. Further, it is usually easier to design a schema that merely covers syntax rather than designing a well-thought-out conceptual data model. An effect of this is that many practical schemas will not contain all the elements that a conceptual schema would, either for reasons of economy or because the initial schema was simply syntactic. But is it useful to make the implicit explicit over time so that more generic processors can make use of data.
For example, the following schema is essentially syntax:
with instances looking like this
Paradise Lost Milton
On the other hand, a conceptual schema could look like this:
If fully explicit, its instances would look something like this:
Milton Paradise Lost Milton
In any case, what we want to express is a diagram such as this:
To do this, we will add mapping information into the syntactic schema which tells us how to interpolate the implied elements (and also to map_author_ to creator) thereby creating a conceptual data model.
A more complex case could involve needing to map several properties to have a common implied node. For example, suppose we wanted that a _street_element and city element should both imply the same address node.
Mary Poppins 17 Cherry Tree Lane London
That is, rather than creating two address nodes, we want to create only a single onc, and subordinate both the street and _city_to it. If the conceptual schema has elements livesAt, address,street and city, we could write a mapping thus:
...definitions of name, street and city...
Elements may be repeated, so mapping rules need to accommodate repetitions. Suppose that someone has two addresses in the grammatical syntax, this needs to map to two addresses in the graph while still keeping the structure correct.
Mary Poppins 17 Cherry Tree Lane London One Park Lane London
Mappings within groups are handled together. Since street and_city_ are in a single group, each occurrence of such a group results in one address.
Text markup can also be handled by mapping. Suppose that for some reason we choose to markup the number portion of a street address:
Mary Poppins < streetNumber>17 Cherry Tree Lane London
If this should be reflected in the graph,
We can do that with mapping such as:
<elementType id="street>
...Person defined as before...
Appendix A: Examples
Some data:
bk:booksAndAuthors Henry Ford 1863
<Person>
<name>Harvey S. Firestone</name>
</Person>
<Person>
<name>Samuel Crowther</name>
</Person>
<Book>
<author>Henry Ford</author>
<author>Samuel Crowther</author>
<title>My Life and Work</title>
</Book>
<Book>
<author>Harvey S. Firestone</author>
<author>Samuel Crowther</author>
<title>Men and Rubber</title>
<ecom:price>23.95</ecom:price>
</Book>
The schema for http://company.com/schemas/books:
<s:schema>
<elementType id="name">
<string/>
</elementType>
<elementType id="birthday">
<string/>
<dataType dt="date.ISO8601"/>
</elementType>
<elementType id="Person">
<element type="#name" id="p1"/>
<element type="#birthday" occurs="OPTIONAL">
<min>1700-01-01</min><max>2100-01-01</max>
</element>
<key id="k1"><keyPart href="#p1" /></key>
</elementType>
<elementType id="author">
<string/>
<domain type="#Book"/>
<foreignKey range="#Person" key="#k1"/>
</elementType>
<elementType id="writtenWork">
<element type="#author" occurs="ONEORMORE"/>
</elementType>
<elementType id="Book" >
<genus type="#writtenWork"/>
<superType href=" http://www.ecom.org/schemas/ecom/commercialItem"/>
<superType href=" http://www.ecom.org/schemas/ecom/inventoryItem"/>
<group groupOrder="SEQ" occurs="OPTIONAL">
<element type="#preface"/>
<element type="#introduction"/>
</group>
<element href="http://www.ecom.org/schemas/ecom/price"/>
<element href="ecom:quantityOnHand"/>
</elementType>
<elementTypeEquivalent id="livre" type="#Book"/>
<elementTypeEquivalent id="auteur" type="#author"/>
Appendix B : An XML DTD for XML-Data schemas
<! Subtype of ElementType that is explicitly a relation. -->