The list of schema languages is long and needs to include languages developed for SGML (the language used before XML was born) to be complete. The list that I propose is far from exhaustive, and includes only the major proposals that have influenced the schema languages I see as the most promising.
Mandatory for any SGML application, a simplified version of the SGML DTDs was introduced in the XML 1.0 Recommendation. Even though a DTD is not mandatory for an application to read and understand a XML document, many developers highly recommend writing DTDs for any XML application.
The W3C XML Schema Working Group received many proposals that were contributed as notes:
XML-Data, submitted as a note (http://www.w3.org/TR/1998/NOTE-XML-data) in January 1998 by Microsoft, DataChannel, Arbortext, Inso Corporation, and the University of Edinburgh, included most of the basic concepts later developed by W3C XML Schema. Although the details were not fully developed, the note covered a lot of ground that was kept out of W3C XML Schema, such as internal and external entity definitions and the mapping to RDF (Resource Description Framework) and OOP structures.
XML-Data-Reduced (XDR), submitted in July 1998 (http://www.ltg.ed.ac.uk/~ht/XMLData-Reduced.htm) by Microsoft and the University of Edinburgh was presented to "refine and subset those ideas down to a more manageable size in order to allow faster progress toward adopting a new schema language for XML" (mappings were left out). XDR was implemented by Microsoft and used by the BizTalk framework.
DCD (Document Content Description for XML), also submitted in July 1998 (http://www.w3.org/TR/NOTE-dcd) by Textuality, Microsoft, and IBM, was a "subset of the XML-Data Submission (XML-Data) and expressed it in a way which is consistent with the ongoing W3C RDF (Resource Description Framework) effort." Mapping considerations were left out, but the language took care to be consistent with RDF through features such as "Interchangeability of Elements and Attributes."
SOX (Schema for Object-Oriented XML) was developed by Veo Systems/Commerce One and submitted as a note in September 1998 (a second version was submitted in July 1999 (see http://www.w3.org/TR/NOTE-SOX) as "informed by the XML 1.0 specification as well as the XML-Data submission (XML-Data), the Document Content Description submission (DCD), and the EXPRESS language reference manual (ISO-10303-11)." SOX was very influenced by OOP language design and included concepts of interface and implementation, but it was also influenced by DTDs and included support for "parameters." SOX is widely used by Commerce One.
DDML (Document Definition Markup Language or XSchema) was the "result of contributions from a large number of people on the XML-Dev mailing list, coordinated by a smaller group of editors" (Ronald Bourret, John Cowan, Ingo Macherius, and Simon St. Laurent) and was submitted as a note in January 1999 (http://www.w3.org/TR/NOTE-ddml). Its purpose was to "encode the logical (as opposed to physical) content of DTDs in an XML document." Great attention is paid to the definition of the back and forward conversions between DTDs and DDML, and the document also includes an "experimental" chapter proposing "Inline DDML Elements." DDML made a clear distinction between structures and data and left datatypes out.
W3C XML Schema, published as a Recommendation in May 2001 (http://www.w3.org/TR/xmlschema-0, http://www.w3.org/TR/xmlschema-1 and http://www.w3.org/TR/xmlschema-2) acknowledges the influence of DCD, DDML, SOX, XML-Data, and XDR in its list of references. It appears to have picked pieces from each of these proposals but is also a compromise between them. The main sponsors of the two languages still actively used and developed (Microsoft for XDR and Commerce One for SOX) both announced that they would support the W3C XML Schema for their new developments. W3C XML Schema will most likely become the only surviving member of this family in the long-term.
The RELAX NG family is a more traditional marriage between grammar-based XML Schema languages that have chosen to unite their strengths.
First published in March 2000 as a Japanese ISO Standard Technical Report written by Murata Makoto, Regular Language description for XML Core (RELAX; see http://www.xml.gr.jp/relax) is both simple ("Tired of complicated specifications? You just RELAX !") and built on a solid mathematical foundation (the adaptation of the hedge automata theory to XML trees). It was approved as an ISO/IEC Technical Report in May 2001.
XDuce (http://xduce.sourceforge.net) was first announced in March 2000."XDuce ('transduce') is a typed programming language that is specifically designed for processing XML data. One can read an XML document as an XDuce value, extract information from it or convert it to another format, and write out the result value as an XML document." Although it is not meant to be a schema language, its typing system has influenced the schema languages.
Published by James Clark in January 2001, TREX (Tree Regular Expressions for XML; see http://thaiopensource.com/trex) is "basically the type system of XDuce with an XML syntax and with a bunch of additional features." The names and content models of the elements used to define the tree patterns of a TREX schema have been carefully chosen, and TREX schemas are usually as easy to read as a plain text description. The simplicity of the structure of the language also allows the resurrection of a consistent treatment between elements and attributes, a feature lost since DCD.
Announced in May 2001, RELAX NG (RELAX New Generation) is a merger of RELAX and TREX, developed by an OASIS TC (http://www.oasis-open.org/committees/relax-ng), coedited by James Clark and Murata Makoto. "The key features of RELAX NG are that it is simple, easy to learn, uses XML syntax, does not change the information set of an XML document, supports XML namespaces, treats attributes uniformly with elements so far as possible, has unrestricted support for unordered content, has unrestricted support for mixed content, has a solid theoretical basis, and can partner with a separate datatyping language (such W3C XML Schema Datatypes)." RELAX NG is now an official specification of the OASIS RELAX NG Technical Committee and will probably progress to become an ISO/IEC International Standard as part of DSDL.
Schematron (http://www.ascc.net/xml/resource/schematron/schematron.html), which was first proposed in September 1999 by Rick Jelliffe of the Academia Sinica Computing Centre, is an unusual schema language. It defines validation rules using XPath expressions. Schematron is also described in the ISO DSDL project.
Starting from the observations that instance documents are usually much easier to understand than the schemas that describe them, and that schema languages often need to give examples of instance documents to help human readers to understand their syntax, I proposed Examplotron (http://examplotron.org) in March 2001, to define "schemas by example" using sample instance documents as actual schemas.
Copyright © 2002 O'Reilly & Associates. All rights reserved.