Creating More Building Blocks Using Object-Oriented Features (XML Schema)

We have already seen many features that have been borrowed from object-oriented languages. In this chapter, we will see substitution groups (similar to subclasses), abstract elements and datatypes (similar to abstract classes), and final datatypes (similar to final classes).

12.1. Substitution Groups

In many cases, a vocabulary needs the ability to accept a variety of different content models. We have two options: we can try to do it using a single generic element name, or we can define a schema smart enough to deal with the possible content model. Since we cannot define multiple different content models for the same element (because of the Consistent Declaration Rule), we can either use xsi:type attributes in the instance documents, or we can define a content model wide enough to accommodate all the possibilities. Such a model would likely be wide enough to also accept combinations that we do not want.

The easiest solution for accommodating different types with W3C XML Schema is to use a different element name for each case. We already saw that the xs:choice(outside a group) compositor allows us to build such constructs where a node in an instance document can accept an element chosen in a list. However, this list is fixed in the complex type definition. We have also seen that this list cannot be extended, since the rules for complex type derivations by extension do not allow it. Substitution groups offer a flexible way to create xs:choice(outside a group) compositors out of single element definitions or references, as well as a way to extend them. More simply, they are lists of elements that can be used in place of each other within documents. One important thing to note before we start, though, is that substitution groups apply only to global elements.

Substitution groups can be seen as extensible element groups. Before introducing them, let's look again at the "traditional" element groups to highlight the differences between these two concepts. Since the Recommendation is especially fuzzy on the extensibility of element groups and the restriction of substitution groups, I have chosen to present a conservative interpretation, which should be free of interoperability issues. I will discuss the different interpretations at the end of the chapter.

12.1.1. Using a "Traditional" Group

Let's come back to the definition of a name. (After all, universal names are one of the most controversial subjects in normalization spheres, so it's no surprise that we can use them as examples!) Instead of playing with datatypes, we may just use different element names, and say that a name is either a simple name, such as:

<simple-name>
  Snoopy
</simple-name>

or a full name, such as:

<full-name>
  <last>
    Schulz
  </last>
  <first>
    Charles
  </first>
  <middle>
    M
  </middle>
</full-name>

We have already seen how we can define a flexible schema that will match these documents. A good idea is to create a group with a xs:choice compositor that allows one of those two elements and can be reused in all the elements in which a name needs to be included. The logical steps are to define the two elements (full name and simple name), to create a group, and to use it in the definition of the author and character elements:

<xs:element name="full-name">
  <xs:complexType>
    <xs:all>
      <xs:element name="first" type="string32" minOccurs="0"/>
      <xs:element name="middle" type="string32" minOccurs="0"/>
      <xs:element name="last" type="string32"/>
    </xs:all>
  </xs:complexType>
</xs:element>
          
<xs:element name="simple-name" type="string32"/>
          
<xs:group name="name">
  <xs:choice>
    <xs:element ref="simple-name"/>
    <xs:element ref="full-name"/>
  </xs:choice>
</xs:group>
          
<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
          
<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="qualification"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

Note that we are able to use xs:all in this case because the elements involved are isolated in the full name element. This is also a good time to mention that xs:all doesn't mean the order is not significant, but only that all the combinations are valid. In this case, writing the following:

<full-name>
  <first>
    Eric
  </first>
  <last>
    van der Vlist
  </last>
</full-name>

or:

<full-name>
  <last>
    van der Vlist
  </last>
  <first>
    Eric
  </first>
</full-name>

may express whether I prefer to be called "Eric van der Vlist" or "van der Vlist Eric." Applications that want access to the components of this full-name can still have it, but those that need a full-name must respect the document order.

12.1.2. Substitution Groups

12.1.2.1. Using substitution groups

Let's see how we can define the same content model using substitution groups. The first thing to do is to define an element that both full-name and simple name can be derived from. In this case, we have a simple type on one hand and a complex type with complex content on the other, and we cannot find a type that can be extended to both. We have no other choice but to start with the universal type, which accepts any content model. Known as xs:anyType, this very special type is also the default value when no type is specified, and we can define a generic name element without giving any type definition to keep it as open as possible:

<xs:element name="name"/>

This element will be what is known as the head of the substitution group. Without declaring anything on this head element, other elements can declare that they can be used wherever the head element is referenced in the schema. These elements are known as the members of the substitution group. The one restriction on the members is their types must be valid derivations of the type of the head element. This declaration is made through a substitutionGroup attribute that references the head element in each interchangeable element--for instance:

<xs:element name="simple-name" type="string32"
  substitutionGroup="name"/>
             
<xs:element name="full-name" substitutionGroup="name">
  <xs:complexType>
    <xs:all>
      <xs:element name="first" type="string32" minOccurs="0"/>
      <xs:element name="middle" type="string32" minOccurs="0"/>
      <xs:element name="last" type="string32"/>
    </xs:all>
  </xs:complexType>
</xs:element>

The effect of these declarations is these two elements can be used every time the head is used in the schema, such as in the definition of the character and author elements:

<xs:element name="character">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="qualification"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>
             
<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

12.1.2.2. Abstract elements

If we keep our schema like we've just seen it, the usage of the head in the instance documents is allowed, and since our head element allows any content, this is probably not something we would want. We need to use a mechanism similar to the abstract types we saw when we encountered the same kind of problem with xsi:type in Chapter 7, "Creating Complex Datatypes". We will define the head element as abstract using the abstract attribute in the definition of the head element, which then becomes:

<xs:element name="name" abstract="true"/>

12.1.2.3. Trees of substitution groups

What if our French offices define a composed-name element that is similar to the full name without its middle subelement? We may just add this element directly to our substitution group, but defining it as having the name element as its head will not clearly show the similarities between this new element and the full-name element. Furthermore, some applications might need to specify that they accept either full-name or composed-name. The solution is to use full-name as the head of a new substitution group. To do this, we need to define the type of the full-name element as global to show the explicit derivation between the two elements:

<xs:complexType name="full-name-type">
  <xs:all>
    <xs:element name="first" type="string32" minOccurs="0"/>
    <xs:element name="middle" type="string32" minOccurs="0"/>
    <xs:element name="last" type="string32"/>
  </xs:all>
</xs:complexType>
              
<xs:element name="full-name" substitutionGroup="name"
  type="full-name-type"/>
             
<xs:element name="composed-name" substitutionGroup="full-name">
  <xs:complexType>
    <xs:complexContent>
      <xs:restriction base="full-name-type">
        <xs:all>
          <xs:element name="first" type="string32" minOccurs="0"/>
          <xs:element name="last" type="string32"/>
        </xs:all>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>

We have now defined not only two substitution groups (with the name and full name heads), but also a tree of substitution groups, since the allowed substitutions for name will include both full name and simple name, but also composed-name!

12.1.3. Traditional Declarations or Substitution Groups?

If we look back at the two solutions that we used to solve the same issue, we see that substitution groups are more extensible than a traditional group which uses a xs:choice compositor. While the element group can only be derived using a xs:redefine inclusion, the substitution group can be extended with new possible elements by just defining them. These elements can be defined in any namespace; the only constraint is that their types must be the same or valid derivations of the head element's type. (This restriction is justified to ensure that applications are not too surprised by an unexpected content model.)

There is another difference to note, though. We have seen that the derivation of a content model using a xs:choice compositor cannot extend the scope of the choice and add new alternatives. The situation for substitution groups is almost the opposite. Although the Recommendation says that substitution groups should be validated as choices, it doesn't define the order of the elements in the equivalent choice. I do not advise restricting substitution groups in practice, since it may lead to interoperability issues between schema processors.

We then have a paradoxical situation where one of the mechanisms (xs:all) can only be restricted while the other (substitution groups) can only be extended, even though the Recommendation states that these two mechanisms are equivalent as far as validation is involved. This characteristic needs to be taken into account when choosing between them.

The differences between these two features are summarized in Table 12-1, "Not advised" stands for "may work with some schema processors but relies on a liberal interpretation of the Recommendation, which may lead to interoperability issues."

Table 12-1. Element versus substitution groups

Feature	Element groups with a `xs:choice(outside a group)` compositor.	Substitution groups.
Definition	Centralized, using `xs:group(definition)` and `xs:choice(outside a group)`.	Spread over global element definitions, using the `substitutionGroup` attribute.
Constraints on the choices	No constraint: the elements can be totally different.	The type of the elements needs to be an explicit derivation of the type of the head.
Allows global elements	Yes.	Yes.
Allows local elements	Yes.	No.
Restriction to remove choices	Yes, though `xs:redefine`.	Not advised.
Extension to add choices	Not advised.	Yes, by adding new elements with the same head element.
Extension to add new elements in sequence	Yes, through `xs:redefine`.	No.

12.1.4. Fuzzy Recommendation

Both the extension of xs:choice during element group redefinitions and the restriction of substitution groups are very fuzzy in the Recommendation and require some explanation.

12.1.4.1. Extension of `xs:choice` through group redefinitions

If we return to our group that is defined as:

<xs:group name="name">
  <xs:choice>
    <xs:element ref="simple-name"/>
    <xs:element ref="full-name"/>
  </xs:choice>
</xs:group>

There doesn't seem to be anything in the recommendation that explicitly forbids redefinition of this group to add another element in the choice by writing:

<xs:redefine schemaLocation="foo.xsd">
  <xs:group name="name">
    <xs:choice>
      <xs:group ref="name"/>
      <xs:element ref="bar"/>
    </xs:choice>
  </xs:group>
</xs:redefine>

However, the effect of this redefinition is to allow a new element (bar) to be accepted instead of simple-name and full-name. Although this would be a nice feature, the principles of redefinition by restriction (i.e., when the content of the group is restricted during a restriction) are the same as the principles of the complex type derivation by restriction. The intention of the Working Group seems to be to define the features of redefinitions by extension after the complex type derivation by extension, which explicitly forbids the addition of new particles in a xs:choice(outside a group).

Although some schema processors do support this feature and some specialists consider it fine, I do not advise using it, since it seems to violate the intent (if not the wording) of the Recommendation.

12.1.4.2. Restricting substitution groups

The restriction of the substitution groups is quite the opposite. The intent of the Working Group seems to be to allow such restrictions while the wording of the Recommendation makes its result undefined.

The Recommendation clearly specifies that during the check to determine if a particle is a valid restriction of another particle, substitution groups should be treated as xs:choice, which is a clear indication that substitution groups could be restricted through complex type derivations by restriction. To illustrate this, let's take the definition of the complex type of the element author, using the substitution group whose head is name, as defined previously:

<xs:complexType name="authorType">
  <xs:sequence>
    <xs:choice>
      <xs:element ref="name"/>
      <xs:element ref="simple-name"/>
      <xs:element ref="full-name"/>
    </xs:choice>
    <xs:element ref="born"/>
    <xs:element ref="dead" minOccurs="0"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>

If substitution groups are treated like xs:choice, and assuming that our head isn't defined as abstract, this definition is equivalent to:

<xs:complexType name="authorType">
  <xs:sequence>
    <xs:element ref="name"/>
    <xs:element ref="born"/>
    <xs:element ref="dead" minOccurs="0"/>
  </xs:sequence>
  <xs:attribute ref="id"/>
</xs:complexType>

It should be possible to derive this complex type by restriction by writing, for instance:

<xs:complexType name="restrictedAuthorType">
  <xs:complexContent>
    <xs:restriction base="authorType">
      <xs:sequence>
        <xs:choice>
          <xs:element ref="simple-name"/>
          <xs:element ref="full-name"/>
        </xs:choice>
        <xs:element ref="born"/>
        <xs:element ref="dead" minOccurs="0"/>
      </xs:sequence>
    </xs:restriction>
  </xs:complexContent>
</xs:complexType>

However, the Recommendation also states that during the derivation by restriction of a xs:choice compositor, "there is a complete order-preserving functional mapping" between the particles used to define the derived and original xs:choice. However, it does not define the order of the particles when substitution groups are mapped into xs:choice. Depending on the order chosen by the schema validator to build the xs:choice out of the substitution group, our derivation can thus be either valid or invalid!

Chapter 12. Creating More Building Blocks Using Object-Oriented Features

Contents:

12.1. Substitution Groups

12.1.1. Using a "Traditional" Group

12.1.2. Substitution Groups

12.1.2.1. Using substitution groups

12.1.2.2. Abstract elements

12.1.2.3. Trees of substitution groups

12.1.3. Traditional Declarations or Substitution Groups?

Table 12-1. Element versus substitution groups

12.1.4. Fuzzy Recommendation

12.1.4.1. Extension of `xs:choice` through group redefinitions

12.1.4.2. Restricting substitution groups


11.4. Beware the Intrusive Nature of These Features...		12.2. Controlling Derivations

Chapter 12. Creating More Building Blocks Using Object-Oriented Features

Contents:

12.1. Substitution Groups

12.1.1. Using a "Traditional" Group

12.1.2. Substitution Groups

12.1.2.1. Using substitution groups

12.1.2.2. Abstract elements

12.1.2.3. Trees of substitution groups

12.1.3. Traditional Declarations or Substitution Groups?

Table 12-1. Element versus substitution groups

12.1.4. Fuzzy Recommendation

12.1.4.1. Extension of xs:choice through group redefinitions

12.1.4.2. Restricting substitution groups

12.1.4.1. Extension of `xs:choice` through group redefinitions