22.1. The XPath Data Model
XPath views each XML document as a tree of
nodes. Each node has one of seven types:
- Root
-
Each document has exactly one root node,
which is the root of the tree. This node contains one comment node
child for each comment outside the document element, one
processing-instruction node child for each processing instruction
outside the document element, and exactly one element node child for
the document element. It does not contain any representation of the
XML declaration, the document type declaration, or any whitespace
that occurs before or after the root element. The root node has no
parent node. The root node's value is the value of
the document element.
- Element
-
An element node represents an
element. It has a name, a namespace URI, a parent node, and a list of
child nodes, which may include other element nodes, comment nodes,
processing-instruction nodes, and text nodes. An element node also
has a list of attributes and a list of in-scope namespaces, none of
which are considered to be children of the element. The value of an
element node is the complete, parsed text between the
element's start- and end-tags that remains after all
tags, comments, and processing instructions are removed and all
entity and character references are resolved.
- Attribute
-
An attribute node represents an
attribute. It has a name, a namespace URI, a value, and a parent
element. However, although elements are parents of attributes,
attributes are not children of their parent elements. The biological
metaphor breaks down here. xmlns and
xmlns:prefix attributes
are not represented as attribute nodes. An attribute
node's value is the normalized attribute value.
- Text
-
Each text node represents the maximum
possible contiguous run of text between tags, processing
instructions, and comments. A text node has a parent node but does
not have children. A text node's value is the text
of the node.
- Namespace
-
A namespace node represents a
namespace in scope on an element. In general, each namespace
declaration by an xmlns or
xmlns:prefix attribute
produces multiple namespace nodes in the document tree. Like
attribute nodes, each namespace node has a parent element but is not
the child of that parent. The name of a namespace node is the prefix.
The value of a namespace node is the namespace URI.
- Processing instruction
-
A processing-instruction node represents a
processing instruction. It has a target, data, a parent node, and no
children. The name of a processing-instruction node is its target.
The value of a processing-instruction node is the data of the
processing instruction, not including any initial whitespace.
- Comment
-
A comment node represents a comment.
It has a parent node and no children. The value of a comment is the
string content of the comment, not including the
<!-- and -->.
The XML declaration and the document type declaration are not
included in XPath's view of an XML document. All
entity references, character references, and CDATA sections are
resolved before the XPath tree is built. The references themselves
are not included as a separate part of the tree.