Contents:
Motivations for XML
XML Syntax
The Document Type Definition (DTD)
The XML Parser
Example: Generating an XML Invoice from Oracle
PLSXML Utilities and Demos
XML and iFS
Extensible Markup Language (XML) is an emerging standard closely related to Standardized General Markup Language (SGML), the granddaddy of all markup languages, which was designed by the U.S. government to create complex documents. Realizing that SGML was simply too complicated for his purposes, Tim Berners-Lee (the inventor of the Web) used SGML to create HTML, and the rest is history.
Now that the Web has matured, however, developers are starting to miss some of SGML's capabilities. XML is an attempt to find a middle ground between the complexities of SGML and the ease of use of HTML. Like HTML, XML employs a tag-based syntax to mark up ASCII text. Unlike HTML, which controls the appearance of a document, XML describes the meaning and structure of a document by defining a syntax and grammar for creating new tags. XML is extensible because it lets you define your own tag vocabulary (as long as it follows the rules of the XML specification) for creating meaningful documents.
Although it's currently being touted as "HTML done right," XML is actually a lot more. It has a number of potential uses as a tool for integrating disparate systems and building electronic commerce systems. The XML specification provides an open framework for exchanging complex, structured documents (such as purchase orders, invoices, insurance claims, etc.) among different computer systems. In one fell swoop, XML eliminates network dependencies such as TCP/IP or IPX, protocol dependencies such as SQL*Net or ODBC, hardware dependencies such as Intel or Alpha, operating system dependencies such as Windows NT or Unix, and even database dependencies such as Oracle or SQL Server. In fact, the implications of XML are so profound that it even threatens the Fort Knox of the database world -- delimited flat files!
While you might expect XML to be enormously complicated, it's really just a formal implementation of a wonderfully simple idea: that the structure and meaning of a document's contents should be indicated inside, not outside, the text of the document itself. An example can help make this idea clear. Suppose you receive the following comma-delimited file:
876514234,05/21/1999, Megaplex Industries PN-5324,Super Duper Widget,5,19.99 PN-6354,Not So Super Duper Widget,2,9.99 119.93
While it's clear that this file contains some sort of structured information, we have no way to tell exactly what it might be; about all we know for certain is that the first line might contain a date. This is the problem with delimited files. Until you have the file's columnar layout, its secret decoder ring, you can't do anything meaningful with it.
Now suppose you receive the same information in XML format:
<?xml version="1.0"?> <!DOCTYPE INVOICE SYSTEM "invoice.dtd"> <INVOICE> <INVOICE_NUMBER>876514234</INVOICE_NUMBER> <DATE>05/21/1999</DATE> <CUSTOMER>Megaplex Industries</CUSTOMER> <INVOICE_ITEMS> <ITEM> <ITEM_NAME ITEM_NUM="PN-5342">Super Duper Widget</ITEM_NAME> <QUANTITY>5</QUANTITY> <PRICE>19.99</PRICE> </ITEM> <ITEM> <ITEM_NAME ITEM_NUM="PN-6354">Not So Super Duper Widget</ITEM_NAME> <QUANTITY>2</QUANTITY> <PRICE>9.99</PRICE> </ITEM> </INVOICE_ITEMS> <TOTAL>119.93</TOTAL> </INVOICE>
The XML version leaves no doubt about the file's purpose or structure: it's an invoice consisting of two items. Knowing this, we can deduce the structure of the original file. The first line contains basic information, such as the invoice number, the invoice date, the invoice total, and the customer to whom it is being sent. The next two lines are invoice items, and consist of a part number, a name, an order quantity, and a unit cost. The last line is the invoice total.
The difference between the first file and the second is that the XML file contains a decoder ring within its own text, making the meaning of each element in the document explicitly clear. While XML certainly doesn't eliminate the need for comma-delimited files (for example, they will always be useful for loading data in bulk), the previous example shows how it could be used in an electronic commerce setting to exchange invoice data. XML, combined with encryption and digital signature technologies,[ 1 ] offers a reasonably straightforward way for businesses to exchange information simply and securely.
[1] A digital signature computes an encrypted checksum (also called a hash function) for a document that guarantees the document's integrity and authenticity. Integrity means that no one has tampered with the file, and authenticity means the file is actually from the person who says he or she sent it. Phil Zimmerman's Pretty Good Privacy (PGP) is a widely available and popular encryption system that can produce a digital signature.
NOTE: Of course, to take advantage of XML's full potential, everyone must adopt a standard set of domain-specific tags and nomenclature. Although this is probably a greater challenge than XML's technical aspects (since it requires people to agree on something!), several industries' experiences with SGML give some hope, at least, that this can happen. Companies in the semiconductor industry (Intel, Hitachi, Texas Instruments, etc.) have adopted an SGML standard for exchanging chip data.
This chapter will help you get your feet wet with XML by showing you how to generate XML documents using WebDB or OAS. We'll start with a brief discussion of the motivations behind XML, then move on to the major skills you'll need to generate XML from the Oracle database: creating syntactically correct XML documents and formally defining rules that they must follow. From there, we'll cover how a program called an XML parser is used to check the structure of the document and, if it's valid, break it into a hierarchical structure called a document tree. After that, we'll write a PL/SQL program to generate the invoice we looked at earlier. Finally, we'll examine the future directions of XML and how it relates to Oracle8 i 's Internet File System.
You probably noticed that the invoice example looks remarkably similar to a standard HTML document, except that there are a lot of new tags. These similarities are intentional. The XML specification was created in response to the evolution (some would say devolution) of HTML.
HTML started as a simple way to define the structure of a document. The
<head>
and
<body>
tags separate descriptive information from the main text. The header tags (
<h1>
,
<h2>
, etc.) break the text into logical sections, much like the A and B headings in an outline. The emphasis tag
<em>
denotes particularly important information.
As the Web has evolved, however, the original intent of these tags has been lost. They are now used to control a document's appearance, rather than its structure. Browser vendors have exacerbated this trend by adding new tags explicitly for formatting. Some of these tags have been good (
<table>
), some not so good (
<blink>
), but the net effect is that HTML no longer has much to say about the purpose of the information it presents.
While this trend is not particularly important for many applications, such as creating attractive user interfaces for our PL/SQL systems, there are several reasons why it has been a change for the worse:
HTML is no longer simple.
HTML designers place more emphasis on a document's appearance than on its content.
HTML documents are very difficult for computers to understand.
The last of these problems is probably one of the most important motivations for XML. As the Web becomes increasingly automated, it has become more and more important that software "robots" understand and interpret a variety of documents. If we're ever going to make a search engine smart enough so that the query "Where can I buy a leather attach� case?" doesn't turn up links to an S&M site, we must create online catalogs a computer can easily parse and understand. HTML is simply not designed to provide this type of information. XML is.
Copyright (c) 2000 O'Reilly & Associates. All rights reserved.