The Internet (or an intranet) is a network that links different computers together. Before we can start writing web applications, we must understand how the output from these systems actually gets from the server to the browser, which means that we have to learn a little about how the Internet and the Web work.
OAS and WebDB use standard Internet conventions and protocols to send resources to a client. The most important parts of this interchange are:
A TCP/IP network to connect the server to the client
A software communication port to serve as a collection point for incoming requests
A transfer protocol called HTTP to govern how server and client communicate
A client program called a web browser to allow users to request and receive resources from the OAS or WebDB server
A uniform resource locator (URL) to allow the browser to find a particular resource
A MIME type to tell the browser what to do with resources once received from the OAS or WebDB server
The following sections briefly describe each of these parts.
Browsers connect to an OAS or WebDB server using the TCP/IP networking protocol. Although there are a number of different types of networking protocols, such as DECNet or IPX, web systems only work with TCP/IP. Fortunately, more and more operating systems have this functionality built in, including Unix, Windows 95, Windows 98, Windows NT, OS/2, and Linux.
Every machine on a TCP/IP network is identified by a four-part IP address. Each number in the address can range from 0 to 255, and the four numbers are separated by periods. For example, 253.4.99.17 might be the address for the machine running the human resources department's web server. Every machine on a TCP/IP network has a unique IP address.
Most TCP/IP networks have a special class of servers called Domain Name Servers (DNSs). Their job is to translate IP addresses into meaningful hostnames that are easy to remember. For example, assigning the address 253.4.99.17 to the name "HR" in the DNS allows users to refer to the human resources server as "HR," rather than its actual IP address.
A software p ort (as opposed to a physical hardware port) is a common reference point on the server that is used to exchange messages. Each TCP/IP-based networking application, like OAS or WebDB, is assigned a specific port that it monitors for incoming requests. Client programs that need to communicate with the server connect to the server's assigned port. Once connected, the two systems exchange information according to a standard protocol (HTTP, FTP, etc.). Each port is identified by a port number, its ordinal position in the range of all ports. On Unix systems, for example, there are 64,536 different ports.
TIP: As a security precaution, a user with root privilege must start programs that use the first 1024 ports. Less privileged users can use ports higher than 1024.
A transfer protocol is a convention that governs how systems exchange information. Take, for example, a phone conversation. When you call someone, you (hopefully!) don't start blurting out whatever comes to mind as soon as they pick up the receiver. Instead, your conversation follows a set pattern that civilized society has agreed upon to make communication more efficient:
I initiate a conversation by calling you.
You say "Hello."
I identify myself.
We exchange a message.
We say "Goodbye."
We hang up.
This sort of formalized exchange is the idea behind a protocol: it lets the sender and receiver know the order in which communications will occur. While computers use much more formalized systems than humans, the idea is basically the same.[ 1 ] OAS and WebDB follow a standard Internet protocol called HyperText Transfer Protocol (HTTP) to communicate with client web browsers. OAS supports HTTP 1.0 and HTTP 1.1, while WebDB supports only HTTP 1.0.
[1] Sometimes it's almost identical; SMTP communications begin when the client says "HELO" to the server!
By convention, several special TCP ports are associated with specific protocols. For example, port 21 is usually used for FTP, port 25 is used for SMTP (a common email protocol), and port 80 is used for HTTP.
Protocols vary in complexity. Unlike client/server protocols, such as SQL*Net or Net8, HTTP is relatively simple because it is stateless , meaning that the client and server terminate their connection once their conversation is complete. Unlike client/server systems, which maintain state by keeping open a continuous connection to the database, HTTP systems are connected only in bursts and not for the duration of the session.
Because the client and server forget everything that happened during previous connections, developers must take explicit steps to maintain information, or state , from page to page. In other words, there are no global variables in a web application; they are all local. Anything you want to retain from screen to screen has to be stored and retrieved in every page. For example, if you're building a web-based threaded discussion list that begins with a login screen, you must manually program it to remember the login information. We'll discuss strategies for doing this in Chapter 7, The PL/SQL Toolkit , and Chapter 8, Developing Applications .
Users request information from a WebDB or OAS server using a web browser such as Microsoft Internet Explorer or Netscape Communicator. The browser is responsible for presenting web content on these servers to the user. In the early days of the Web, a browser could handle only basic HTML and text documents, but the explosion of web content has turned the browser into an information kiosk, multimedia center, and minicomputer all rolled into one. For example, most modern browsers can display an HTML document filled with pictures, sounds, and even movies. With the advent of Java, the browser has become a virtual machine , a computer within a computer capable of running Java programs.
There are a number of browsers on the market, and each one behaves slightly differently. For example, the appearance of any given HTML document often varies from browser to browser. To differentiate their product from the competition, browser vendors add features that work only with their browser. You should test your content on a number of different browsers, even if your company has adopted a standard, since many users refuse to give up browsers to which they are fanatically attached. Additionally, more and more people are dialing in from home, and they will often have older (or, depending on your company, newer) software than your company standard.
Uniform Resource Locators (URLs) are used to request a resource from an OAS or WebDB server independently of the operating system used on the machine. A URL abstracts the machine name, resource path, and resource name into a string with the following syntax:
protocol: // server:port / path / resource?query_string
Specifies the network protocol that the browser and the server use to communicate. The most common values are HTTP and FTP.
Identifies the name of the machine that hosts the resource. Although you can use the machine's IP address, it's better to use the name defined in the DNS since it helps isolate the URL from the network reconfiguration.
Specifies the TCP port used by the OAS or WebDB server. If the port is omitted, then port 80 is used by default.
Specifies the virtual directory or schema containing the resource. The path usually maps to either a virtual directory mapping on the web server or, in Oracle web servers, to a Database Access Descriptor (DAD), a logical name used to map a procedure call in a URL to the database schema in which it resides.
Typically specifies the actual name of the file to return. If the name is omitted, the listener returns a default file, if one is available. The name of the default file varies: index.html is used on many Unix systems, and default.htm is usually used on Windows NT systems. On Oracle web servers, the resource can also correspond to a PL/SQL procedure.
Optionally passes parameters to dynamic resources. The string begins with a question mark (?) and is followed by ampersand (&)-delimited sets of name/value pairs. Each name/value pair consists of a parameter name followed by an equals sign (=) and a value for the parameter. The parameter name must match a name in the procedure's formal parameter list. The parameter value must be encoded by converting its nonalphanumeric characters to their hexadecimal equivalents; converted characters are preceded with a percent sign (%). The exceptions to this rule are the underscore character, which is left alone, and the space, which can be converted to either a plus (+) sign or %20. For example, "w/in second(s)" converts to "w%2Fin+second%28s%29".
You can omit the server, port, and path sections from hyperlinks (links the user clicks to go to other locations) inside other documents, which allows you to create relative, rather than absolute, URLs. Relative URLs are like relative directories in a filesystem: they let you describe the location of one resource in relation to the current resource. Most resources don't stand on their own; they are part of a larger hierarchical site that usually begins with a "home" and branches out from there. There are practical as well as aesthetic reasons to define a site's structure using relative rather than absolute URLs. For example, if a site is moved to a new host, the server section on all the links in the site must be changed to the new hostname. This is very tedious work. If the site is defined using relative URLs, however, the relative structure of its pages is unaffected by the move.
You create a relative URL by omitting the server, port, and path section from the URLs for hyperlinks and for
ACTION
attributes in HTML forms. The omitted information is filled in with the server and path information for the current resource, just as the file path information in an operating system command can be assumed from the current directory. The "current directory" of a URL is called the
base URL
.
For example, if a page's URL is http://betty/somepage.html, links on that page to other resources on the site do not have to explicitly include "betty" in the URL. Instead, they can simply begin with the path and name of other resources. The server part, "betty," is implied by the base URL. You can even include new subdirectories off the base URL.
Every resource is associated with a MIME type that tells the browser what to do with the resource once the transfer is complete (e.g., display it in the main window, launch a file viewer, and so on). MIME, which stands for Multipurpose Internet Mail Extensions, is a standard for exchanging various types of files (such as images, text, and video) over the Internet so that each computer platform, whether NT, Unix, or VMS, will interpret and correctly handle the resource's contents.
MIME types describe the data format using two parts. The first part, the
type
, identifies the resource's general format, such as
text
,
image
, or
audio
. The second part, the
subtype
, identifies the resource's specific data format. For example, the subtypes for the
image
type include
gif
and
jpeg
. The type and subtype are delimited with a slash (/ ); for example, a picture's full MIME type could be
image/gif
or
image/jpeg
.
The default for WebDB and OAS is
text/html
.
Browsers must be configured to handle each MIME type. Almost all browsers can display
text/plain
,
text/html
, and
image/jpeg
documents without any extra configuration. When a browser receives a document with a MIME type it doesn't recognize, it asks the user to select a helper program to display the document. This is similar to selecting a file association based on a file extension in Windows (i.e., mapping the
.doc
extension to the Microsoft Word application). Once the user makes an association, subsequent requests for that MIME type are opened using the associated application.
I've covered a lot of important material in this section. It might be useful to summarize it by making an analogy to the telephone system network, as shown in Table 2.1 .
Term |
Analogy |
---|---|
Resources |
A resource is like the message you want to transmit during the call. It's the actual information you want to send or receive. |
TCP/IP network |
The TCP/IP network is like the standards used by the phone company to route your call from your phone to the person that you're calling. |
Port |
The port is like the circuit that's opened across the network. It is the conduit through which the message is sent. |
HTTP |
HTTP is like the "hello" and "goodbye" parts of your conversation, the agreed-upon convention that governs how the conversation takes place. |
Web browser |
The web browser is like the telephone, the component that allows you to place the call. |
URL |
The URL is like the phone number, the convention that associates a particular resource with an abstract location. |
MIME type |
The MIME type is like the "nature" of the conversation (i.e., business, pleasure, etc.). It is the specific classification of the message; additionally, it implies a specific action that must be taken based on the message. |
Copyright (c) 2000 O'Reilly & Associates. All rights reserved.