Things are going so well here at Butterthlies, Inc. that we are hard put to keep up with the flood of demand. Everyone, even the cat, is hard at work typing in orders that arrive incessantly by mail and telephone.
Then someone has a brainstorm: "Hey," she cries, "let's use the Internet to take the orders!" The essence of her scheme is simplicity itself. Instead of letting customers read our catalog pages on the Web and then, drunk with excitement, phone in their orders, we provide them with a form they can fill out on their screens. At our end we get a chunk of data back from the Web, which we then pass to a script or program we have written. This brings us into the world of scripting, where the web site can take a much more active role in interacting with users. These tools make Apache a foundation for building applications, not just publishing web pages.
While many sites act as simple repositories, providing users with a collection of files they can retrieve and navigate through with hyperlinks, web sites are capable of much more sophisticated interactions. Sites can collect information from users through forms, customize their appearance and their contents to reflect the interests of particular users, or let users interact with a wide variety of information sources. Sites can also serve as hosts for services provided not to browsers but to other computers, as "web services" become a more common part of computing.
Apache provides a solid foundation for applications, using its core web server to manage HTTP transactions and a wide variety of modules and interfaces to connect those transactions to programs. Developers can create logic that manages a much more complex flow of information than just reading pages, they can use the development environment of their choice, as well as Apache services for HTTP, security, and other web-specific aspects of application design. Everything from simple inclusion of changing information to sophisticated integration of different environments and applications is possible.
In publishing a site, we've been focusing on only one method of the HTTP protocol, GET. Apache's basic handling of GET is more than adequate for sites that just need to publish information from files, but HTTP (and Apache) can support a much wider range of options. Developers who want to create interactive sites will have to write some programs to supply the basic logic. However, many useful tasks are simple to create, and Apache is quite capable of supporting much more complex applications, including applications that connect to databases or other information sources.
Every HTTP request must specify a method. This tells the server how to handle the incoming data. For a complete account, see the HTTP 1.1 specification (http://www.w3.org/Protocols/rfc2616/rfc2616.html). Briefly, however, the methods are as follows:
Note that servers do not have to implement all these methods. See RFC 2068 for more detail. The most commonly used methods are GET and POST, which handle the bulk of interactions with users.
Forms are the most common type of interaction between users and web applications, providing a much wider set of possibilities for user input than simple hypertext linking. HTML provides a set of components for collecting information from users, which HTTP then transmits to the server using your choice of methods. On the server side, your application processes the information sent from the form and generally replies to the user as you deem appropriate.
Creating the form is a simple matter of editing our original brochure to turn it into a form. We have to resist the temptation to fool around, making our script more and more beautiful. We just want to add four fields to capture the number of copies of each card the customer wants and, at the bottom, a field for the credit card number.
The catalog, now a form with the new lines marked:
<!-- NEW LINE - <explanation> -->
looks like this:
<html>
<body>
<FORM METHOD="POST" ACTION="cgi-bin/mycgi.cgi">
<!-- see text -->
<h1> Welcome to Butterthlies Inc</h1>
<h2>Summer Catalog</h2>
<p> All our cards are available in packs of 20 at $2 a pack.
There is a 10% discount if you order more than 100.
</p>
<hr>
<p>
Style 2315
<p align="center">
<img src="bench.jpg" alt="Picture of a bench">
<p align="center">
Be BOLD on the bench
<p>How many packs of 20 do you want? <INPUT NAME="2315_order" >
<!-- new line -->
<hr>
<p>
Style 2316
<p align="center">
<img src="hen.jpg" alt="Picture of a hencoop like a pagoda">
<p align="center">
Get SCRAMBLED in the henhouse
<p>How many packs of 20 do you want? <INPUT NAME="2316_order" >
<HR>
<p>
Style 2317
<p align="center">
<img src="tree.jpg" alt="Very nice picture of tree">
<p align="center">
Get HIGH in the treehouse
<p>How many packs of 20 do you want? <INPUT NAME="2317_order">
<!-- new line -->
<hr>
<p>
Style 2318
<p align="center">
<img src="bath.jpg" alt="Rather puzzling picture of a batchtub">
<p align="center">
Get DIRTY in the bath
<p>How many packs of 20 do you want? <INPUT NAME="2318_order">
<!-- new line -->
<hr>
<p> Which Credit Card are you using?
<ol>
<li>Access <INPUT NAME="card_type" TYPE="checkbox" VALUE="Access">
<li>Amex <INPUT NAME="card_type" TYPE="checkbox" VALUE="Amex">
<li>MasterCard <INPUT NAME="card_type" TYPE="checkbox" VALUE="MasterCard">
</ol>
<p>Your card number? <INPUT NAME="card_num" SIZE=20>
<!-- new line -->
<hr>
<p align=right>
Postcards designed by [email protected]
<hr>
<br>
Butterthlies Inc, Hopeful City, Nevada, 99999
</br>
<p><INPUT TYPE="submit"><INPUT TYPE="reset">
<!-- new line -->
</FORM>
</body>
</html>
This is all pretty straightforward stuff, except perhaps for the line:
<FORM METHOD="POST" ACTION="/cgi-bin/mycgi.cgi">
which on Windows might look like this:
<FORM METHOD="POST" ACTION="mycgi.bat">
The tag <FORM> introduces the form; at the bottom, </FORM> ends it. The METHOD attribute tells Apache how to return the data to the CGI script we are going to write, in this case using POST.
In the Unix case, the ACTION attribute tells Apache to use the URL cgi-bin/mycgi.cgi (which the server may internally expand to /usr/www/cgi-bin/mycgi.cgi, depending on server configuration) to do something about it all:
It would be good if we wrote perfect HTML, which this is not. Although most browsers allow some slack in the syntax, they don't all allow the same slack in the same places. If you write HTML that deviates from the standard, you have to expect that your pages will behave oddly somewhere, sometime. To make sure you have not done so, you can submit your pages to a validator — for instance, http://validator.w3.org.
For more information on the many HTML features used to create forms, see HTML & XHTML: The Definitive Guide by Chuck Musciano and Bill Kennedy (O'Reilly, 2002).
While HTML forms are likely the most common use for application logic on web servers, there are many other cases where users interact with applications without necessarily filling out forms. Large sites often use content-management systems to store the information the site presents in databases, generating content regularly even though it may look to users exactly like an ordinary site with static files. Even smaller sites may use tools like Cocoon (discussed in Chapter 19) to manage and generate content for users.
Many sites create customized experiences for their users, making suggestions based on prior visits to the site or information users have provided previously. These sites typically use "cookies," a mechanism that lets sites store a tiny amount of information on the user's computer and that the browser will report each time the user visits the site. Cookies may last for a single session, expiring when the user quits the browser, or they may last longer, expiring at some preset date. Cookies raise a number of privacy issues, but are frequently used in applications that interact with users over more than a single transaction. Using mechanisms like this, a web site might in fact generate every page a user sees, customizing the entire site.
Building complex web applications is well beyond the scope of this book, which focuses on the Apache server you would use as their foundation. For more on web-application design in general, see Information Architecture for the World Wide Web by Louis Rosenfeld and Peter Morville (O'Reilly, 2002). For more on application design in specific environments, see the books referenced in the environment-specific chapters.
Copyright © 2003 O'Reilly & Associates. All rights reserved.