What Does a Web Server Do?
How Apache Works
Apache and Networking
How HTTP Clients Work
What Happens at the Server End?
Planning the Apache Installation
Windows?
Which Apache?
Installing Apache
Building Apache 1.3.X Under Unix
New Features in Apache v2
Making and Installing Apache v2 Under Unix
Apache Under Windows
Apache is the dominant web server on the Internet today, filling a key place in the infrastructure of the Internet. This chapter will explore what web servers do and why you might choose the Apache web server, examine how your web server fits into the rest of your network infrastructure, and conclude by showing you how to install Apache on a variety of different systems.
The whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.
When you fire up your browser and connect to the URL of someone's home page — say the notional http://www.butterthlies.com/ we shall meet later on — you send a message across the Internet to the machine at that address. That machine, you hope, is up and running; its Internet connection is working; and it is ready to receive and act on your message.
URL stands for Uniform Resource Locator. A URL such as http://www.butterthlies.com/ comes in three parts:
<scheme>://<host>/<path>
So, in our example, < scheme> is http, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> is www.butterthlies.com ; and <path> is /, traditionally meaning the top page of the host.[2] The <host> may contain either an IP address or a name, which the browser will then convert to an IP address. Using HTTP 1.1, your browser might send the following request to the computer at that IP address:
[2]Note that since a URL has no predefined meaning, this really is just a tradition, though a pretty well entrenched one in this case.
GET / HTTP/1.1 Host: www.butterthlies.com
The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in four parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) /; the version of the protocol we are using; and a series of headers that modify the request (in this case, a Host header, which is used for name-based virtual hosting: see Chapter 4). It is then up to the web server running on that host to make something of this message.
The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom or just a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message.
What do we want a web server to do? It should:
Run fast, so it can cope with a lot of requests using a minimum of hardware.
Support multitasking, so it can deal with more than one request at once and so that the person running it can maintain the data it hands out without having to shut the service down. Multitasking is hard to arrange within a program: the only way to do it properly is to run the server on a multitasking operating system.
Authenticate requesters: some may be entitled to more services than others. When we come to handling money, this feature (see Chapter 11) becomes essential.
Respond to errors in the messages it gets with answers that make sense in the context of what is going on. For instance, if a client requests a page that the server cannot find, the server should respond with a "404" error, which is defined by the HTTP specification to mean "page does not exist."
Negotiate a style and language of response with the requester. For instance, it should — if the people running the server can rise to the challenge — be able to respond in the language of the requester's choice. This ability, of course, can open up your site to a lot more action. There are parts of the world where a response in the wrong language can be a bad thing.
Support a variety of different formats. On a more technical level, a user might want JPEG image files rather than GIF, or TIFF rather than either of those. He might want text in vdi format rather than PostScript.
Be able to run as a proxy server. A proxy server accepts requests for clients, forwards them to the real servers, and then sends the real servers' responses back to the clients. There are two reasons why you might want a proxy server:
The proxy might be running on the far side of a firewall (see Chapter 11), giving its users access to the Internet.
The proxy might cache popular pages to save reaccessing them.
Be secure. The Internet world is like the real world, peopled by a lot of lambs and a few wolves.[3] The aim of a good server is to prevent the wolves from troubling the lambs. The subject of security is so important that we will come back to it several times.
[3]We generally follow the convention of calling these people the Bad Guys. This avoids debate about "hackers," which to many people simply refers to good programmers, but to some means Bad Guys. We discover from the French edition of this book that in France they are Sales Types -- dirty fellows.
Apache has more than twice the market share than its next competitor, Microsoft. This is not just because it is freeware and costs nothing. It is also open source,[4] which means that the source code can be examined by anyone so inclined. If there are errors in it, thousands of pairs of eyes scan it for mistakes. Because of this constant examination by outsiders, it is substantially more reliable[5] than any commercial software product that can only rely on the scrutiny of a closed list of employees. This is particularly important in the field of security, where apparently trivial mistakes can have horrible consequences.
[4]For more on the open source movement, see Open Sources: Voices from the Open Source Revolution (O'Reilly & Associates, 1999).
[5]Netcraft also surveys the uptime of various sites. At the time of writing, the longest running site was http://wwwprod1.telia.com, which had been up for 1,386 days.
Anyone is free to take the source code and change it to make Apache do something different. In particular, Apache is extensible through an established technology for writing new Modules (described in more detail in Chapter 20), which many people have used to introduce new features.
Apache suits sites of all sizes and types. You can run a single personal page on it or an enormous site serving millions of regular visitors. You can use it to serve static files over the Web or as a frontend to applications that generate customized responses for visitors. Some developers use Apache as a test-server on their desktops, writing and trying code in a local environment before publishing it to a wider audience. Apache can be an appropriate solution for practically any situation involving the HTTP protocol.
Apache is freeware . The intending user downloads the source code and compiles it (under Unix) or downloads the executable (for Windows) from http://www.apache.org or a suitable mirror site. Although it sounds difficult to download the source code and configure and compile it, it only takes about 20 minutes and is well worth the trouble. Many operating system vendors now bundle appropriate Apache binaries.
The result of Apache's many advantages is clear. There are about 75 web-server software packages on the market. Their relative popularity is charted every month by Netcraft (http://www.netcraft.com). In July 2002, their June survey of active sites, shown in Table 1-1, had found that Apache ran nearly two-thirds of the sites they surveyed (continuing a trend that has been apparent for several years).
Developer |
May 2002 |
Percent |
June 2002 |
Percent |
---|---|---|---|---|
Apache |
10411000 |
65.11 |
10964734 |
64.42 |
Microsoft |
4121697 |
25.78 |
4243719 |
24.93 |
iPlanet |
247051 |
1.55 |
281681 |
1.66 |
Zeus |
214498 |
1.34 |
227857 |
1.34 |
Copyright © 2003 O'Reilly & Associates. All rights reserved.