How configure Apache server in linux

how to configure the apache web server on an ubuntu and how to access apache server remotely
GregDeamons Profile Pic
GregDeamons,New Zealand,Professional
Published Date:03-08-2017
Your Website URL(Optional)
Comment
Chapter 11 CHAPTER 11 In this chapter: • Installing Apache Software • Configuring the Apache Server Configuring Apache • Understanding an httpd.conf File • Web Server Security • Managing Your Web Server Web servers provide the leading method for delivering information over an IP net- work. The Web is best known for providing information over the global Internet, yet it can just as effectively provide information to internal staff as it does to external customers. All but the smallest networks can benefit from a well-run web server, which can advertise products and offer support services to external customers, as well as coordinate and disseminate information to users within your organization. The Web is the single most effective tool for delivering on-demand information to end users. Most Unix web servers are built with Apache software. Apache is freely available web server software with origins in the National Center for Supercomputer Applica- tions (NCSA) web server, the first widely used web server. Because of these “ancient” roots, Apache has undergone years of testing and development. Because it is the most widely deployed web server software on the Internet, you will probably use Apache to build your Unix web server. In this chapter, we focus on installing and configuring an Apache server. The large number of configuration options can make Apache configuration appear more com- plex than it really is. This chapter provides an example of a simple configuration to get Apache up and running quickly. Our focus is configuration and administration of the service, not the design of the content provided by the service; web page design is beyond the scope of this book. If you’re lucky, your organization has trained web designers; if you’re not so lucky, you may be expected to take on this artistic task yourself. O’Reilly has books that can help you: try HTML and XHTML: The Definitive Guide, by Chuck Musciano and Bill Kennedy, or Web Design in a Nutshell, by Jennifer Niederst. 333 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Installing Apache Software The Apache server software is bundled with many Unix systems. Frequently, Apache is installed as part of the initial operating system installation. For example, the initial installation of a Red Hat system presents a screen that allows the user to select the Apache software by clicking on an icon labeled Apache Web Server. Frequently, users select the Apache server software even when they don’t plan to run a web server. You might be surprised to find an Apache server installed and running on client desktop workstations. Try a ps test: ps ax grep httpd 321 ? S 0:00 httpd 324 ? S 0:00 httpd 325 ? S 0:00 httpd 326 ? S 0:00 httpd 329 ? S 0:00 (httpd) 330 ? S 0:00 (httpd) 331 ? S 0:00 (httpd) 332 ? S 0:00 (httpd) 333 ? S 0:00 (httpd) 334 ? S 0:00 (httpd) 335 ? S 0:00 (httpd) 2539 p1 D 0:00 grep http The daemon that Apache installs to provide web services is the Hypertext Transport Protocol daemon (httpd). Use the process status (ps) command to check for all pro- cesses in the system, and the grep command to display only those with the name httpd. Running this test on a freshly installed system will show you if Apache is installed and running. If Apache is running, start the Netscape web browser and enter “localhost” in the search box. Figure 11-1 shows the result on a sample Red Hat 7 system. Not only is Apache installed and running, it is configured and responding with a web page. Users of desktop Linux systems are sometimes surprised to find out they are running a fully functional web server. Of course, if you’re the administrator of a web server system, this is exactly what you want to see—Apache installed, up, and running. If the Apache software is not installed on your system, you need to install the pack- age. The easiest way to install optional software on a Linux system is to use a pack- age manager. Several good ones are available. Most Linux systems support the Red Hat Package Manager (rpm), so we’ll use that in the following example. Using the Red Hat Package Manager Use the Red Hat Package Manager to install needed software, remove unneeded soft- ware, and check what software is installed. rpm has many options for the developers 334 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Figure 11-1. Default Apache server web page who build the packages, but for a network administrator, rpm comes down to three basic commands: rpm install package The install option installs software. rpm uninstall package The uninstall option removes software. rpm query The query option lists a software package that is already installed. Use all with the query option to list all installed packages. You must know the name of a package to install it with rpm. To find the full name of the Apache package, mount the Linux CD-ROM and look in the RPMS directory. Here is an example from a Red Hat 7.2 system: cd /mnt/cdrom/RedHat/RPMS ls apache apache-1.3.20-16.i386.rpm apacheconf-0.8.1-1.noarch.rpm Installing Apache Software 335 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.This example assumes that the CD-ROM was mounted on /mnt/cdrom. It shows that two Apache software packages are included in the Red Hat distribution: the web server software and a Red Hat configuration tool. Install apache-1.3.20-16.i386.rpm with this command to get the web server software: rpm –-install apache-1.3.20-16.i386.rpm After installing the package, check that it is installed with this rpm command: rpm query apache apache-1.3.20-16 Once the Apache package is installed, make sure the httpd daemons are started at boot time. On a Red Hat system, the script /etc/init.d/httpd starts the daemons. Use chkconfig or a similar command to add the script to the boot process. The following example adds the httpd startup script to the boot process for runlevels 3 and 5: chkconfig list httpd httpd 0:off 1:off 2:off 3:off 4:off 5:off 6:off chkconfig level 35 httpd on chkconfig list httpd httpd 0:off 1:off 2:off 3:on 4:off 5:on 6:off The first chkconfig command lists the status of the httpd script for every runlevel. The response shows that httpd is off for all seven runlevels, meaning that the script is not run. We want to start the web server at runlevel 3, which is the multiuser run- level, and at runlevel 5, which is the default runlevel for this Red Hat system. The second chkconfig command does this. The level argument specifies that runlevel 3 and runlevel 5 are affected—note that the 3 and the 5 are run together with no inter- vening spaces. The httpd on argument says that the httpd script should be executed for those two runlevels. The last chkconfig command again lists the status of the httpd script for all runlevels. This time it shows that httpd will be executed for run- level 3 and runlevel 5. The next time this Red Hat system reboots, the web server will be running. To start the web server without rebooting, invoke the httpd script from the command line: /etc/init.d/httpd start Starting httpd: OK Installing Apache on a Linux system is straightforward. It is often installed during the initial system setup; if not, it can usually be installed from the CDs that came with the system. Installing Apache on a Solaris system is just as simple because Solaris 8 also includes Apache as part of the operating system. If your Unix system does not include Apache, download it from the Internet. 336 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Downloading Apache Apache is available from http://www.apache.org in both source and binary forms. The Apache source is available for Unix systems in both compressed and zipped tarballs. You can download and compile the source, but the easiest way to get Apache is as a precompiled binary. Figure 11-2 shows just some of the versions of Unix for which precompiled httpd server daemons are available. Figure 11-2. Binary distributions at the Apache web site The binaries are listed by operating system. Assume you have a FreeBSD system. Click on the freebsd link, and you’re presented with a long list of zipped tarballs. Each tarball relates to a different version of FreeBSD and contains an Apache binary distribution. Select the binary that is appropriate for your version of FreeBSD and download it to a working directory. Make a backup copy of the current daemon and extract the new daemon with tar. The software should now be installed and ready to run with the configuration files from your current configuration. Installing Apache Software 337 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Configuring the Apache Server Apache configuration traditionally involves three files: httpd.conf This is the primary configuration file. It traditionally contains configuration set- tings for the HTTP protocol and for the operation of the server. This file is pro- cessed first. srm.conf This file traditionally contains configuration settings to help the server respond to client requests. The settings include how to handle different MIME types, how to format output, and the location of HTTP documents and Common Gateway Interface (CGI) scripts. This file is processed second. access.conf This file traditionally defines access control for the server and the information the server provides. This file is processed last. All three files have a similar structure: they are all written as ASCII text files, com- ments begin with a , and the files are well commented. Most of the directives in the files are written in the form of an option followed by a value. We say that these three files traditionally handle Apache configuration, but common practice today has diverged from that approach. There is overlap in the function of the three files. You may think you know where a certain value should be set, only to be surprised to find it in another file. In fact, any Apache configuration directive can be used in any of the configuration files—the traditional division of the files into server, data, and security functions was essentially arbitrary. Some administrators still follow tradition, but it is most common for the entire configuration to be defined in the httpd.conf file. This is the recommended approach, and the one we use in this chapter. Different systems put the httpd.conf file in different directories. On a Solaris system, the file is stored in the /etc/apache directory; on a Red Hat system, it is found in the /etc/httpd/conf directory; and on Caldera systems, in the /etc/httpd/apache/conf direc- tory. The Apache manpage should tell you where httpd.conf is located on your sys- tem; if it doesn’t, look in the script that starts httpd at boot time. The location of the httpd.conf file will either be defined by a script variable or by the -f argument on the httpd command line. Of course, a very simple way to locate the file is with the find command, as in this Caldera Linux example: find / -name httpd.conf -print /etc/httpd/apache/conf/httpd.conf Once you find httpd.conf, customize it for your system. The Apache configuration file is large and complex; however, it is preconfigured, so your server will run with only a little input from you. Edit the httpd.conf file to set the web administrator’s 338 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.email address in ServerAdmin and the server’s hostname in ServerName. With those small changes, the httpd configuration provided with your Unix system will proba- bly be ready to run. Let’s look at a Solaris 8 example. Configuring Apache on Solaris The first step to configure Apache on a Solaris system is to copy the file httpd.conf- example to httpd.conf: cd /etc/apache cp httpd.conf-example httpd.conf Use an editor to put valid ServerAdmin and ServerName values into the configura- tion. In the Solaris example, we change ServerAdmin from: ServerAdmin youyour.address to: ServerAdmin webmasterwww.wrotethebook.com We change ServerName from: ServerName new.host.name to: ServerName www.wrotethebook.com Once these minimal changes are made, the server can be started. The easiest way to do this on a Solaris system is to run the /etc/init.d/apache script file. The script accepts three possible arguments: start, restart, and stop. Since httpd is not yet running, there is no daemon to stop or restart, so we use the start command: /etc/init.d/apache start httpd starting. ps -ef grep '/httpd' nobody 474 473 0 12:57:27 ? 0:00 /usr/apache/bin/httpd nobody 475 473 0 12:57:27 ? 0:00 /usr/apache/bin/httpd nobody 476 473 0 12:57:27 ? 0:00 /usr/apache/bin/httpd root 473 1 0 12:57:26 ? 0:00 /usr/apache/bin/httpd nobody 477 473 0 12:57:27 ? 0:00 /usr/apache/bin/httpd nobody 478 473 0 12:57:27 ? 0:00 /usr/apache/bin/httpd root 501 358 0 13:10:04 pts/2 0:00 grep /httpd After running the apache startup script, run ps to verify that the httpd daemon is run- ning. In this example, several copies of the daemon are running, just as we saw ear- lier in the Linux example. This group of daemons is called the swarm, and we’ll examine the Apache configuration directives that control the size of the swarm later. The DynaWeb (dwhttpd) daemon, which is used to display the AnswerBook, may also appear in theps list on Solaris systems that run an AnswerBook2 server. Configuring the Apache Server 339 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Now that the daemons are running, run Netscape. Enter “localhost” in the location box, and you should see something like Figure 11-3. Figure 11-3. Default web page on a Solaris server Our Solaris Apache server is now up, running, and serving data. Of course, this is not really the data we want to serve our clients. There are two solutions to this prob- lem: either put the correct data in the directory that the server is using, or configure the server to use the directory in which the correct data is located. The DocumentRoot directive points the server to the directory that contains web page information. By default, the Solaris server gets web pages from the /var/apache/ htdocs directory, as you can see by checking the value for DocumentRoot in the httpd.conf file: grep 'DocumentRoot' httpd.conf DocumentRoot "/var/apache/htdocs" ls /var/apache/htdocs apache_pb.gif index.html The /var/apache/htdocs directory contains only two files. The GIF file is the Apache feather graphic seen at the bottom of the web page in Figure 11-3. The index.html file is the HTML document that creates this web page. By default, Apache looks for a file named index.html and uses it as the home page if a specific page has not been requested. You can put your own index.html file in this directory, along with any other supporting files and directories you need, and Apache will start serving your 340 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.data. Alternately, you can edit the httpd.conf file to change the value in the Document- Root directive to point to the directory where you store your data. The choice is yours. Either way, you need to create HTML documents for the web server to display. Although the Solaris server can run after modifying only two or three configuration directives, you still need to understand the full range of Apache configuration. Given the importance of web services for most networks, Apache is too essential for you to ignore. To properly debug a misconfigured web server, you need to understand the entire httpd.conf file. The following sections examine this file in detail. Understanding an httpd.conf File It’s helpful to know the default configuration when you’re called upon to correct the configuration of someone else’s system. In this section we examine the values set in the default configuration on a Solaris 8 system. (The default Solaris 8 configuration file is listed in Appendix F.) Here we focus on the directives that are actually used in the Solaris 8 configuration, and a few others that show important Apache features. There are some other direc- tives that we don’t discuss. If you need additional information about any directive, there are many places to look. The full httpd.conf file contains many comments, which explain the purpose of each directive and are an excellent source of informa- tion. The Apache web site (http://www.apache.org) provides online documentation. Two excellent books on Apache configuration are Apache: The Definitive Guide,by Ben and Peter Laurie (O’Reilly), and Linux Apache WebServer Administration ,by Charles Aulds (Sybex). However, you’ll probably find more information about the httpd.conf file than you need for an average configuration right here in this chapter. The httpd.conf file that comes with Solaris has 160 active configuration lines. To tackle that much information, the following sections organize the configuration directives into different groups. Note that the configuration file itself organizes direc- tives by scope: global environment directives, main server directives, and virtual host directives. (Virtual hosts are explained later in this chapter.) Although that organiza- tion is great for httpd when it is processing the file, it’s not so great for a human read- ing the file. Here, related directives are grouped by function to make the individual directives more understandable. Once you understand the individual directives, you will understand the entire configuration. We start our look at the httpd.conf file with the directives that load dynamically load- able modules. These modules must be loaded before the directives they provide can be used in the configuration, so it makes sense to discuss loading the modules before we discuss the features they provide. Understanding dynamically loadable modules is a good place to start understanding Apache configuration. Understanding an httpd.conf File 341 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Loading Dynamic Shared Objects The two directives that appear most in the Solaris httpd.conf file are LoadModule and AddModule. Together, they make up more than 60 of the 160 active lines in the httpd.conf file. All 60 of these lines configure the Dynamic Shared Object (DSO) modules used by the Apache web server. Apache is composed of many software modules. Like kernel modules, DSO modules can be compiled into Apache or loaded at runtime. Running httpd with the -l com- mand-line option lists all the modules compiled into Apache. The following exam- ple is from a Solaris 8 system: /usr/apache/bin/httpd -l Compiled-in modules: http_core.c mod_so.c Some systems may have many modules compiled into the Apache daemon. Solaris and Red Hat systems are delivered with only the following two modules compiled in: http_core.c This is the core module. It is always statically linked into the Apache kernel, and it provides the basic functions that must be found in every Apache web server. This module is required; all other modules are optional. mod_so.c This module provides runtime support for Dynamic Shared Object modules. It is required if you plan to dynamically link in other modules at runtime. If modules are loaded through the httpd.conf file, this module must be installed in Apache to support those modules. For this reason it is often statically linked into the Apache kernel. In addition to these statically linked modules, Solaris uses many dynamically load- able modules. The LoadModule and AddModule directives are used in the httpd.conf file to load DSOs. First, each module is identified by a LoadModule directive. For example, this line in the Solaris httpd.conf file identifies the module that tracks users through the use of cookies: LoadModule usertrack_module /usr/apache/libexec/mod_usertrack.so The LoadModule directive is followed by the module name and the path of the shared object file. Before a module can be used, it must be added to the list of modules that are avail- able to Apache. The first step in building the new module list is to clear the old one. This is done with the ClearModuleList directive. ClearModuleList has no arguments or options. It occurs in the httpd.conf file after the last LoadModule directive and before the first AddModule directive. 342 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.The AddModule directive adds a module name to the module list. The module list must include all optional modules, both those compiled into the server and those that are dynamically loaded. On our sample Solaris system, that means that there is one more AddModule directive in the httpd.conf file than there are LoadModule directives. The extra AddModule directive handles mod_so.c, which is the only optional module compiled into Apache on our sample system. Mostly, however, LoadModule and AddModule directives occur in pairs: there is one AddModule directive for every LoadModule directive. For example, the following AddModule directive in the Solaris httpd.conf file adds the usertrack_module defined by the LoadModule directive shown previously to the module list: AddModule mod_usertrack.c The AddModule directive is followed by the name of the source file for the module being loaded. Notice that this is the name of the source file that produced the object module, not the module name seen in the LoadModule directive. This name is iden- tical to the object filename except for the extension. In the LoadModule directive, which uses the shared object extension .so, the object filename is mod_usertrack.so. AddModule uses the source filename extension .c, so the module name is mod_ usertrack.c. Table 11-1 lists all the modules referenced by AddModule directives in the Solaris 8 httpd.conf file. Table 11-1. DSO modules loaded in the Solaris configuration Module Function mod_access Enables allow/deny type access controls. mod_actions Enables the use of user-defined handlers for specific MIME types or access methods. mod_alias Allows references to documents and scripts outside the document root. mod_asis Defines file types returned without headers. mod_auth Enables user authentication. mod_auth_anon Enables anonymous logins. mod_auth_dbm Enables use of a DBM authentication file. mod_autoindex Enables automatic index generation. mod_cern_meta Enables compatibility with old CERN web servers. mod_cgi Enables execution of CGI programs. mod_digest Enables MD5 authentication. mod_dir Controls formatting of directory listings. mod_env Allows CGI scripts and server-side includes (SSI) to inherit all shell environment variables. The http_core.c module is an integrated part of Apache. It is not installed with LoadModule and AddModule commands. Understanding an httpd.conf File 343 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Table 11-1. DSO modules loaded in the Solaris configuration (continued) Module Function mod_expires Set the date for the Expires: header. mod_headers Enables customized response headers. mod_imap Processes image map files. mod_include Processes SSI files. mod_info Enables use of the server-info handler. mod_log_config Enables use of custom log formats. mod_mime Provides support for MIME files. mod_mime_magic Determines the MIME type of a file from its content. mod_negotiation Enables MIME content negotiation. mod_perl Provides support for the Perl language. mod_proxy Enables web caching. mod_rewrite Enables URI-to-filename mapping. mod_setenvif Sets environment variables from client information. mod_so Provides runtime support for dynamic shared objects (DSOs). mod_speling Automatically corrects minor spelling errors. mod_status Provides web-based access to the server-info report. mod_unique_id Generates a unique request identifier for each request. mod_userdir Defines where users can create public web pages. mod_usertrack Provides user tracking through a unique identifier called a cookie. mod_vhost_alias Provides support for name-based virtual hosts. If you decide to add modules to your configuration, do so very carefully. The order of the LoadModule and AddModule directives in the httpd.conf file is critical. Don’t change things without knowing what you’re doing. Before proceeding with a new installation, read the documentation that comes with your new module and the modules documentation found in the manual/mod directory of the Apache distribu- tion. See the previously mentioned book Linux Apache WebServer Administration for detailed advice about adding new modules. Once the DSOs are loaded, the directives that they provide can be used in the config- uration file. Let’s continue looking at the Solaris httpd.conf file by examining some of the basic configuration directives. Basic Configuration Directives This section covers six different directives. The directives as they appear in the sam- ple configuration we created for our Solaris system are: ServerAdmin webmasterwww.wrotethebook.com ServerName www.wrotethebook.com 344 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.UseCanonicalName On ServerRoot "/var/apache" ServerType standalone Port 80 Two of the basic directives, ServerAdmin and ServerName, were touched upon ear- lier in the chapter. ServerAdmin defines the email address of the web server adminis- trator. This is set to a bogus value, youyour.host, in the default Solaris configuration. You should change this to the full email address of the real web administrator before starting the server. ServerName defines the hostname returned to clients when they read data from this server. In the default Solaris configuration, the ServerName directive is commented out, which means that the “real” hostname is sent to clients. Thus, if the name assigned to the first network interface is crab.wrotethebook.com, then that is the name sent to clients. Many Apache experts suggest defining an explicit value for ServerName in order to document your configuration and to ensure that you get exactly the value you want. Earlier, we set ServerName to www.wrotethebook.com,so that even though the web server is running on crab, the server will be known as www.wrotethebook.com during web interactions. Of course, www.wrotethebook.com must be a valid hostname configured in DNS. (See Chapter 8, where www is defined as a nickname for crab in the wrotethebook.com zone file.) A configuration directive related to ServerName is UseCanonicalName, which defines how httpd builds “self-referencing” URLs. A self-referencing URL contains the name of the server itself in the hostname portion of the URL. For example, on the server www.wrotethebook.com, a URL that starts with http://www.wrotethebook. com would be a self-referencing URL. The hostname in the URL should be a canoni- cal name, which is a name that DNScan resolve to a valid IP address. When Use- CanonicalName is set to on, as it is in the default Solaris configuration, the value in ServerName is used to identify the server in self-referencing URLs. For most configu- rations, leave it set to on. If it is set to off, the value that came in the query from the client is used. The ServerRoot option defines the directory that contains important files used by httpd, including error files, log files, and the three configuration files: httpd.conf, srm.conf, and access.conf. In the Solaris configuration, ServerRoot points to /var/ apache. This is surprising in that the Solaris httpd configuration files are actually located in /etc/apache, so clearly something else is at work. Solaris uses the -f option on the httpd command line to override the location of the httpd.conf file at runtime. httpd is started at boot time using the script /etc/init.d/ apache. That script defines a variable named CONF_FILE that contains the value /etc/ apache/httpd.conf. This variable is used with the httpd command that launches the web server, and it is this variable that defines the location of the configuration file on a Solaris system. Understanding an httpd.conf File 345 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.The ServerType option defines how the server is started. If the server starts from a startup script at boot time, the option is set to standalone. If the server is run on demand by inetd, the option is set to inetd. The default Solaris configuration sets ServerType to standalone, which is the best value; web servers are usually in high demand, so it is best to start them at boot time. It is possible, of course, for a user to set up a small, rarely used web site on a desktop workstation, in which case running the server from inetd may be desirable. But the web server you create for your net- work should be standalone. Port defines the TCP port number used by the server. The standard port number is 80. On occasion, private web servers run on other port numbers. For example, Solaris runs the AnswerBook2 server on port 8888. Other popular alternative ports for special-purpose web sites are 8080 and 8000. If you change the port number, you must then tell your users the nonstandard port number. For example, http://jerboas. wrotethebook.com:8080 is a URL for a web site running on TCP port 8080 on host jerboas.wrotethebook.com. When ServerType is set to inetd, it is usually desirable to set Port to something other than 80. The reason for this is that the ports under 1024 are “privileged” ports. If 80 is used, httpd must be run from inetd with the userid root. This is a potential secu- rity problem, as an intruder might be able to exploit the web site to get root access. Using port 80 is okay when ServerType is standalone because the initial httpd pro- cess does not provide direct client service. Instead it starts several other HTTP dae- mons, called the swarm, to provide client services. The daemons in the swarm do not run with root privilege. Managing the Swarm In the original web server design, the server would create separate processes to han- dle individual requests. This placed a heavy load on the CPU when the server was busy and had a major negative impact on responsiveness. It was possible for the entire system to be overwhelmed by httpd processes. Apache uses a different approach. A swarm of server processes starts at boot time (the ps command earlier in the chapter shows several httpd processes running on the Solaris system), and all the processes in the swarm share the workload. If all the per- sistent httpd processes become busy, spare processes are started to share the work. Five directives in the Apache configuration control how the swarm of server child processes is managed. They are: MinSpareServers This directive sets the minimum number of idle server processes that must be maintained. In the Solaris configuration, this is set to 5, which is the default value used in the Apache distribution. When the number of idle processes drops below 5, another process is created to maintain the correct number of idle processes. 346 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved. Download from Wow eBook www.wowebook.comFive is a good value for an average server; it allows a burst of up to five quick requests to be handled without making the client wait for a child process to start. A lightly used server might have a lower number, and a heavily used server could benefit from a higher number. However, you don’t want too many idle servers waiting around for requests that may never come. MaxSpareServers This directive sets the maximum number of idle server processes that may be maintained. It prevents too many idle servers from sitting around with nothing to do. If the number of idle servers exceeds MaxSpareServers, the excess idle servers are killed. In the Solaris configuration, MaxSpareServers is set to 10, which is the default value that ships with the Apache distribution. Set this value to about twice the value set for MinSpareServers. StartServers This directive defines the number of httpd daemons started at boot time. In the Solaris configuration, it is set to 5. The effect of this directive can be seen in the output of the ps command earlier in this chapter, which showed that six httpd daemons were running. One of these is the parent process that manages the swarm; the other five are the child processes that actually handle client requests for data. MaxClients This directive sets the maximum number of client connections that can be ser- viced simultaneously. HTTP connection requests beyond the number set by MaxClients are rejected. Solaris sets this to 150, which is the most commonly used value. MaxClients prevents the server from consuming all system resources when it receives an overwhelming number of client requests. MaxClients should be increased only if you have an extremely powerful system with fast disks and a large amount of memory. It is generally best to handle additional clients by add- ing additional servers. The upper limit for MaxClients is set by HARD_ SERVER_LIMIT, which is compiled into Apache. The default for HARD_ SERVER_LIMIT is 256. MaxRequestsPerChild This directive defines the number of client requests a child process can handle before it must terminate. Solaris sets MaxRequestsPerChild to 0, which means “unlimited”—a child process can keep handling client requests for as long as the system is up and running. This directive should always be set to 0, unless you know for a fact that the library you used to compile Apache has a memory leak. The User and Group directives define the UID and GID under which the swarm of httpd processes are run. When httpd starts at boot time, it runs as a root process, binds to port 80, and then starts a group of child processes that provide the actual web services. These child processes are the ones given the UID and GID defined in the file. The UID and GID should provide the least possible system privileges to the web server. On the Solaris system, this is the user nobody and the group nobody. The Understanding an httpd.conf File 347 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.previous ps command output shows this clearly. One httpd process belongs to root and five other httpd processes belong to the user nobody. An alternative to using nobody is to create a userid and groupid just for httpd. If you do this, create the file permissions granted to the new user account very carefully. The advantage of creat- ing a special user and group for httpd is that you can use group permissions for added protection, and you won’t be completely dependent on the world permissions granted to nobody. Defining Where Things Are Stored The DocumentRoot directive defines the directory that contains the web server docu- ments. For security reasons, this is not the same directory that holds the configura- tion files. As we saw earlier, the Solaris setting for DocumentRoot is: DocumentRoot "/var/apache/htdocs" To apply directives to a specific directory, create a container for those directives. Three of the httpd.conf directives used to create containers are: Directory pathname The Directory directive creates a container for directives that apply to the direc- tory identified by pathname. Any configuration directives that occur after the Directory directive and before the next /Directory statement apply only to the specified directory. Location document The Location directive creates a container for directives that apply to a specific document. Any configuration directives that occur after the Location directive and before the next /Location statement apply only to the specified document. Files filename The Files directive creates a container for directives that apply to the file identi- fied by filename. Any configuration directives that occur after the Files directive and before the next /Files statement apply only to the specified file. filename can refer to more than one file if it contains the Unix wildcard character or ?. Additionally, if the Files directive is followed by an optional (tilde), the filename field is interpreted as a regular expression. Directories and files are easy to understand: they are parts of the Unix filesystem that every system administrator knows. Documents, on the other hand, are specific to the web server. The screenful of information that appears in response to a web query is a document; it can be made up of many files from different directories. The Location container provides an easy way to refer to a complex document as a single entity. We will see examples of Location and Files containers later in this chapter. Here we look at Directory containers. The Solaris configuration defines a Directory container for the server’s root directory and for the DocumentRoot: 348 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Directory / Options FollowSymLinks AllowOverride None /Directory Directory "/var/apache/htdocs" Options Indexes FollowSymLinks AllowOverride None Order allow,deny Allow from all /Directory Each Directory container starts with a Directory directive and ends with a /Direc- tory tag. Both containers shown here enclose configuration statements that apply to only a single directory. The purpose of the directives inside these containers is cov- ered later in the section “Web Server Security.” For now, it is sufficient to under- stand that containers are used inside the httpd.conf file to limit the scope of various configuration directives. The Alias directive and the ScriptAlias directive both map a URL path to a directory on the server. For example, the Solaris configuration contains the following three directives: Alias /icons/ "/var/apache/icons/" Alias /manuals/ "/usr/apache/htdocs/manual/" ScriptAlias /cgi-bin/ "/var/apache/cgi-bin/" The first line maps the URL path /icons/ to the directory /var/apache/icons/. Thus a request for www.wrotethebook.com/icons/ is mapped to www.wrotethebook.com/var/ apache/icons/. The second directive maps the URL path /manuals/ to www.wrotethe- book.com/usr/apache/htdocs/manual/. You may have several Alias directives to handle several different mappings, but you will have only one ScriptAlias directive. The ScriptAlias directive functions in exactly the same ways as the Alias directive, except that the directory it points to contains executable CGI programs. Therefore, httpd grants this directory execution privi- leges. ScriptAlias is particularly important because it allows you to maintain execut- able web scripts in a directory separate from the DocumentRoot. CGI scripts are the single biggest security threat to your server; maintaining them separately allows you to have tighter control over who has access to the scripts. The Solaris configuration has containers for the /var/apache/icons directory and the /var/apache/cgi-bin directory, but none for the /usr/apache/htdocs/manual directory. Just because a directory is defined inside the httpd.conf file does not mean that a Directory container must be created for that directory. The /var/apache/icons and the /var/apache/cgi-bin containers are shown here: Directory "/var/apache/icons" Options Indexes MultiViews AllowOverride None Order allow,deny Allow from all /Directory Understanding an httpd.conf File 349 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.Directory "/var/apache/cgi-bin" AllowOverride None Options None Order allow,deny Allow from all /Directory These containers enclose AllowOverride, Options, Order, and Allow statements—all of which relate to security. Most of the directives found in containers have security implications, and have been placed in containers to provide special security settings for a file, document, or directory. All of the directives used in the containers shown above are covered in the “Web Server Security” section later in this chapter. The UserDir directive enables personal user web pages and points to the directory that contains the user pages. UserDir usually points to public_html, and it does in the Solaris configuration. With this default setting, users create a directory named public_html in their home directories to hold their personal web pages. When a request comes in for www.wrotethebook.com/sara, for example, it is mapped to www.wrotethebook.com/export/home/sara/public_html. An alternative is to define a full pathname on the UserDir directive line such as /export/home/userpages. Then the administrator creates the directory and allows each user to store personal pages in subdirectories of this directory, so that a request for www.wrotethebook.com/sara will map to www.wrotethebook.com/export/home/userpages/sara. The advantage of this approach is that it makes it easier for you to monitor the content of user pages. The disadvantage is that a separate user web directory tree must be created and pro- tected separately, whereas a web folder within the user’s home directory will inherit the protection of that user’s home. The PidFile and ScoreBoardFile directives define the paths of files that relate to pro- cess status. The PidFile is the file in which httpd stores its process ID, and the Score- BoardFile is the file where httpd writes process status information. The DirectoryIndex option defines the name of the file retrieved if the client’s request does not include a filename. Our Solaris system has the following value for this option: DirectoryIndex index.html Given the value defined for DocumentRoot and this value, if the server gets a request for http://www.wrotethebook.com, it gives the client the file /var/apache/htdocs/index. html. If it gets a request for http://www.wrotethebook.com/books/, it gives the client the file /var/apache/htdocs/books/index.html. The DocumentRoot is prepended to every request, and the DirectoryIndex is appended to any request that doesn’t end in a filename. Earlier in this chapter, we saw from an ls of /var/apache/htdocs that the directory contains a file named index.html. But what if it didn’t? What would Apache send to the client? If the file index.html is not found in the directory, httpd sends the client a listing of the directory, if the configuration permits it. A directory listing is allowed if 350 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.the Options directive in the Directory container for the directory contains the key- word Indexes. (More on Options later.) If a directory index is allowed, several differ- ent directives control how that directory listing is formatted. Creating a Fancy Index The keyword FancyIndexing is used on the IndexOptions directive line to enable a “fancy index” of the directory when Apache is forced to send the client a directory listing. When fancy indexing is enabled, httpd creates a directory list that includes graphics, links, and other advanced features. The Solaris configuration enables fancy indexing with the IndexOptions directive, and it contains about 20 extra lines to help configure the fancy index. Solaris uses the following directives to define the graphics and features used in the fancy directory listing: IndexIgnore Identifies the files that should not be included in the directory listing. Files can be specified by name, partial name, extension, or by standard wildcard characters. HeaderName Specifies the name of a file that contains information to be displayed at the top of the directory listing. ReadmeName Specifies the name of a file that contains information to be displayed at the bot- tom of the directory listing. AddIconByEncoding Points to the icon used to represent a file based on its MIME encoding type. AddIconByType Points to the icon used to represent a file based on its MIME file type. AddIcon Points to the icon used to represent a file based on its extension. DefaultIcon Points to the icon file used to represent a file that has not been given an icon by any other option. Defining File Types MIME file types and file extensions play a major role in helping the server determine how a file should be handled. Specifying MIME options is also a major part of the Solaris httpd.conf file. The directives involved are: DefaultType Defines the MIME type that is used when the server cannot determine the type of a file. In the Solaris configuration this is set to text/plain. Thus, when a file has no file extension, the server assumes it is a plain-text file. Understanding an httpd.conf File 351 This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.AddEncoding Maps a MIME encoding type to a file extension. The Solaris configuration con- tains two AddEncoding directives: AddEncoding x-compress Z AddEncoding x-gzip gz tgz The first directive maps the extension Z to the MIME encoding type x-compress. The second line maps the extensions gz and tgz to MIME encoding type x-gzip. AddLanguage Maps a MIME language type to a file extension. The Solaris configuration con- tains mappings for six languages, e.g., .en for English and .fr for French. LanguagePriority Sets the priority of the language encoding used when preparing multiviews, and the language used when the client does not specify a preference. In the Solaris configuration, the priority is English (en), French (fr), and German (de). This means that English, French, and German views will be prepared if multiviews are used. The client will be sent the English version if no language preference is specified. AddType Maps a MIME file type to a file extension. The Solaris configuration has only one AddType directive; it maps MIME type application/x-tar to the extension .tgz.A configuration can have several AddType directives. Another directive that is commonly used to process files based on the filename exten- sion is the AddHandler directive. This directive maps a file handler to a file exten- sion. A file handler is a program that knows how to process a specific file type. For example, the handler cgi-script is able to execute CGI files. The Solaris configura- tion does not define any optional handlers, so all the AddHandler directives are com- mented out. Performance Tuning Directives The KeepAlive directive enables the use of persistent connections. Without persis- tent connections, the client must make a new connection to the server for every link the user selects. Because HTTP runs over TCP, every connection requires a connec- tion setup, adding time to every file retrieval. With persistent connections, the server waits to see if the client has additional requests before it closes the connection. Therefore, the client does not need to create a new connection to request a new doc- ument. The KeepAliveTimeout defines the number of seconds the server holds a per- sistent connection open waiting to see if the client has additional requests. The Solaris configuration turns KeepAlive on and sets KeepAliveTimeout to 15 seconds. MaxKeepAliveRequests defines the maximum number of requests that will be accepted on a “kept-alive” connection before a new TCP connection is required. 352 Chapter 11: Configuring Apache This is the Title of the Book, eMatter Edition Copyright © 2010 O’Reilly & Associates, Inc. All rights reserved.