PHHTTPD
Zach Brown
   Copyright © 2000 by Zach Brown
     _________________________________________________________________
   
   Table of Contents
   1. [1]Introduction
          
        [2]Architectural Overview
        [3]Supported Systems
                
   2. [4]Configuration File
          
        [5]Overview
        [6]Global config section
        [7]Virtual Servers
                
   3. [8]Logging
          
        [9]Overview
        [10]Configuration
        [11]Format and Strange Behaviour
                
   4. [12]Run Time Facilities
          
        [13]Overview
        [14]Log Rotating
        [15]Status Reporting
     _________________________________________________________________
   
Chapter 1. Introduction
   phhttpd is an HTTP accelerator. It serves fast static HTTP fetches
   from a local file-system and passes slower dynamic requests back to a
   waiting server. It features a lean networking I/O core and an
   aggressive content cache that help it perform its job efficiently.
     _________________________________________________________________
   
Architectural Overview
   phhttpd features a very slim I/O core. It does all its networking work
   using non-blocking system calls driven by whatever event model is most
   appropriate for the host operating system. This allows a single
   execution context to handle as many client connections as the event
   model dictates.
   
   phhttpd's job is to serve static content as quickly as it possibly
   can. To do this it maintains a cache of content in memory. When a
   request is serviced, phhttpd saves a reference to the on disk content
   and whatever HTTP headers are dependent on the content. Next time a
   request for this content is received, phhttpd can service it very
   quickly. This cache can be prepopulated-populated at run time, or can
   be built dynamically as requests come in. Its size may also be capped
   by the administrator so that it doesn't overwhelm a system.
   
   phhttpd is a threaded stand alone daemon. The number of threads is
   currently statically defined at run time. Incoming connections are
   evenly balanced among the running threads, regardless of what content
   they may be serving. Connections are served by the thread that
   accepted them until the transfer is done.
     _________________________________________________________________
   
Supported Systems
   phhttpd is currently only expected to build and run on Linux systems
   using glibc2.1 under a kernel that supports passing POLL* information
   over real-time SIGIO signals. This means later 2.3.x kernels or a
   2.2.x kernel that has been patched.
   
   I badly want this to change. If you're interested in doing porting
   work to other Operating Systems, please do let me know.
     _________________________________________________________________
   
Chapter 2. Configuration File
Overview
   phhttpd uses an XML config file format to express how it should behave
   while running. More information on XML may be found near
   [16]http://www.w3.org/XML/
   
   phhttpd's configuration centers around the concept of virtual servers.
   For us, a virtual server may be thought of as the merging of a
   document tree and the actions phhttpd takes while serving that
   content.
   
   phhttpd.conf may be thought of as having two main sections. The global
   section, which defines properties that are consistent across the
   entire running phhttpd server, and multiple virtual sections that
   describe properties of that only apply to a virtual server. There will
   only be one global section while multiple virtual sections are
   allowed.
     _________________________________________________________________
   
Global config section
   The global section defines properties of the running server that don't
   apply to a single virtual server. It should be enclosed in
   
   Global config entities
   
   cache max=NUM
          Sets the maximum number of cached responses that will be held
          in memory. Each cached responses holds a minimal amount of
          memory. More importantly, each cached response holds an open
          file descriptor to the file with real content and an mmap()ed
          region of that content. phhttpd will start pruning the cache
          when it notices either of these two resources coming under
          pressure, but has no way to easily deduce that its running low
          on memory. The administrator may set this value to set an upper
          bound on the number of responses to keep in memory.
          
   control file=PATH
          This specifies the file that will be used to talk with
          phhttpd_ctl.
          
   globallog file=PATH
          This specifies the file to which global messages will be
          logged.
          
   mime file=PATH
          This specifies the file that contains the mapping of file
          extensions to MIME types. It should be of the form:
          
text/sgml                       sgml sgm
video/mpeg                      mpeg mpg mpe
   timeout inactivity=NUM
          Controls various network connection timeouts. 'inactivity' sets
          the amount of time that a connection can be idle before phhttpd
          will forcibly disconnect it. inactivity defaults to 0, which
          lets the connections idle until TCP timeouts take effect.
          
   sendfile
          Enabling this option tells phhttpd to use sendfile() rather
          than write()ing from an mmap()ed region. Avoiding calling
          mmap() will shorten the amount of time it takes to build cached
          responses.
     _________________________________________________________________
   
Virtual Servers
   A Virtual Server can be thought of the abstraction of serving up a
   content tree ( "docroot" in apache speak). There are a set of
   attributes that are used to define a virtual server. These attributes
   are used to decide which virtual server will process a client's
   request. Then there are attributes which define how the content is
   served.
   
   A virtual server must have a docroot. The virtual tag in the config
   file has a docroot attribute that must be set.
   
        ...
   There can be as many virtual sections in the configuration file as one
   likes.
   
   Global config entities
   
   md5
          This enables the generation of the Content-MD5: header. This
          greatly increases the cost of creating a cached response for
          this virtual, because the MD5 function must be applied to the
          entire content of the response. Once the response is created,
          though, there is no per-request overhead.
          
   prepop
          This will cause phhttpd to traverse the entire docroot at
          initialization time and prepare cached responses for all the
          files it finds. This happens in the back ground during normal
          operation, so there is no dramatic increase in the time it
          takes for phhttpd to start serving connections.
          
   name
          This tag surrounds the string that will be used to identify the
          server. This string will be compared to the Host: header given
          in the request from the client, or will be compared to the
          'host part' of the full URL if that was given. This will be
          used in combination with the network address and port pair to
          determine if a request should be served by a virtual server.
          
   listen v4=DOT.TED.QU.AD port=PORT
          This virtual server will be chosen to serve an incoming request
          if that request was made to the network address specified in
          this entity. There can be as many of these as one likes in a
          given virtual server, and '*' may be specified for either
          parameter to indicate that all addresses or ports should match.
          
   logs
          The logs section of the virtual server define the per virtual
          log files that should be written to during operation. See the
          following section on logging.
     _________________________________________________________________
   
Chapter 3. Logging
   "All kids love log!"
     _________________________________________________________________
   
Overview
   phhttpd maintains log buffers for each log it writes too. Logged
   events are put in these buffers at reporting time rather than being
   immediately written to disk. These logs are written as they are filled
   during normal operation, or at regular intervals. This greatly reduces
   the performance impact of keeping detailed logs.
     _________________________________________________________________
   
Configuration
   phhttpd keeps interesting logs on a virtual server granularity. The
   action of recording lots is specified by including an entity in the
   log section of a virtual for the log source that wants to be kept.
   There is an entity for each source of logging, and attributes to that
   entity define where it is logged to. It looks something like this:
        
        ...
   mode is the octal permissions mode of the file that is to be opened.
   As it is parsed by dumb routines, a leading 0 is highly recommended.
   fileis the file the logged events will be written to. The LOG_SOURCE
   is one of:
   
   access Successfully answered requests
   agent The value given in the 'User-Agent' HTTP request header
   referer The string given in the 'Referer' HTTP request header
     _________________________________________________________________
   
Format and Strange Behaviour
   phhttpd log entries are contained with a single line in a text file.
   They contain the time the log entry was written, an opaque token that
   is associated with the connection that caused the log entry, followed
   by the actual entry.
   
   The contents of the 'referer' and 'agent' log entries is simply the
   string that was given with the header. The contents of the 'access'
   log is a little more interesting. It has the decoded relative URL that
   was asked for, followed by the total bytes that were transfered, and
   the time in seconds that it took to transfer.
387f7a45 387f7a45800210ac8910500 /index.html - 2132 0
   is an entry from an 'access' log.
   
   The first field is the time in seconds since the Unix epoch, a.k.a.
   time_t. The second field is associated with the client connection that
   caused the log entry. It is constant for the duration of the
   connection, and is written to all the logs entries, of whatever type,
   that are generated. This allows a log parser to do more complete
   connection granularity analysis. As it happens, this opaque token is
   currently built up of the time the client was connected, its remote
   and local network address, etc, but these values most _not_ be parsed
   as they may change in the future.
   
   Entries generated by a thread will be written in chronological order.
   If, however, multiple threads are sharing an output file the resulting
   entries may not be written in chronological order. It is up to the
   parsing programs to use the 'time' field to sort by, if they care
   about chronological order.
     _________________________________________________________________
   
Chapter 4. Run Time Facilities
Overview
   While phhttpd is running it listens to a 'control' socket for messages
   from the administrator. The currently provided phhttpd_ctl program
   allows the administrator to minimally interact with phhttpd. This
   provides both control and status reporting.
   
   phhttpd_ctl always wants a --control argument that specifies the
   control socket of the running phhttpd daemon. This should match the
    tag specified in the config file.
     _________________________________________________________________
   
Log Rotating
   phhttpd can be told to rotate its logs so that existing logs may be
   processed.
   
   The --rotate argument to phhttpd_ctl tells phhttpd to rename the
   existing files to a unique name, open new files with the previously
   used names, then close the renamed logs and start using the newly
   created files. phhttpd_ctl will output the names of the newly created
   files which will be safe to use once the command exits.
   
   The --reopen argument to phhttpd_ctl tells phhttpd to close the
   existing file logs and reopen the files with the filenames that were
   configured. This implies that an external entity has moved the files
   to new names and wants phhttpd to stop using them.
     _________________________________________________________________
   
Status Reporting
   The --status argument to phhttpd_ctl tells phhttpd to return a quick
   status blurb about the server. It contains miscellaneous information
   about the running state of the server.
References
   1. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN10
   2. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN13
   3. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN18
   4. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN22
   5. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN24
   6. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN30
   7. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN74
   8. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN110
   9. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN114
  10. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN117
  11. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN136
  12. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN144
  13. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN146
  14. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN153
  15. file://localhost/export/sunsite/users/gferg/work/00_phhttpd-HOWTO.html#AEN163
  16. http://www.w3.org/XML/