Anirudh Zala's Blog: cache

Showing posts with label cache. Show all posts

13 Mar 2008

Lighttpd vs. Apache

#1 Lighttpd overview

Lighttpd is an open source web server (similar like Apache) to server web pages. It has been developed by a MySQL developer named Jan Kneschke who developed this web server as a part of the C10K problem. Hence immediate reason of birth of Lighttpd is to overcome weakness, like reducing high memory footprint, of Apache web server.

The prefork model that Apache uses consumes a lot of memory (> 20 MB normally) per process. Which means if we multiply number of process to run simultaneously then RAM of server gets exhausted quickly. Lighttpd here beats Apache by using very low memory footprint (just 6MB) which means faster output from web server. The response appears even more faster when static contents are to be delivered. In Netcraft's latest web server survey, we can see Lighttpd among top 5 web servers currently used on Internet.

#2 How to set it up

Normal and most preferred installation instruction can be found from this installation page. For Yum users, a single command yum install zlib pcre lighttpd lighttpd-fastcgi will do almost all things.

If you want to start and stop Lighttpd manually, you're done. To install Lighttpd as a service like Apache, edit and install the init script (only if you have installed Lighttpd from source):

# sed -e 's/FOO/lighttpd/g' doc/rc.lighttpd > lighttpd.init
# chmod a+rx lighttpd.init
# cp lighttpd.init /etc/init.d/lighttpd
# cp -p doc/sysconfig.lighttpd /etc/sysconfig/lighttpd
# install -Dp ./doc/lighttpd.conf /etc/lighttpd/lighttpd.conf
# chkconfig lighttpd on

If you have installed Lighttpd using Yum then just follow last step. You may also use various other commands to start and stop Lighttpd web service like /etc/init.d/lighttpd start|stop|restart|condrestart|reload|status or service lighttpd start|stop|restart|condrestart|reload|status.

To just test lighttpd.conf, run command lighttpd -t -f /PATH/TO/CONF/lighttpd.conf

#3 Differences between Apache and Lighttpd

#3.1 General

The main difference between Apache and Lighttpd is the serving model, Lighttpd is event-driven and Apache is threaded or pre-forked.

Apache provides different multiprocessing models (MPMs) for different runtime environments. The prefork model that Apache uses creates number of processes at startup of service and manages them in a pool. However each process requires lot of memory to handle requests which means the more the processes the more memory will require. That is simultaneous apache processes quickly eat available RAM.

On the other hand Lighttpd uses single process, single thread and non-blocking I/O. For that it uses fastest even handler in the target system like: poll, epoll, kqueue or /dev/poll. This difference makes Lighttpd faster than Apache in serving static files.

However the biggest difference between both is how they support scripting languages (specially like PHP). Apache has upper hand here because it supports easy to use Shared module version, CGI and FastCGI all together while Lighttpd supports only FastCGI at this moment.

#3.2 Configuration level

There is visible difference between styles of configuration files of Lighttpd (lighttpd.conf) and Apache (httpd.conf). Syntax of lighttpd.conf will look more like syntax of php.ini while httpd.conf has XML type syntax. Here is an example of some basic configuration:

#3.2.1 Basic Configuration

Apache:

DocumentRoot /var/www/html

CustomLog /var/www/logs/access

ErrorLog /var/www/logs/error

User apache

Group apache

Lighttpd:

server.document-root="/var/www/html"

accesslog.filename="/var/www/logs/access"

server.errorlog="/var/www/logs/error"

server.username="apache"

server.groupname="apache"

server.modules=("mod_cml")

#3.2.2 Virtual Hosts

Below is an example of difference between VirtualHosts of Apache and Lighttpd. Example is shown for myproject project.

Apache:

NameVirtualHost *

<VirtualHost *:80>
 ServerName 'www.myproject.com'
 DocumentRoot '/web/myproject/web'
 ErrorLog '/web/logs/myproject_error'
</VirtualHost>

Include conf.d/virtualhosts/*.conf

Lighttpd:

$HTTP[“host”] == “www.myproject.com” {

 server.document-root=”/web/myproject/web”

 server.errorlog="/web/myproject_error"

}

#3.2.3 Authentication and Authorization

Lighttpd, at this moment, does not support .htaccess files, so all settings must be specified in the lighttpd.conf file, or the configuration files that it includes. However it understands Apache user files for basic and digest authentication, but group file support is not yet implemented but will be implemented soon. Here is an example of authentication and authorization:

Apache:

<Directory ~>
  AuthName "Authentication required to access this area."
  AuthType Basic
  AuthUserFile /web/myproject/docs/valid.users
  Order deny,allow
  Require valid-user
</Directory>
 

Lighttpd:

auth.backend="htpasswd"

auth.backend.htpasswd.userfile="/web/myproject/docs/valid.users"

auth.require=

("~" =>

(

    "method" =>"basic",

    "realm"  =>"Authentication required to access this area.",

    "require"=>"valid-user"

)

)

Summarily, configuration file of Lighttpd server behaves like an active script in which you can declare variables, write logic, do computation based upon criteria etc. similar like programming script. This feature makes configuration file alive and agile.

#4 How to run PHP under Lighttpd

#4.1 Configuring PHP under Lighttpd

Apache processes PHP internally i.e using it as Shared module mod_php while Lighttpd runs PHP under FastCGI. Although Apache also supports FastCGI, using PHP under FastCGI with Apache is neglected and is not used. However with Lighttpd, only option is to run under FastCGI, PHP must be compiled with FastCGI option (thought it is not used with Apache). For more information, please read http://trac.lighttpd.net/trac/wiki/TutorialLighttpdAndPHP. Below is the example of difference between running PHP under Apache and lighttpd.

Apache:

LoadModule php5_module modules/libphp5.so
AddType application/x-httpd-php .php

Lighttpd:

server.modules=( ..., "mod_fastcgi", ... )
fastcgi.server=( ".php" =>
                 (
                   (
                     "socket" => "/tmp/php-fastcgi.socket",
                     "bin-path" => "/usr/bin/php-cgi",
                     "broken-scriptfilename" => "enable",
                     "bin-environment" =>
                     (
                       "PHP_FCGI_CHILDREN" => "2",
                       "PHP_FCGI_MAX_REQUESTS" => "5000"
                     ),
                     "min-procs" => 1,
                     "max-procs" => 2,
                     "idle-timeout" => 60
                   )
                 )
               )

You may require to set path of php-cgi according to your setup. Please note that directive server.modules actually exists along with other modules on top of configuration file hence above line indicates that mod_fastcgi should be enabled in lighttpd.cnf.

Then it will require to set 1 directive in following way in php.ini configuration file if it exists, if doesn't then nothing to do.

cgi.fix_pathinfo=1

Last 4 directives of above mentioned configuration are for running PHP scripts in better ways.

#4.2 Application wise changes

As FastCGI is a separate process, we can't handle directives of PHP into configuration file of web server (i.e lighttpd.conf). This is one of the biggest drawback of FastCGI that is why PHP is not used under FastCGI. Moreover under FastCGI mode, your PHP script would get limited support from web server which may force you to change or rewrite your scripts. Hence it is not recommended to use PHP under Lighttpd (at this moment because Lighttpd currently supports only FastCGI mode) because of it's lack of features that PHP will require like enabling configuration options of php.ini in configuration file for all hosts or per host base.

Moreover many benchmarks shows that PHP runs slower under FastCGI than under shared version on Apache.

However if it is required to run PHP on Lighttpd, then 2 major changes will require. They are:

To move all PHP related setting either in php.ini directly or in configuration or global file of the application.
Removing all Apache web server related variables and settings from application.

Once these changes are done, it will require to test application heavily to find whether functionalities of application get broken somewhere or not.

#5 mod_uploadprogress and Prototype

This feature will be available in Lighttpd version 1.5.0 which is not released yet, hence can not test or write more about it. More information about this module can be found at here. However when it will get released, it would surely be one of the finest module of Lighttpd server because it can be easily integrated with front-end applications using JSON.

#6 Lighttpd and output compression

Lighttpd provides output compression for static data through mod_compress module. Which means before sending static contents to client, mod_compress compresses it and saves at specified path. This compressed and cached copy will be served directly from cached location when similar request is made from same or different client. Thus saving valuable bandwidth and increasing response time.

Lighttpd supports 3 types of compressions viz. deflate, gzip and bzip2. The limitation of compressing and caching is that Lighttpd can not compress files with size more than 128 MByte and less than 128 Bytes.

To enable compression we need to set 3 directives in lighttpd.conf file. They are:

compress.cache-dir="/var/www/cache/myproject/"
compress.filetype=("text/plain", "text/html")
compress.max-filesize=1 MB

However since there is upper limit of file size of 128 Mbytes, the last directive is not necessary to declare. While compressing various types of static data, it should be kept in mind that if no file type or wrong file type is mentioned then no file will get compressed.

You may require to manually create cache folder and assign necessary write permissions to it. These cached contents do not automatically get cleared hence it is left to developer to clean it at periodic level when required. Following type of command can be used to remove contents that are older than a week.

$ find /var/www/cache/myproject/ -type f -mtime +7 | xargs -r rm

To compress dynamic contents, we need to reply on PHP itself as PHP natively supports good compression of dynamic contents. For that following 2 directives are to be set in php.ini or in equivalent configuration file.

zlib.output_compression=1
zlib.output_handler=On

Please note that to use zlib.output_compression, value of output_handler should be zlib.output_handler instead of standard output_handler. To do so, output_handler directive is to be set in following way:

output_handler=zlib.output_handler

or

zlib.output_handler=On

#7 Lighttpd and caching

#7.1 Caching overview

Caching is also another method to gain better performance in serving contents and increasing response time of your PHP scripts. There are several types of caching softwares available for PHP. Some important from them are Zend Platform, APC (APC GUI), XCache, eAccelerator, ionCube Encoder and PHP Accelerator. Certain web servers like Lighttpd provides built in modules for caching static contents at web server level. They are mod_expire, mod_mem_cache and mod_cml. Hence using combination of caching static and dynamic contents effectively, we can gain lot of speed in serving contents. However all of these mechanisms are not similar.

Aforementioned independent softwares are for Opcode/Bytecode caching i.e caching your PHP script into compiled state so that when new request arrives for same script, cache software will server compiled version of code directly from cache rather than reading file again from the disk and then compiling. From these 6, eAccelerator, XCache and APC are widely used caching softwares. This benchmarks also show that how XCache and APC are better than others. We will learn more about XCache in a short while.

As said earlier that only good combination of static and dynamic contents can give considerable boost in performance, we should try to cache as much contents as possible. To cache static contents, integrated modules of web server are the best candidates. In case of Lighttpd they are mod_expire and mod_mem_cache (however this is not provided as default).

#7.2 mod_expire

Mod_expire controls the Expire header in the Response Header of HTTP/1.0 messages. It is useful to set it for static files which should be cached like images, style-sheets etc. To use this module, first it needs to get enabled in server.modules directive array. Then module specific directives are to be set in server's configuration file as shown below.

<access|modification> <number> <years|months|days|hours|minutes|seconds>

Some examples could be like:

Cache contents of folder images for 2 hours.

expire.url = ( "/images/" => "access 2 hours" )

Cache contents of all sub-folders of images folder for 2 hours.

$HTTP["url"] =~ "^/images/" {
expire.url = ( "" => "access 2 hours" )
}

Values can be hours, months, days etc. depending upon requirement.

#7.3 mod_mem_cache

Mod_mem_cache is a plugin which stores content of files in memory for faster serving. That is it stores specified file types into memory to serve directly from there without going to read it from disk from specified location thus saving disk read access time. This module is a 3rd party module, hence is not included in the official distribution of Lighttpd.

This module doesn't seem that much promising to use effectively for caching as memory should be used for processing data rather than storing data. Moreover memory should not be occupied for serving files that can reside and easily managed on disks. For example when we have thousands of images to be served then it is not advisable to store them into memory just to serve it faster. More information about this plugin can be found at here.

#7.4 mod_cml (Cache Meta Language)

Mod_cml is an another caching module similar like mod_expire which is provided by Lighttpd to cache static contents of dynamic pages. The difference between mod_expire and mod_cml is that mod_cml can cache fragmented static contents which are part of dynamic contents. For example a dynamic page called index.php might have static contents like menu.html, banner.html inside it which are not integral part of index.php. In such case using mod_cml, these 2 static contents can be cached and can be delivered directly from there.

But such type of caching can not be handled directly by Lighttpd web server and mod_cml hence we need to write some code in PHP or in special CML scripts for mod_cml which is written in lua programming language.

To use mod_cml, it requires to install lua programming language and libmemcache-1.3.x. Additionally Lighttpd must be compiled with 2 options --with-lua and –with-memcache.

#7.5 XCache

XCache is a newly emerging candidate in the market of caching PHP scripts. This is an independent software and not a module of Lighttpd. However it has been written by developers of Lighttpd.

XCache is an open-source opcode cacher, which means that it accelerates the performance of PHP on servers. It optimizes performance by removing the compilation time of PHP scripts by caching the compiled state of PHP scripts into the shm (RAM) and uses the compiled version straight from the RAM. This will increase the rate of page generation time by up to 5 times as it also optimizes many other aspects of php scripts and reduce server load. Some of the good features of XCache are:

Optimized opcode cache.
Using a generator to produce C code, reduces human mistake greatly.
Running stable on PHP_4_3/PHP_4_4
Supported and tested on all latest php cvs branches, such as PHP_4_3 PHP_4_4 PHP_5_0 PHP_5_1 PHP_5_2 HEAD (6.x)
Alpha supported for in-alpha-php6, with Unicode enabled.
Read-only Cacher Protection that prevents the cache from being corrupted by php-core/extension or any code other than XCache itself.
Atomic get/set/inc/dec API operation on var cache for php programmers.
Optimizer
Encoder/Decoder(Loader)
Administrator Script

view statistics

to see if it's AutoDisableOnCorrupted?

view cached php/variable list

clear cache

The last feature allows administrator to view statistic and cached PHP variables and manage caching behavior of XCache.

#7.5.1 Installing XCache

The standard way to install XCache is from source. Get your desired version of XCache from here. Then follow below steps to install it.

# tar -zxf xcache-*.tar.gz
# cd xcache
# phpize
# ./configure --enable-xcache
# make
# su
# make install
# cat xcache.ini >> /etc/php.ini

To make sure XCache is properly installed, run below command.

$ php-fcgi -v

It will show string like with XCache vX.X, Copyright (c) XXXX-XXXX, by XXX. Same can be checked from output of phpinfo() function also. Once XCache is installed, it will require to edit xcache.ini which contain various caching related directives to be used. However it is not mandatory to edit or change. A complete explanation of all the directives can be found from http://trac.lighttpd.net/xcache/wiki/PhpIni.

#7.5.2 Configuring Administrator panel

XCache Administrator panel is an important web interface that you can monitor and operate your opcode cache, seeing how well(or bad) it goes. Since this page is protected by http-auth, it will require to provide certain values in xcache.ini. For that set below 2 directives.

xcache.admin.user='USER'
xcache.admin.pass='MD5(PASSWORD)'

where USER is name of user you wish to use and MD5(PASSWORD) is MD5 encrypted string of password that you wish to use for given USER.

To set up web interface, copy xcache/admin/ (the whole directory) to your web document-root or sub-directory of it then request it from your browser, a http-auth prompt will popup where you will require to provide above USER and PASSWORD (as a normal string, not MD5 encrypted string). However sometimes installing XCache from rpm based utilities it may require to alias in web server instead of copying the script. To do so, add below directive in your server configuration file.

Apache:

Alias /xcache-admin/ /usr/share/xcache/admin/

Lighttpd:

alias.url += ("/xcache-admin/" => "/usr/share/xcache/admin/")

Gaining performance boost by using caching mechanism is tricky. Unless used carefully, it cannot give required boost. As we know that we can't cache everything (specially dynamic contents), we should try to cache whatever is left. This can be achieved by various types of caching as discussed above. Static contents are well cached by clients, if not then can be cached by web servers. PHP scripts can be cached using Opcode caching softwares like APC, XCache etc. While static part of dynamic data can be cached by modules like mod_cml (for lighttpd web server only).

#8 Summary

Summarily, Lighttpd web server is surely worth to have look at it and to be used for serving static data. For dynamic contents like PHP scripts, it is not optimized (because of support of only FastCGI) hence we have to wait until Shared module version PHP get started to support by it. At this moment it is widely used to server static contents only. So it will take time for it to really start competing with Apache.

However certain modules like mod_secdownload, mod_compress, mod_geoip, mod_trigger_b4_dl, mod_uploadprogress, mod_useronline etc. are peculiar modules of Lighttpd which can make it stand firmly with currently popular web servers.

#9 Links

http://www.onlamp.com/pub/a/onlamp/2007/04/05/the-lighttpd-web-server.html
http://survey.netcraft.com/Reports/0703/
http://schlitt.info/applications/blog/index.php?/archives/504-Apache-vs.-Lighttpd-echo-performance.html
http://trac.lighttpd.net/trac/wiki/TutorialInstallation
http://trac.lighttpd.net/trac/wiki/TutorialConfiguration
http://trac.lighttpd.net/trac/wiki/Docs:ConfigurationOptions
http://trac.lighttpd.net/trac/wiki/TutorialLighttpdAndPHP
http://trac.lighttpd.net/trac/wiki/Docs:ModUploadProgress
http://trac.lighttpd.net/trac/wiki/Docs:ModCompress
http://blog.lighttpd.net/articles/2006/08/01/mod_uploadprogress-is-back
http://trac.lighttpd.net/trac/wiki/Docs:ModCML
http://www-128.ibm.com/developerworks/library/os-php-fastapps1/index.html
http://trac.lighttpd.net/xcache/wiki/Faq

2 Jan 2008

Varnish accelerator

#1 Introduction

Varnish is a high performance HTTP accelerator (more precisely a Reverse proxy server) designed for content-heavy dynamic web sites. In contrast to other HTTP accelerators, many of which began life as client-side proxies or origin servers, Varnish was designed from the ground up as an HTTP accelerator. The Varnish web site claims that Varnish is ten to twenty times faster than the popular Squid cache on the same hardware.

Varnish is installed within the neighbourhood of one or more webservers. All connections coming from the Internet addressed to one of the webservers are routed through the proxy server, which may either deal with the request itself or pass the request wholly or partially to the main webserver.

There are various reasons to install reverse proxies. They are:

Security: the proxy server is an additional layer of defence and therefore protects the webservers further up the chain.
Encryption / SSL acceleration: when secure websites are created, the SSL encryption is sometimes not done by the webserver itself, but by a reverse proxy that is equipped with SSL acceleration hardware.
Load distribution: the reverse proxy can distribute the load to several servers, each server serving its own application area. In the case of reverse proxying in the neighbourhood of webservers, the reverse proxy may have to rewrite the URLs in each webpage (translation from externally known URLs to the internal locations).
Caching static content: A reverse proxy can offload the webservers by caching static content, such as images. Proxy caches of this sort can often satisfy a considerable amount of website requests, greatly reducing the load on the central web server.
Compression: the proxy server can optimize and compress the content to speed up the load time.
Spoon feeding: if a program is producing the webpage on the webservers, the webservers can produce it, serve it to the reverse-proxy, which can spoon-feed it however slowly the clients need and then close the program rather than having to keep it open while the clients insist on being spoon fed.

#2 Architecture

Varnish is heavily threaded, with each client connection being handled by a separate worker thread. When the configured limit on the number of active worker threads is reached, incoming connections are placed in an overflow queue; only when this queue reaches its configured limit will incoming connections be rejected.

The principal configuration mechanism is VCL (Varnish Configuration Language), a DSL used to write hooks which are called at critical points in the handling of each request. Most policy decisions are left to VCL code, making Varnish far more configurable and adaptable than most other HTTP accelerators. When a VCL script is loaded, it is translated to C, compiled to a shared object by the system compiler, and linked directly into the accelerator.

A number of run-time parameters control things such as the maximum and minimum number of worker threads, various timeouts etc. A command-line management interface allows these parameters to be modified, and new VCL scripts to be compiled, loaded and activated, without restarting the accelerator.

In order to reduce the number of system calls in the fast path to a minimum, log data is stored in shared memory, and the task of filtering, formatting and writing log data to disk is delegated to a separate application.

#3 Installation

Here we will go through quick installation process. Please get latest version of Varnish from here or check it out from repository.

#3.1 Prerequisites

The following tools are required to build Varnish:

A recent version of GCC.
A POSIX compatible make.
Recent versions of GNU autotools like automake, autoconf, libtool.

Latest versions of OSes are most likely to contain above mentioned items.

#3.2 Configuring and Building

$ ./autogen.sh

You may see some error messages. Check if configure and Makefile.in were generated. If they weren't, you probably need newer versions of the GNU autotools. If they were; run autogen.sh again: any error messages it still shows the second time around are most likely caused by bugs in autoconf macros installed by other software you have on your machine, and can safely be ignored.

Next, run configure. In most cases, the defaults are correct and you do not need to specify any command-line options, except perhaps --prefix. If you plan on hacking the Varnish sources, however, you will most likely want to turn on stricter error checks and dependency tracking:

$ ./configure

OR

$ ./configure --enable-debugging-symbols --enable-developer-warnings –enable-dependency-tracking

If configure completes without any errors, simply run below two commands to compile and install Varnish.

$ make
$ make install

For more information please visit this link .

#3.3 Enabling Varnish caching

Varnish API comes with Management console (telnet HOST/IP PORT), Caching process as a child process of management process (varnishd), and some utilities for logging (varnishlog and varnishncsa), statistics of caching (varnishstat), histogram (varnishhist) and log entry ranking (varnishtop).

Following commands can be used to enable varnish caching on your servers.

$ varnishd -a www.example.com:80 -b www.example.com:8080
$ varnishd -a www.example.com:80 -f /usr/local/etc/varnish/myconf.vcl
$ varnishd -a www.example.com:80 -b www.example.com:8080 -T www.example.com:6082
$ varnishd -a www.example.com:80 -f /usr/local/etc/varnish/myconf.vcl -T www.example.com:6082

1st command denotes that website www.example.com is originally running on port 8080 on Apache web server but it's running through Varnish under port 80 which is default port for http. This is must for production server but for development and/or test server, ports could be exactly in reverse because during development and testing you may want to run your websites without caching.

Sometimes we might want to use different caching policies (like caching documents having cookies) which is written in special configuration syntax called VCL; in that case 2nd command is useful to tell Varnish to use modified configuration language file than the default one. When -f switch is used, -b switch cannot be used together because values of -b switch is now mentioned in configuration file.

Once caching is started it can be controlled by management console from which caching can be started, stopped and various configuration values can be set and unset. For that 2 steps are needed.

enabling Varnish as shown in command 3 or 4 and
using Telnet utility to open management console on given port for given host (like telnet www.example.com 6082).

Please note that to start and stop caching do not just kill process, instead use management console to control caching for particular host.

Varnish stores log into memory hence to dump it in regular file on disk, use varnishlog or varnishncsa utilities. For more information and how to use these and other utilities, please check their man pages.

#4 VCL

#4.1 Description

VCL is an acronym for Varnish Configuration Language. In a VCL file, you configure how Varnish should behave. It is like Apache web server's httpd.conf and PHP's php.ini configuration files.

#4.2 Syntax

The VCL syntax is very simple, and deliberately similar to C and Perl. Blocks are delimited by curly braces, statements end with semicolons, and comments may be written as in C, C++ or Perl according to your own preferences.

In addition to the C-like assignment (=), comparison (==) and boolean (!, && and ||) operators, VCL sup-ports regular expression and ACL matching using the ~ operator.

Unlike C and Perl, the backslash (\) character has no special meaning in strings in VCL, so it can be freely used in regular expressions without doubling.

Assignments are introduced with the set keyword. There are no user-defined variables; values can only be assigned to variables attached to backend, request or document objects. Most of these are typed, and the values assigned to them must have a compatible unit suffix.

VCL has if tests, but no loops.

The contents of another VCL file may be inserted at any point in the code by using the include keyword followed by the name of the other file as a quoted string.

#4.3 How to

#4.3.1 refresh (purge) document when it gets changed on server?

Refreshing is often called purging a document. There are 2 different ways in Varnish to refresh (purge) any document/s:

From management console you can type below commands to control purging of desired documents. Regular expressions are allowed in syntax so many documents can be purged by giving few commands.

url.purge ^/$

url.purge .*html$

In VCL we can write logic to purge any document when request is method is PURGE. Which means any document that needs to get purged, will require to call same document by PURGE method to remove itself from cache. This is the most convenient and practical way to keep fresh copies of documents in cache. It is also automatic way so server administrator need not to manually purge large amount of documents.

Define all possible hosts only from which purging request will be accepted. This is good precaution so that not everyone can purge what is in cache.

acl purge
{
"myhost"; "123.456.789.1";
}

When request is received.

sub vcl_recv
{
if (req.request == "PURGE")
{
    if (!client.ip ~ purge)
    {
      error 405 "Not allowed.";
    }
    lookup;
}
}

When cache is hit (i.e document is to be served from cache).

sub vcl_hit
{
if(req.request == "PURGE")
{
    set obj.ttl = 0s;
    error 200 "Purged.";
}
}

When cache is missed (i.e document is to be served directly from backend server).

sub vcl_miss
{
if(req.request == "PURGE")
{
    error 404 "Not in cache.";
}
}

#4.3.2 cache documents even when cookies are present?

When request is received.

sub vcl_recv
{
if (req.request == "GET" && req.http.cookie)
{
    lookup;
}
}

Fetch document from backend server.

sub vcl_fetch
{
if (resp.http.Set-Cookie)
{
    insert;
}
}

#4.3.3 support multiple sites running on separate backends in the same Varnish instance?

Define all backend WWW servers which are to be used for caching.

backend www
{
set backend.host = "www.example.com";
set backend.port = "8080";
}

Define all backend Image servers which are to be used for caching.

backend images
{
set backend.host = "images.example.com";
set backend.port = "8080";
}

When request is received.

sub vcl_recv
{

if (req.http.host ~ "^(www.)?example.com.com$")
{
    set req.backend = www;
}
elsif (req.http.host ~ "^images.example.com")
{
    set req.backend = images;
}
else
{
    error 404 "Unknown virtual host";
}
}

#4.3.4 force a minimum TTL for all documents?

Fetch document from backend server.

sub vcl_fetch
{

  if (obj.ttl < 120s)

{
set obj.ttl = 120s;
}
}

#5 Performance

While Varnish is designed to reduce contention between threads to a minimum, its performance will only be as good as that of the system's pthreads implementation. Additionally, a poor malloc implementation may add unnecessary contention and thereby limit performance. On FreeBSD (using libthr) and Linux (using native threads), it is believed that performance is limited only by hardware.

When the requested document is in cache, response time is typically measured in microseconds. This is significantly better than most HTTP servers, so even sites consisting mostly of static content will mostly benefit from Varnish.

#6 Limitations

Current versions of Varnish do not understand the HTTP Vary: header, which can lead to problems with sites which support content negotiation.
the HTTP Host: header is always included in the object hash, so sites which can be accessed under multiple different names will have multiple copies of the same content cached.
Default policy of Varnish doesn't allow caching documents having cookies/sessions, which means websites heavily dependent upon cookies and session can not use Varnish out of the box for dynamic documents. To solve this problem VCL is to be tweaked as shown in section 4.3.2.
Varnish’s internal caching mechanism doesn’t obey even the minimum requisite client-side HTTP caching pragmas. It fails to obey other established caching headers, and support for them cannot even be implemented by end users through configuration, because there’s no mechanism to control cache behavior based on Web server HTTP headers — only on client headers. Which means preventing caching of files without an ETag response header is very hard to implement.
Varnish refuses to start if your /tmp is mounted noexec. Because Varnish attempts to compile a “shared lib” and load it from /tmp. Such problems are very hard to detect because the startup script doesn’t give any indication, and the log files don’t either.
There is lack of proper documentation for Varnish and VCL. There is some documentation in man pages but it is accessible only when you have Varnish installed on your PC.

Most of these limitations have been or are being addressed in the development version.

#7 Conclusion

Web accelerators (here caching software) are not install and forget type of software. They require constant monitoring and inspection on them for their behaviour and effectiveness. Software like Varnish have their limitations as shown in section 6 which must be kept in mind before using them. Then there are other things to be taken care of in your project to use caching most effectively.

Caching of documents is implemented on GET and HEAD methods only. Hence your project must have maximum documents using above 2 methods.
URL structure should be caching friendly.
For dynamic document session IDs should not get appended into URL because they are dynamic and different every time they are generated hence same document having such different session IDs makes caching of documents less effective because same document will have different versions in cache as session IDs are different.

#8 Links

http://varnish.linpro.no/
http://phk.freebsd.dk/pubs/varnish.pdf
http://rudd-o.com/archives/2007/07/02/why-the-varnish-cache-sucks-with-bonus-varnish-dev-whining-about-me/
http://projects.linpro.no/pipermail/varnish-misc/2007-July/000577.html
http://www.version2.dk/artikel/3084
http://varnish.projects.linpro.no/wiki/StatsExplained
http://varnish.projects.linpro.no/wiki/FAQ

6 Sept 2006

Various types of caching

#1 Caching overview

Caching is one of the important techniques to gain better performance in serving contents and increasing response time of your web sites. There are several types of caching mechanism that can be used to cache HTML, Media, Raw scripts, Compiled scripts etc. Let's understand what can be cached at what level from below diagram.

A request traverses through 3 components viz. Client, Internet and Server environment. At each component there can be implemented various caching mechanism according to requirement.

At client level browsers are good candidates of caching static media like CSS, Images, JS, Videos etc. Most browsers take such data from cache instead of requesting to serve as fresh copy. They handle caching of such media automatically hence no special instructions or headers are required to tell browser to how to cache such media. However browsers tend to cache almost everything that it renders hence special HTTP headers are passed to inform them to not to cache such items (usually PHP handles this automatically). Browser cache can be also managed by web server by sending certain HTTP headers if browsers are not behaving in normal ways. For that, web server module like mod_expire (for Apache) is used send cache control related HTTP headers. More details about mod_expire can be found later section.
Internet is intermediate between client and server which includes Gateways, ISPs, Proxies and several other components. At this level, ISP and Proxy cache can be implemented to cache pages, but since it is not in control of client or server it is not good option of caching.
Server environment is made of components like Apache web server, PHP scripting language, Middlewares like frameworks and database like MySQL. For each component, there exists separate caching mechanisms to boost performance of web site. Let's discuss about each in detail.

#2 Levels of caching in server environment

#2.1 Web server (Apache)

Certain web servers like Apache, Lighttpd etc. provide built in modules for caching contents at web server level. They are mod_cache, mod_disk_cache, mod_expire, mod_file_cache, mod_mem_cache.

#2.1.1 mod_cache

This module of Apache web server required to implement caching of HTTP contents. However it is always used along with other caching modules like mod_disk_cache or mod_mem_cache depending upon requirements. Detailed information about this module can be found from here.

#2.2.2 mod_disk_cache

This module is used to cache files using disk based storage manager i.e cached files are stored on disk on URI based keys and when same URI is requested, cache content is served directly from cache. This mechanism is similar like how browser caches entire page on local disk. The only difference here is that content is stored on server instead of client. Detailed information about this module can be found from here.

#2.2.3 mod_file_cache

This module provides two techniques for caching frequently requested static files. Through configuration directives, mod_file_cache can be directed to either open then mmap() a file, or to pre-open a file and save the file's open file handle. Both techniques reduce server load when processing requests for these files by doing part of the work (specifically, the file I/O) for serving the file when the server is started rather than during each request.

Not all platforms support both techniques, hence it is to be found out that which technique will work for specific environment.

This mmap() ing is done once at server start or restart, only. So whenever one of the mapped files changes on the filesystem, web server is to be restarted. However if the files are modified in place without restarting the server, only cached contents will be served. Hence files should get updated by unlinking the old copy and putting a new copy in place. Most tools such as rdist and mv do this. The reason why this modules do't take care of changes to the files is that this check would need an extra stat() every time which is a waste and against the intent of I/O reduction.

For certain systems, configuration and global files are ideal candidates for this module because they do not get changed frequently. This mechanism should not be used to cache media files as they can be effectively cached at client side. Detailed information about this module can be found from here.

#2.2.4 mod_mem_cache

As opposite to mod_disk_cache, mod_mem_cache implements memory based caching of contents which provides faster access to cached contents than disk. This module can be implemented in 2 ways;

by caching open file descriptors or
caching objects in heap storage. This module is most useful when it is used to cache locally generated content or to cache backend server content for mod_proxy configured as reverse proxy. Content is stored in and retrieved from the cache using URI based keys.

This module can be used to store session files in memory to share session across different services of particular system. Detailed information about this module can be found from here.

#2.2.5 mod_expire

This module controls the setting of the Expires HTTP header and the max-age directive of the Cache-Control HTTP header in server responses. The expiration date can set to be relative to either the time the source file was last modified, or to the time of the client access.

These HTTP headers are an instruction to the client about the document's validity and persistence. If cached, the document may be fetched from the cache rather than from the source until this time has passed. After that, the cache copy is considered "expired" and invalid, and a new copy must be obtained from the source.

This module is not useful when caching is turned off from browser because nothing is cached at server side. Since most browsers automatically handle caching of static media, this module may not be as useful as it seems. Detailed information about this module can be found from here.

#2.2.6 Reverse proxy (Varnish)

See here.

#2.2 Scripting language (PHP)

#2.2.1 Opcode/Bytecode cache

Opcode/Bytecode caching is caching your PHP script into compiled state so that when new request arrives for same script, cache software will server compiled version of code directly from cache rather than reading file from the disk and then compiling. Some examples of opcode caching softwares are Zend Platform, APC (APC GUI), XCache, eAccelerator, ionCube Encoder and PHP Accelerator. From these 6, eAccelerator, XCache and APC are widely used caching softwares. This benchmarks also show that how XCache and APC are better than others.

#2.3 Middleware (Symfony)

Symfony framework provides 3 types of caching of data. They are HTML cache of output, cache of configuration scripts and cache of translated templates. Last 2 mechanisms are automatic and handled by Symfony directly.

#2.3.1 HTML cache

HTML cache is nothing but output of script that is sent to browser for display. A web page is made of many sections. These sections are mostly designed in separate templates and files. Hence depending upon type of output, different type of HTML caching can be enabled in Symfony environment. Caching can be enabled by changing off => on in settings.yml file as shown below.

prod:

dev:

.settings:

cache : on

There are 5 types of HTML cache that can be implemented in Symfony. They are Action, Partial, Component, Page and Fragment of template. Each type is useful depending upon type of data. Enabling each type of cache requires specific settings in configuration files at various levels. For more details, please refer to Symfony API manual.

In modern web applications, most of data is dynamic hence great care should be taken while enabling caching in particular action or page.

However following care should be taken before implementing HTML cache in Symfony.

Setting less time of caching data may not effectively boost performance, similarly setting long time than required may result in annoyance of users since they could be viewing old data.
Structure of page should be divided into sections like which section can be cached and which can not be so that cached part can easily be identified and handled.
Enabling or Disabling cache should not be entirely depending upon just any programmer, instead it should be well discussed before making any change in it.
Static templates should not be cached as anyway they do not require extra processing time.
Cache should get cleared regularly either by automatic or manual way.

#2.3.3 Template Translations cache

Multilingual projects requires separate language files for each language. These language translation files are also cached by Symfony. Location of these cached files is /..../PROJECT/cache/i18n/. This cache is automatically handled by Symfony hence at code level nothing is to be done.

#2.4 Database (MySQL)

At database level (specially MySQL), query cache can be implemented which stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.

The query cache is extremely useful in an environment where you have tables that do not change very often and for which the server receives many identical queries. This is a typical situation for many web servers that generate many dynamic pages based on database content.
There are some points to be considered before implementing query cache.
The query cache does not return stale data. When tables are modified, any relevant entries in the query cache are flushed.
The query cache does not work in an environment where you have multiple mysqld servers updating the same MyISAM tables.
The query cache is not used for server-side prepared statements. If you're using server-side prepared statements consider that these statement won't be satisfied by the cache.

#3 Summary

Gaining performance boost by using caching mechanism is tricky. Unless used ca
refully, it cannot give required boost. As we know that we can't cache everything (specially dynamic contents), we should try to cache whatever is left. This can be achieved by various types of caching as discussed above. Static contents are well cached by clients, if not then can be cached by web servers. PHP scripts can be cached using Opcode caching softwares like APC, XCache etc. While static part of dynamic data can be cached by middleware like Symfony framework.

However only drawback of caching is that exact calculation of pages served, displayed, data transferred etc. becomes almost impossible which may effect rankings of website and thus effects popularity, revenue generations etc.

Anirudh Zala's Blog

Pages