Anirudh Zala's Blog: varnish

#1 Introduction

Varnish is a high performance HTTP accelerator (more precisely a Reverse proxy server) designed for content-heavy dynamic web sites. In contrast to other HTTP accelerators, many of which began life as client-side proxies or origin servers, Varnish was designed from the ground up as an HTTP accelerator. The Varnish web site claims that Varnish is ten to twenty times faster than the popular Squid cache on the same hardware.

Varnish is installed within the neighbourhood of one or more webservers. All connections coming from the Internet addressed to one of the webservers are routed through the proxy server, which may either deal with the request itself or pass the request wholly or partially to the main webserver.

There are various reasons to install reverse proxies. They are:

Security: the proxy server is an additional layer of defence and therefore protects the webservers further up the chain.
Encryption / SSL acceleration: when secure websites are created, the SSL encryption is sometimes not done by the webserver itself, but by a reverse proxy that is equipped with SSL acceleration hardware.
Load distribution: the reverse proxy can distribute the load to several servers, each server serving its own application area. In the case of reverse proxying in the neighbourhood of webservers, the reverse proxy may have to rewrite the URLs in each webpage (translation from externally known URLs to the internal locations).
Caching static content: A reverse proxy can offload the webservers by caching static content, such as images. Proxy caches of this sort can often satisfy a considerable amount of website requests, greatly reducing the load on the central web server.
Compression: the proxy server can optimize and compress the content to speed up the load time.
Spoon feeding: if a program is producing the webpage on the webservers, the webservers can produce it, serve it to the reverse-proxy, which can spoon-feed it however slowly the clients need and then close the program rather than having to keep it open while the clients insist on being spoon fed.

#2 Architecture

Varnish is heavily threaded, with each client connection being handled by a separate worker thread. When the configured limit on the number of active worker threads is reached, incoming connections are placed in an overflow queue; only when this queue reaches its configured limit will incoming connections be rejected.

The principal configuration mechanism is VCL (Varnish Configuration Language), a DSL used to write hooks which are called at critical points in the handling of each request. Most policy decisions are left to VCL code, making Varnish far more configurable and adaptable than most other HTTP accelerators. When a VCL script is loaded, it is translated to C, compiled to a shared object by the system compiler, and linked directly into the accelerator.

A number of run-time parameters control things such as the maximum and minimum number of worker threads, various timeouts etc. A command-line management interface allows these parameters to be modified, and new VCL scripts to be compiled, loaded and activated, without restarting the accelerator.

In order to reduce the number of system calls in the fast path to a minimum, log data is stored in shared memory, and the task of filtering, formatting and writing log data to disk is delegated to a separate application.

#3 Installation

Here we will go through quick installation process. Please get latest version of Varnish from here or check it out from repository.

#3.1 Prerequisites

The following tools are required to build Varnish:

A recent version of GCC.
A POSIX compatible make.
Recent versions of GNU autotools like automake, autoconf, libtool.

Latest versions of OSes are most likely to contain above mentioned items.

#3.2 Configuring and Building

$ ./autogen.sh

You may see some error messages. Check if configure and Makefile.in were generated. If they weren't, you probably need newer versions of the GNU autotools. If they were; run autogen.sh again: any error messages it still shows the second time around are most likely caused by bugs in autoconf macros installed by other software you have on your machine, and can safely be ignored.

Next, run configure. In most cases, the defaults are correct and you do not need to specify any command-line options, except perhaps --prefix. If you plan on hacking the Varnish sources, however, you will most likely want to turn on stricter error checks and dependency tracking:

$ ./configure

OR

$ ./configure --enable-debugging-symbols --enable-developer-warnings –enable-dependency-tracking

If configure completes without any errors, simply run below two commands to compile and install Varnish.

$ make
$ make install

For more information please visit this link .

#3.3 Enabling Varnish caching

Varnish API comes with Management console (telnet HOST/IP PORT), Caching process as a child process of management process (varnishd), and some utilities for logging (varnishlog and varnishncsa), statistics of caching (varnishstat), histogram (varnishhist) and log entry ranking (varnishtop).

Following commands can be used to enable varnish caching on your servers.

$ varnishd -a www.example.com:80 -b www.example.com:8080
$ varnishd -a www.example.com:80 -f /usr/local/etc/varnish/myconf.vcl
$ varnishd -a www.example.com:80 -b www.example.com:8080 -T www.example.com:6082
$ varnishd -a www.example.com:80 -f /usr/local/etc/varnish/myconf.vcl -T www.example.com:6082

1st command denotes that website www.example.com is originally running on port 8080 on Apache web server but it's running through Varnish under port 80 which is default port for http. This is must for production server but for development and/or test server, ports could be exactly in reverse because during development and testing you may want to run your websites without caching.

Sometimes we might want to use different caching policies (like caching documents having cookies) which is written in special configuration syntax called VCL; in that case 2nd command is useful to tell Varnish to use modified configuration language file than the default one. When -f switch is used, -b switch cannot be used together because values of -b switch is now mentioned in configuration file.

Once caching is started it can be controlled by management console from which caching can be started, stopped and various configuration values can be set and unset. For that 2 steps are needed.

enabling Varnish as shown in command 3 or 4 and
using Telnet utility to open management console on given port for given host (like telnet www.example.com 6082).

Please note that to start and stop caching do not just kill process, instead use management console to control caching for particular host.

Varnish stores log into memory hence to dump it in regular file on disk, use varnishlog or varnishncsa utilities. For more information and how to use these and other utilities, please check their man pages.

#4 VCL

#4.1 Description

VCL is an acronym for Varnish Configuration Language. In a VCL file, you configure how Varnish should behave. It is like Apache web server's httpd.conf and PHP's php.ini configuration files.

#4.2 Syntax

The VCL syntax is very simple, and deliberately similar to C and Perl. Blocks are delimited by curly braces, statements end with semicolons, and comments may be written as in C, C++ or Perl according to your own preferences.

In addition to the C-like assignment (=), comparison (==) and boolean (!, && and ||) operators, VCL sup-ports regular expression and ACL matching using the ~ operator.

Unlike C and Perl, the backslash (\) character has no special meaning in strings in VCL, so it can be freely used in regular expressions without doubling.

Assignments are introduced with the set keyword. There are no user-defined variables; values can only be assigned to variables attached to backend, request or document objects. Most of these are typed, and the values assigned to them must have a compatible unit suffix.

VCL has if tests, but no loops.

The contents of another VCL file may be inserted at any point in the code by using the include keyword followed by the name of the other file as a quoted string.

#4.3 How to

#4.3.1 refresh (purge) document when it gets changed on server?

Refreshing is often called purging a document. There are 2 different ways in Varnish to refresh (purge) any document/s:

From management console you can type below commands to control purging of desired documents. Regular expressions are allowed in syntax so many documents can be purged by giving few commands.

url.purge ^/$

url.purge .*html$

In VCL we can write logic to purge any document when request is method is PURGE. Which means any document that needs to get purged, will require to call same document by PURGE method to remove itself from cache. This is the most convenient and practical way to keep fresh copies of documents in cache. It is also automatic way so server administrator need not to manually purge large amount of documents.

Define all possible hosts only from which purging request will be accepted. This is good precaution so that not everyone can purge what is in cache.

acl purge
{
"myhost"; "123.456.789.1";
}

When request is received.

sub vcl_recv
{
if (req.request == "PURGE")
{
    if (!client.ip ~ purge)
    {
      error 405 "Not allowed.";
    }
    lookup;
}
}

When cache is hit (i.e document is to be served from cache).

sub vcl_hit
{
if(req.request == "PURGE")
{
    set obj.ttl = 0s;
    error 200 "Purged.";
}
}

When cache is missed (i.e document is to be served directly from backend server).

sub vcl_miss
{
if(req.request == "PURGE")
{
    error 404 "Not in cache.";
}
}

#4.3.2 cache documents even when cookies are present?

When request is received.

sub vcl_recv
{
if (req.request == "GET" && req.http.cookie)
{
    lookup;
}
}

Fetch document from backend server.

sub vcl_fetch
{
if (resp.http.Set-Cookie)
{
    insert;
}
}

#4.3.3 support multiple sites running on separate backends in the same Varnish instance?

Define all backend WWW servers which are to be used for caching.

backend www
{
set backend.host = "www.example.com";
set backend.port = "8080";
}

Define all backend Image servers which are to be used for caching.

backend images
{
set backend.host = "images.example.com";
set backend.port = "8080";
}

When request is received.

sub vcl_recv
{

if (req.http.host ~ "^(www.)?example.com.com$")
{
    set req.backend = www;
}
elsif (req.http.host ~ "^images.example.com")
{
    set req.backend = images;
}
else
{
    error 404 "Unknown virtual host";
}
}

#4.3.4 force a minimum TTL for all documents?

Fetch document from backend server.

sub vcl_fetch
{

  if (obj.ttl < 120s)

{
set obj.ttl = 120s;
}
}

#5 Performance

While Varnish is designed to reduce contention between threads to a minimum, its performance will only be as good as that of the system's pthreads implementation. Additionally, a poor malloc implementation may add unnecessary contention and thereby limit performance. On FreeBSD (using libthr) and Linux (using native threads), it is believed that performance is limited only by hardware.

When the requested document is in cache, response time is typically measured in microseconds. This is significantly better than most HTTP servers, so even sites consisting mostly of static content will mostly benefit from Varnish.

#6 Limitations

Current versions of Varnish do not understand the HTTP Vary: header, which can lead to problems with sites which support content negotiation.
the HTTP Host: header is always included in the object hash, so sites which can be accessed under multiple different names will have multiple copies of the same content cached.
Default policy of Varnish doesn't allow caching documents having cookies/sessions, which means websites heavily dependent upon cookies and session can not use Varnish out of the box for dynamic documents. To solve this problem VCL is to be tweaked as shown in section 4.3.2.
Varnish’s internal caching mechanism doesn’t obey even the minimum requisite client-side HTTP caching pragmas. It fails to obey other established caching headers, and support for them cannot even be implemented by end users through configuration, because there’s no mechanism to control cache behavior based on Web server HTTP headers — only on client headers. Which means preventing caching of files without an ETag response header is very hard to implement.
Varnish refuses to start if your /tmp is mounted noexec. Because Varnish attempts to compile a “shared lib” and load it from /tmp. Such problems are very hard to detect because the startup script doesn’t give any indication, and the log files don’t either.
There is lack of proper documentation for Varnish and VCL. There is some documentation in man pages but it is accessible only when you have Varnish installed on your PC.

Most of these limitations have been or are being addressed in the development version.

#7 Conclusion

Web accelerators (here caching software) are not install and forget type of software. They require constant monitoring and inspection on them for their behaviour and effectiveness. Software like Varnish have their limitations as shown in section 6 which must be kept in mind before using them. Then there are other things to be taken care of in your project to use caching most effectively.

Caching of documents is implemented on GET and HEAD methods only. Hence your project must have maximum documents using above 2 methods.
URL structure should be caching friendly.
For dynamic document session IDs should not get appended into URL because they are dynamic and different every time they are generated hence same document having such different session IDs makes caching of documents less effective because same document will have different versions in cache as session IDs are different.

#8 Links

http://varnish.linpro.no/
http://phk.freebsd.dk/pubs/varnish.pdf
http://rudd-o.com/archives/2007/07/02/why-the-varnish-cache-sucks-with-bonus-varnish-dev-whining-about-me/
http://projects.linpro.no/pipermail/varnish-misc/2007-July/000577.html
http://www.version2.dk/artikel/3084
http://varnish.projects.linpro.no/wiki/StatsExplained
http://varnish.projects.linpro.no/wiki/FAQ

Anirudh Zala's Blog

Pages

2 Jan 2008

Varnish accelerator

Followers