Caching is one of the important techniques to gain better performance
in serving contents and increasing response time of your web sites.
There are several types of caching mechanism that can be used to cache
HTML, Media, Raw scripts, Compiled scripts etc. Let's understand what
can be cached at what level from below diagram.
A
request traverses through 3 components viz. Client, Internet and Server
environment. At each component there can be implemented various caching
mechanism according to requirement.
- At client level browsers are good candidates of caching static media like CSS, Images, JS, Videos etc. Most browsers take such data from cache instead of requesting to serve as fresh copy. They handle caching of such media automatically hence no special instructions or headers are required to tell browser to how to cache such media. However browsers tend to cache almost everything that it renders hence special HTTP headers are passed to inform them to not to cache such items (usually PHP handles this automatically). Browser cache can be also managed by web server by sending certain HTTP headers if browsers are not behaving in normal ways. For that, web server module like mod_expire (for Apache) is used send cache control related HTTP headers. More details about mod_expire can be found later section.
- Internet is intermediate between client and server which includes Gateways, ISPs, Proxies and several other components. At this level, ISP and Proxy cache can be implemented to cache pages, but since it is not in control of client or server it is not good option of caching.
- Server environment is made of components like Apache web server, PHP scripting language, Middlewares like frameworks and database like MySQL. For each component, there exists separate caching mechanisms to boost performance of web site. Let's discuss about each in detail.
Certain web servers like Apache, Lighttpd etc. provide built in modules for caching contents at web server level. They are mod_cache, mod_disk_cache, mod_expire, mod_file_cache, mod_mem_cache.
#2.1.1 mod_cache
This module of Apache web server required to implement caching of HTTP contents. However it is always used along with other caching modules like mod_disk_cache or mod_mem_cache depending upon requirements. Detailed information about this module can be found from here.
#2.2.2 mod_disk_cache
This module is used to cache files using disk based storage manager i.e cached files are stored on disk on URI based keys and when same URI is requested, cache content is served directly from cache. This mechanism is similar like how browser caches entire page on local disk. The only difference here is that content is stored on server instead of client. Detailed information about this module can be found from here.
#2.2.3 mod_file_cache
This module provides two techniques for caching frequently
requested static files. Through configuration directives, mod_file_cache
can be directed to either open then mmap() a file, or to pre-open a
file and save the file's open file handle. Both techniques reduce server
load when processing requests for these files by doing part of the work
(specifically, the file I/O) for serving the file when the server is
started rather than during each request.
Not all platforms support both techniques, hence it is to be found out that which technique will work for specific environment.
This mmap() ing is done once at server start or restart, only. So whenever one of the mapped files changes on the filesystem, web server is to be restarted. However if the files are modified in place without restarting the server, only cached contents will be served. Hence files should get updated by unlinking the old copy and putting a new copy in place. Most tools such as rdist and mv do this. The reason why this modules do't take care of changes to the files is that this check would need an extra stat() every time which is a waste and against the intent of I/O reduction.
For certain systems, configuration and global files are ideal candidates for this module because they do not get changed frequently. This mechanism should not be used to cache media files as they can be effectively cached at client side. Detailed information about this module can be found from here.
#2.2.4 mod_mem_cache
As opposite to mod_disk_cache, mod_mem_cache implements memory based caching of contents which provides faster access to cached contents than disk. This module can be implemented in 2 ways;
- by caching open file descriptors or
- caching objects in heap storage. This module is most useful when it is used to cache locally generated content or to cache backend server content for mod_proxy configured as reverse proxy. Content is stored in and retrieved from the cache using URI based keys.
This module can be used to store session files in memory to share session across different services of particular system. Detailed information about this module can be found from here.
#2.2.5 mod_expire
This module controls the setting of the Expires HTTP header and the max-age directive of the Cache-Control HTTP header in server responses. The expiration date can set to be relative to either the time the source file was last modified, or to the time of the client access.
These HTTP headers are an instruction to the client about the document's validity and persistence. If cached, the document may be fetched from the cache rather than from the source until this time has passed. After that, the cache copy is considered "expired" and invalid, and a new copy must be obtained from the source.
This module is not useful when caching is turned off from browser because nothing is cached at server side. Since most browsers automatically handle caching of static media, this module may not be as useful as it seems. Detailed information about this module can be found from here.
#2.2.6 Reverse proxy (Varnish)
#2.2.1 Opcode/Bytecode cache
Opcode/Bytecode caching is caching your PHP script into compiled state so that when new request arrives for same script, cache software will server compiled version of code directly from cache rather than reading file from the disk and then compiling. Some examples of opcode caching softwares are Zend Platform, APC (APC GUI), XCache, eAccelerator, ionCube Encoder and PHP Accelerator. From these 6, eAccelerator, XCache and APC are widely used caching softwares. This benchmarks also show that how XCache and APC are better than others.
#2.3 Middleware (Symfony)
Symfony framework provides 3 types of caching of data. They are HTML cache of output, cache of configuration scripts and cache of translated templates. Last 2 mechanisms are automatic and handled by Symfony directly.
#2.3.1 HTML cache
HTML cache is nothing but output of script that is sent to browser for display. A web page is made of many sections. These sections are mostly designed in separate templates and files. Hence depending upon type of output, different type of HTML caching can be enabled in Symfony environment. Caching can be enabled by changing off => on in settings.yml file as shown below.
prod:
dev:
.settings:
cache : on
There are 5 types of HTML cache that can be implemented in Symfony. They are Action, Partial, Component, Page and Fragment of template. Each type is useful depending upon type of data. Enabling each type of cache requires specific settings in configuration files at various levels. For more details, please refer to Symfony API manual.
In modern web applications, most of data is dynamic hence great care should be taken while enabling caching in particular action or page.
However following care should be taken before implementing HTML cache in Symfony.
- Setting less time of caching data may not effectively boost performance, similarly setting long time than required may result in annoyance of users since they could be viewing old data.
- Structure of page should be divided into sections like which section can be cached and which can not be so that cached part can easily be identified and handled.
- Enabling or Disabling cache should not be entirely depending upon just any programmer, instead it should be well discussed before making any change in it.
- Static templates should not be cached as anyway they do not require extra processing time.
- Cache should get cleared regularly either by automatic or manual way.
#2.3.3 Template Translations cache
Multilingual projects requires separate language files for each language. These language translation files are also cached by Symfony. Location of these cached files is /..../PROJECT/cache/i18n/. This cache is automatically handled by Symfony hence at code level nothing is to be done.
#2.4 Database (MySQL)
At database level (specially MySQL), query cache can be implemented which stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
- The query cache is extremely useful in an environment where you have tables that do not change very often and for which the server receives many identical queries. This is a typical situation for many web servers that generate many dynamic pages based on database content.
- There are some points to be considered before implementing query cache.
- The query cache does not return stale data. When tables are modified, any relevant entries in the query cache are flushed.
- The query cache does not work in an environment where you have multiple mysqld servers updating the same MyISAM tables.
- The query cache is not used for server-side prepared statements. If you're using server-side prepared statements consider that these statement won't be satisfied by the cache.
#3 Summary
Gaining performance boost by using caching mechanism is tricky. Unless used ca
refully, it cannot give required boost. As we know that we can't cache everything (specially dynamic contents), we should try to cache whatever is left. This can be achieved by various types of caching as discussed above. Static contents are well cached by clients, if not then can be cached by web servers. PHP scripts can be cached using Opcode caching softwares like APC, XCache etc. While static part of dynamic data can be cached by middleware like Symfony framework.
However only drawback of caching is that exact calculation of pages served, displayed, data transferred etc. becomes almost impossible which may effect rankings of website and thus effects popularity, revenue generations etc.
