Pages

Showing posts with label symfony. Show all posts
Showing posts with label symfony. Show all posts

9 Jun 2011

Video uploading guide

#1 Introduction

This document provides information about setting up video uploading and streaming for PHP based websites. This guide has been prepared by studying various resources from Internet hence this is tried and tested and almost de-facto standard in video uploading, processing and streaming.

#2 Video uploading stages

There are 3 stages in video uploading viz. Uploading video, Processing it for streaming and Streaming.

#2.1 Uploading video

Generally video files are large in terms of file size hence separate page/interface is designed to handle long uploading process. By this interface, user can send their video to your server for streaming attached to ad.

#2.2 Processing video

Processing video involves activities such as creating video thumbnails (for promotion, preview etc.), converting video formats suitable for various browsers, extracting meta-data from video for various purposes.

#2.3 Streaming video

In 3rd stage, converted videos are streamed through flash player or by browser's built in media players supporting those video types.

#3 Implementation guidelines

This guidelines mainly emphasizes on set up of web server because it is the most important part and 99% remain same for most of video uploading and streaming purpose; processing and streaming is less critical since it varies from project to project.

#3.1 Uploading video

To upload various types of videos, we first need to set up webserver so that it can accept video files. It is also better to have separate machine for video uploading, processing and streaming so that website which used those videos will not share load given by video related operations as such operations heavily consumes memory and CPU.

In this article I have decided to use Lighttpd 1.5 as video uploading and streaming server mainly for 2 reasons:
  1. it is specially designed to serve static contents,
  2. it has such modules/plugins which provides information about uploading progress directly to caller script which is very convenient to developers to design interface with minium coding.
There are 2 alternating solutions also viz. Apache + apache-upload-progress-module and Nginx + nginx-upload-progress-module & nginx-upload-module. However there is not much feedback available about these 2 solutions, hence I decided not to use them and sticked to lighttpd since it is popular and trusted.

#3.1.1 Installing and configuring Lighttpd

For rpm based distributions, use following command to install lighttpd server related packages:

yum install pcre-devel glib2-devel zlib-devel openssl-devel spwan-fcgi php php-cli

Lighttpd 1.5 is not yet available in any yum repository hence we have to compile and configure it manually as shown below:
  • Download lighttpd.
cd /tmp/
wget http://download.lighttpd.net/lighttpd/snapshots-1.5/lighttpd-1.5.0-r2698.tar.gz
tar -zxvf lighttpd-1.5.0-r2698.tar.gz
cd lighttpd-1.5.0
  • Configure and install
./configure --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --with-pcre

make
make install
  • Add necessary user/group, directories and files
adduser -m -d /var/www -s /sbin/nologin lighttpd
mkdir /etc/lighttpd/
mkdir -p /www/logs/
mkdir -p /web/pages/
chown lighttpd:lighttpd /var/log/lighttpd
cp doc/lighttpd.conf /etc/lighttpd/
  • Make changes as per your setup by editing “/etc/lighttpd/lighttpd.conf” file
server.modules = ("mod_rewrite",                  "mod_access",
                  "mod_status",
                  "mod_uploadprogress",
                  "mod_proxy_core",
                  "mod_proxy_backend_fastcgi"
                  "mod_accesslog"
                )

server.max-request-size = 150000  // to support approx 120/150 MB of file.
upload-progress.progress-url = "/progress"
upload-progress.remove-timeout = 10

#### mod-proxy-core module
## read mod-proxy-core.txt for more info
## for PHP don't forget to set cgi.fix_pathinfo = 1 in the php.ini
$PHYSICAL["existing-path"] =~ "\.php$" {
  proxy-core.balancer = "round-robin"
  proxy-core.allow-x-sendfile = "enable"
  proxy-core.protocol = "fastcgi"
  proxy-core.backends = ( "unix:/tmp/php-fastcgi.sock" )
  proxy-core.max-pool-size = 16
}

# setup of host specific to video upload.
$HTTP["host"] =~ "video.myproject.com" {
  server.document-root = "/web/video.myproject.com"
  server.errorlog = "/web/logs/video.myproject.com_error.log"
  #accesslog.filename = "/web/logs/video.myproject.com_access.log"
  server.error-handler-404 = "http://www.myproject.com"

  $HTTP["url"]  =~ "^/upload" {
    proxy-core.balancer = "round-robin"
    proxy-core.protocol = "fastcgi"
    proxy-core.allow-x-sendfile = "enable"
    proxy-core.backends = (
      "unix:/tmp/upload_socket_1.sock",
      "unix:/tmp/upload_socket_2.sock",
      #"unix:/tmp/upload_socket_N.sock",
    )
    proxy-core.max-pool-size = 2  # as per backend.
  }
}

In above setup, what we are doing is that when video is uploaded to URI upload, we are proxying request to more than 1 socket using fastcgi protocol so that we can handle 2 to N uploads at a time on dedicated unix sockets. We do not need to worry about which socket is to be used and which is not, since webserver handles it on own. You can create more than 2 sockets also to handle more concurrent video uploads.

Here PHP script upload will contain code to move/copy video file at desired location making it available for further processing. This script will be normal PHP CGI script containing valid PHP code. Please note that for copying/renaming etc. you need file name so it is better to pass it from website as hidden variable of the form so that this script can rename video by that name.

Please note that for URI other than upload, dedicated php-fastcgi.sock will be used. Also do not forget to rotate 404 error log :)
  • Verify installation by running following command
lighttpd -t -f /etc/lighttpd/lighttpd.conf
  • Create init.d file “/etc/init.d/lighttpd” as shown below
#!/bin/sh
#
# lighttpd     Startup script for the lighttpd server
#
# chkconfig: - 85 15
# description: Lightning fast webserver with light system requirements
#
# processname: lighttpd
# config: /etc/lighttpd/lighttpd.conf
# config: /etc/sysconfig/lighttpd
# pidfile: /var/run/lighttpd.pid
#
# Note: pidfile is assumed to be created
# by lighttpd (config: server.pid-file).

# Source function library
. /etc/rc.d/init.d/functions

if [ -f /etc/sysconfig/lighttpd ]; then
  . /etc/sysconfig/lighttpd
fi

if [ -z "$LIGHTTPD_CONF_PATH" ]; then
  LIGHTTPD_CONF_PATH="/etc/lighttpd/lighttpd.conf"
fi

prog="lighttpd"
lighttpd="/usr/sbin/lighttpd"
RETVAL=0

start() {
  echo -n $"Starting $prog: "
  daemon $lighttpd -f $LIGHTTPD_CONF_PATH
  RETVAL=$?
  echo
  [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog
  /usr/bin/spawn-fcgi -s /tmp/php-fastcgi.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/spawn-fcgi.pid
  /usr/bin/spawn-fcgi -s /tmp/upload_socket_1.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_1.pid
  /usr/bin/spawn-fcgi -s /tmp/upload_socket_2.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_2.pid
  # /usr/bin/spawn-fcgi -s /tmp/upload_socket_N.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_N.pid
  return $RETVAL
}

stop() {
  echo -n $"Stopping $prog: "
  killproc $lighttpd
  killproc php-cgi
  RETVAL=$?
  echo
  [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$prog /tmp/php-fastcgi.sock
/var/run/spawn-fcgi.pid /tmp/upload_socket_1.sock /var/run/upload_socket_1.pid
/tmp/upload_socket_2.sock /var/run/upload_socket_2.pid
  return $RETVAL
}

reload() {
  echo -n $"Reloading $prog: "
  killproc $lighttpd -HUP
  RETVAL=$?
  echo
  return $RETVAL
}

case "$1" in
  start)
    start
    ;;
  stop)
    stop
    ;;
  restart)
    stop
    start
    ;;
  condrestart)
    if [ -f /var/lock/subsys/$prog ]; then
      stop
      start
    fi
    ;;
  reload)
    reload
    ;;
  status)
    status $lighttpd
    RETVAL=$?
    ;;
  *)
  echo $"Usage: $0 {start|stop|restart|condrestart|reload|status}"
  RETVAL=1
esac

exit $RETVAL

In above init.d file, I have merged creation of spawn-fcgi process along with lighttpd process because wihout “spawn-fcgi” process your PHP script can not receive data from webserver.
  • Start lighttpd service
chmod +x /etc/init.d/lighttpd
/etc/init.d/lighttpd start

#3.1.2 Creating interface to upload videos

To create interface on your website, follow this best example. It explains how to create HTML form, Jquery based JS code and some basic stylsheets. Please do not forget to validate video file name by extension. If you face cross site domain issue, follow this native example using iframe.

#3.2 Processing video

Once video is copies/moved at desired location, it needs to be processed by a script for various purposes. These operations are like converting video; extract metadata; creating thumbnails; etc. for streaming purpose.

This should be done by separate process script. Let's call it as “process.php” script. But we also need various other software for processing. Install them on video server, using following command:

yum install ffmpeg, flvtool2, compat-readline5, php-gd php-devel libaio-devel

This process.php script will be set into crontab and should run every 1 minute so that newly uploaded videos can be processed as fast as possible.

#3.2.1 Converting video

User might have uploaded videos from any source, so there is no guarantee that it can be played in any browser since all browsers do not support all codecs. Hence we must convert uploaded video into desired format. We decided to use flash format.

Run following command from your php script to convert video into FLV format:

ffmpeg -i INPUT_VIDEO -ar 22050 -ab 32 -ac 1 -f flv -b 700k -r 15 -s ASPECT_RATIO - 2>/dev/null | flvtool2 -U stdin OUTPUT_VIDEO.flv > /dev/null

In above command, we are combining use of flvtool2 to embed keyframe markers for streaming. In your script, you will require to adjust ASPECT_RATIO.

#3.2.2 Creating thumbnails

To extract thumbnail from video file, following command can be used:

ffmpeg -itsoffset -4 -i VIDEO_FILE -vcodec CODEC -vframes 1 -an -f rawvideo -s 320x240 OUTPUT.jpg

This command generates a 320×240 sized JPG thumbnail at the 4th second in the video. You can use this example to randomly create thumbnails according to length of the video.

#3.2.3 Extract metadata

To extract metadata, following command can be used:

ffmpeg -i INPUT_VIDEO

It will print lot of metadata about video in text format which can be stored in database or used while streaming video.

#3.3 Streaming video

Streaming video requires support from webserver, JS, Flash player and some HTML work.

#3.3.1 Preparing server

Lighttpd server has built in streaming support to stream video files. To support streaming using keyframes, enable required module in server “/etc/lighttpd/lighttpd.conf” configuration file:

server.modules += ( "mod_flv_streaming" )
flv-streaming.extensions = ( ".flv" )

Restart web service to reflect above changes. Now server is ready to stream video files in flv format with support of keyframes.

#3.3.2 Streaming through HTML5

There are 2 ways to stream video. They are either using HTML5's native “video” tag or using Flash player as container.

Streaming video through HTML5 is as easy as showing image in browser, but unfortunately not all browsers support HTML5 because their support started to arrive in latest browser only in beginning of year 2011. Moreover even if browser supports HTML5, not all browsers supports all codec (another round of browser war) hence if user uploads video with H.264 codec, it will not be played in firefox and chrome browsers. Similarly if video is encoded using Theora codec then it will not be played in IE. More information about this situation can be found from here.

However if still it is decided to use HTML5 then following HTML tag can be used:

<video src="movie.mpeg" controls="controls">
Fallback flash player based video streaming code.
</video>

That's it, by this way any video file can be played without any JS/HTML code if browser supports video file's codec natively.

#3.3.3 Streaming through Flash player

Unfortunately standard solution is to use Flash player based video streaming method which streams video inside flash container. That is why earlier we had to convert video into “flv” format :). Because flash player natively supports almost all codec.

To stream video using flash container, follow this excellent tutorial.

#4 Improvements
  1. In this article, I have not discussed about realtime video format validation to prevent users from uploading junks.
  2. Since video server mostly serves video files and occasionally JS and HTML, you should deny access for other files than these.
  3. When you will require more features for video processing, streaming you will need  to use wrapper classes like phpvideotoolkit and native ffmpeg-php extension.
#5 Resources

http://flowplayer.org/plugins/streaming/pseudostreaming.html
http://en.wikipedia.org/wiki/Flash_Video#Format_details
http://praegnanz.de/html5video/
http://uakino.net/media/document/1009.pdf
http://diveintohtml5.info/video.html

6 Sept 2006

Various types of caching

    #1 Caching overview
      Caching is one of the important techniques to gain better performance in serving contents and increasing response time of your web sites. There are several types of caching mechanism that can be used to cache HTML, Media, Raw scripts, Compiled scripts etc. Let's understand what can be cached at what level from below diagram.


      A request traverses through 3 components viz. Client, Internet and Server environment. At each component there can be implemented various caching mechanism according to requirement.
      1. At client level browsers are good candidates of caching static media like CSS, Images, JS, Videos etc. Most browsers take such data from cache instead of requesting to serve as fresh copy. They handle caching of such media automatically hence no special instructions or headers are required to tell browser to how to cache such media. However browsers tend to cache almost everything that it renders hence special HTTP headers are passed to inform them to not to cache such items (usually PHP handles this automatically). Browser cache can be also managed by web server by sending certain HTTP headers if browsers are not behaving in normal ways. For that, web server module like mod_expire (for Apache) is used send cache control related HTTP headers. More details about mod_expire can be found later section.
      2. Internet is intermediate between client and server which includes Gateways, ISPs, Proxies and several other components. At this level, ISP and Proxy cache can be implemented to cache pages, but since it is not in control of client or server it is not good option of caching.
      3. Server environment is made of components like Apache web server, PHP scripting language, Middlewares like frameworks and database like MySQL. For each component, there exists separate caching mechanisms to boost performance of web site. Let's discuss about each in detail.
      #2 Levels of caching in server environment
      #2.1 Web server (Apache)
      Certain web servers like Apache, Lighttpd etc. provide built in modules for caching contents at web server level. They are mod_cache, mod_disk_cache, mod_expire, mod_file_cache, mod_mem_cache.

      #2.1.1 mod_cache

      This module of Apache web server required to implement caching of HTTP contents. However it is always used along with other caching modules like mod_disk_cache or mod_mem_cache depending upon requirements. Detailed information about this module can be found from here.

      #2.2.2 mod_disk_cache

      This module is used to cache files using disk based storage manager i.e cached files are stored on disk on URI based keys and when same URI is requested, cache content is served directly from cache. This mechanism is similar like how browser caches entire page on local disk. The only difference here is that content is stored on server instead of client. Detailed information about this module can be found from here.

      #2.2.3 mod_file_cache

      This module provides two techniques for caching frequently requested static files. Through configuration directives, mod_file_cache can be directed to either open then mmap() a file, or to pre-open a file and save the file's open file handle. Both techniques reduce server load when processing requests for these files by doing part of the work (specifically, the file I/O) for serving the file when the server is started rather than during each request.

      Not all platforms support both techniques, hence it is to be found out that which technique will work for specific environment.

      This mmap() ing is done once at server start or restart, only. So whenever one of the mapped files changes on the filesystem, web server is to be restarted. However if the files are modified in place without restarting the server, only cached contents will be served. Hence files should get updated by unlinking the old copy and putting a new copy in place. Most tools such as rdist and mv do this. The reason why this modules do't take care of changes to the files is that this check would need an extra stat() every time which is a waste and against the intent of I/O reduction.

      For certain systems, configuration and global files are ideal candidates for this module because they do not get changed frequently. This mechanism should not be used to cache media files as they can be effectively cached at client side. Detailed information about this module can be found from here.

      #2.2.4 mod_mem_cache

      As opposite to mod_disk_cache, mod_mem_cache implements memory based caching of contents which provides faster access to cached contents than disk. This module can be implemented in 2 ways;
      1. by caching open file descriptors or
      2. caching objects in heap storage. This module is most useful when it is used to cache locally generated content or to cache backend server content for mod_proxy configured as reverse proxy. Content is stored in and retrieved from the cache using URI based keys.
      This module can be used to store session files in memory to share session across different services of particular system. Detailed information about this module can be found from here.

      #2.2.5 mod_expire

      This module controls the setting of the Expires HTTP header and the max-age directive of the Cache-Control HTTP header in server responses. The expiration date can set to be relative to either the time the source file was last modified, or to the time of the client access.

      These HTTP headers are an instruction to the client about the document's validity and persistence. If cached, the document may be fetched from the cache rather than from the source until this time has passed. After that, the cache copy is considered "expired" and invalid, and a new copy must be obtained from the source.

      This module is not useful when caching is turned off from browser because nothing is cached at server side. Since most browsers automatically handle caching of static media, this module may not be as useful as it seems. Detailed information about this module can be found from here.

      #2.2.6 Reverse proxy (Varnish)


      #2.2 Scripting language (PHP)

      #2.2.1 Opcode/Bytecode cache

      Opcode/Bytecode caching is caching your PHP script into compiled state so that when new request arrives for same script, cache software will server compiled version of code directly from cache rather than reading file from the disk and then compiling. Some examples of opcode caching softwares are Zend Platform, APC (APC GUI), XCache, eAccelerator, ionCube Encoder and PHP Accelerator. From these 6, eAccelerator, XCache and APC are widely used caching softwares. This benchmarks also show that how XCache and APC are better than others.

      #2.3 Middleware (Symfony)

      Symfony framework provides 3 types of caching of data. They are HTML cache of output, cache of configuration scripts and cache of translated templates. Last 2 mechanisms are automatic and handled by Symfony directly.

      #2.3.1 HTML cache

      HTML cache is nothing but output of script that is sent to browser for display. A web page is made of many sections. These sections are mostly designed in separate templates and files. Hence depending upon type of output, different type of HTML caching can be enabled in Symfony environment. Caching can be enabled by changing off => on in settings.yml file as shown below.

      prod:
      dev:
        .settings:
          cache : on

      There are 5 types of HTML cache that can be implemented in Symfony. They are Action, Partial, Component, Page and Fragment of template. Each type is useful depending upon type of data. Enabling each type of cache requires specific settings in configuration files at various levels. For more details, please refer to Symfony API manual.

      In modern web applications, most of data is dynamic hence great care should be taken while enabling caching in particular action or page.

      However following care should be taken before implementing HTML cache in Symfony.
      1. Setting less time of caching data may not effectively boost performance, similarly setting long time than required may result in annoyance of users since they could be viewing old data.
      2. Structure of page should be divided into sections like which section can be cached and which can not be so that cached part can easily be identified and handled.
      3. Enabling or Disabling cache should not be entirely depending upon just any programmer, instead it should be well discussed before making any change in it.
      4. Static templates should not be cached as anyway they do not require extra processing time.
      5. Cache should get cleared regularly either by automatic or manual way.
      #2.3.3 Template Translations cache

      Multilingual projects requires separate language files for each language. These language translation files are also cached by Symfony. Location of these cached files is /..../PROJECT/cache/i18n/. This cache is automatically handled by Symfony hence at code level nothing is to be done.

      #2.4 Database (MySQL)

      At database level (specially MySQL), query cache can be implemented which stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
      • The query cache is extremely useful in an environment where you have tables that do not change very often and for which the server receives many identical queries. This is a typical situation for many web servers that generate many dynamic pages based on database content.
      • There are some points to be considered before implementing query cache.
      • The query cache does not return stale data. When tables are modified, any relevant entries in the query cache are flushed.
      • The query cache does not work in an environment where you have multiple mysqld servers updating the same MyISAM tables.
      • The query cache is not used for server-side prepared statements. If you're using server-side prepared statements consider that these statement won't be satisfied by the cache.
      #3 Summary

      Gaining performance boost by using caching mechanism is tricky. Unless used ca
      refully, it cannot give required boost. As we know that we can't cache everything (specially dynamic contents), we should try to cache whatever is left. This can be achieved by various types of caching as discussed above. Static contents are well cached by clients, if not then can be cached by web servers. PHP scripts can be cached using Opcode caching softwares like APC, XCache etc. While static part of dynamic data can be cached by middleware like Symfony framework.

      However only drawback of caching is that exact calculation of pages served, displayed, data transferred etc. becomes almost impossible which may effect rankings of website and thus effects popularity, revenue generations etc.