Pages

Showing posts with label html. Show all posts
Showing posts with label html. Show all posts

9 Jun 2011

Video uploading guide

#1 Introduction

This document provides information about setting up video uploading and streaming for PHP based websites. This guide has been prepared by studying various resources from Internet hence this is tried and tested and almost de-facto standard in video uploading, processing and streaming.

#2 Video uploading stages

There are 3 stages in video uploading viz. Uploading video, Processing it for streaming and Streaming.

#2.1 Uploading video

Generally video files are large in terms of file size hence separate page/interface is designed to handle long uploading process. By this interface, user can send their video to your server for streaming attached to ad.

#2.2 Processing video

Processing video involves activities such as creating video thumbnails (for promotion, preview etc.), converting video formats suitable for various browsers, extracting meta-data from video for various purposes.

#2.3 Streaming video

In 3rd stage, converted videos are streamed through flash player or by browser's built in media players supporting those video types.

#3 Implementation guidelines

This guidelines mainly emphasizes on set up of web server because it is the most important part and 99% remain same for most of video uploading and streaming purpose; processing and streaming is less critical since it varies from project to project.

#3.1 Uploading video

To upload various types of videos, we first need to set up webserver so that it can accept video files. It is also better to have separate machine for video uploading, processing and streaming so that website which used those videos will not share load given by video related operations as such operations heavily consumes memory and CPU.

In this article I have decided to use Lighttpd 1.5 as video uploading and streaming server mainly for 2 reasons:
  1. it is specially designed to serve static contents,
  2. it has such modules/plugins which provides information about uploading progress directly to caller script which is very convenient to developers to design interface with minium coding.
There are 2 alternating solutions also viz. Apache + apache-upload-progress-module and Nginx + nginx-upload-progress-module & nginx-upload-module. However there is not much feedback available about these 2 solutions, hence I decided not to use them and sticked to lighttpd since it is popular and trusted.

#3.1.1 Installing and configuring Lighttpd

For rpm based distributions, use following command to install lighttpd server related packages:

yum install pcre-devel glib2-devel zlib-devel openssl-devel spwan-fcgi php php-cli

Lighttpd 1.5 is not yet available in any yum repository hence we have to compile and configure it manually as shown below:
  • Download lighttpd.
cd /tmp/
wget http://download.lighttpd.net/lighttpd/snapshots-1.5/lighttpd-1.5.0-r2698.tar.gz
tar -zxvf lighttpd-1.5.0-r2698.tar.gz
cd lighttpd-1.5.0
  • Configure and install
./configure --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --with-pcre

make
make install
  • Add necessary user/group, directories and files
adduser -m -d /var/www -s /sbin/nologin lighttpd
mkdir /etc/lighttpd/
mkdir -p /www/logs/
mkdir -p /web/pages/
chown lighttpd:lighttpd /var/log/lighttpd
cp doc/lighttpd.conf /etc/lighttpd/
  • Make changes as per your setup by editing “/etc/lighttpd/lighttpd.conf” file
server.modules = ("mod_rewrite",                  "mod_access",
                  "mod_status",
                  "mod_uploadprogress",
                  "mod_proxy_core",
                  "mod_proxy_backend_fastcgi"
                  "mod_accesslog"
                )

server.max-request-size = 150000  // to support approx 120/150 MB of file.
upload-progress.progress-url = "/progress"
upload-progress.remove-timeout = 10

#### mod-proxy-core module
## read mod-proxy-core.txt for more info
## for PHP don't forget to set cgi.fix_pathinfo = 1 in the php.ini
$PHYSICAL["existing-path"] =~ "\.php$" {
  proxy-core.balancer = "round-robin"
  proxy-core.allow-x-sendfile = "enable"
  proxy-core.protocol = "fastcgi"
  proxy-core.backends = ( "unix:/tmp/php-fastcgi.sock" )
  proxy-core.max-pool-size = 16
}

# setup of host specific to video upload.
$HTTP["host"] =~ "video.myproject.com" {
  server.document-root = "/web/video.myproject.com"
  server.errorlog = "/web/logs/video.myproject.com_error.log"
  #accesslog.filename = "/web/logs/video.myproject.com_access.log"
  server.error-handler-404 = "http://www.myproject.com"

  $HTTP["url"]  =~ "^/upload" {
    proxy-core.balancer = "round-robin"
    proxy-core.protocol = "fastcgi"
    proxy-core.allow-x-sendfile = "enable"
    proxy-core.backends = (
      "unix:/tmp/upload_socket_1.sock",
      "unix:/tmp/upload_socket_2.sock",
      #"unix:/tmp/upload_socket_N.sock",
    )
    proxy-core.max-pool-size = 2  # as per backend.
  }
}

In above setup, what we are doing is that when video is uploaded to URI upload, we are proxying request to more than 1 socket using fastcgi protocol so that we can handle 2 to N uploads at a time on dedicated unix sockets. We do not need to worry about which socket is to be used and which is not, since webserver handles it on own. You can create more than 2 sockets also to handle more concurrent video uploads.

Here PHP script upload will contain code to move/copy video file at desired location making it available for further processing. This script will be normal PHP CGI script containing valid PHP code. Please note that for copying/renaming etc. you need file name so it is better to pass it from website as hidden variable of the form so that this script can rename video by that name.

Please note that for URI other than upload, dedicated php-fastcgi.sock will be used. Also do not forget to rotate 404 error log :)
  • Verify installation by running following command
lighttpd -t -f /etc/lighttpd/lighttpd.conf
  • Create init.d file “/etc/init.d/lighttpd” as shown below
#!/bin/sh
#
# lighttpd     Startup script for the lighttpd server
#
# chkconfig: - 85 15
# description: Lightning fast webserver with light system requirements
#
# processname: lighttpd
# config: /etc/lighttpd/lighttpd.conf
# config: /etc/sysconfig/lighttpd
# pidfile: /var/run/lighttpd.pid
#
# Note: pidfile is assumed to be created
# by lighttpd (config: server.pid-file).

# Source function library
. /etc/rc.d/init.d/functions

if [ -f /etc/sysconfig/lighttpd ]; then
  . /etc/sysconfig/lighttpd
fi

if [ -z "$LIGHTTPD_CONF_PATH" ]; then
  LIGHTTPD_CONF_PATH="/etc/lighttpd/lighttpd.conf"
fi

prog="lighttpd"
lighttpd="/usr/sbin/lighttpd"
RETVAL=0

start() {
  echo -n $"Starting $prog: "
  daemon $lighttpd -f $LIGHTTPD_CONF_PATH
  RETVAL=$?
  echo
  [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog
  /usr/bin/spawn-fcgi -s /tmp/php-fastcgi.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/spawn-fcgi.pid
  /usr/bin/spawn-fcgi -s /tmp/upload_socket_1.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_1.pid
  /usr/bin/spawn-fcgi -s /tmp/upload_socket_2.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_2.pid
  # /usr/bin/spawn-fcgi -s /tmp/upload_socket_N.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_N.pid
  return $RETVAL
}

stop() {
  echo -n $"Stopping $prog: "
  killproc $lighttpd
  killproc php-cgi
  RETVAL=$?
  echo
  [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$prog /tmp/php-fastcgi.sock
/var/run/spawn-fcgi.pid /tmp/upload_socket_1.sock /var/run/upload_socket_1.pid
/tmp/upload_socket_2.sock /var/run/upload_socket_2.pid
  return $RETVAL
}

reload() {
  echo -n $"Reloading $prog: "
  killproc $lighttpd -HUP
  RETVAL=$?
  echo
  return $RETVAL
}

case "$1" in
  start)
    start
    ;;
  stop)
    stop
    ;;
  restart)
    stop
    start
    ;;
  condrestart)
    if [ -f /var/lock/subsys/$prog ]; then
      stop
      start
    fi
    ;;
  reload)
    reload
    ;;
  status)
    status $lighttpd
    RETVAL=$?
    ;;
  *)
  echo $"Usage: $0 {start|stop|restart|condrestart|reload|status}"
  RETVAL=1
esac

exit $RETVAL

In above init.d file, I have merged creation of spawn-fcgi process along with lighttpd process because wihout “spawn-fcgi” process your PHP script can not receive data from webserver.
  • Start lighttpd service
chmod +x /etc/init.d/lighttpd
/etc/init.d/lighttpd start

#3.1.2 Creating interface to upload videos

To create interface on your website, follow this best example. It explains how to create HTML form, Jquery based JS code and some basic stylsheets. Please do not forget to validate video file name by extension. If you face cross site domain issue, follow this native example using iframe.

#3.2 Processing video

Once video is copies/moved at desired location, it needs to be processed by a script for various purposes. These operations are like converting video; extract metadata; creating thumbnails; etc. for streaming purpose.

This should be done by separate process script. Let's call it as “process.php” script. But we also need various other software for processing. Install them on video server, using following command:

yum install ffmpeg, flvtool2, compat-readline5, php-gd php-devel libaio-devel

This process.php script will be set into crontab and should run every 1 minute so that newly uploaded videos can be processed as fast as possible.

#3.2.1 Converting video

User might have uploaded videos from any source, so there is no guarantee that it can be played in any browser since all browsers do not support all codecs. Hence we must convert uploaded video into desired format. We decided to use flash format.

Run following command from your php script to convert video into FLV format:

ffmpeg -i INPUT_VIDEO -ar 22050 -ab 32 -ac 1 -f flv -b 700k -r 15 -s ASPECT_RATIO - 2>/dev/null | flvtool2 -U stdin OUTPUT_VIDEO.flv > /dev/null

In above command, we are combining use of flvtool2 to embed keyframe markers for streaming. In your script, you will require to adjust ASPECT_RATIO.

#3.2.2 Creating thumbnails

To extract thumbnail from video file, following command can be used:

ffmpeg -itsoffset -4 -i VIDEO_FILE -vcodec CODEC -vframes 1 -an -f rawvideo -s 320x240 OUTPUT.jpg

This command generates a 320×240 sized JPG thumbnail at the 4th second in the video. You can use this example to randomly create thumbnails according to length of the video.

#3.2.3 Extract metadata

To extract metadata, following command can be used:

ffmpeg -i INPUT_VIDEO

It will print lot of metadata about video in text format which can be stored in database or used while streaming video.

#3.3 Streaming video

Streaming video requires support from webserver, JS, Flash player and some HTML work.

#3.3.1 Preparing server

Lighttpd server has built in streaming support to stream video files. To support streaming using keyframes, enable required module in server “/etc/lighttpd/lighttpd.conf” configuration file:

server.modules += ( "mod_flv_streaming" )
flv-streaming.extensions = ( ".flv" )

Restart web service to reflect above changes. Now server is ready to stream video files in flv format with support of keyframes.

#3.3.2 Streaming through HTML5

There are 2 ways to stream video. They are either using HTML5's native “video” tag or using Flash player as container.

Streaming video through HTML5 is as easy as showing image in browser, but unfortunately not all browsers support HTML5 because their support started to arrive in latest browser only in beginning of year 2011. Moreover even if browser supports HTML5, not all browsers supports all codec (another round of browser war) hence if user uploads video with H.264 codec, it will not be played in firefox and chrome browsers. Similarly if video is encoded using Theora codec then it will not be played in IE. More information about this situation can be found from here.

However if still it is decided to use HTML5 then following HTML tag can be used:

<video src="movie.mpeg" controls="controls">
Fallback flash player based video streaming code.
</video>

That's it, by this way any video file can be played without any JS/HTML code if browser supports video file's codec natively.

#3.3.3 Streaming through Flash player

Unfortunately standard solution is to use Flash player based video streaming method which streams video inside flash container. That is why earlier we had to convert video into “flv” format :). Because flash player natively supports almost all codec.

To stream video using flash container, follow this excellent tutorial.

#4 Improvements
  1. In this article, I have not discussed about realtime video format validation to prevent users from uploading junks.
  2. Since video server mostly serves video files and occasionally JS and HTML, you should deny access for other files than these.
  3. When you will require more features for video processing, streaming you will need  to use wrapper classes like phpvideotoolkit and native ffmpeg-php extension.
#5 Resources

http://flowplayer.org/plugins/streaming/pseudostreaming.html
http://en.wikipedia.org/wiki/Flash_Video#Format_details
http://praegnanz.de/html5video/
http://uakino.net/media/document/1009.pdf
http://diveintohtml5.info/video.html

9 Sept 2008

SEO guidelines

#1 About this document

This document aims to defines generic search engine optimization requirements for various projects.

At this moment this document contains general guidelines of SEO. In future, at the time of taking training session, this document will be expanded further in order to be used as perfect resource for almost all SEO requirements.

#2 General requirements

#2.1 Server location

The server should be located in same country from where it will be mostly accessed. Moreover If the service will have it’s own domain then it should reside on a dedicated server. Wildcard DNS should not be allowed as well as all sub domains, if any, should be activated separately.

#2.2 Robots.txt

The robots.txt file is to be placed in the root (value of DocumentRoot directive in case web server is Apache) directory of the software. It should allow the search engines to crawl all directories where information related to various entities will be shown.

Personal pages such as listing owner's entities, posting/editing entities that require login should be blocked. Most search engines now a days are able to find out this behavior hence you may omit such entries into robots.txt file.

#2.3 Encoding

If there are will be used special characters in language of website they will need to get encoded in URLs (maybe using PHP function like urlencode()) and Filenames using UTF-8 encoding. For full documentation of encoding of such characters, please visit http://www1.tip.nl/~t876506/utf8tbl.html). As practically all browsers supports the Unicode UTF-8 standard, it should not be important to encode the characters in the actual content. The suitable HTML entities can be taken from this address: http://leftlogic.com/lounge/articles/entity-lookup/ anyhow.

There should be a 301 redirect from any page with special characters in the URL where someone writes the URL using the special characters and not the encoded ones if that user has a browser that over writes the UTF-8 character set with some other character set. See how Wikipedia functions for an example. This prevents links with the wrong character set to be used on external pages.

#2.4 Header responses

#2.4.1 Page not found (404 error)

Entities that are removed from database/software should not be shown. When someone accesses the removed listings page the server should respond with a 404 header response (and not a 200 response) and show an error message (or optionally a separate page) saying that the entity is already deleted/expired/sold etc. Furthermore the relevant listing page should be shown.

#2.4.2 Redirects (301 error)

As a general rule of thumb all redirects should be done using the 301 permanently moved response. All sub domains should be redirected this way (example.com -> 301 -> www.example.com) and also all other domains that contain the same information, as shown below;

www.example.net -> 301 -> www.example.com
www.example.in -> 301 -> www.example.com

Assure also that only the specified URLs work and make a 301 redirect rule for all non-specified URL’s when called missing.

#3 General page requirements

#3.1 Using standards

The site should comply with the World Wide Web consortium’s (http://www.w3.org/) recommendations for creating web pages (XHTML 1.0 Transitional should be enough) and also comply with the Americans with disabilities act (http://www.ada.gov/) if required.

#3.2 Page design

The pages should be designed with CSS positioning and the content part of the page should appear in the source code as early as possible preferably before other body content such as navigational blocks.

The navigation should be implemented with anchor tags and text and the links should not redirect.

Breadcrumb navigation would increase SEO with internal back links and usability in a sense that the visitor would see their location on site. Example of the breadcrumb navigation: Home => List furniture items => View table => ...

Scripts and other elements (CSS) should be put in external files. The source code should be kept clean with little or no unused code. The preferred maximum file size for HTML code is 100 KB.

#3.3 Elements of a page

The following elements should always be included (and be editable somehow) on a page which is to be indexed by search engine:
  • Page title ([title]-element in the header)
  • Meta description, robots and keywords (in the header)
  • Page heading (one [h1] per page)
#3.3.1 Page title ([title]-element in the header)

A page title should be as specific and concise as possible with respect to the document. This will insure its uniqueness and click-through in Search Engine Result Pages. A structure similar to "Page name | Section name | Site title - Tagline" is encouraged for clarity, uniqueness and better usability for the visitor. Focus on delivering a title that spans from specific (closer to the beginning) to general keywords. The length of the title needs no more then 80 characters.

#3.3.2 Meta description, robots and keywords (in the header)

HTML meta description around 150 characters should be sufficient. Although it doesn't hurt to be a little more, this data should contain the most concise information about the document. The uniqueness of this information also plays a fair role as far as Search Engine Result Pages are concerned.

Meta keywords on the other hand are not quite necessary since it is the responsibility of the search engine indexers to determine the nature and the relevancy of the document. For the purposes of accuracy, they can't rely on what the document claims it to be. There comes a transition on the Web which provides this sort of meta information about the document. Today, the results gained from meta keywords are negligible. See below some examples of well written meta tags;

<meta name="description" content="Suppliers of quality office furniture and accessories at discount prices.">
<meta name="keywords" content="furniture, office, store, shop, retail, discount">

#3.3.3 Page heading (one <h1> per page)

A proper structured document will consist of headings, paragraphs, lists, tables, and forms, and use an external stylesheet to style them. Many search engines place more emphasis on text within heading tags (and not just on keywords provided in meta elements), so make sure they use keywords. Use one <h1> tag per page with the most important keywords. You can also use other head tags ( <h2>, <h3> etc.) to provide variations and support the main heading.

Some example of tags are;

<h1>Tables</h1>
<h2>Round tables</h2>
<p>... information about round tables ...</p>
<h2>Square Desks</h2>
<p>... information about square desks, etc.</p>

#3.3.4 Body text

Make sure the text of your web pages contain keywords and common phrases which people might search for. Be careful with the frequency of your keywords - you want to have them occur at least a few times if possible, but don't repeat yourself so much that the copy becomes unnatural. The idea is to discretely spread keywords around without making it obvious.

A well written document will naturally use keywords that are appropriate and in proportion. Search engine algorithms essentially compare similar documents to get a better understanding of the nature of the document. If a document is not well written and gives off-balanced scores then it will raise flags and possibly mark it as not relevant as it indicates a document that is written for the machine and not for the human reader. Keep in mind that indexing is in place to assist human searches. An example of good body text could be like;

[p]Buy office furniture at affordable prices from any of our retail stores.[/p]

#3.3.5 Images and Pictures

When pictures, that are not part of the page template, are used they should always include an ALT description. This description should either be automated or editable (This is partly already a requirement of the Americans with disabilities act).

#3.4 Automation

The title element and the meta description and keywords need to be automatically generated according to different templates. These templates will include page- and directory specific elements as well as generic elements. An example of a template for the title element for a page called Search results page could be:

[Results] - [category] - Search results – My furniture example.com
  • Different elements that could be included are
  • Results = Search results pages (New, Old, All)
  • Category name = Such as Wood tables, Wood chairs, Metal chairs
  • Area = Can represents location of entity.
  • Page number in a Search results, if applicable
  • The category name or area might not be in basic form – different grammatical forms might be needed.
In the title, meta information and headings the keywords or key phrases are added as is or in another grammatical form but when automating (URL rewriting) the URL, it may need some encoding if other language has been used:
  • Non-ASCII keywords (and phrases) included in URLs need to be encoded in hex values (maybe using PHP function like urlencode()) like:
  • www.example.com/product/table/એપલ => www.example.com/product/table/%E0%AA%8F%E0%AA%AA%E0%AA%B2
#4 Index page

Index page of your website is the most likely to get the highest number of inbound links since it is entry point of your website. Hence linking other pages of website from this page becomes very important. By theory this page should host almost links of all pages that starts from here.

However number of links in such page should be around 100, in many projects it may not be possible to display all links. In such cases most important links should be made visible from here. And remaining pages could be linked from there because our purpose is to chain all important pages to be get indexed.

To make this working it becomes important to identify those important links. For example if you are selling something then this home page can have link of those pages that display list of items per category of products. Similarly if they are bound to certain geographical location and if you website displays list of selling items per province/city/area then links to those pages could be placed on this page.

#5 Search pages

Search pages whether simple or extended, may not be indexed as they are not containing, be default, any information to be searched for.

However for usability point of view, their URLs, page design and on-page information should be properly designed and implemented.

#6 Listing pages

Listing pages are the 2nd most important pages for any website as they display information about entities for which website is created. Listings entities can include various types of stuffs ranging from selling items, ads, jobs etc.

Such listing may contain pagination and sorting links depending upon results and interest of users. It is recommended to keep pagination links in text mode so that search engine can crawl through all available pages and can index those pages. However sorting links may be implemented using JS (Ajax) etc. so that additional query to server can be minimized. From search engine point of view, it doesn't matter in what order information displays.

If possible, URL scheme of such pages can be made self-informative. For example for furniture selling website URLs can be designed like below; 

www.myfurniture.com/tables/round
www.myfurniture.com/tables/square
www.myfurniture.com/tables/plastic
www.myfurniture.com/chairs/rocking
www.myfurniture.com/chairs/revolving

Text appearing on such pages should be as informative as possible and number of entity per list should be kept around 10 to 30 entities. Listing pages may also contain links to other important pages which are to be indexed.

#7 View pages

View entity pages shows detailed information about entities listed in listing pages. Title, Meta description, Meta keywords, H1 tag should contain information about entity that is expected to be viewed.

#8 Other pages

Other pages may include pages like Login pages, Posting/Editing entity pages etc.

#8.1 Login pages

Such pages should not get indexed as they don't contain any public searchable information.

#8.2 Posting/Editing entity pages

Any page that contains forms to be submitted are not normally indexed as they don't display any searchable information to general public.

General rule of thumb is that those pages which changes stat of the server (like data is inserted/updated, file is created/delete etc.) or those pages which are personal to users are not indexed as they are tightly integrated with data of the website.

#9 Resources

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769
http://help.yahoo.com/l/us/yahoo/search/basics/basics-18.html