Pages

9 Jun 2011

Video uploading guide

#1 Introduction

This document provides information about setting up video uploading and streaming for PHP based websites. This guide has been prepared by studying various resources from Internet hence this is tried and tested and almost de-facto standard in video uploading, processing and streaming.

#2 Video uploading stages

There are 3 stages in video uploading viz. Uploading video, Processing it for streaming and Streaming.

#2.1 Uploading video

Generally video files are large in terms of file size hence separate page/interface is designed to handle long uploading process. By this interface, user can send their video to your server for streaming attached to ad.

#2.2 Processing video

Processing video involves activities such as creating video thumbnails (for promotion, preview etc.), converting video formats suitable for various browsers, extracting meta-data from video for various purposes.

#2.3 Streaming video

In 3rd stage, converted videos are streamed through flash player or by browser's built in media players supporting those video types.

#3 Implementation guidelines

This guidelines mainly emphasizes on set up of web server because it is the most important part and 99% remain same for most of video uploading and streaming purpose; processing and streaming is less critical since it varies from project to project.

#3.1 Uploading video

To upload various types of videos, we first need to set up webserver so that it can accept video files. It is also better to have separate machine for video uploading, processing and streaming so that website which used those videos will not share load given by video related operations as such operations heavily consumes memory and CPU.

In this article I have decided to use Lighttpd 1.5 as video uploading and streaming server mainly for 2 reasons:
  1. it is specially designed to serve static contents,
  2. it has such modules/plugins which provides information about uploading progress directly to caller script which is very convenient to developers to design interface with minium coding.
There are 2 alternating solutions also viz. Apache + apache-upload-progress-module and Nginx + nginx-upload-progress-module & nginx-upload-module. However there is not much feedback available about these 2 solutions, hence I decided not to use them and sticked to lighttpd since it is popular and trusted.

#3.1.1 Installing and configuring Lighttpd

For rpm based distributions, use following command to install lighttpd server related packages:

yum install pcre-devel glib2-devel zlib-devel openssl-devel spwan-fcgi php php-cli

Lighttpd 1.5 is not yet available in any yum repository hence we have to compile and configure it manually as shown below:
  • Download lighttpd.
cd /tmp/
wget http://download.lighttpd.net/lighttpd/snapshots-1.5/lighttpd-1.5.0-r2698.tar.gz
tar -zxvf lighttpd-1.5.0-r2698.tar.gz
cd lighttpd-1.5.0
  • Configure and install
./configure --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --with-pcre

make
make install
  • Add necessary user/group, directories and files
adduser -m -d /var/www -s /sbin/nologin lighttpd
mkdir /etc/lighttpd/
mkdir -p /www/logs/
mkdir -p /web/pages/
chown lighttpd:lighttpd /var/log/lighttpd
cp doc/lighttpd.conf /etc/lighttpd/
  • Make changes as per your setup by editing “/etc/lighttpd/lighttpd.conf” file
server.modules = ("mod_rewrite",                  "mod_access",
                  "mod_status",
                  "mod_uploadprogress",
                  "mod_proxy_core",
                  "mod_proxy_backend_fastcgi"
                  "mod_accesslog"
                )

server.max-request-size = 150000  // to support approx 120/150 MB of file.
upload-progress.progress-url = "/progress"
upload-progress.remove-timeout = 10

#### mod-proxy-core module
## read mod-proxy-core.txt for more info
## for PHP don't forget to set cgi.fix_pathinfo = 1 in the php.ini
$PHYSICAL["existing-path"] =~ "\.php$" {
  proxy-core.balancer = "round-robin"
  proxy-core.allow-x-sendfile = "enable"
  proxy-core.protocol = "fastcgi"
  proxy-core.backends = ( "unix:/tmp/php-fastcgi.sock" )
  proxy-core.max-pool-size = 16
}

# setup of host specific to video upload.
$HTTP["host"] =~ "video.myproject.com" {
  server.document-root = "/web/video.myproject.com"
  server.errorlog = "/web/logs/video.myproject.com_error.log"
  #accesslog.filename = "/web/logs/video.myproject.com_access.log"
  server.error-handler-404 = "http://www.myproject.com"

  $HTTP["url"]  =~ "^/upload" {
    proxy-core.balancer = "round-robin"
    proxy-core.protocol = "fastcgi"
    proxy-core.allow-x-sendfile = "enable"
    proxy-core.backends = (
      "unix:/tmp/upload_socket_1.sock",
      "unix:/tmp/upload_socket_2.sock",
      #"unix:/tmp/upload_socket_N.sock",
    )
    proxy-core.max-pool-size = 2  # as per backend.
  }
}

In above setup, what we are doing is that when video is uploaded to URI upload, we are proxying request to more than 1 socket using fastcgi protocol so that we can handle 2 to N uploads at a time on dedicated unix sockets. We do not need to worry about which socket is to be used and which is not, since webserver handles it on own. You can create more than 2 sockets also to handle more concurrent video uploads.

Here PHP script upload will contain code to move/copy video file at desired location making it available for further processing. This script will be normal PHP CGI script containing valid PHP code. Please note that for copying/renaming etc. you need file name so it is better to pass it from website as hidden variable of the form so that this script can rename video by that name.

Please note that for URI other than upload, dedicated php-fastcgi.sock will be used. Also do not forget to rotate 404 error log :)
  • Verify installation by running following command
lighttpd -t -f /etc/lighttpd/lighttpd.conf
  • Create init.d file “/etc/init.d/lighttpd” as shown below
#!/bin/sh
#
# lighttpd     Startup script for the lighttpd server
#
# chkconfig: - 85 15
# description: Lightning fast webserver with light system requirements
#
# processname: lighttpd
# config: /etc/lighttpd/lighttpd.conf
# config: /etc/sysconfig/lighttpd
# pidfile: /var/run/lighttpd.pid
#
# Note: pidfile is assumed to be created
# by lighttpd (config: server.pid-file).

# Source function library
. /etc/rc.d/init.d/functions

if [ -f /etc/sysconfig/lighttpd ]; then
  . /etc/sysconfig/lighttpd
fi

if [ -z "$LIGHTTPD_CONF_PATH" ]; then
  LIGHTTPD_CONF_PATH="/etc/lighttpd/lighttpd.conf"
fi

prog="lighttpd"
lighttpd="/usr/sbin/lighttpd"
RETVAL=0

start() {
  echo -n $"Starting $prog: "
  daemon $lighttpd -f $LIGHTTPD_CONF_PATH
  RETVAL=$?
  echo
  [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog
  /usr/bin/spawn-fcgi -s /tmp/php-fastcgi.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/spawn-fcgi.pid
  /usr/bin/spawn-fcgi -s /tmp/upload_socket_1.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_1.pid
  /usr/bin/spawn-fcgi -s /tmp/upload_socket_2.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_2.pid
  # /usr/bin/spawn-fcgi -s /tmp/upload_socket_N.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -P /var/run/upload_socket_N.pid
  return $RETVAL
}

stop() {
  echo -n $"Stopping $prog: "
  killproc $lighttpd
  killproc php-cgi
  RETVAL=$?
  echo
  [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$prog /tmp/php-fastcgi.sock
/var/run/spawn-fcgi.pid /tmp/upload_socket_1.sock /var/run/upload_socket_1.pid
/tmp/upload_socket_2.sock /var/run/upload_socket_2.pid
  return $RETVAL
}

reload() {
  echo -n $"Reloading $prog: "
  killproc $lighttpd -HUP
  RETVAL=$?
  echo
  return $RETVAL
}

case "$1" in
  start)
    start
    ;;
  stop)
    stop
    ;;
  restart)
    stop
    start
    ;;
  condrestart)
    if [ -f /var/lock/subsys/$prog ]; then
      stop
      start
    fi
    ;;
  reload)
    reload
    ;;
  status)
    status $lighttpd
    RETVAL=$?
    ;;
  *)
  echo $"Usage: $0 {start|stop|restart|condrestart|reload|status}"
  RETVAL=1
esac

exit $RETVAL

In above init.d file, I have merged creation of spawn-fcgi process along with lighttpd process because wihout “spawn-fcgi” process your PHP script can not receive data from webserver.
  • Start lighttpd service
chmod +x /etc/init.d/lighttpd
/etc/init.d/lighttpd start

#3.1.2 Creating interface to upload videos

To create interface on your website, follow this best example. It explains how to create HTML form, Jquery based JS code and some basic stylsheets. Please do not forget to validate video file name by extension. If you face cross site domain issue, follow this native example using iframe.

#3.2 Processing video

Once video is copies/moved at desired location, it needs to be processed by a script for various purposes. These operations are like converting video; extract metadata; creating thumbnails; etc. for streaming purpose.

This should be done by separate process script. Let's call it as “process.php” script. But we also need various other software for processing. Install them on video server, using following command:

yum install ffmpeg, flvtool2, compat-readline5, php-gd php-devel libaio-devel

This process.php script will be set into crontab and should run every 1 minute so that newly uploaded videos can be processed as fast as possible.

#3.2.1 Converting video

User might have uploaded videos from any source, so there is no guarantee that it can be played in any browser since all browsers do not support all codecs. Hence we must convert uploaded video into desired format. We decided to use flash format.

Run following command from your php script to convert video into FLV format:

ffmpeg -i INPUT_VIDEO -ar 22050 -ab 32 -ac 1 -f flv -b 700k -r 15 -s ASPECT_RATIO - 2>/dev/null | flvtool2 -U stdin OUTPUT_VIDEO.flv > /dev/null

In above command, we are combining use of flvtool2 to embed keyframe markers for streaming. In your script, you will require to adjust ASPECT_RATIO.

#3.2.2 Creating thumbnails

To extract thumbnail from video file, following command can be used:

ffmpeg -itsoffset -4 -i VIDEO_FILE -vcodec CODEC -vframes 1 -an -f rawvideo -s 320x240 OUTPUT.jpg

This command generates a 320×240 sized JPG thumbnail at the 4th second in the video. You can use this example to randomly create thumbnails according to length of the video.

#3.2.3 Extract metadata

To extract metadata, following command can be used:

ffmpeg -i INPUT_VIDEO

It will print lot of metadata about video in text format which can be stored in database or used while streaming video.

#3.3 Streaming video

Streaming video requires support from webserver, JS, Flash player and some HTML work.

#3.3.1 Preparing server

Lighttpd server has built in streaming support to stream video files. To support streaming using keyframes, enable required module in server “/etc/lighttpd/lighttpd.conf” configuration file:

server.modules += ( "mod_flv_streaming" )
flv-streaming.extensions = ( ".flv" )

Restart web service to reflect above changes. Now server is ready to stream video files in flv format with support of keyframes.

#3.3.2 Streaming through HTML5

There are 2 ways to stream video. They are either using HTML5's native “video” tag or using Flash player as container.

Streaming video through HTML5 is as easy as showing image in browser, but unfortunately not all browsers support HTML5 because their support started to arrive in latest browser only in beginning of year 2011. Moreover even if browser supports HTML5, not all browsers supports all codec (another round of browser war) hence if user uploads video with H.264 codec, it will not be played in firefox and chrome browsers. Similarly if video is encoded using Theora codec then it will not be played in IE. More information about this situation can be found from here.

However if still it is decided to use HTML5 then following HTML tag can be used:

<video src="movie.mpeg" controls="controls">
Fallback flash player based video streaming code.
</video>

That's it, by this way any video file can be played without any JS/HTML code if browser supports video file's codec natively.

#3.3.3 Streaming through Flash player

Unfortunately standard solution is to use Flash player based video streaming method which streams video inside flash container. That is why earlier we had to convert video into “flv” format :). Because flash player natively supports almost all codec.

To stream video using flash container, follow this excellent tutorial.

#4 Improvements
  1. In this article, I have not discussed about realtime video format validation to prevent users from uploading junks.
  2. Since video server mostly serves video files and occasionally JS and HTML, you should deny access for other files than these.
  3. When you will require more features for video processing, streaming you will need  to use wrapper classes like phpvideotoolkit and native ffmpeg-php extension.
#5 Resources

http://flowplayer.org/plugins/streaming/pseudostreaming.html
http://en.wikipedia.org/wiki/Flash_Video#Format_details
http://praegnanz.de/html5video/
http://uakino.net/media/document/1009.pdf
http://diveintohtml5.info/video.html

1 Feb 2010

Quality Assurance guidelines

#1 Introduction

This document provides detailed information about Quality Assurance plan for web based software. Intended audience of this document consists of developers, testing team, project manager, stack-holders and clients.

#2 Quality objectives

To achieve high quality of code and functionality to it's customers, team can design a detailed QA plan to be followed throughout development life cycle and post-development life span. To achieve this aim, following objectives should be  decided by development team.
  • No untested code should be shipped to clients.
  • Test criteria must cover at least 75% of code for testing procedure.
  • Whatever has been tested is documented for various purposes.
  • Each functionality of the software must be tested according to requirement specifications and must run in most popular browsers and platforms in similar way.
#3 Management

#3.1 Team

Quality Assurance team primarily involves Test manager, Developers and Dedicated & Ad-hoc testers. It directly reports to Project manager and/or Development manager.

#3.2 Tasks and Responsibilities

QA team performs following tasks.
  1. Doing unit testing;
  2. Doing functionality testing;
  3. Doing usability testing;
  4. Managing testing documentation;
  5. Managing testing guidelines;
  6. Reviewing test cases;
  7. Doing continuous testing for core code;
  8. Providing training to new members;
Breakdown of above tasks by team members is as follows;
  • Developers do unit testing of code and make sure that it doesn't break even after making changes in it for new/updated features;
  • Dedicated and Ad-hoc testers then test those newly added/updated features in various browsers and tools to make sure that features behave according to requirement specifications;
  • Test manager checks created unit & functional tests and re-run them as and when needed. He also writes complex test cases to test code and features from various aspects like stress, load bearing, sanity etc.
  • Test manager is also responsible for preparing testing plans and guidelines for existing and new testers. Test manager also manages documentation of test cases and suits in repository for proof of quality control and easier access.
  • Development and Test manager together work for running unit tests continuously on test server for consistent quality check.
  • Training to new tester is usually provided by Test manager as and when needed during development life cycle.
Details about each task has been provided in following sections.

#4 Testing overview

Software testing is the process used to help identify the correctness, completeness, security, and quality of developed computer software. Testing is a process of technical investigation, performed on behalf of stakeholders, that is intended to reveal quality-related information about the product with respect to the context in which it is intended to operate.

#4.1 Unit testing

When it comes to source code, unit testing of code is performed in order to be sure that there is no problem at code level. In unit testing, unit is known as a part of code which can be a function, a method or a whole class. In short, unit testing tests code of software into various pieces. Hence successful unit testing shows that software is now ready for functional testing.

PHPUnit is a member of the xUnit family of testing frameworks and provides both a framework that makes the writing of tests easy as well as the functionality to easily run the tests and analyse their results.

#4.2 Functionality testing

Purpose of functionality testing is to test whole system (or software sometimes) using automatic and manual tools. Whole system may include PHP code, JS code, HTML etc. In this type of testing, various components of software are tested together. Software like selenium is used to perform functionality testing.

Selenium IDE is an integrated development environment for Selenium tests. It is implemented as a Firefox extension, and allows you to record, edit, and debug tests.

#4.3 User (Usability) testing

Usability testing is for measuring how well people can use some human-made object (such as a web page, a computer interface, a document, or a device) for its intended purpose, i.e. usability testing measures the usability of the object. Usability testing focuses on a particular object or a small set of objects, whereas general human-computer interaction studies attempt to formulate universal principles. Software like ab and mysqlprofile can be used to perform usability testing.

#5 Deliverables

QA deliverables include testing guidelines, manual and automated test cases, statistical and coverage reports and overall documentation related to QA. These are well organized and kept in version control for concerned project's application.

#6 Testing Guidelines

Testing guidelines and goals help achieve highest standards in quality control. They have been prepared to assist developers and testers to write effective test cases and to document them for various purposes.

#6.1 Unit testing

To read unit testing related guidelines and principles, read section #4 in this document.

#6.2 Functionality testing

To read functional testing related guidelines and principles, read section #4 in this document.

#7 Review process

Review of test cases is checking test cases by test colleagues and/or test manager in order to make sure that those are correct or any misunderstandings in the requirements or anything missed out or not. Since review is a continuous process, test manager regularly checks previously designed test cases and re-run them whenever needed.
  1. Review process comprises following things:
  2. Completeness- Do test cases contain enough conditions of code and features to be tested?
  3. Accuracy- Are test cases accurately defined and logical? Are there missing elements?
  4. Testable- Are test cases testable?
#8 Continuous testing

Continuous testing uses excess cycles on a workstation to continuously run regression tests in the background, providing rapid feedback about test failures as source code is edited. It reduces the time and energy required to keep code well-tested, and prevents regression errors from persisting uncaught for long periods of time.

For more information about continuous testing, please refer this document.

#9 Training

Training is important part of any department during software development life cycle. Test manager regularly provides training to QA team to improve testing procedure and to make it as effective as possible.

#10 References

http://phing.info/trac/
 

17 Jul 2009

PHP project build system

#1 Pre-requisites

Before reading this document, it is required to read unit testing guidelines, which is part of this package.

#2 PHPUnit, CVS and Phing

For a typical web based software, our main requirements are to automate several routine tasks like unit testing, updating latest data from version control, loading packages on different servers, cleaning up log files and performing other daily routine task to save manual work.

Hence here we will have brief look at how Phing can be used to automate some of the above mentioned tasks. But before that let's see what Phing is and how it works.

#2.1 What is Phing

Phing (PHing Is Not GNU make) is a project build system based on Apache Ant. In the context of PHP, where it is not required to build and compile sources, the intention of Phing is to ease the packaging, deployment, and testing of applications. For these tasks, Phing provides numerous out-of-the-box operation modules ("tasks") and an easy-to-use, object-oriented model for adding our own custom tasks.
Phing can be installed using the PEAR Installer, as shown in the following command line:

$ pear channel-discover pear.phing.info
$ pear install phing/phing

For other modes of installation and more details read Phing documentation.
Note: Please do not forget to install other dependencies also that Phing might ask during it's installation.

#2.2 How it works

Phing uses XML buildfiles that contain a description of the things to do. The buildfile is structured into targets (groups of task/s) that contain the actual commands to perform (e.g. commands to copy a file, delete a directory, perform a DB query, etc.). So, to use Phing, we would first write our buildfile and then run phing specifying the target in buildfile that we want to execute.

$ phing -f [mybuildfile.xml] [mytarget]

By default Phing will look for a buildfile named build.xml (so you don't have to specify the buildfile name unless it is not build.xml) and if no target is specified Phing will try to execute the default target, as specified in the tag.

A valid Phing buildfile has the following basic structure:
  1. The document prolog,
  2. Exactly one root element called ,
  3. Several Phing type elements (i.e. , , etc.),
  4. One or more elements containing built-in or user defined Phing task elements (i.e. , , etc).
It is beyond the scope of this document, to explain above structure. Hence please read phing documentation.

#2.3 Phing and automated testing

Phing has several built in tasks to perform various operations. Some of them, for our purpose, are exec, delete, mkdir, coverage-setup, phpunit, phpunitreport and coverage-report.
  1. exec task can be used to execute any command of OS environment and applications;
  2. delete can be used to remove files/folders;
  3. mkdir to make new directories;
  4. phpunit to run unit tests created by developers (as described in this document);
  5. phpunitreport to generate report about unit tests;
  6. coverage-setup and coverage-report tasks are for providing detailed report of code-coverage analysis.
#2.5 Agile documentation

Continuous testing using Phing is performed everyday on test server and 2 types of documentation is generated at following locations for every project;
  1. tests/unittests/reports/index.html and
  2. tests/unittests/reports/coverage/index.html (for reporting purpose)
Since this document is generated anytime on the fly and gets updated after every run, it doesn't require to add them into version control.

#2.6 Guidelines about using Phing
  1. Phing has vast application areas hence in beginning it is used for limited tasks only like updating local repository from CVS, running unit tests, code analysis, packaging and unpacking files etc. Later it will be expanded to handle more complex tasks.
  2. Build management system such as Phing, should not be used by individual developers as it affects whole system. Hence it's care is taken by dedicated test/project/development manager.
  3. Phing can be extended to create custom tasks also. But that requires extensive study of it's existing functionality.
#3 Links

http://phing.info/docs/guide/current/
http://www.phpunit.de/manual/3.3/en/build-automation.html#build-automation.phing

26 May 2009

Unit testing guidelines

#1 Introduction

This document provides detailed information about how to unit test code of PHP based software. Intended audience of this document consists of developers, testing team, project manager and stack-holders.

#2 What is unit testing

Unit testing is a software verification and validation method in which a programmer tests that individual units of source code are fit for use. A unit is the smallest testable part of an application. In procedural programming a unit may be an individual program, function, procedure, etc., while in object-oriented programming, the smallest unit is a class, which may belong to a base/super class, abstract class or derived/child class.

The goal of unit testing is to isolate each part of the program and to show that the individual parts are correct. Unit testing provides a strict, written contract that the piece of code must satisfy. As a result, it affords several benefits. In this article we will follow PHPUnit as a unit testing software.

#2.1 Advantages of unit testing

#2.1.1 Facilitates change

Unit testing allows the programmer to refactor code at a later date, and make sure the module still works correctly (i.e. regression testing). This provides the benefit of encouraging programmers to make changes to the code since it is easy for the programmer to check if the piece is still working properly. Good unit test design produces test cases that cover all paths through the unit with attention paid to loop conditions. In continuous unit testing environments, through the inherent practice of sustained maintenance, unit tests will continue to accurately reflect the intended use of the executable and code in the face of any change. Depending upon established development practices and unit test coverage, up-to-the-second accuracy can be maintained.

#2.1.2 Simplifies integration

Unit testing helps eliminate uncertainty in the units themselves and can be used in a bottom-up testing style approach. By testing the parts of a program first and then testing the sum of its parts, integration testing becomes much easier.

A heavily debated matter exists in assessing the need to perform manual integration testing. While an elaborate hierarchy of unit tests may seem to have achieved integration testing, this presents a false sense of confidence since integration testing evaluates many other objectives that can only be proved through the human factor. Some argue that given a sufficient variety of test automation systems, integration testing by a human test group is unnecessary. Realistically, the actual need will ultimately depend upon the characteristics of the product being developed and its intended uses.

#2.1.3 Documentation

Unit testing provides a sort of "living document". Clients and other developers looking to learn how to use the module can look at the unit tests to determine how to use the module to fit their needs and gain a basic understanding of the API.

Unit test cases embody characteristics that are critical to the success of the unit. These characteristics can indicate appropriate/inappropriate use of a unit as well as negative behaviors that are to be trapped by the unit. A unit test case, in and of itself, documents these critical characteristics, although many software development environments do not rely solely upon code to document the product in development.

Ordinary documentation, on the other hand, is more susceptible to drifting from the implementation of the program and will thus become outdated (e.g. design changes, feature creep, relaxed practices to keep documents up to date).

#2.1.4 Separation of interface from implementation

Because some classes may have references to other classes, testing a class can frequently spill over into testing another class. A common example of this is classes that depend on a database: in order to test the class, the tester often writes code that interacts with the database. This is a mistake, because a unit test should never go outside of its own class boundary. As a result, the software developer abstracts an interface around the database connection, and then implements that interface with their own mock object. This results in loosely coupled code, minimizing dependencies in the system.

#2.2 Limitations of unit testing

Unit testing will not catch every error in the program. By definition, it only tests the functionality of the units themselves. Therefore, it will not catch integration errors, performance problems or any other system-wide issues. In addition, it may not be easy to anticipate all special cases of input the program unit under study may receive in reality. Unit testing is only effective if it is used in conjunction with other software testing activities.

It is unrealistic to test all possible input combinations for any non-trivial piece of software. A unit test can only show the presence of errors; it cannot show the absence of errors. Though these two limitations apply to any form of software test.

#3 PHPUnit

PHPUnit is an excellent unit testing software for PHP programming language based softwares, which is derived from JUnit software (in Java technology). To learn PHPUnit, well understanding of OOP is must.

Testing with PHPUnit is not a totally different activity from what you should already be doing. It is just a different way of doing it. The difference is between testing, that is, checking that your program behaves as expected, and performing a battery of tests, runnable code-fragments that automatically test the correctness of parts (units) of the software. These runnable code-fragments are called unit tests.

It is to be clearly kept in mind that unit testing should be performed as soon as coding is finished, and not after days and month. When software's code gets changed, corresponding tests should also get changed and tested.

#3.1 How PHPUnit works

PHPUnit API has several built in methods to perform various kind of testing on your given code. Developer first write certain code according to requirement of software. Then he will need to create certain test cases (or suits) using PHPUnit's built in classes and finally to test those test cases (or suits) to be sure that designed code works properly.
  1. Test cases (or suits) are nothing but PHP scripts that test code of software using PHPUnit's built in tests. Hence there are 3 components in unit testing:
  2. Software's code
  3. PHPUnit's built in tests and
  4. Test cases (or suits).
#3.2 Installing PHPUnit

PHPUnit should be installed using the PEAR Installer. This installer is the backbone of PEAR, which provides a distribution system for PHP packages, and is shipped with every release of PHP since version 4.3.0.

The PEAR channel (pear.phpunit.de) that is used to distribute PHPUnit needs to be registered with the local PEAR environment:

$ pear channel-discover pear.phpunit.de

This has to be done only once. Now the PEAR Installer can be used to install packages from the PHPUnit channel: 

# pear install phpunit/PHPUnit


After the installation you can find the PHPUnit source files inside your local PEAR directory; the path is usually /usr/lib/php/PHPUnit.

Although using the PEAR Installer is the only supported way to install PHPUnit, you can install PHPUnit manually. For manual installation, do the following:
  • Download a release archive from http://pear.phpunit.de/get/ and extract it to a directory that is listed in the include_path of your php.ini configuration file.
  • Prepare the phpunit script:
Rename the phpunit.php script to phpunit.
Replace the @php_bin@ string in it with the path to your PHP command-line interpreter (usually /usr/bin/php).
Copy it to a directory that is in your path and make it executable (chmod +x phpunit).
  • Prepare the PHPUnit/Util/Fileloader.php script:
Replace the @php_bin@ string in it with the path to your PHP command-line interpreter (usually /usr/bin/php).
After successful installation of PHPUnit, you can use command phpunit /PATH/TO/TESTSCRIPT to run your tests. For all the available options of this command type phpunit -–help.

#3.3 How to learn and use PHPUnit

Road map of learning and using PHPUnit has been shown below.
  1. Get good understanding of OOP.
  2. Read whole documentation of PHPUnit software to understand what it is and how it works.
  3. Install PHPUnit to build up testing environment.
  4. Study code of all scripts of PHPUnit software even if at first time, you don't understand much.
  5. Having look at the testing examples provided by PHPUnit, understand how unit testing works.
  6. Now try to build your own test cases of software on which you are working.
  7. After having enough experience of building test cases, try to build test suit that can perform all test at once (which is the goal of the unit testing).
  8. Try to build more complex and fully automated test suits using PHPUnit's extensions and other utilities or by building your own custom extensions according to requirements of your software.
To create test cases in your live projects, you may require to reorganize your classes into packages and sub packages. For test cases, you may create new folder test/unittest in your project's main directory and then create test cases (scripts) according to packages and sub packages defined above.

#4 Guidelines and principles

Here are guidelines and principles of implementing unit testing to be followed by developer tram.

#4.1 How to write test cases 
  1. Best way to write test cases is to study an example given here and some more from PHPUnit documentation.
  2. 1st part of test script contains code to include necessary files/fixtures, while 2nd part contain test case class.
  3. In beginning develop simple tests for single method or collection of methods. After having certain experience, build more complex test cases. Finally go to suite level which provides fully automated testing environment.
  4. Create readable tests - write comments in Asserts and in UT. Write descriptive method names (even very long). Use local variables in Asserts. Use constants. UT should be readable like a book.
  5. Test cases must cover at least 75% of code. However this will be shown in reports generated by Phing when such tests are run from there. The danger of not implementing a unit test on every method is that the coverage may be incomplete. Just because we don't test every method explicitly doesn't mean that methods can get away with not being tested.
  6. You may require configuration parameters for database, paths and locations of other dependent software to run unit tests. For that there has been created a common file tests/unittests/config.inc.php which would be included in your test cases. Since it is common file or all tests, it is normally managed test/build manager. Hence if it is required to update that file, test manager must be consulted.
  7. Moreover this file should be included in your test cases as include_once or require_once constructs only otherwise there will be warnings/errors of duplication of code while running all tests together as suite from build system.
  8. If you find difficulties in creating/running tests cases for any of your class method etc., it means it requires regression that is to re-organize structure of class/class package to be able to test them in proper way.
  9. Since private methods of class can not be tested directly, it's corresponding public method should be tested. However every private method always has a parent public method which uses that private method. For more information visit Links sections.
  10. The concepts that you may need to be consider while building test cases (or suits) are Test-first programming, Code-coverage analysis, Refactoring, Incomplete tests, Agile documentation, Debugging Tests etc.
#4.2 How to organize and run them
  1. Developers should maintain hierarchy of classes as well as their equivalent test cases in same way. So for example if class file is grouped as Validate/Numeric.php then it's corresponding test case should also be grouped as Validate/NumericTest.php inside folder tests/unittests. Moreover name of test case class should also follow name of class that is to be tested. Same is true for phpDocumentator tags in test script. That is; tests scripts must be maintained as any other code- like Keep It Simple, Stupid (KISS)
  2. Fixtures (such as images, data files etc.) of test cases should be kept in same folder where test case/s reside/s. However for larger contents, they can be put in separate folder.
  3. All test cases are to be run from project's base directory only as shown in earlier example. This is necessary because it is required to generate agile documentation of unit tests cases and their classes.
#4.3 Testing instructions
  1. Test cases should not be designed in such a way that they require specific order of execution; that is they must run independent of other implementation.
  2. Test cases like to test Mail system, Database, File-system etc. requires additional setup apart from test script. Instructions about such setup and dependencies should clearly be mentioned in text file (like README.txt) inside corresponding test case's package. Such instructions can also contain information about how to run tests cases, other prerequisites, dependencies etc. wherever applicable.
  3. While overall instructions are to be kept at tests/unittests/README.txt file only.
#4.4 Scope of unit testing
  1. The crucial issue in constructing a unit test is scope. If the scope is too narrow, then the tests will be trivial and the objects might pass the tests, but there will be no design of their interactions.
  2. Likewise, if the scope is too broad, then there is a high chance that not every component of the new code will get tested. The programmer is then reduced to testing-by-poking-around, which is not an effective test strategy.
  3. It is recommended to write test cases for logical part only i.e static code/data need not to get tested. For complex test cases such as for Database, Mailing etc., it may require certain environmental settings. For that, concerned higher positioned person can be contacted for availability of it.
Bottom line of unit testing is that: it all depends upon the test case that you create. The better the cases, that covers all possible areas to be tested, the more worth the test is. If your test is poor, you wont yield advantages of unit testing.

#4.5 What tests should be written

While doing unit testing, it is important to know that what kind of tests should be done. There has been provided some insight about type of some tests.
  1. All positive test: This set of tests ensures that everything works as expected.
  2. All failure test: Use these tests on a one-by-one basis to ensure that every failure or exception case works.
  3. Positive sequence tests: This set of tests ensures that calls in the correct order work as expected.
  4. Negative sequence tests: This set of tests ensures that when calls are made out of order, they fail.
  5. Load tests: When appropriate, you can perform a small set of tests to determine that the performance of those tests is within an expected range. For example; 500 mails should be sent within 10-15 seconds.
  6. Resource tests: These tests ensure that the application program interface (API) properly allocates and frees resources-- for example, opening, writing, and closing a file-based API several times in a row to ensure that no files remain open.
  7. Callback tests: For APIs that have callback methods, these tests ensure that the code runs properly if callbacks are not defined. In addition, these tests ensure that the code runs properly when callbacks are defined but behave inappropriately or generate exceptions.
First 4 types of tests are must for every test-case. Rest can be implemented according to requirements in code.

#4.6 Unit testing in the context of team
  1. Developers should do unit testing of their developed libraries and dependent libraries only, i.e they don't require to do unit testing of whole library since other code might have other requirement/dependency about which they are not aware of. That means test-suite should be run by Test/Project/Development manager only.
  2. Above positioned persons can use build tools like phing to automate unit testing process, whenever there are changes in code, without manually running every time. Procedure of using Phing in unit testing and other tasks has been mentioned in this document.
  3. No untested code should be committed to version control, that is each package should have corresponding test cases.
#4.7 Re-usability of test cases/data
  1. Although it may seem like a good idea to throw random data at an interface, try to avoid it because the data is hard to debug. If data is generated randomly on each invocation, you may get an error on one pass that you don't get on another. If your test requires random data, generate the data in a file, then use that file on every run. In this way, you can have noisy data, but still be able to debug errors.
  2. In unit tests since each test case is tightly attached to particular piece of code, it is difficult to write re-usable test cases. However whenever possible, test cases/data should be made re-usable and hence should be kept in test library in order to re-use them for other projects.
  3. To create such library, Version control system can be used to store them in organized way under namespace like unit_test_library. And underneath that, there can be created several pages to describe nature of test case/data and how to use it in other projects. Similar practice can be used for functional test cases.

5 Feb 2009

Functional testing guidelines

#1 Introduction

This document provides detailed information about how to do functional testing of a web based software. Intended audience of this document consists of testing team, developers, project manager and stack-holders.

Please read document quality assurance guidelines before reading this document.

#2 What is Functional Testing?

Functional testing is a testing strategy, which needs little or no need of internal design or code etc. The types of testing under this strategy are totally based/focused on the testing for requirements and functionality of the work product/software application. Functionality testing (FT) is sometimes also called as Black Box testing, Opaque Testing, Behavioral Testing, Closed Box Testing etc.

The base of the FT strategy lies in the selection of appropriate data as per functionality and testing it against the functional specifications in order to check for normal and abnormal behavior of the software.

In order to implement FT Strategy, the tester is needed to be thorough with the requirement specifications of the system and as a user, should know, how the system should behave in response to the particular action.

FT works on TDD methodology, so It should be created once the testers exercised the code for completed stories (functionalities) to verify that functional requirements had been met. It should be synchronized with functionality of the project.

#2.1 Advantages
  1. Tester needs little or no knowledge of implementation, including specific programming languages.
  2. Tester and programmer are independent of each other.
  3. Tests are done from a user's point of view.
  4. Will help to expose any ambiguities or inconsistencies in the specifications.
  5. Test cases can be designed as soon as the specifications are complete.
#2.2 Disadvantages
  1. Only a small number of possible inputs can actually be tested, to test every possible input stream would take nearly forever.
  2. Without clear and concise functional requirement specifications, test cases are hard to design.
  3. There may be unnecessary repetition of test inputs if the tester is not informed of test cases the programmer has already tried.
  4. May leave many program paths untested.
  5. Cannot be directed toward specific segments of code which may be very complex (and therefore more error prone).
#3 Implementation

For web based software, test cases are first designed into HTML format for reporting, documentation, verification then their corresponding test cases are recorded and executed using selenium software.

Selenium is an open source tool to do automated functional tests for web-based applications. It is based on java scripting to a large extent. It’s simplicity and robustness makes it an excellent candidate for introducing automated functional testing in our project.

While Selenium-IDE operates as a Firefox add-on and provides an easy-to-use interface for developing and running individual test cases or entire test suites. It has a recording feature, which will keep account of user actions as they are performed and store them as a reusable script to play back. It also has a context menu (right-click) integrated with the Firefox browser, which allows the user to pick from a list of assertions and verifications for the selected location. Selenium-IDE also offers full editing of test cases for more precision and control.

Hence there will be 2 types of artifacts for functional test cases:
  1. Designed test cases in HTML format.
  2. Corresponding Selenium test case.
#4 Guidelines and principles

QA team should follow below guidelines and principles for designing, executing and documenting functional test cases.

#4.1 Designing test cases
  1. Test cases are directly mapped Functional Requirement Specification. Hence each FRS should have corresponding test case.
  2. Test cases should be written in enough detail that they could be given to a new team member who would be able to quickly start to carry out the tests and find defects.
  3. Test cases should be first designed in HTML format as shown in following table. That is; there have to be sections; Unique ID, Purpose, URL, Prerequisites (optional), Test data, Steps, Notes/Question (optional). After that each test case will be recorded and executed using Selenium IDE.
  4. The crucial issue in constructing functional test case is selection of range of input data. If range is too narrow, then the tests will be trivial and the objects might pass the tests.
#4.1.1 Test case format

UniqueTestCaseID: Test Case Title
Purpose:
Short sentence or two about the aspect of the system is being tested. If this gets too long, break the test case up or put more information into the feature descriptions.
URL:
URL where test is to be performed.
Prerequisites:
Assumptions that must be met before the test case can be run. e.g., logged in, guest login allowed, user testuser exists.
Test Data:
List of variables and their possible values used in the test case. You can list specific values or describe value ranges. The test case should be performed once for each combination of values. These values are written in set notation, one per line, like;

loginID = {Valid loginID, invalid loginID, valid email, invalid email, empty}
password = {valid, invalid, empty}
Steps:
Steps to carry out the test. See step formatting rules below.
  1. visit LoginPage
  2. enter userID
  3. enter password
  4. click login
  5. see the terms of use page
  6. click agree radio button at page bottom
  7. click submit button
  8. see PersonalPage
  9. verify that welcome message is correct username
Notes and Questions:
If any

Each step can be written very tersely using the following keywords:

login [as ROLE-OR-USER]

Log into the system with a given user or a user of the given type. Usually only stated explicitly when the test case depends on the permissions of a particular role or involves a workflow between different users.

visit LOCATION

Visit a page or screen. For web applications, LOCATION may be a hyperlink. The location should be a well-known starting point (e.g., the Login screen), drilling down to specific pages should be part of the test.

enter FIELD-NAME [as VALUE] [in SCREEN-LOCATION]

Fill in a named form field. VALUE can be a literal value or the name of a variable defined in the Test Data section. The FIELD-NAME itself can be a variable name when the UI field for that value is clear from context, e.g., enter password.

enter FIELDS

Fill in all fields in a form when their values are clear from context or when their specific values are not important in this test case.

click "LINK-LABEL" [in SCREEN-LOCATION]

Follow a labeled link or press a button. The screen location can be a predefined panel name or English phrase. Predefined panel names are based on GUI class names, master template names, or titles of boxes on the page.

click BUTTON-NAME [in SCREEN-LOCATION]

Press a named button. This step should always be followed by a see step to check the results.

see SCREEN-OR-PAGE

The tester should see the named GUI screen or web page. The general correctness of the page should be testable based on the feature description.

verify CONDITION

The tester should see that the condition has been satisfied. This type of step usually follows a see step at the end of the test case.

verify CONTENT [is VALUE]

The tester should see the named content on the current page, the correct values should be clear from the test data, or given explicitly. This type of step usually follows a see step at the end of the test case.

perform TEST-CASE-NAME

This is like a subroutine call. The tester should perform all the steps of the named test case and then continue on to the next step of this test case.

Every test case must include a verify step at the end so that the expected output is very clear. A test case can have multiple verify steps in the middle or at the end. Having multiple verify steps can be useful if you want a smaller number of long tests rather than a large number of short tests.

#4.2 Recording and Executing test cases

Once test cases are designed, they are be recorded and executed using Selenium. Selenium IDE is the tool to develop Selenium test cases which is an easy-to-use Firefox plug-in and is generally the most efficient way to develop test cases.

While recording and executing Selenium test cases, following checklist is to be kept in mind.
  • Links
  1. Check that the link takes you to the page it said it would.
  2. Ensure to have no orphan pages normally (a page that has no links to it).
  3. Are all referenced web sites or email addresses hyperlinked if not generated by image?
  4. Check all of your links to other websites.
  5. If we have removed some of the pages from our own site, set up a custom 404 page that redirects your visitors to your home page (or a search page) when the user try to access a page that no longer exists.
  6. Check all mailto links and whether it reaches properly.
  • Forms
  1. Acceptance of invalid input.
  2. Optional versus mandatory fields.
  3. Input longer than field allows.
  4. Default values on page load/reload.
  5. Is Command Button can be used for hyperLinks and Continue Links?
  6. Is all the data inside combo/list box are arranged in required order?
  7. Are all of the parts of a table or form present? Correctly laid out? Can you confirm that selected texts are in the right place?
  8. Does a scroll-bar appear if required? 
  • Data verification and validation
  1. Is the Privacy Policy clearly defined and available for user access?
  2. At no point of time the system should behave awkwardly when an invalid data is fed.
  3. Check to see what happens if a user deletes cookies while in site.
  4. Check to see what happens if a user deletes cookies after visiting a site.
  • Data integration
  1. Check the maximum field lengths to ensure that there are no truncated characters?
  2. If numeric fields accept negative values can these be stored correctly on the database and does it make sense for the field to accept negative numbers?
  3. If a particular set of data is saved to the database check that each value gets saved fully to the database. (i.e.) Beware of truncation (of strings) and rounding of numeric values.
  • Date field checks
  1. Assure that leap years are validated correctly & do not cause errors/miscalculations.
  2. Assure that Feb. 28, 29, 30 are validated correctly & do not cause errors/ miscalculations.
  • Numeric fields
  1. Assure that lowest and highest values are handled correctly.
  2. Assure that numeric fields with a blank in position 1 are processed or reported as an error.
  3. Assure that fields with a blank in the last position are processed or reported as an error an error.
  4. Assure that both + and - values are correctly processed.
  5. Assure that division by zero does not occur.
  6. Include value zero in all calculations.
  7. Assure that upper and lower values in ranges are handled correctly.
  8. Alphanumeric field checks
  9. Use blank and non-blank data.
  10. Include lowest and highest values.
  11. Include invalid characters & symbols.
  12. Include valid characters.
  13. Include data items with first position blank.
  14. Include data items with last position blank.
#4.3 Organizing test cases

While test cases are successfully designed, recorded and executed they should be stored on file-system to re-run them for regression testing, documentation, demonstration etc. There have been designed certain standards and conventions for that, as shown below;

#4.3.1 HTML test cases
  1. HTML test case should be named as ModuleAction.html. For example if module is User and action is Login then test case name would be UserLogin.html. Some more examples are UserAdd.html, ImageAttach.html etc.
  2. These test cases are to be organized into single test suite to view them at same time. For that name TestSuite.html is to be used. A test suite document is an organized table of contents for our test cases: it simply lists the names of all test cases that we intend to write. The suite can be organized in several ways. For example, we can list all the system components, and then list test cases under each. Or, we could list major product features, and then list test cases for each of those.
  3. Test suite is organized in a grid where the rows are types of business objects or module names and the columns are types of operations or actions. Each cell in the grid lists test cases that test one type of operation on one type of object. Each individual operations/action should also contain hyperlink to corresponding test case document. Format of TestSuite.html should be as follows:
Test cases by modules and actions or business objects and operations
Modules/Actions
BOs/Operations
add
list/browse
edit
delete
search
User
UserAdd
UserList
UserEdit
UserDelete
UserSearch
Ad
AdAdd
AdList
AdEdit
AdDelete
AdSearch
Club
ClubAdd
ClubList
ClubEdit
ClubDelete
ClubSearch
Element
ElementAdd
ElementList
ElementEdit
ElementDelete
ElementSearch

Once all these objects are created they should be stored at location; tests/functionaltests/html/ location for concerned project. Other artifacts like documents, instructions etc. also should be put at same location.

Since these objects are for internal purpose; they may not be publicly visible. However separate/sub domain can be created to view them from web if needed.

#4.3.2 Selenium test cases
  1. Selenium test case should also be named as ModuleAction.html. For example if module is User and action is Login then test case name would be UserLogin.html. Some more examples are UserAdd.html, ImageAttach.html etc.
  2. These test cases are to be organized into single test suite HTML file to re-run them at same time. For that name TestSuite.html is to be used.
  3. Once all these objects are created they should be stored at location; tests/functionaltests/selenium/ location for concerned project. Other artifacts like documents, instructions etc. also should be put at same location.
  4. Since these objects are for internal purpose; they may not be publicly visible. However separate/sub domain can be created to re-run them from web on other servers if needed.
#4.4 Scope of testing
  1. Minimum scope of overall testing procedure is to test all features in all popular browsers and platforms.
  2. Stress testing should be performed (try to overload the program with inputs to see where it reaches its maximum capacity), especially with real time systems.
  3. Crash testing should be performed to see what it takes to bring the system down.
  4. Other scope can be defined according to FRS when test plan is defined.
#4.5 What test should be written

While doing functional testing, it is important to know that what kind of tests should be done. There has been provided some insight about type of some tests.
  1. All positive tests: This set of tests ensures that everything works as expected.
  2. All failure tests: Use these tests on a one-by-one basis to ensure that every failure or exception case works.
  3. Positive sequence tests: This set of tests ensures that calls in the correct order work as expected.
  4. Negative sequence tests:This set of tests ensures that when calls are made out of order, they fail.
  5. Load tests: When appropriate, you can perform a small set of tests to determine that the performance of those tests is within an expected range. For example; while uploading image, how large file software is able to handle.
First 4 types of tests are must for every test case. Rest can be implemented according to FRS.

#4.6 Functional testing in the context of team

Functional testing team is also part of development team, hence they should be included in every meeting, discussion, planning etc. to make them aware about functionalities are to be developed.

Testing strategy described here is to be performed by dedicated testers only. However standard practice is that developers do functional testing immediately after they develop required functionality. Then testing related to that functionality can be given to dedicated testers to test it from user point of view.

FT team may not always have daily tasks of testing, hence in spare time they can also assist developers in developing new functionality if they wish.

#5 Links

9 Sept 2008

SEO guidelines

#1 About this document

This document aims to defines generic search engine optimization requirements for various projects.

At this moment this document contains general guidelines of SEO. In future, at the time of taking training session, this document will be expanded further in order to be used as perfect resource for almost all SEO requirements.

#2 General requirements

#2.1 Server location

The server should be located in same country from where it will be mostly accessed. Moreover If the service will have it’s own domain then it should reside on a dedicated server. Wildcard DNS should not be allowed as well as all sub domains, if any, should be activated separately.

#2.2 Robots.txt

The robots.txt file is to be placed in the root (value of DocumentRoot directive in case web server is Apache) directory of the software. It should allow the search engines to crawl all directories where information related to various entities will be shown.

Personal pages such as listing owner's entities, posting/editing entities that require login should be blocked. Most search engines now a days are able to find out this behavior hence you may omit such entries into robots.txt file.

#2.3 Encoding

If there are will be used special characters in language of website they will need to get encoded in URLs (maybe using PHP function like urlencode()) and Filenames using UTF-8 encoding. For full documentation of encoding of such characters, please visit http://www1.tip.nl/~t876506/utf8tbl.html). As practically all browsers supports the Unicode UTF-8 standard, it should not be important to encode the characters in the actual content. The suitable HTML entities can be taken from this address: http://leftlogic.com/lounge/articles/entity-lookup/ anyhow.

There should be a 301 redirect from any page with special characters in the URL where someone writes the URL using the special characters and not the encoded ones if that user has a browser that over writes the UTF-8 character set with some other character set. See how Wikipedia functions for an example. This prevents links with the wrong character set to be used on external pages.

#2.4 Header responses

#2.4.1 Page not found (404 error)

Entities that are removed from database/software should not be shown. When someone accesses the removed listings page the server should respond with a 404 header response (and not a 200 response) and show an error message (or optionally a separate page) saying that the entity is already deleted/expired/sold etc. Furthermore the relevant listing page should be shown.

#2.4.2 Redirects (301 error)

As a general rule of thumb all redirects should be done using the 301 permanently moved response. All sub domains should be redirected this way (example.com -> 301 -> www.example.com) and also all other domains that contain the same information, as shown below;

www.example.net -> 301 -> www.example.com
www.example.in -> 301 -> www.example.com

Assure also that only the specified URLs work and make a 301 redirect rule for all non-specified URL’s when called missing.

#3 General page requirements

#3.1 Using standards

The site should comply with the World Wide Web consortium’s (http://www.w3.org/) recommendations for creating web pages (XHTML 1.0 Transitional should be enough) and also comply with the Americans with disabilities act (http://www.ada.gov/) if required.

#3.2 Page design

The pages should be designed with CSS positioning and the content part of the page should appear in the source code as early as possible preferably before other body content such as navigational blocks.

The navigation should be implemented with anchor tags and text and the links should not redirect.

Breadcrumb navigation would increase SEO with internal back links and usability in a sense that the visitor would see their location on site. Example of the breadcrumb navigation: Home => List furniture items => View table => ...

Scripts and other elements (CSS) should be put in external files. The source code should be kept clean with little or no unused code. The preferred maximum file size for HTML code is 100 KB.

#3.3 Elements of a page

The following elements should always be included (and be editable somehow) on a page which is to be indexed by search engine:
  • Page title ([title]-element in the header)
  • Meta description, robots and keywords (in the header)
  • Page heading (one [h1] per page)
#3.3.1 Page title ([title]-element in the header)

A page title should be as specific and concise as possible with respect to the document. This will insure its uniqueness and click-through in Search Engine Result Pages. A structure similar to "Page name | Section name | Site title - Tagline" is encouraged for clarity, uniqueness and better usability for the visitor. Focus on delivering a title that spans from specific (closer to the beginning) to general keywords. The length of the title needs no more then 80 characters.

#3.3.2 Meta description, robots and keywords (in the header)

HTML meta description around 150 characters should be sufficient. Although it doesn't hurt to be a little more, this data should contain the most concise information about the document. The uniqueness of this information also plays a fair role as far as Search Engine Result Pages are concerned.

Meta keywords on the other hand are not quite necessary since it is the responsibility of the search engine indexers to determine the nature and the relevancy of the document. For the purposes of accuracy, they can't rely on what the document claims it to be. There comes a transition on the Web which provides this sort of meta information about the document. Today, the results gained from meta keywords are negligible. See below some examples of well written meta tags;

<meta name="description" content="Suppliers of quality office furniture and accessories at discount prices.">
<meta name="keywords" content="furniture, office, store, shop, retail, discount">

#3.3.3 Page heading (one <h1> per page)

A proper structured document will consist of headings, paragraphs, lists, tables, and forms, and use an external stylesheet to style them. Many search engines place more emphasis on text within heading tags (and not just on keywords provided in meta elements), so make sure they use keywords. Use one <h1> tag per page with the most important keywords. You can also use other head tags ( <h2>, <h3> etc.) to provide variations and support the main heading.

Some example of tags are;

<h1>Tables</h1>
<h2>Round tables</h2>
<p>... information about round tables ...</p>
<h2>Square Desks</h2>
<p>... information about square desks, etc.</p>

#3.3.4 Body text

Make sure the text of your web pages contain keywords and common phrases which people might search for. Be careful with the frequency of your keywords - you want to have them occur at least a few times if possible, but don't repeat yourself so much that the copy becomes unnatural. The idea is to discretely spread keywords around without making it obvious.

A well written document will naturally use keywords that are appropriate and in proportion. Search engine algorithms essentially compare similar documents to get a better understanding of the nature of the document. If a document is not well written and gives off-balanced scores then it will raise flags and possibly mark it as not relevant as it indicates a document that is written for the machine and not for the human reader. Keep in mind that indexing is in place to assist human searches. An example of good body text could be like;

[p]Buy office furniture at affordable prices from any of our retail stores.[/p]

#3.3.5 Images and Pictures

When pictures, that are not part of the page template, are used they should always include an ALT description. This description should either be automated or editable (This is partly already a requirement of the Americans with disabilities act).

#3.4 Automation

The title element and the meta description and keywords need to be automatically generated according to different templates. These templates will include page- and directory specific elements as well as generic elements. An example of a template for the title element for a page called Search results page could be:

[Results] - [category] - Search results – My furniture example.com
  • Different elements that could be included are
  • Results = Search results pages (New, Old, All)
  • Category name = Such as Wood tables, Wood chairs, Metal chairs
  • Area = Can represents location of entity.
  • Page number in a Search results, if applicable
  • The category name or area might not be in basic form – different grammatical forms might be needed.
In the title, meta information and headings the keywords or key phrases are added as is or in another grammatical form but when automating (URL rewriting) the URL, it may need some encoding if other language has been used:
  • Non-ASCII keywords (and phrases) included in URLs need to be encoded in hex values (maybe using PHP function like urlencode()) like:
  • www.example.com/product/table/એપલ => www.example.com/product/table/%E0%AA%8F%E0%AA%AA%E0%AA%B2
#4 Index page

Index page of your website is the most likely to get the highest number of inbound links since it is entry point of your website. Hence linking other pages of website from this page becomes very important. By theory this page should host almost links of all pages that starts from here.

However number of links in such page should be around 100, in many projects it may not be possible to display all links. In such cases most important links should be made visible from here. And remaining pages could be linked from there because our purpose is to chain all important pages to be get indexed.

To make this working it becomes important to identify those important links. For example if you are selling something then this home page can have link of those pages that display list of items per category of products. Similarly if they are bound to certain geographical location and if you website displays list of selling items per province/city/area then links to those pages could be placed on this page.

#5 Search pages

Search pages whether simple or extended, may not be indexed as they are not containing, be default, any information to be searched for.

However for usability point of view, their URLs, page design and on-page information should be properly designed and implemented.

#6 Listing pages

Listing pages are the 2nd most important pages for any website as they display information about entities for which website is created. Listings entities can include various types of stuffs ranging from selling items, ads, jobs etc.

Such listing may contain pagination and sorting links depending upon results and interest of users. It is recommended to keep pagination links in text mode so that search engine can crawl through all available pages and can index those pages. However sorting links may be implemented using JS (Ajax) etc. so that additional query to server can be minimized. From search engine point of view, it doesn't matter in what order information displays.

If possible, URL scheme of such pages can be made self-informative. For example for furniture selling website URLs can be designed like below; 

www.myfurniture.com/tables/round
www.myfurniture.com/tables/square
www.myfurniture.com/tables/plastic
www.myfurniture.com/chairs/rocking
www.myfurniture.com/chairs/revolving

Text appearing on such pages should be as informative as possible and number of entity per list should be kept around 10 to 30 entities. Listing pages may also contain links to other important pages which are to be indexed.

#7 View pages

View entity pages shows detailed information about entities listed in listing pages. Title, Meta description, Meta keywords, H1 tag should contain information about entity that is expected to be viewed.

#8 Other pages

Other pages may include pages like Login pages, Posting/Editing entity pages etc.

#8.1 Login pages

Such pages should not get indexed as they don't contain any public searchable information.

#8.2 Posting/Editing entity pages

Any page that contains forms to be submitted are not normally indexed as they don't display any searchable information to general public.

General rule of thumb is that those pages which changes stat of the server (like data is inserted/updated, file is created/delete etc.) or those pages which are personal to users are not indexed as they are tightly integrated with data of the website.

#9 Resources

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769
http://help.yahoo.com/l/us/yahoo/search/basics/basics-18.html