#1 About this document
This document aims to defines generic search engine optimization requirements for various projects.
At this moment this document contains general guidelines of SEO. In future, at the time of taking training session, this document will be expanded further in order to be used as perfect resource for almost all SEO requirements.
#2 General requirements
#2.1 Server location
The server should be located in same country from where it will be mostly accessed. Moreover If the service will have it’s own domain then it should reside on a dedicated server. Wildcard DNS should not be allowed as well as all sub domains, if any, should be activated separately.
#2.2 Robots.txt
The robots.txt file is to be placed in the root (value of DocumentRoot directive in case web server is Apache) directory of the software. It should allow the search engines to crawl all directories where information related to various entities will be shown.
Personal pages such as listing owner's entities, posting/editing entities that require login should be blocked. Most search engines now a days are able to find out this behavior hence you may omit such entries into robots.txt file.
#2.3 Encoding
If there are will be used special characters in language of website they will need to get encoded in URLs (maybe using PHP function like urlencode()) and Filenames using UTF-8 encoding. For full documentation of encoding of such characters, please visit http://www1.tip.nl/~t876506/utf8tbl.html). As practically all browsers supports the Unicode UTF-8 standard, it should not be important to encode the characters in the actual content. The suitable HTML entities can be taken from this address: http://leftlogic.com/lounge/articles/entity-lookup/ anyhow.
There should be a 301 redirect from any page with special characters in the URL where someone writes the URL using the special characters and not the encoded ones if that user has a browser that over writes the UTF-8 character set with some other character set. See how Wikipedia functions for an example. This prevents links with the wrong character set to be used on external pages.
#2.4 Header responses
#2.4.1 Page not found (404 error)
Entities that are removed from database/software should not be shown. When someone accesses the removed listings page the server should respond with a 404 header response (and not a 200 response) and show an error message (or optionally a separate page) saying that the entity is already deleted/expired/sold etc. Furthermore the relevant listing page should be shown.
#2.4.2 Redirects (301 error)
As a general rule of thumb all redirects should be done using the 301 permanently moved response. All sub domains should be redirected this way (example.com -> 301 -> www.example.com) and also all other domains that contain the same information, as shown below;
Assure also that only the specified URLs work and make a 301 redirect rule for all non-specified URL’s when called missing.
#3 General page requirements
#3.1 Using standards
The site should comply with the World Wide Web consortium’s (http://www.w3.org/) recommendations for creating web pages (XHTML 1.0 Transitional should be enough) and also comply with the Americans with disabilities act (http://www.ada.gov/) if required.
#3.2 Page design
The pages should be designed with CSS positioning and the content part of the page should appear in the source code as early as possible preferably before other body content such as navigational blocks.
The navigation should be implemented with anchor tags and text and the links should not redirect.
Breadcrumb navigation would increase SEO with internal back links and usability in a sense that the visitor would see their location on site. Example of the breadcrumb navigation: Home => List furniture items => View table => ...
Scripts and other elements (CSS) should be put in external files. The source code should be kept clean with little or no unused code. The preferred maximum file size for HTML code is 100 KB.
#3.3 Elements of a page
The following elements should always be included (and be editable somehow) on a page which is to be indexed by search engine:
A page title should be as specific and concise as possible with respect to the document. This will insure its uniqueness and click-through in Search Engine Result Pages. A structure similar to "Page name | Section name | Site title - Tagline" is encouraged for clarity, uniqueness and better usability for the visitor. Focus on delivering a title that spans from specific (closer to the beginning) to general keywords. The length of the title needs no more then 80 characters.
#3.3.2 Meta description, robots and keywords (in the header)
HTML meta description around 150 characters should be sufficient. Although it doesn't hurt to be a little more, this data should contain the most concise information about the document. The uniqueness of this information also plays a fair role as far as Search Engine Result Pages are concerned.
Meta keywords on the other hand are not quite necessary since it is the responsibility of the search engine indexers to determine the nature and the relevancy of the document. For the purposes of accuracy, they can't rely on what the document claims it to be. There comes a transition on the Web which provides this sort of meta information about the document. Today, the results gained from meta keywords are negligible. See below some examples of well written meta tags;
<meta name="description" content="Suppliers of quality office furniture and accessories at discount prices.">
<meta name="keywords" content="furniture, office, store, shop, retail, discount">
#3.3.3 Page heading (one <h1> per page)
A proper structured document will consist of headings, paragraphs, lists, tables, and forms, and use an external stylesheet to style them. Many search engines place more emphasis on text within heading tags (and not just on keywords provided in meta elements), so make sure they use keywords. Use one <h1> tag per page with the most important keywords. You can also use other head tags ( <h2>, <h3> etc.) to provide variations and support the main heading.
Some example of tags are;
<h1>Tables</h1>
<h2>Round tables</h2>
<p>... information about round tables ...</p>
<h2>Square Desks</h2>
<p>... information about square desks, etc.</p>
#3.3.4 Body text
Make sure the text of your web pages contain keywords and common phrases which people might search for. Be careful with the frequency of your keywords - you want to have them occur at least a few times if possible, but don't repeat yourself so much that the copy becomes unnatural. The idea is to discretely spread keywords around without making it obvious.
A well written document will naturally use keywords that are appropriate and in proportion. Search engine algorithms essentially compare similar documents to get a better understanding of the nature of the document. If a document is not well written and gives off-balanced scores then it will raise flags and possibly mark it as not relevant as it indicates a document that is written for the machine and not for the human reader. Keep in mind that indexing is in place to assist human searches. An example of good body text could be like;
[p]Buy office furniture at affordable prices from any of our retail stores.[/p]
#3.3.5 Images and Pictures
When pictures, that are not part of the page template, are used they should always include an ALT description. This description should either be automated or editable (This is partly already a requirement of the Americans with disabilities act).
#3.4 Automation
The title element and the meta description and keywords need to be automatically generated according to different templates. These templates will include page- and directory specific elements as well as generic elements. An example of a template for the title element for a page called Search results page could be:
[Results] - [category] - Search results – My furniture example.com
www.myfurniture.com/tables/round
www.myfurniture.com/tables/square
www.myfurniture.com/tables/plastic
www.myfurniture.com/chairs/rocking
www.myfurniture.com/chairs/revolving
Text appearing on such pages should be as informative as possible and number of entity per list should be kept around 10 to 30 entities. Listing pages may also contain links to other important pages which are to be indexed.
#7 View pages
View entity pages shows detailed information about entities listed in listing pages. Title, Meta description, Meta keywords, H1 tag should contain information about entity that is expected to be viewed.
#8 Other pages
Other pages may include pages like Login pages, Posting/Editing entity pages etc.
#8.1 Login pages
Such pages should not get indexed as they don't contain any public searchable information.
#8.2 Posting/Editing entity pages
Any page that contains forms to be submitted are not normally indexed as they don't display any searchable information to general public.
General rule of thumb is that those pages which changes stat of the server (like data is inserted/updated, file is created/delete etc.) or those pages which are personal to users are not indexed as they are tightly integrated with data of the website.
#9 Resources
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769
http://help.yahoo.com/l/us/yahoo/search/basics/basics-18.html
This document aims to defines generic search engine optimization requirements for various projects.
At this moment this document contains general guidelines of SEO. In future, at the time of taking training session, this document will be expanded further in order to be used as perfect resource for almost all SEO requirements.
#2 General requirements
#2.1 Server location
The server should be located in same country from where it will be mostly accessed. Moreover If the service will have it’s own domain then it should reside on a dedicated server. Wildcard DNS should not be allowed as well as all sub domains, if any, should be activated separately.
#2.2 Robots.txt
The robots.txt file is to be placed in the root (value of DocumentRoot directive in case web server is Apache) directory of the software. It should allow the search engines to crawl all directories where information related to various entities will be shown.
Personal pages such as listing owner's entities, posting/editing entities that require login should be blocked. Most search engines now a days are able to find out this behavior hence you may omit such entries into robots.txt file.
#2.3 Encoding
If there are will be used special characters in language of website they will need to get encoded in URLs (maybe using PHP function like urlencode()) and Filenames using UTF-8 encoding. For full documentation of encoding of such characters, please visit http://www1.tip.nl/~t876506/utf8tbl.html). As practically all browsers supports the Unicode UTF-8 standard, it should not be important to encode the characters in the actual content. The suitable HTML entities can be taken from this address: http://leftlogic.com/lounge/articles/entity-lookup/ anyhow.
There should be a 301 redirect from any page with special characters in the URL where someone writes the URL using the special characters and not the encoded ones if that user has a browser that over writes the UTF-8 character set with some other character set. See how Wikipedia functions for an example. This prevents links with the wrong character set to be used on external pages.
#2.4 Header responses
#2.4.1 Page not found (404 error)
Entities that are removed from database/software should not be shown. When someone accesses the removed listings page the server should respond with a 404 header response (and not a 200 response) and show an error message (or optionally a separate page) saying that the entity is already deleted/expired/sold etc. Furthermore the relevant listing page should be shown.
#2.4.2 Redirects (301 error)
As a general rule of thumb all redirects should be done using the 301 permanently moved response. All sub domains should be redirected this way (example.com -> 301 -> www.example.com) and also all other domains that contain the same information, as shown below;
www.example.net -> 301 -> www.example.com
www.example.in -> 301 -> www.example.com
Assure also that only the specified URLs work and make a 301 redirect rule for all non-specified URL’s when called missing.
#3 General page requirements
#3.1 Using standards
The site should comply with the World Wide Web consortium’s (http://www.w3.org/) recommendations for creating web pages (XHTML 1.0 Transitional should be enough) and also comply with the Americans with disabilities act (http://www.ada.gov/) if required.
#3.2 Page design
The pages should be designed with CSS positioning and the content part of the page should appear in the source code as early as possible preferably before other body content such as navigational blocks.
The navigation should be implemented with anchor tags and text and the links should not redirect.
Breadcrumb navigation would increase SEO with internal back links and usability in a sense that the visitor would see their location on site. Example of the breadcrumb navigation: Home => List furniture items => View table => ...
Scripts and other elements (CSS) should be put in external files. The source code should be kept clean with little or no unused code. The preferred maximum file size for HTML code is 100 KB.
#3.3 Elements of a page
The following elements should always be included (and be editable somehow) on a page which is to be indexed by search engine:
- Page title ([title]-element in the header)
- Meta description, robots and keywords (in the header)
- Page heading (one [h1] per page)
A page title should be as specific and concise as possible with respect to the document. This will insure its uniqueness and click-through in Search Engine Result Pages. A structure similar to "Page name | Section name | Site title - Tagline" is encouraged for clarity, uniqueness and better usability for the visitor. Focus on delivering a title that spans from specific (closer to the beginning) to general keywords. The length of the title needs no more then 80 characters.
#3.3.2 Meta description, robots and keywords (in the header)
HTML meta description around 150 characters should be sufficient. Although it doesn't hurt to be a little more, this data should contain the most concise information about the document. The uniqueness of this information also plays a fair role as far as Search Engine Result Pages are concerned.
Meta keywords on the other hand are not quite necessary since it is the responsibility of the search engine indexers to determine the nature and the relevancy of the document. For the purposes of accuracy, they can't rely on what the document claims it to be. There comes a transition on the Web which provides this sort of meta information about the document. Today, the results gained from meta keywords are negligible. See below some examples of well written meta tags;
<meta name="description" content="Suppliers of quality office furniture and accessories at discount prices.">
<meta name="keywords" content="furniture, office, store, shop, retail, discount">
#3.3.3 Page heading (one <h1> per page)
A proper structured document will consist of headings, paragraphs, lists, tables, and forms, and use an external stylesheet to style them. Many search engines place more emphasis on text within heading tags (and not just on keywords provided in meta elements), so make sure they use keywords. Use one <h1> tag per page with the most important keywords. You can also use other head tags ( <h2>, <h3> etc.) to provide variations and support the main heading.
Some example of tags are;
<h1>Tables</h1>
<h2>Round tables</h2>
<p>... information about round tables ...</p>
<h2>Square Desks</h2>
<p>... information about square desks, etc.</p>
#3.3.4 Body text
Make sure the text of your web pages contain keywords and common phrases which people might search for. Be careful with the frequency of your keywords - you want to have them occur at least a few times if possible, but don't repeat yourself so much that the copy becomes unnatural. The idea is to discretely spread keywords around without making it obvious.
A well written document will naturally use keywords that are appropriate and in proportion. Search engine algorithms essentially compare similar documents to get a better understanding of the nature of the document. If a document is not well written and gives off-balanced scores then it will raise flags and possibly mark it as not relevant as it indicates a document that is written for the machine and not for the human reader. Keep in mind that indexing is in place to assist human searches. An example of good body text could be like;
[p]Buy office furniture at affordable prices from any of our retail stores.[/p]
#3.3.5 Images and Pictures
When pictures, that are not part of the page template, are used they should always include an ALT description. This description should either be automated or editable (This is partly already a requirement of the Americans with disabilities act).
#3.4 Automation
The title element and the meta description and keywords need to be automatically generated according to different templates. These templates will include page- and directory specific elements as well as generic elements. An example of a template for the title element for a page called Search results page could be:
Different elements that could be included are- Results = Search results pages (New, Old, All)
- Category name = Such as Wood tables, Wood chairs, Metal chairs
- Area = Can represents location of entity.
- Page number in a Search results, if applicable
- The category name or area might not be in basic form – different grammatical forms might be needed.
In the title, meta information and headings the keywords or key phrases are added as is or in another grammatical form but when automating (URL rewriting) the URL, it may need some encoding if other language has been used:
- Non-ASCII keywords (and phrases) included in URLs need to be encoded in hex values (maybe using PHP function like urlencode()) like:
- www.example.com/product/table/એપલ => www.example.com/product/table/%E0%AA%8F%E0%AA%AA%E0%AA%B2
#4 Index page
Index page of your website is the most likely to get the highest number of inbound links since it is entry point of your website. Hence linking other pages of website from this page becomes very important. By theory this page should host almost links of all pages that starts from here.
However number of links in such page should be around 100, in many projects it may not be possible to display all links. In such cases most important links should be made visible from here. And remaining pages could be linked from there because our purpose is to chain all important pages to be get indexed.
To make this working it becomes important to identify those important links. For example if you are selling something then this home page can have link of those pages that display list of items per category of products. Similarly if they are bound to certain geographical location and if you website displays list of selling items per province/city/area then links to those pages could be placed on this page.
#5 Search pages
Search pages whether simple or extended, may not be indexed as they are not containing, be default, any information to be searched for.
However for usability point of view, their URLs, page design and on-page information should be properly designed and implemented.
#6 Listing pages
Listing pages are the 2nd most important pages for any website as they display information about entities for which website is created. Listings entities can include various types of stuffs ranging from selling items, ads, jobs etc.
Such listing may contain pagination and sorting links depending upon results and interest of users. It is recommended to keep pagination links in text mode so that search engine can crawl through all available pages and can index those pages. However sorting links may be implemented using JS (Ajax) etc. so that additional query to server can be minimized. From search engine point of view, it doesn't matter in what order information displays.
If possible, URL scheme of such pages can be made self-informative. For example for furniture selling website URLs can be designed like below;
Index page of your website is the most likely to get the highest number of inbound links since it is entry point of your website. Hence linking other pages of website from this page becomes very important. By theory this page should host almost links of all pages that starts from here.
However number of links in such page should be around 100, in many projects it may not be possible to display all links. In such cases most important links should be made visible from here. And remaining pages could be linked from there because our purpose is to chain all important pages to be get indexed.
To make this working it becomes important to identify those important links. For example if you are selling something then this home page can have link of those pages that display list of items per category of products. Similarly if they are bound to certain geographical location and if you website displays list of selling items per province/city/area then links to those pages could be placed on this page.
#5 Search pages
Search pages whether simple or extended, may not be indexed as they are not containing, be default, any information to be searched for.
However for usability point of view, their URLs, page design and on-page information should be properly designed and implemented.
#6 Listing pages
Listing pages are the 2nd most important pages for any website as they display information about entities for which website is created. Listings entities can include various types of stuffs ranging from selling items, ads, jobs etc.
Such listing may contain pagination and sorting links depending upon results and interest of users. It is recommended to keep pagination links in text mode so that search engine can crawl through all available pages and can index those pages. However sorting links may be implemented using JS (Ajax) etc. so that additional query to server can be minimized. From search engine point of view, it doesn't matter in what order information displays.
If possible, URL scheme of such pages can be made self-informative. For example for furniture selling website URLs can be designed like below;
www.myfurniture.com/tables/round
www.myfurniture.com/tables/square
www.myfurniture.com/tables/plastic
www.myfurniture.com/chairs/rocking
www.myfurniture.com/chairs/revolving
Text appearing on such pages should be as informative as possible and number of entity per list should be kept around 10 to 30 entities. Listing pages may also contain links to other important pages which are to be indexed.
#7 View pages
View entity pages shows detailed information about entities listed in listing pages. Title, Meta description, Meta keywords, H1 tag should contain information about entity that is expected to be viewed.
#8 Other pages
Other pages may include pages like Login pages, Posting/Editing entity pages etc.
#8.1 Login pages
Such pages should not get indexed as they don't contain any public searchable information.
#8.2 Posting/Editing entity pages
Any page that contains forms to be submitted are not normally indexed as they don't display any searchable information to general public.
General rule of thumb is that those pages which changes stat of the server (like data is inserted/updated, file is created/delete etc.) or those pages which are personal to users are not indexed as they are tightly integrated with data of the website.
#9 Resources
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769
http://help.yahoo.com/l/us/yahoo/search/basics/basics-18.html