Pages

11 Jun 2007

CVS best practices

Best practices often requires thorough knowledge of technology that is used in software development. To fully use CVS, we now need to know about tag, trunk, branch, merging etc. to minimize or eliminate certain problems arising of out insufficient use of CVS. In this document it has been shown how we can use CVS in most effective way to minimize such problems. Here are some policies that have been designed to follow whenever it is possible.

#1 Sandbox

The developer sandbox is where each developer keeps his or her working copy of the code base. In CVS this is referred to as the working directory. This is where they build, test and debug the modules that they are working on. A sandbox can also be the area where the staging build or the production build is done. Changes made in the work area are checked into the CVS repository. In addition, changes made in the repository by others have to be updated in the sandbox on a regular basis.

The best practices related to developers sandbox are:

#1.1 Keep System clocks in Sync

CVS tracks change to source files by using the timestamps on the file. If each client system date and time is not in sync, there is a definite possibility of CVS getting confused. Thus system clocks must be kept in sync by use of a central time server or similar mechanism.

CVS is designed from ground up to handle multiple timezones. As long as the host operating system has been setup and configured correctly, CVS will be able to track changes correctly.

#1.2 Stay in sync with the repository

To gain the benefits of working within a sandbox as mentioned above, the developer must keep his or her sandbox in sync with the main repository. A regular cvs update with the appropriate tag or branch name will ensure that the sandboxes are kept up to date.

#1.3 Do not share the sandbox

Sandboxes have to be unique for each developer or purpose. They should not be used for multiple things at the same time. A sandbox can be a working area for a developer or the build area for the final release. If such sandboxes are shared, then the owner of the sandbox will not be aware of the changes made to the files resulting in confusion.

In CVS, the sandbox is created automatically when a working copy is checked out for a CVS project using the cvs checkout [options] MODULES command. In very large projects, it does not make sense for the developers to check−out the entire source into the local sandbox. In such cases, they can take only certain modules in which they are working.

#1.4 Do not work outside the sandbox

The sandbox can be thought of as a controlled area within which CVS can track for changes made to the various source files. Files belonging to other developers will be automatically updated by CVS in the developer's sandbox. Thus the developer who lives within the sandbox will stand to gain a lot of benefits of concurrent development.

#1.5 Cleanup after completion

Make sure that the sandbox is cleaned up after completion of work on the files. Clean up can be done in CVS by using the cvs release [-d] [DIRECTORIES] command. This ensures that no old version of the files exists in the development sandbox.

#1.6 Check−in often

To help other developers keep their code in sync with your code, you must check−in (commit) your code often into the CVS repository. The best practice would be to check−in soon as a piece of code is completed, reviewed and tested, check−in the changes with command cvs commit [options] [-m LOG_MESSAGE | -F FILE] [-r revision] [FILES] to ensure that your changes are committed to the CVS repository.

CVS promotes concurrent development. Concurrent development is possible only if all the other developers are aware of the ongoing changes on a regular basis. This awareness can be termed as "situation awareness". One of the "bad" practices that commonly occur is the sharing of files between developers by email. This works against most of the best practices mentioned above. To share updates between two developers, CVS must be used as the communication medium. This will ensure that CVS is aware of the changes and can track them. Thus, audit trail can be established if necessary.

When you commit a change to the repository, make sure your change reflects a single purpose: the fixing of a specific bug, the addition of a new feature, or some particular task. Your commit will create a new revision number which can forever be used as a name for the change. You can mention this revision number in bug databases, or use it as an argument to CVS merge should you want to undo the change or port it to another branch.

#1.7 Add/Commit data in proper way

CVS is not good in handling directories. Hence once any directory is added, can't be removed from repository in normal way. Hence be careful when dealing with directories.

Moreover CVS tend to exclude empty directories while checking out any module. Which means any directory that is supposed to be empty at check-out time wont be included in checked-out copy of module. This problem often occurs when any directory is used to store temporary files which are not required to keep in CVS. Hence if directory is not present in checked-out module, your local sandbox might not work as expected. To solve this problem an empty file called .keepme can be added to empty directory.

#1.8 Use the issue-tracker wisely

Try to create as many two-way links between CVS changesets and your issue-tracking (gForge, Bugzilla, Mantis etc.) database as possible:

If possible, refer to a specific issue ID in every commit log message. When appending information to an issue (to describe progress, or to close the issue) name the revision number(s) responsible for the change.

#2 Branching and Merging

Branching in CVS splits a project's development into separate, parallel histories. Changes made on one branch do not affect the other branches. Branching can be used extensively to maintain multiple versions of a product for providing support and new features.

Merging converges the branches back to the main trunk. In a merge, CVS calculates the changes made on the branch between the point where it diverged from the trunk and the branch's tip (its most recent state), then applies those differences to the project at the tip of the trunk.

#2.1 Know when to create branches

This is a hotly debated question, and it really depends on the culture of your software project. Rather than prescribe a universal policy, we'll describe three common ones here.

#2.1.1 The Never-Branch system

(Often used by nascent projects that don't yet have runnable code.) Users commit their day-to-day work on /trunk. Occasionally /trunk "breaks" (doesn't compile, or fails functional tests) when a user begins to commit a series of complicated changes.

Pros: Very easy policy to follow. New developers have low barrier to entry. Nobody needs to learn how to branch or merge.

Cons: Chaotic development, code could be unstable at any time.

Note: this sort of development is a bit less risky in Subversion than in CVS. Because Subversion commits are atomic, it's not possible for a checkout or update to receive a "partial" commit while somebody else is in the process of committing.

#2.1.2 The Always-Branch system

(Often used by projects that favor heavy management and supervision.) Each user creates/works on a private branch for every coding task. When coding is complete, someone (original coder, peer, or manager) reviews all private branch changes and merges them to /trunk.

Pros: /trunk is guaranteed to be extremely stable at all times.

Cons: Coders are artificially isolated from each other, possibly creating more merge conflicts than necessary. Requires users to do lots of extra merging.

#2.1.3 The Branch-When-Needed system

Users commit their day-to-day work on /trunk.

Rule #1: /trunk must compile and pass regression tests at all times. Committers who violate this rule are publicly humiliated.

Rule #2: a single commit (changeset) must not be so large so as to discourage peer-review.

Rule #3: if rules #1 and #2 come into conflict (i.e. it's impossible to make a series of small commits without disrupting the trunk), then the user should create a branch and commit a series of smaller changesets there. This allows peer-review without disrupting the stability of /trunk.

Pros: /trunk is guaranteed to be stable at all times. The hassle of branching/merging is somewhat rare.

Cons: Adds a bit of burden to users' daily work: they must compile and test before every commit.

#2.3 Assign ownership to Trunk and Branches

The main trunk of the source tree and the various branches should have a owner assigned who will be responsible for.

#2.3.1 Keeping the list of configurable items for the branch or trunk

The owner will be the maintainer of the contents list for the branch or trunk. This list should contain the item name and a brief description about the item. This list is essential since new artifacts are always added to or removed from the repository on an ongoing basis. This list will be able to track the new additions/deletions to the repository for the respective branch.

#2.3.2 Establishing a working policy for the branch or trunk

The owner will establish policies for check−in and check−out. The policy will define when the code can be checked in (after coding or after review etc.,). Who is responsible to merge changes on the same file and resolve conflicts (the author or the person who recently changed the file).

#2.3.3 Identifying and document policy deviations

Policies once established tend to have exceptions. The owner will be responsible for identifying the workaround and tracking/documenting the same for future use.

#2.3.4 Merging with the trunk

The branch owner will be responsible for ensuring that the changes in the branch can be successfully merged with the main trunk at a reasonable point in time.

#2.3 Tag each release

As part of the release process, the entire code base must be tagged (by cvs tag [options] SYMBOLIC_TAG [FILES] command) with an identifier that can help in uniquely identifying the release. A tag gives a label to the collection of revisions represented by one developer's working copy (usually, that working copy is completely up to date so the tag name is attached to the "latest and greatest" revisions in the repository).

The identifier for the tag should provide enough information to identify the release at any point in time in the future. One suggested tag identifier is of the form.

release_{major version #}_{minor version #}

Checkout the entire codebase using the tag, and then proceed to go through a build / deploy / test process before making the actual release. This will absolutely ensure that what "leaves the door " is a verified and tested codebase.

#2.4 Create a branch after each release

After each software release, once the CVS repository is tagged, a branch has to be immediately created. This branch will serve as the bug fix baseline for that release. This branch is created only if the release is not a bug fix or patch release in the first place. Patches that have to be made for this release at any point in time in the future will be developed on this branch. The main trunk will be used for ongoing product development.

With this arrangement, the changes in the code for the ongoing development will be on the main trunk and the branch will provide a separate partition for hot fixes and bug fix releases.The identifier for the branch name can be of the form.

A branch can be created using cvs tag -b BRANCH_NAME command.

#2.5 Make bug fixes to branches only

This practice extends from the previous practice of creating a separate branch after a major release. The branch will serve as the code base for all bug fixes and patch release that have to be made. Thus, there is a separate repository "sandbox" where the hot fixes and patches can be developed apart from the mainstream development.

This practice also ensures that bug fixes done to previous releases do not mysteriously affect the mainstream version. In addition, new features added to the mainstream version do not creep into the patch release accidentally.

#2.6 Make patch releases from branches only

Since all the bug fixes for a given release are done on its corresponding branch, the patch releases are made from the branch. This ensures that there is no confusion on the feature set that is released as part of the patch release.. After the patch release is made, the branch has to be tagged using the release tagging practice (see Tag each release).

#2.7 Merge branch with the trunk after release

After each release from a branch, the changes made to the branch should be merged (by cvs update -j BRANCH command) with the trunk. This ensures that all the bug fixes made to the patch release are properly incorporated into future releases of the application.

This merge could potentially be time consuming depending on the amount of changes made to the trunk and the branch being merged. In fact, it will probably result in a lot of conflicts in CVS resulting in manual merges. After the merge, the trunk code base must be tested to verify that the application is in proper working order. This must be kept in mind while preparing the project schedule.

In the case of changes occurring on branches for a long period, these changes can be merged to the main branch on a regular basis even before the release is made. The frequency of merge is done based on certain logical points in the branch's evolution. To ensure that duplicate merging does not occur, the following practice can be adopted.

In addition to the branch tag, a tag called {branch_name}_MERGED should be created. This is initially at the same level as the last release tag for the branch. This tag is then "moved" after each intermediate merge by using the −F option. This eliminates duplicate merging issues during intermediate merges.

#3 Summary

it is not said that all of the above mentioned policies must be followed as they are there. There are always exceptions depending upon type of project, amount of work that is to be done. Hence policies need to be followed under most suitable way.