Web Content Management with Perforce

Laura Wingerd

[ Deutsch | Français ]

Abstract

Perforce is recognized as the fast, low-overhead, high-throughput solution in software configuration management (SCM). What's not as obvious is how Perforce solves the problem of web content management (WCM). Perforce is used in a wide range of WCM applications by:   organizations using an intranet for internal documentation;   companies whose product is web content, not software; and   individuals, companies, and organizations with external web sites. This paper surveys the Perforce deployment models currently in use for web content management, and identifies the features that make Perforce a suitable WCM solution..

1. How Perforce Works

Perforce uses a client/server architecture. The Perforce Server maintains a repository of versioned files (the depot) and a database of SCM information (the metadata). A Perforce client is any user or application that communicates with the server. A single Perforce Server can support a large number of local and geographically distributed clients. Perforce clients get files from the depot, modify them, and submit changed files back to the depot. The files are stored locally in the client workspaces.

2. A Simple WCM Approach

Web servers read files from a filesystem. Perforce client workspaces can be mapped to any filesystem. The simplest implementation of Perforce as a web content manager, therefore, is to establish a dedicated Perforce client workspace that is the web site filesystem. Developers and web content authors work on web files in their own workspaces and submit modified files to the depot. The dedicated workspace (i.e., the web site filesystem) is automatically synchronized with the depot every few minutes by P4FTP, a system service, or a daemon process.

This allows web content developers concurrent yet controlled access to web pages, while insulating the web site itself. The only agents with direct access to the web site filesystem are the automated synchronizer and the web server. Managing the web site filesystem with Perforce has these benefits:

  • Perforce lets content developers group multi-file changes into a single unit called a changelist. Changes involving multiple files are easy to track and review.

  • Web content developers can use Perforce reporting features to tell exactly which file versions are on the web site, which versions are in their own workspaces, and which versions are in each others' workspaces. Complete file histories are also readily available.

  • While most web authoring tools provide facilities to put files on directly onto web sites, few provide a way to return to previous versions. With Perforce, web sites can be resynchronized with previous file versions.

  • Web content reviewers have instant access to previous web site versions with P4Web's Back-in-Time Browsing

  • Perforce makes it possible for developers to work in parallel and without file locking. Web content developers can use Perforce's merge tools, or third-party merge tools of their choice, in the course of their work instead of having to wait for exclusive file access.

  • Perforce passwords and depot protections can be used to fine-tune access permissions authors have on files.

  • Perforce distinguishes and stores a wide variety of file types, including Unicode, binary and compressed binary, and Unix- and Macintosh-specific types.

  • Web content developers can work entirely inside a firewall. Aside from the web server, the only agent that needs extramural access to the web site filesystem is the service that synchronizes the files.

3. Web Deployment Staging

In the simple model described above, an author's changes are visible to web browsers as soon as his files are submitted to the depot and synchronized to the web site filesystem. There is no distinction between an author's work in progress and published web content. While acceptable for a small intranet, this model would be unsatisfactory for most web sites.

Web content at production sites typically passes through distinct deployment stages on its way from development to publication. Furthermore, web "content" seen by users may in fact be generated dynamically; the actual content under SCM control consists of web application server source files, database procedures, and the like. Each stage of deployment requires rebuilding the components that produce content. The web deployment staging of managed content, whether dynamic or static, can be controlled by Perforce with labels, branches, or a combination of the two.

3.1 Using Labels to Control Deployment

Perforce's labeling mechanism can be used to associate a deployment stage with a particular file revision. In the simplest scenario, as authors submit content to the depot, an internal web server with access to the head revisions of files is used to test and review changes. Approved changes are labeled, and the labeled revisions are synchronized to external web site filesystems.

The advantage of using labels is that it's easy to understand and practically effortless to set up. However, without additional infrastructure, controlling content publication with a label leaves no audit trail. While reverting to previous versions is always possible, it's not always possible to know which versions were available on a web site at any particular time.

3.2 Using Branches to Control Deployment

A more sophisticated approach to controlling deployment is to branch files into separate depot locations. Each location, or branch, is designated for a particular stage of web deployment. Branches can model all web deployment stages, including component integration, quality assurance, beta test sites, custom portals, and other content variants. Perforce client views are used to associate branches with staging web site filesystems.

In the simple scenario, authors work on files in a development branch and approved changes are propagated to a published branch. External web sites get files from the published branch, and internal web sites get files from the development branch.

There are several advantages to using branches for web content management:

  • As long as web site filesystems are continually synchronized with the latest files in their respective branches, Perforce metadata provide a complete inventory and history of each web site.

  • Perforce can report aggregate information for a branch. Recent activity, pending integrations, file differences, etc., can be displayed for each branch, and thus for each stage of web content deployment.

  • Because Perforce branches look like directories, authors and web masters can easily see the staging branches in the depot, and examine the contents of each.

  • Perforce tracks propagation of files and changelists from one branch to another. A web master can tell which files need propagating between staging branches. A developer can tell if her changelist has been propagated to a particular staging branch yet.

4. Direct Depot Access

The models described above rely on web site filesystems that are continually synchronized with the depot. An alternative to using a web site filesystem is to use a web server that can access Perforce depot files directly. Simple direct access can be enabled with the Perforce WebKeeper module for the Apache HTTP server, or with P4Web acting as a web server.

With direct depot access, dedicated client workspaces and web site filesystem synchronization are unecessary. For example, one WebKeeper-enabled web server can make content in a development branch available to reviewers and testers as soon as files are submitted, while another can make published content available as soon as content is promoted into the published branch.

5. Compatibility with Web Development Tools

Perforce imposes no restriction on the tools or environments used to create and deploy web content. Some Perforce features are particularly compatible with web development:

  • Web authoring tools (Dreamweaver, for example) typically rely on FTP to update web site files. P4FTP offers a familiar interface to Perforce for web content authors using those tools.

  • For development tools that provide programmable command menus or tool bars, P4 scripts can perform customized Perforce tasks.

  • Integrated development environments like Microsoft Developers Studio can manage Perforce tasks directly with the Perforce IDE Plug-ins installed.

  • When authoring tools do not provide an FTP or source control interface, Perforce client programs like P4Win, P4Web, and P4 can be used directly. Perforce client programs can detect new and changed files in the workspace. This allows authors to put all files involved in the web site under source control, even in the case where authoring tools create or modify files unexpectedly.

  • The Perforce C/C++ API (P4API) can be built into CGI programs, web application servers and other executables to access depot files directly for server-side processing. P4API extensions are available for a number of programming and scripting lanuages.

6. Conclusion

By far, the majority of Perforce customers are software development organizations. An added benefit of Perforce is that once it is installed for software development it can be used for web content management at no additional cost, and with very little administrative effort. However, Perforce is just as suitable for web content management independent of software development. As the models described above illustrate, Perforce can provide complete, unobtrusive control of web content evolution and distribution in a variety of implementations.

Appendix: Relevant Perforce Features

Some additional features of Perforce are particularly relevant for WCM purposes:

  • The operation that synchronizes the workspace with the depot (called "sync") only transfers files when needed. That is, the sync operation can be invoked as often as desired, but the server only sends files to the workspace if the workspace versions are no longer current. This efficiency applies to automated services that synchronize web site workspaces every few minutes as well as to developers keeping a local workspace up to date with copies of the latest web site files. If nothing has changed in the depot, no data transfer takes place.

  • Automated synchronization of web and FTP site files can be set up either with P4FTP or with system-supplied services.

  • When a user submits a collection of changed files, Perforce numbers and records a transaction called a changelist. Perforce enforces the integrity of each changelist so that even in the face of network or system problems, a user can never unintentionally end up with a partially submitted unit of work. Changelist numbers can be used to identify not only the files and revisions associated with it, but the state of any depot file at the moment the changelist was submitted.

  • File versions can be represented literally or symbolically. E.g., "foo.html#12" is literally the twelfth version of foo.html, whereas "foo.html@reviewed" is the foo.html version identified by the symbol "reviewed." Labels are typically used as symbols, but client workspace names and changelist numbers can be used as well. For example, if a web site workspace is named "website", then "foo.html@website" indicates the version of foo.html that is on the web site.

  • Perforce uses TCP/IP for communication between client and server and offers data compression options for file transfer and depot storage. It does not rely on NFS or any other file sharing system.

  • Perforce supports clients on a wide range of platforms, including Macintosh, Windows, Linux, and nearly all Unix.