Filtering metadata during replication or edge-to-edge chaining

As part of an HA/DR solution, you typically want to ensure that all the metadata and all the versioned files are replicated. In most other use cases, particularly build servers and/or forwarding replicas, this leads to a great deal of redundant data being transferred.

It is often advantageous to configure your replica servers to filter data on client workspaces and file revisions. For example:

  • developers working on one project at a remote site do not typically need to know the state of every client workspace at other sites where other projects are being developed
  • build servers don’t require access to the endless stream of changes to office documents and spreadsheets associated with a typical large enterprise
  • in the case of edge-to-edge chaining, the outer edge might need only a subset of what the inner edge has

Filtering applies to new revisions

Changes made to a filtering rules are not applied retrospectively.

  • If a file revision is on the replica, it will remain on the replica if the filter is changed.

  • If the metadata has previously been filtered out for file revisions and the filter is subsequently changed to allow those file revisions, the replica will still be missing any revisions created before the filter was changed, even after running the p4 verify -t command.

Therefore, filtering is more consistent and predictable if you re-seed the replica or edge server after an update to the server specification.

Sensitive or unneeded librarianClosed The librarian subsystem of the server stores, manages, and provides the archive files to other subsystems of the server. files can be removed by running p4 cachepurge before reseeding the replica or edge server.

Two ways to filter

Excluding database tables

The simplest way to filter metadata is by using the -T tableexcludelist option with the p4 pull command. If you know, for example, that a build server has no need to refer to any of your users' have lists or the state of their client workspaces, you can filter out db.have and db.working entirely with p4 pull -T db.have,db.working.

Excluding entire database tables is a coarse-grained method of managing the amount of data passed between servers, requires some knowledge of which tables are most likely to be referred to during Helix Server command operations, and offers no means of control over which versioned files are replicated.

Filtering by fields

You can have fine-grained control over what data is replicated by using the ClientDataFilter:, RevisionDataFilter:, and ArchiveDataFilter: fields of the p4 server form. These fields enable you to replicate only a subset of the server metadata and versioned files to a replica or edge.

Note

For this feature to work, the value of the Services: field in the server spec must be a value other than commit-server or standard.

Example   Filtering out client workspace data and files

If workspaces for users in each of three sites are named with site[123]-ws-username, a replica intended to act as partial backup for users at site1 could be configured as follows:

ServerID:       site1-1668
Name:           site1-1668
Type:           server
Services:       replica
Address:        tcp:site1bak:1668
Description:
        Replicate all client workspace data, except the states of
        workspaces of users at sites 2 and 3.
        Automatically replicate .c files in anticipation of user
        requests. Do not replicate .mp4 video files, which tend
        to be large and impose high bandwidth costs.
ClientDataFilter:
        //...
        -//site2-ws-*/...
        -//site3-ws-*/...
RevisionDataFilter:
ArchiveDataFilter:
        //....c
        -//....mp4

When you start the replica, your p4 pull metadata thread might resemble the following:

$ p4 configure set "site1-1668#startup.1=pull -i 30"

In this configuration, only those portions of db.have that are associated with site1 are replicated. All metadata concerning workspaces associated with site2 and site3 is ignored.

All file-related metadata is replicated. All files in the depot are replicated, except for those ending in .mp4. Files ending in .c are transferred automatically to the replica when submitted.

To further illustrate the concept, consider a build server scenario. The ongoing work of the organization (such as code, business documents, or videos) can be stored anywhere in the depot, but this build farm is dedicated to building releasable products, and has no need to have the rest of the organization’s output:

Example   Replicating metadata and file contents for a subset of a depot

Releasable code is placed into //depot/releases/... and automated builds are based on these changes. Changes to other portions of the depot, as well as the states of individual workers' client workspaces, are filtered out.

ServerID:       builder-1669
Name:           builder-1669
Type:           server
Services:       build-server
Address:        tcp:built:1669
Description:
        Exclude all client workspace data
        Replicate only revisions in release branches
ClientDataFilter:
        -//...
RevisionDataFilter:
        //depot/releases/...
ArchiveDataFilter:
        //depot/releases/...
Important

If you want to exclude a subset of paths, first put inclusionary line(s), then add the exclusionary line(s) below. For example,

RevisionDataFilter:
    //... 
    -//depot/releases/...

To seed the replica, you can use a command like the following to create a filtered checkpoint:

$ p4d -r /p4/master -P builder-1669 -jd myCheckpoint

The filters specified for builder-1669 are used in creating the checkpoint. You can then continue to update the replica using the p4 pull command.

When you start the replica, your p4 pull metadata thread might resemble the following:

$ p4 configure set "builder-1669#startup.1=pull -i 30"

Therefore, this p4 pull thread gets metadata for replication that excludes all client workspace data (including the have lists) of all users.

The p4 pull -u thread(s) ignore all changes on the master except those that affect revisions in the //depot/releases/... branch, which are the only changes of interest to a build server. The only metadata that is available is that which concerns released code. All released code is automatically transferred to the build server before any requests are made, so that when the build server performs a p4 sync, the sync is performed locally.