Configuring clients for Unicode

When you set up a server to work in unicode mode, the client determines what character set to use by examining the current environment and, generally, you should have nothing more to do to get a correct translation. For example a UNIX client examines the LANG or LOCALE variables to determine the appropriate character set. However, there might be situations when you need to override the selection made by the client:

In each of these cases, you will need to explicitly set P4CHARSET to an appropriate value or take some other action. To get a list of the possible values for P4CHARSET, use the command:

$ p4 help P4CHARSET
Warning

Do not submit a file using a P4CHARSET that is different than the one you used to sync it; the file is translated in a way that is likely to be incorrect. That is to say, do not change the value of P4CHARSET while files are checked out.

Unicode character sets and Byte Order Markers (BOMs)

Byte order markers (BOMs) are used in Unicode files to specify the order in which multi-byte characters are stored and to identify the file content as Unicode. Not all extended-character file formats use BOMs.

To ensure that such files are translated correctly by the Helix Server when the files are synced or submitted, you must set P4CHARSET to the character set that corresponds to the format used on your workstation by the applications that access them, such as text editors or IDEs. Typically the formats are listed when you save the file using the Save As... menu option.

The following table lists valid settings for P4CHARSET for specifying byte order properties of Unicode files.

Client Unicode format BOM? Big or Little-Endian Set P4CHARSET to Remarks

UTF-8

No

(N/A)

utf8

Suppresses Helix Server UTF-8 validation

 

Yes

 

utf8-bom

 
 

No

 

utf8unchecked

 
 

Yes

 

utf8unchecked-bom

 

UTF-16

Yes

Per client

utf16

Synced with a BOM according to the client platform byte order

 

Yes

Little

utf16le

 

 

Yes

Big

utf16be

 
 

No

Per client

utf16-nobom

 
 

No

Little

utf16le-nobom

 
 

No

Big

utf16be-nobom

 

UTF-32

Yes

Per client

utf32

Synced with a BOM according to the client platform byte order

 

Yes

Little

utf32le

 
 

Yes

Big

utf32be

 
 

No

Per client

utf32-nobom

 
 

No

Little

utf32le-nobom

 
 

No

Big

utf32be-nobom

 

If you set P4CHARSET to a UTF-8 setting, the Helix Server does not translate text files when you sync or submit them. Helix Server does verify that such files contain valid UTF-8 data.

Controlling translation of server output

If you set P4CHARSET to any utf16 or utf32 setting, you must set the P4COMMANDCHARSET to a non-utf16 or non-utf32 character set in which you want server output displayed. "Server output" includes informational and error messages, diff output, and information returned by reporting commands.

To specify P4COMMANDCHARSET on a per-command basis, use the -Q flag. For example, to display all filenames in the depot, as translated using the winansi code page, issue the following command:

C:\> p4 -Q winansi files //...

Using other Helix Server client applications

If you are using other Helix Server client applications, note how they handle Unicode environments:

  • P4V (Helix Visual Client): the first time you connect to a Unicode-mode server, you are prompted to choose the character encoding. Thereafter, P4V retains your selection in association with the connection. P4V also has a global default setting for Charset. If you set this, it will be used instead of asking you to provide a charset.
  • P4Eclipse will ask for a charset when connecting to a Unicode-mode server.
  • P4Merge: To configure the character encoding used by P4Merge, choose P4Merge’s File > Character Encoding... menu option. When launched from P4V, P4Merge uses P4V’s P4CHARSET instead of the one defined in it’s preferences.
  • P4GT and P4EXP, the Helix Plugin for File Explorer, use environmental settings and will fail with a Unicode-mode server.