Configuring clients for Unicode

When you set up a server to work in unicode mode, the client determines what character set to use by examining the current environment and, generally, you should have nothing more to do to get a correct translation. For example a UNIX client examines the LANG or LOCALE variables to determine the appropriate character set. However, there might be situations when you need to override the selection made by the client:

The automatically selected setting is producing bad translations.

See Troubleshooting user workstations in Unicode installations for more information.
You want to use separate workspaces (clients) and each of these needs to use a different character set. In this case, you must set a different P4CHARSET value for each client.
The files you check out need to be accessed by applications for which byte order is important.

See Unicode character sets and Byte Order Markers (BOMs) or more information.
You need to set P4CHARSET to an utf16 or utf32 setting.

See Controlling translation of server output for more information.
The file is checked out using Helix Server client applications that handle Unicode environments in different ways.

See Using other Helix Server client applications for more information.

In each of these cases, you will need to explicitly set P4CHARSET to an appropriate value or take some other action. To get a list of the possible values for P4CHARSET, use the command:

$ p4 help P4CHARSET

Warning

Do not submit a file using a P4CHARSET that is different than the one you used to sync it; the file is translated in a way that is likely to be incorrect. That is to say, do not change the value of P4CHARSET while files are checked out.

Unicode character sets and Byte Order Markers (BOMs)

Byte order markers (BOMs) are used in Unicode files to specify the order in which multi-byte characters are stored and to identify the file content as Unicode. Not all extended-character file formats use BOMs.

To ensure that such files are translated correctly by the Helix Server when the files are synced or submitted, you must set P4CHARSET to the character set that corresponds to the format used on your workstation by the applications that access them, such as text editors or IDEs. Typically the formats are listed when you save the file using the Save As... menu option.

The following table lists valid settings for P4CHARSET for specifying byte order properties of Unicode files.

Client Unicode format	BOM?	Big or Little-Endian	Set P4CHARSET to	Remarks
UTF-8	No	(N/A)	`utf8`	Suppresses Helix Server UTF-8 validation
	Yes		`utf8-bom`
	No		`utf8unchecked`
	Yes		`utf8unchecked-bom`
UTF-16	Yes	Per client	`utf16`	Synced with a BOM according to the client platform byte order
	Yes	Little	`utf16le`
	Yes	Big	`utf16be`
	No	Per client	`utf16-nobom`
	No	Little	`utf16le-nobom`
	No	Big	`utf16be-nobom`
UTF-32	Yes	Per client	`utf32`	Synced with a BOM according to the client platform byte order
	Yes	Little	`utf32le`
	Yes	Big	`utf32be`
	No	Per client	`utf32-nobom`
	No	Little	`utf32le-nobom`
	No	Big	`utf32be-nobom`

If you set P4CHARSET to a UTF-8 setting, the Helix Server does not translate text files when you sync or submit them. Helix Server does verify that such files contain valid UTF-8 data.

Controlling translation of server output

If you set P4CHARSET to any utf16 or utf32 setting, you must set the P4COMMANDCHARSET to a non-utf16 or non-utf32 character set in which you want server output displayed. "Server output" includes informational and error messages, diff output, and information returned by reporting commands.

To specify P4COMMANDCHARSET on a per-command basis, use the -Q flag. For example, to display all filenames in the depot, as translated using the winansi code page, issue the following command:

C:\> p4 -Q winansi files //...

Using other Helix Server client applications

If you are using other Helix Server client applications, note how they handle Unicode environments:

P4V (Helix Visual Client): the first time you connect to a Unicode-mode server, you are prompted to choose the character encoding. Thereafter, P4V retains your selection in association with the connection. P4V also has a global default setting for Charset. If you set this, it will be used instead of asking you to provide a charset.
P4Eclipse will ask for a charset when connecting to a Unicode-mode server.
P4Merge: To configure the character encoding used by P4Merge, choose P4Merge’s File > Character Encoding... menu option. When launched from P4V, P4Merge uses P4V’s P4CHARSET instead of the one defined in it’s preferences.
P4GT and P4EXP, the Helix Plugin for File Explorer, use environmental settings and will fail with a Unicode-mode server.