Configuring clients for Unicode
When you set up a server to work in unicode mode, the client determines
what character set to use by examining the current environment and,
generally, you should have nothing more to do to get a correct
translation. For example a UNIX client examines the LANG
or
LOCALE
variables to determine the appropriate character set.
However, there might be situations when you need to override the
selection made by the client:
-
The automatically selected setting is producing bad translations.
See Troubleshooting user workstations in Unicode installations for more information.
- You want to use separate workspaces (clients) and each of these needs
to use a different character set. In this case, you must set a
different
P4CHARSET
value for each client. -
The files you check out need to be accessed by applications for which byte order is important.
See Unicode character sets and Byte Order Markers (BOMs) or more information.
-
You need to set
P4CHARSET
to anutf16
orutf32
setting.See Controlling translation of server output for more information.
-
The file is checked out using Helix Server client applications that handle Unicode environments in different ways.
See Using other Helix Server client applications for more information.
In each of these cases, you will need to explicitly set
P4CHARSET
to an appropriate value or take some other action.
To get a list of the possible values for P4CHARSET
, use the
command:
$ p4 help P4CHARSET
Do not submit a file using a P4CHARSET
that is different
than the one you used to sync it; the file is translated in a way that
is likely to be incorrect. That is to say, do not change the value of
P4CHARSET
while files are checked out.
Unicode character sets and Byte Order Markers (BOMs)
Byte order markers (BOMs) are used in Unicode files to specify the order in which multi-byte characters are stored and to identify the file content as Unicode. Not all extended-character file formats use BOMs.
To ensure that such files are translated correctly by the
Helix Server when the files are synced or submitted, you must set
P4CHARSET
to the character set that corresponds to the
format used on your workstation by the applications that access them,
such as text editors or IDEs. Typically the formats are listed when you
save the file using the menu
option.
The following table lists valid settings for P4CHARSET
for
specifying byte order properties of Unicode files.
Client Unicode format | BOM? | Big or Little-Endian | Set P4CHARSET to | Remarks |
---|---|---|---|---|
UTF-8 |
No |
(N/A) |
|
Suppresses Helix Server UTF-8 validation |
Yes |
|
|||
No |
|
|||
Yes |
|
|||
UTF-16 |
Yes |
Per client |
|
Synced with a BOM according to the client platform byte order |
Yes |
Little |
|
|
|
Yes |
Big |
|
||
No |
Per client |
|
||
No |
Little |
|
||
No |
Big |
|
||
UTF-32 |
Yes |
Per client |
|
Synced with a BOM according to the client platform byte order |
Yes |
Little |
|
||
Yes |
Big |
|
||
No |
Per client |
|
||
No |
Little |
|
||
No |
Big |
|
If you set P4CHARSET
to a UTF-8 setting, the
Helix Server
does not translate text files when you sync or submit them.
Helix Server
does verify that such files contain valid UTF-8 data.
Controlling translation of server output
If you set P4CHARSET
to any utf16
or
utf32
setting, you must set the
P4COMMANDCHARSET
to a non-utf16
or
non-utf32
character set in which you want server output
displayed. "Server output" includes informational and error messages,
diff output, and information returned by reporting commands.
To specify P4COMMANDCHARSET
on a per-command basis, use the
-Q
flag. For example, to display all filenames in the depot,
as translated using the winansi
code page, issue the
following command:
C:\> p4 -Q winansi files //...
Using other Helix Server client applications
If you are using other Helix Server client applications, note how they handle Unicode environments:
- P4V (Helix Visual Client): the first time you connect to a Unicode-mode server, you are prompted to choose the character encoding. Thereafter, P4V retains your selection in association with the connection. P4V also has a global default setting for Charset. If you set this, it will be used instead of asking you to provide a charset.
- P4Eclipse will ask for a charset when connecting to a Unicode-mode server.
- P4Merge: To configure the character encoding used by P4Merge,
choose P4Merge’s P4V, P4Merge uses
P4V’s
P4CHARSET
instead of the one defined in it’s preferences.
menu option. When launched from
- P4GT and P4EXP, the Helix Plugin for File Explorer, use environmental settings and will fail with a Unicode-mode server.