Using Unicode

Encoding

Encodings are used to convert a file to either SBCS/DBCS for the active code page or Unicode (more specifically UTF-8) data. By default, XML and Unicode files with signatures (UTF-8, UTF-16 and UTF-32) files are automatically loaded as Unicode UTF-8 data, while other more common program source files like

.c, .java, and .cs source files are loaded as SBCS/DBCS active code page data.

All file data can be configured to Unicode UTF-8 data, but this would cause some problems. Loading files containing SBCS/DBCS data would take significantly longer, slowing down parsing by Context Tagging ® and any other multi-file operations. In addition, Unicode editors cannot support all the features supported by SBCS/DBCS editors due to font limitations. For more information, see Unicode Limitations.

To provide better support for editing Unicode and non-Unicode files, two modes of editing exist: Unicode and SBCS/DBCS mode. Files that contain Unicode, XML, or code page data not compatible with the act- ive code page should be opened as Unicode files.

The following are non-Unicode encodings and put the editor in SBCS/DBCS editing mode: Default, Text, SBCS/DBCS mode, Binary, SBCS/DBCS mode, and EBCDIC, SBCS/DBCS mode. In addition, the Auto Unicode, Auto Unicode2, Auto EBCDIC and Unicode, and Auto EBCDIC and Unicode2 encod- ings put the editor into SBCS/DBCS editing mode when the file is determined not to be Unicode. All other encodings put the editor in Unicode mode and require that the file data be converted to UTF-8.

There are many encodings available, including:

Auto XML - This encoding specifies that the file encoding be determined based on XML standards and that the file be loaded as Unicode data. The encoding is determined based on the encoding specified by the ?xml tag. If the encoding is not specified by the ?xml, the file data is assumed to be UTF-8 data which is consistent with XML standards. We applied some modifications to the standard XML encoding determination to allow for some user error. If the file has a standard Unicode signature, the Unicode signature is assumed to be correct and the encoding defined by the ?xml tag is ignored.

Auto Unicode - When this encoding is chosen and the file has a standard Unicode signature, the file is loaded as Unicode data. Otherwise the file is loaded as SBCS/DBCS data.

Auto Unicode2 - When this encoding is chosen and the file has a standard Unicode signature or looks like a Unicode file, the file is loaded as Unicode data. Otherwise the file is loaded as SBCS/DBCS data. This option is NOT fool-proof and may give incorrect results.

Auto EBCDIC - When this encoding is chosen and the file looks like an EBCDIC file, the file is loaded as Unicode data. Otherwise, the file is loaded as SBCS/DBCS data. This option is NOT fool-proof and may give incorrect results. The option does attempt to support binary EBCDIC files.

Auto EBCDIC and Unicode2 - This encoding is a combination of the Auto EBCDIC and Auto Unicode2 encodings described above.

Using Unicode

To use encodings, Unicode support is required (OEMs typically turn this feature off). Unicode is supported

470

Page 492
Image 492
Slick V3.3 manual Encoding, Using Unicode