CHAPTER 9 International Languages and Character Sets

Multibyte character sets

 

Some languages, such as Japanese and Chinese, have many more than 256

 

characters. These characters cannot all be represented using a single byte, but

 

can be represented in multibyte character sets. In addition, some character sets

 

use the much larger number of characters available in a multibyte

 

representation to represent characters from many languages in a single, more

 

comprehensive, character set.

 

Multibyte character sets are of two types. Some are variable width, in which

 

some characters are single-byte characters, others are double-byte, and so on.

 

Other sets are fixed width, in which all characters in the set have the same

 

number of bytes. Adaptive Server IQ supports only variable-width character

 

sets.

Example

As an example, characters in the Shift-JIS character set are of either one or two

 

bytes in length. If the value of the first byte is in the range of hexadecimal

 

values from \x81 to \x9F or from \xE0 to \xEF (decimal values 129-159 or 224-

 

239) the character is a two-byte character and the subsequent byte (called a

 

follow byte) completes the character. If the first byte is outside this range, the

 

character is a single-byte character and the next byte is the first byte of the

 

following character.

 

The properties of any Shift-JIS character can be read from its first byte

 

 

also. Characters with a first byte in the range \x09 to \x0D, or \x20, are

 

 

space characters.

 

Characters in the ranges \x41 to \x5A, \x61 to \x7A, \x81 to \x9F or \xA1

 

 

to \xEF are considered to be alphabetic (letters).

 

Characters in the range \x30 to \x39 are digits.

 

In building custom collations, you can specify which ranges of values for the

first byte signify single- and double-byte (or more) characters, and which specify space, alpha, and digit characters. However, all first bytes of value less than 64 (hex 40) must be single-byte characters, and no follow bytes may have values less than 64. This restriction is satisfied by all known current encodings.

For information on the multibyte character sets, see “Using multibyte collations” on page 336.

321

Page 341
Image 341
Sybase 12.4.2 manual Multibyte character sets, 321