CHAPTER 9 International Languages and Character Sets

The Encodings section lists which characters are lead-bytes, for multi-byte character sets, and what are valid follow-bytes.

For example, the Shift-JIS Encodings section is as follows:

Encodings: [\x00-\x80,\xa0-\xdf,\xf0-\xff] [\x81-\x9f,\xe0-\xef][\x40-\x7e,\x80-\xfc]

The first line following the section title lists valid single-byte characters. The square brackets enclose a comma-separated list of ranges. Each range is listed as a hyphen-separated pair of values. In the Shift-JIS collation, values \x00 to \x80 are valid single-byte characters, but \x81 is not a valid single-byte character.

The second line following the section title lists valid multibyte characters. Any combination of one byte from the second line followed by one byte from the first is a valid character. Therefore \x81\x40 is a valid double-byte character, but \x81 \x00 is not.

The Properties section

The Properties section is optional, and follows the Encodings section.

If a Properties section is supplied, an Encodings section must be supplied also.

The Properties section lists values for the first-byte of each character that represent alphabetic characters, digits, or spaces.

The Shift-JIS Properties section is as follows:

Properties:

space: [\x09-\x0d,\x20]

digit: [\x30-\x39]

alpha: [\x41-\x5a,\x61-\x7a,\x81-\x9f,\xe0-\xef]

This indicates that characters with first bytes \x09 to \x0d, as well as \x20, are to be treated as space characters, digits are found in the range \x30 to \x39 inclusive, and alphabetic characters in the four ranges \x41-\x5a, \x61-\x7a, \x81-\x9f, and \xe0-\xef.

343

Page 363
Image 363
Sybase 12.4.2 manual Properties section, 343