Overview

7.1 Overview

This release of Wireless Edition supports single-byte, multi-byte, and fixed-width encoding schemes which are based on national, international, and vendor-specific standards.

If the character set is single byte, and that character set includes only composite characters, the number of characters and the number of bytes are the same. If the character set is multi-byte, there is generally no such correspondence between the number of characters and the number of bytes. A character can consist of one or more bytes, depending on the specific multi-byte encoding scheme.

A typical situation is when character elements are combined to form a single character. For example, in the Thai language, up to three separate character elements can be combined to form one character, and one Thai character would require up to 3 bytes when TH8TISASCII or another single-byte Thai character set is used. One Thai character would require up to 9 bytes when the UTF8 character set is used.

7.2 Multi-byte Encoding Schemes

Multi-byte encoding schemes are needed to support ideographic scripts used in Asian languages like Chinese or Japanese since these languages use thousands of characters. These schemes use either a fixed number of bytes to represent a character or a variable number of bytes per character.

7.2.1 Fixed-width Encoding Schemes

In a fixed-width Multi-byte encoding scheme, each character is represented by a fixed number of n bytes, where n is greater than or equal to two.

7.2.2 Variable-width Encoding Schemes

A variable-width encoding scheme uses one or more bytes to represent a single character. Some Multi-byte encoding schemes use certain bits to indicate the number of bytes that represent a character. For example, if two bytes is the maximum number of bytes used to represent a character, the most significant bit can be toggled to indicate whether that byte is part of a single-byte character or the first byte of a double-byte character. In other schemes, control codes differentiate single-byte from double-byte characters. Another possibility is that a shift-out code is used to indicate that the subsequent bytes are double-byte characters until a shift-in code is encountered.

7-2Oracle9i Application Server Wireless Edition Configuration Guide

Page 90
Image 90
Oracle Audio Technologies 9i Multi-byte Encoding Schemes, Fixed-width Encoding Schemes, Variable-width Encoding Schemes