Character Sets, Ascii Character Set Management

Chapter 1: Product Features

Character Sets

The following includes the references to various tables that provide conversions between the different character sets.

•CS1 - GSM to UCS2.

•CS2 - ASCII to/from UTF8.

•CS3 - UCS2 to/from UTF8.

For the full content of a specific conversion table, refer to Appendix A, Character Set Tables.

ASCII Character Set Management

The ASCII character set is a standard seven-bit code that was proposed by ANSI in 1963, and finalized in 1968. ASCII was established to achieve compatibility between various types of data processing equipment.

GSM Character Set Management

In G24-L, the GSM character set is defined as octant stream. This means that text is displayed not as GSM characters but in the hex values of these characters.

UCS2 Character Set Management

UCS2 is the first officially standardized coded character set, eventually to include the characters of all the written languages in the world, as well as all mathematical and other symbols.

Unicode can be characterized as the (restricted) 2-octet form of UCS2 on (the most general) implementation level 3, with the addition of a more precise specification of the bi-directional behavior of characters, as used in the Arabic and Hebrew scripts.

The 65,536 positions in the 2-octet form of UCS2 are divided into 256 rows with 256 cells in each. The first octet of a character representation denotes the row number, the second the cell number. The first row (row 0) contains exactly the same characters as ISO/IEC 8859-1. The first 128 characters are thus the ASCII characters. The octet representing an ISO/IEC 8859-1 character is easily transformed to the representation in UCS2 by placing a 0 octet in front of it. UCS2 includes the same control characters as ISO/IEC 8859 (also in row 0).

UTF-8 Character Set Management

UTF-8 provides compact, efficient Unicode encoding. The encoding distributes a Unicode code value's bit pattern across one, two, three, or even four bytes. This encoding is a multi-byte encoding.

UTF-8 encodes ASCII in a single byte, meaning that languages using Latin-based scripts can be represented with only 1.1 bytes per character on average.

UTF-8 is useful for legacy systems that want Unicode support because developers do not have to drastically modify text processing code. Code that assumes single-byte code units typically does not fail completely when provided UTF-8 text instead of ASCII or even Latin-1.

April 15, 2008

G24-L AT Commands Reference Manual

1-13

PIONEERPOS G24-L Character Sets, Ascii Character Set Management, GSM Character Set Management

Models: G24-LC G24-L