Chapter 1: Product Features
June 30, 2008 AT Commands Reference Manual 1 -19
Character Sets
The following includes the references to various tables that provide conversions between the
different character sets.
CS1 - GSM to UCS2.
CS2 - ASCII to/from UTF8.
CS3 - UCS2 to/from UTF8.
For the full content of a specific conversion table, refer to Appendix A, Character Set Tables.

ASCII Character Set Management

The ASCII character set is a standard seven-bit code that was proposed by ANSI in 1963, and
finalized in 1968. ASCII was established to achieve compatibility between various types of data
processing equipment.

GSM Character Set Management

In G24, the GSM character set is defined as octant stream. This means that text is displayed not as
GSM characters but in the hex values of these characters.

UCS2 Character Set Management

UCS2 is the first officially standardized coded character set, eventually to include the characters
of all the written languages in the world, as well as all mathematical and other symbols.
Unicode can be characterized as the (restricted) 2-octet form of UCS2 on (the most general)
implementation level 3, with the addition of a more precise specification of the bi-directional
behavior of characters, as used in the Arabic and Hebrew scripts.
The 65,536 positions in the 2-octet form of UCS2 are divided into 256 rows with 256 cells in
each. The first octet of a character representation denotes the row number, the second the cell
number. The first row (row 0) contains exactly the same characters as ISO/IEC 8859-1. The first
128 characters are thus the ASCII characters. The octet representing an ISO/IEC 8859-1 character
is easily transformed to the representation in UCS2 by placing a 0 octet in front of it. UCS2
includes the same control characters as ISO/IEC 8859 (also in row 0).

UTF-8 Character Set Management

UTF-8 provides compact, efficient Unicode encoding. The encoding distributes a Unicode code
value's bit pattern across one, two, three, or even four bytes. This encoding is a multi-byte
encoding.
UTF-8 encodes ASCII in a single byte, meaning that languages using Latin-based scripts can be
represented with only 1.1 bytes per character on average.
UTF-8 is useful for legacy systems that want Unicode support because developers do not have to
drastically modify text processing code. Code that assumes single-byte code units typically does
not fail completely when provided UTF-8 text instead of ASCII or even Latin-1.