Character Sets
1-20 AT Commands Reference Manual June 30, 20 08
Unlike some legacy encoding, UTF-8 is easy to parse. So-called lead and trail bytes are easily
distinguished. Moving forwards or backwards in a text string is easier in UTF-8 than in many
other multi-byte encoding.
The codes in the first half of the first row in Character Set Table CS2 (UTF-8 <-> ASCII) are
replaced in this transformation format by their ASCII codes, which are octets in the range
between 00h and 7F. The other UCS2 codes are transformed to between two and six octets in the
range between 80h and FF. Text containing only characters in Character Set Table CS3
(UTF-8 <-> UCS-2) is transformed to the same octet sequence, irrespective of whether it was
coded with UCS-2.
8859-1 Character Set Management
ISO-8859-1 is an 8 bit character set - a major improvement over the plain 7 bit US-ASCII.
Characters 0 to 127 are always identical with US-ASCII and the positions 128 to 159 hold some
less used control characters. Positions 160 to 255 hold language-specific characters.
ISO-8859-1 covers most West European languages, such as French (fr), Spanish (es), Catalan
(ca), Basque (eu), Portuguese (pt), Italian (it), Albanian (sq), Rhaeto-Romanic (rm), Dutch (nl),
German (de), Danish (da), Swedish (sv), Norwegian (no), Finnish (fi), Faroese (fo), Icelandic (is),
Irish (ga), Scottish (gd) and English (en). Afrikaans (af) and Swahili (sw) are also included,
extending coverage to much of Africa.