Appendix E – Coding of Alpha Fields in the SIM for UCS2
Multi-Tech Systems, Inc. Wireless GSM/GPRS AT Commands (Document Number S000293I) 191
The coding can take one of the three following structures. If the ME supports UCS2 coding of alpha fields in the SIM, the
ME shall support all three coding schemes for character sets containing 128 characters or less; for character sets
containing more than 128 characters, the ME shall at least support the first coding scheme. If the alpha field record
contains GSM default alphabet characters only, then none of these schemes shall be used in that record. Within a record,
only one coding scheme, either GSM default alphabet, or one of the three described below, shall be used.
1. If the first byte in the alpha string is '0x80', then the other bytes are 16 bit UCS2 characters. The more significant
byte (MSB) of the UCS2 character is coded in the lower numbered byte of the alpha field, and the less significant byte
(LSB) of the UCS2 character is coded in the higher numbered alpha field byte. In other words, byte 2 of the alpha
field contains the more significant byte (MSB) of the first UCS2 character, and byte 3 of the alpha field contains the
less significant byte (LSB) of the first UCS2 character (as shown below). Unused bytes shall be set to 'FF', and if the
alpha field has an even number of bytes, then the last (unusable) byte shall be set to 'FF'.
Example 1
Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 Byte 7 Byte 8 Byte 9
'80' Ch1MSB Ch1LSB Ch2MSB Ch2LSB Ch3MSB Ch3LSB 'FF' 'FF'
2. If the first byte of the alpha string is set to 0x'81', then the second byte contains a value indicating the number of
characters in the string. The third byte contains an 8-bit number that defines bits 15 to 8 of a 16-bit base pointer,
where bit 16 is set to zero, and bits 7 to 1 are also set to zero. These sixteen bits represent a base pointer to a "half-
page" in the UCS2 code space, to be used with some or all of the remaining bytes in the string. The fourth and
subsequent bytes in the string contain codings as follows:
• If bit 8 of the byte is set to zero, the remaining bits of the byte contain a GSM Default Alphabet character
• If bit 8 of the byte is set to one, then the remaining bits are an offset value added to the 16-bit base pointer
defined by byte 3, and the resulting 16-bit value is a UCS2 code point and defines a UCS2 character.
Example 2
Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 Byte 7 Byte 8 Byte 9
'81' '05' '13' '53' '95' 'A6' 'XX' 'FF' 'FF'
In the above example:
• Byte 2 indicates there are 5 characters in the string
• Byte 3 indicates bits 15 to 8 of the base pointer, and indicates a bit pattern of 0hhh hhhh h000 0000 as the 16
bit base pointer number. Bengali characters for example start at code position 0980 (0000 1001 1000 0000),
which is indicated by the coding '13' in byte 3 (shown by the italicized digits).
• Byte 4 indicates GSM Default Alphabet character ‘53’; e.g., "S".
• Byte 5 indicates a UCS2 character offset to the base pointer of '15', expressed in binary as follows 001 0101,
which, when added to the base pointer value results in a sixteen bit value of 0000 1001 1001 0101, e.g.. '0995',
which is the Bengali letter KA.
• Byte 8 contains the value 'FF', but as the string length is 5, this a valid character in the string, where the bit
pattern 111 1111 is added to the base pointer, yielding a sixteen bit value of 0000 1001 1111 1111 for the
UCS2 character (e.g., '09FF').
• Byte 9 contains the padding value 0xFF.
3. If the first byte of the alpha string is set to '0x82', then the second byte contains the length of the string (number of
characters). The third and fourth bytes contain a 16-bit number that defines the complete 16-bit base pointer to a
"half-page" in the UCS2 code space for use with some or all of the remaining bytes in the string. The fifth and
subsequent bytes in the string contain coding as follows:
• If bit 8 of the byte is set to zero, the remaining 7 bits of the byte contain a GSM Default Alphabet character
• If bit 8 of the byte is set to one, the remaining 7 bits are an offset value added to the base pointer defined in
bytes three and four, and the resultant 16 bit value is a UCS2 code point, and defines a UCS2 character.