Appendix
BBBB

Unicode Support

Unicode Support in IBM SPSS Modeler

IBM® SPSS® Modeler is fully Unicode-enabled for both IBM® SPSS® Modeler and IBM®
SPSS® Modeler Server. This makes it possible to exchange data with other applica tions that
support Unicode, including multi-language databases, without any loss of information that might
be caused by conversion to or from a locale-specic encoding scheme.
SPSS Modeler stores Unicode data internally and can read and write multi-language data
stored as Unicode in databases without loss.
SPSS Modeler can read and write UTF-8 encoded text les. Text le import and export
will default to the locale-encoding but support UTF-8 as an alternative. This setting can be
specied in the le import and export nodes, or the default encoding can be changed in the
stream properties dialog box. For more information, see the to picS etting general options
for streams in Chapter 5 on p. 55.
Statistics, SAS, and text data les stored in the locale-encoding will be converted to UTF-8 on
import and back again on export. When writing to any le, if there are Unicode chara cters
that do not exist in the locale character set, they will be substituted and a warning will be
displayed. This should occur only where the data has been imported from a data source that
supports Unicode (a database or UTF-8 text le) and that contains characters from a different
locale or from multiple locales or character sets.
IBM® SPSS® Modeler Solution Publisher images are UTF-8 encoded and are truly portab le
between platforms and locales.
About Unicode
The goal of the Unicode standard is to provide a consistent way to encode multilingual text so that
it can be easily shared across borders, locales, and applications. The Unicode Standard, now at
version 4.0.1, denes a character set that is a superset of all of the character sets in common use
in the world today and assigns to each character a unique name and code point. The characters
and their code points are identical to those of the Universal Character Se t (UCS) dened by
ISO-10646. For more information, see the Unicode Ho me Page (http://www.unicode.org).
© Copyright IBM Corporation 1994, 2012. 248