
113
Building CLEM Expressions
Determining the length (number of characters) for a string variable—lengt h(STRING).
Checking the alphabetical ordering of string values—alphabefore(STRING1, STRING2).
Removing leading or trailing white space from values—trim(STRING),trim_start(STRIN G),
or trimend(STRING).
Extract the first or last ncharacters from a string—startstring(LENGTH, STRING) o r
endstring(LENGTH, STRING). For example, suppose you have a field named item that com bines
a product name with a four-digit ID code (ACME CAMERA-D109). To create a new field that
contains only the four-digit code, specify the following formula in a Derive node:
endstring(4,item)
Matching a specific pattern—STRING matches PATTERN. For example, to select persons with
“market” anywhere in their job title, you could specify the following in a Select node:
job_titlematches "*market*"
Replacing all instances of a substring within a string—replace(SUBSTRING, NEWSUBSTRING,
STRING). For example, to replace all instances of an unsupported character, such as a vertical
pipe ( | ), with a semicolon prior to text mining, use the replace function i n a Filler node.
Under Fill in fields:, select all fields where the character may occur. For the Replace: condi tion,
select Always, and specify the following condition under Replace with:
replace('|',';',@FIELD)
Deriving a flag field based on the presence of a specific substring. For example, you could
use a string function in a Derive node to generate a separate flag field for each response
with an expression such as:
hassubstring(museums,"museum_of_design")
For more information, see the topic String Functions in Chapter 8 on p. 141.
Handling Blanks and Missing ValuesReplacing blanks or missing values is a common data preparation task for data miners. CLEM
provides you with a number of tools to automate blank handling. The Filler node is the most
common place to work with blanks; however, the following functions can be used in any node that
accepts CLEM expressions:
@BLANK(FIELD) can be used to determine records whose values are blank for a particular
field, such as Age.
@NULL(FIELD) can be used to determine records whose values are system-missing for the
specified field(s). In IBM® SPSS® Modeler, system-missing values are displayed a s$ null$
values.