IBM 15 Node Caches

231

PerformanceConsiderations for Streams and Nodes

The following operations cannot be performed in most databases. T hey should be placed in the

stream after the operations in the preceding list:

Operations on any nondatabase data, such as ﬂat ﬁles

Merge by order

Balance

Distinct operations in discard mode or where only a subset of ﬁelds are selected as distinct

Any operation that requires accessing data from records other than the one being processed

State and count ﬁeld derivations

History node operations

Operations involving “@” (time-series) functions

Type-checking modes Warnand Abort

Model construction, application, and analysis

Note: Decision trees, rulesets, linear regressi on, and factor-generated models can generate

SQL and can therefore be pushed back to the database.

Data output to anywhere other than the same database that is processing the data

Node Caches

Tooptimize stream run ning, you can set up a cache on any nonterminal node. When you set up a

cache on a node, the cache is ﬁlled with the data that passes through the node the next time you

run the data stream. From then on, the data is read from the cache (which is st ored on disk in a

temporary directory) rather than from the data source.

Caching is most useful following a time-consuming operation such as a sort, merge, or

aggregation. For example, suppose that y ou have a source node set to read sales data from a

database and an Aggregate node that summarizes sales by location. You can set up a cache on the

Aggregate node rather than on the source node because you want the cache to store the ag gregated

data rather than the entire data set.

Note: Caching at source nodes, which sim ply stores a copy of the original data as it is read into

IBM® SPSS® Modeler, will not improve performance in most circumstances.

Nodes with caching enabled are displayed with a small document icon at the t op right corner.

When the data is cached at the node, the document icon is green.

Contents

Main Page Preface About IBM Business Analytics Technical support Contents 1 About IBM SPSS Modeler 1 2 New Features 7 3 IBM SPSS Modeler Overview 12 4 Understanding Data Mining 29 5 Building Streams 41 6 Handling Missing Values 99 7 Building CLEM Expressions 105 8 CLEM Language Reference 127 9 Using IBM SPSS Modeler with a Reposito ry 158 10 Exporting to External Applications 195 11 Projects and Reports 200 12 Customizing IBM SPSS Modeler 215 13 Performance Considerations for Stream s and Nodes 230 Appendices A Accessibility in IBM SPSS Modele r 236 B Unicode Support 248 Page About IBM SPSS Modeler IBM SPSS Modeler Products IBM SPSS Modeler IBM SPSS Modeler Server IBM SPSS Modeler Administration Console IBM SPSS Modeler Batch IBM SPSS Modeler Solution Publisher IBM SPSS Modeler Server Adapters for IBM SPSS Collaboration and Deployment Services IBM SPSS Modeler Editions IBM SPSS Modeler Documentation SPSS Modeler Professional Documentation SPSS Modeler Premium Documentation Application Examples Demos Folder New Features New and Changed Features in IBM SPSS Mod eler 15 New features in IBM SPSS Modeler Professional Page Page New features in IBM SPSS Modeler Premium New Nodes in This Release IBM SPSS Modeler Premium Page Launching from the Command Line Connecting to IBM SPSS Modeler Server Adding and Editing the IBM SPSS Modeler Server Connection Page Searching for Servers in IBM SPSS Collaboration and Deployment Ser vices Changing the Temp Directory Starting Multiple IBM SPSS Modeler Sessions IBM SPSS Modeler Interface at a Glance IBM SPSS Modeler Stream Canvas Nodes Palette IBM SPSS Modeler Managers IBM SPSS Modeler Projects IBM SPSS Modeler Toolbar Page Customizing the Toolbar Customizing the IBM SPSS Modeler Window Changing the icon size for a stream Page Using the Mouse in IBM SPSS Modeler Using Shortcut Keys Printing Automating IBM SPSS Modeler Page Understanding Data Mining Data Mining Overview Assessing the Data Page A Strategy for Data Mining The CRISP-DM Process Model Page Types of Models Classication nodes Page Page Page Page Page Building Streams Stream-Building Overview Building Data Streams Working with Nodes Adding Nodes to a Stream Connecting Nodes in a Stream Page Bypassing Nodes in a Stream Disabling Nodes in a Stream Adding Nodes in Existing Connections Deleting Connections between Nodes Setting Options for Nodes Page Caching Options for Nodes Page Previewing Data in Nodes Locking Nodes Working with Streams Setting Options for Streams Setting general options for streams Page Setting date and time options for streams Page Setting number format options for streams Setting optimization options for streams Page Page Setting SQL logging and record status options for streams Setting layout options for streams Viewing Stream Operation Messages Page Viewing Node Execution Times Setting Stream and Session Parameters Page Specifying Runtime Prompts for Parameter Values Specifying Value Constraints for a Parameter Type Stream Deployment Options Viewing Global Values for Streams Searching for Nodes in a Stream Renaming Streams Stream Descriptions Page Previewing Stream Descriptions Exporting Stream Descriptions Running Streams Working with Models Adding Comments and Annotations to Nodes and Streams Comments Page Operations Involving Comments Page Page Listing Stream Comments Converting Annotations to Comments Annotations Page Saving Data Streams Saving States Saving Nodes Saving Multiple Stream Objects Saving Output Encrypting and Decrypting Information Loading Files Mapping Data Streams Mapping Data to a Template Page Mapping between Streams Specifying Essential Fields Examining Mapped Fields Tips and Shortcuts Page Page Handling Missing Values Overview of Missing Values Handling Missing Values Handling Records with Missing Values Handling Fields with Missing Values Imputing or Filling Missing Values CLEM Functions for Missing Values Page Page Building CLEM Expressions About CLEM Page Page CLEM Examples Page Values and Data Types Expressions and Conditions Stream, Session, and SuperNode Param eters Working with Strings Handling Blanks and Missing Values Working with Numbers Working with Times and Dates Summarizing Multiple Fields Page Working with Multiple-Response Data The Expression Builder Page Accessing the Expression Builder Creating Expressions Selecting Functions Selecting Fields, Parameters, and Global Variables Viewing or Selecting Values Checking CLEM Expressions Find and Replace Page Matching Character Classes Predefined Character Classes Boundary Matches Page CLEM Language Reference CLEM Reference Overview CLEM Datatypes Integers Reals Characters Strings Lists Fields Dates Time The CLEM language supports the following time formats. CLEM Operators The following operators are available. Operator Precedence Functions Reference Conventions in Function Descriptions The following conventions are used throughout this guide when referring to items in a function. Information Functions Conversion Functions Comparison Functions Page Logical Functions CLEM expressions can be used to perform logical operations. Numeric Functions CLEM contains a number of commonly used numeric functions. Trigonometric Functions Probability Functions Bitwise Integer Operations Random Functions The following functions are used to randomly select items or randomly generate numbers. String Functions In CLEM, you can perform the following operations with strings: Page Page Page Page SoundEx Functions Date and Time Functions Note: Date and time functions cannot be ca lled from scripts. Page Page Converting Date and Time Values Sequence Functions Page Page Page Page Global Functions Functions Handling Blanks and Null Values Special Fields Using IBM SPSS Modeler with a Repository About the IBM SPSS Collaboration and Depl oyment Services Repository Page Storing and Deploying Repository Object s Connecting to the Repository Entering Credentials for the Repository Browsing the Repository Contents Page Storing Objects in the Repository Setting Object Properties Choosing the Location for Storing Objects Adding Information About Stored Objects Page Assigning Topics to a Stored Object Setting Security Options for Stored Objects Page Adding a User to the Permissions List Modifying Access Rights for an Object Storing Streams Storing Projects Storing Nodes Storing Output Objects Storing Models and Model Palettes Retrieving Objects from the Repository Choosing an Object to Retrieve Selecting an Object Version Searching for Objects in the Repository Page Modifying Repository Objects Creating, Renaming, and Deleting Folders Locking and Unlocking Repository Objects Deleting Repository Objects Managing Properties of Repository O bjects Viewing Folder Properties Viewing and Editing Object Properties Page Page Managing Object Version Labels Deploying Streams Stream Deployment Options Page Page Scoring and Modeling Parameters The Scoring Branch Identifying the Scoring Branch for Deployment Model Refresh How the Refresh Model is Selected Page Page Page Checking a Scoring Branch for Errors Exporting to External Applications About Exporting to External Application s Opening a Stream in IBM SPSS Modeler Adv antage Importing and Exporting Models as PMML Page Model TypesSuppor ting PMML Page Projects and Reports Introduction to Projects CRISP-DM View Setting the Default Project Phase Classes View Building a Project Creating a New Project Adding to a Project Transferring Projects to the IBM SPSS Collaboration and Deployment Services Repository Setting Project Properties Annotating a Project Folder Properties and Annotations Object Properties Closing a Project Generating a Report Page Page Saving and Exporting Generated Reports Page Page Customizing IBM SPSS Modeler Customizing IBM SPSS Modeler Optio ns Setting IBM SPSS Modeler Options System Options Managing Memory Setting Default Directories Setting User Options Setting Notification Options Page Page Setting Display Options Setting PMML Export Options Setting User Information Customizing the Nodes Palette Customizing the Palette Manager Page Creating a Palette Tab Displaying Palette Tabs on the Nodes Palette Displaying Subpalettes on a Palette Tab Creating a Subpalette Changing a Palette Tab View CEMI Node Management Performance Considerations for Streams and Nodes Order of Nodes Node Caches Page Performance: Process Nodes Performance: Modeling Nodes Performance: C LEM Expressions Page A AA A Accessibility in IBM SPSS Modeler Overview of Accessibility in IBM SPSS Modeler Types of Accessibility Support Accessibility for Blind Users Keyboard Accessibility Shortcuts for Navigating the Main Window Node and Stream Shortcuts Shortcuts for Dialog Boxes and Tables Dialog Box and Expression Builder Shortcuts TableShortcuts Shortcuts for Comments When working with on-screen comments, you can use the following short cuts. Shortcuts for Cluster Viewer and Model Viewer Shortcut keys are available for navigating around the Cluster Viewer and Model Viewer windows. General - Cluster Viewer and Model Viewer Shortcut Keys Example: Building Streams Shortcut Keys Example: Editing Nodes Using a Screen Reader Using a Screen Reader with HTML Output Accessibility in the Interactive Tree Window Tips for Use Interference with Other Software JAWS and Java Using Graphs in IBM SPSS Modeler B BB B Unicode Support Unicode Support in IBM SPSS Modeler C CC C Notices Page Page Index