CLEM 01 Clementine Overview -...

42
Data Mining SPSS Clementine 12.0 1. Clementine Overview Clementine 1. Clementine Overview Fall 2009 Instructor: Dr. Masoud Yaghini

Transcript of CLEM 01 Clementine Overview -...

Page 1: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Data Mining

SPSS Clementine 12.0

1. Clementine Overview

Clementine

1. Clementine Overview

Fall 2009Instructor: Dr. Masoud Yaghini

Page 2: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Outline

� Introduction

� Types of Models

� Clementine Interface

� References

Clementine

Page 3: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Introduction

Clementine

Page 4: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Introduction

� Three of the common data mining tools

– SPSS Clementine

– SAS-Enterprise Miner

– STATISTICA Data Miner

� The most popular and the oldest data mining tool

Clementine

package on the market today is SPSS Clementine

Page 5: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Introduction

� SPSS Clementine

– the most mature among the major data mining packages on

the market today.

� Since 1993, many thousands of data miners have used

Clementine to create very powerful models for

business.

Clementine

business.

� It was the first data mining package to use the

graphical programming user interface.

� It enables you to quickly develop data mining models

and deploy them in business processes to improve

decision making.

Page 6: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Introduction

� The Clementine package was integrated with CRISP-

DM to help guide the modeling process flow.

� Clementine supports the entire data mining process.

Clementine

Page 7: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Documentation

� Clementine User’s Guide

– General introduction to using Clementine.

� Clementine Source, Process, and Output Nodes

– Descriptions of all the nodes used to read, process, and output data in different

formats.

� Clementine Modeling Nodes

– Descriptions a variety of modeling methods in Clementine

Clementine

– Descriptions a variety of modeling methods in Clementine

� Clementine Applications Guide

– The examples in this guide provide brief, targeted introductions to specific

modeling methods and techniques.

� Clementine Algorithms Guide

– Descriptions of the mathematical foundations of the modeling methods used in

Clementine.

� CRISP-DM 1.0 Guide

– Step-by-step guide to data mining using the CRISP-DM methodology.

Page 8: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Application Examples

� While the data mining tools in Clementine can help

solve a wide variety of business and organizational

problems, the application examples provide brief,

targeted introductions to specific modeling methods

and techniques.

You can access the examples by choosing Application

Clementine

� You can access the examples by choosing Application

Examples from the Help menu in Clementine Client.

� The data files and sample streams are installed in the

Demos folder under the product installation directory.

Page 9: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Demos Folder

� The data files and sample streams used with the

application examples are installed in the Demos folder

under the product installation directory.

� This folder can also be accessed from the Clementine

12.0 program group under SPSS Inc on the Windows

Start menu.

Clementine

Start menu.

Page 10: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Changing the Temp Directory

� Some operations performed by Clementine may

require temporary files to be created.

� By default, Clementine uses the system temporary

directory to create temp files.

Clementine

Page 11: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Changing the Temp Directory

� To alter the location of the temporary directory using the

following steps:

– Create a new directory called clem and subdirectory called servertemp.

– Edit options.cfg, located in the /config directory of your Clementine

installation directory.

– Edit the temp_directory parameter in this file to read:

temp_directory, "C:/clem/servertemp".

Clementine

temp_directory, "C:/clem/servertemp".

– After doing this, you must restart the Clementine Server service. You

can do this by clicking the Services tab on your Windows Control

Panel. Just stop the service and then start it to activate the changes you

made. Restarting the machine will also restart the service.

– All temp files will now be written to this new directory.

Page 12: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Types of Models

Clementine

Page 13: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Types of Models

� In Clementine 12.0, models are packaged into

following modules:

– Classification module

� The Classification module helps organizations for prediction

– Association module

� Association models associate a particular conclusion (such as the

Clementine

� Association models associate a particular conclusion (such as the

decision to buy something) with a set of conditions.

– Segmentation module

� The Segmentation module is recommended in cases where the

specific result is unknown

Page 14: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Classification Module

� Classification module is included:

– Decision Trees

� C&R Tree

� QUEST

� CHAID

� C5.0

– Decision List

Clementine

– Decision List

– Neural Networks

– Regression

� Linear regression

� Logistic regression

– Bayesian Network

– Support Vector Machine (SVM)

Page 15: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Association Module

� Association module is included:

– Generalized Rule Induction (GRI)

– Apriori model

– CARMA model

– Sequence model

Clementine

Page 16: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Segmentation Module

� Clustering models focus on identifying groups of

similar records and labeling the records according to

the group to which they belong.

� This following models are included:

– K-Means

Clementine

– Kohonen

– TwoStep

– Anomaly Detection

Page 17: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Modules

� The following are optional add-on modules for

Clementine:

– Text Mining option for mining unstructured data

– Web Mining option

– Database Modeling for integration with other data mining

packages

Clementine

packages

Page 18: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Interface

Clementine

Page 19: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Starting Clementine

� To start the application, choose Clementine 12.0 from the SPSS

Inc program group on the Windows Start menu.

Clementine

Page 20: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Data Stream

� The Clementine interface employs an intuitive visual

programming interface to permit you to draw logical data

flows the way you think of them.

� Working with Clementine is a three-step process of working

with data.

– First, you read data into Clementine,

Clementine

– Then, run the data through a series of manipulations,

– And finally, send the data to a destination.

� This sequence of operations is known as a data stream

Page 21: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Stream Canvas

� Stream Canvas

– the largest area of the Clementine window and is where you

will build and manipulate data streams.

– You can work on multiple streams in the same canvas or

multiple stream files

– Streams are created by drawing diagrams of data operations

Clementine

– Streams are created by drawing diagrams of data operations

on the main canvas in the interface.

Page 22: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Interface

� Example of canvas:

Clementine

Page 23: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Interface

� Node

– Each operation is represented by an icon or node

– The nodes are linked together in a stream representing the

flow of data through each operation.

� Streams manager

Clementine

– streams are stored in the Streams manager, at the upper

right of the Clementine window.

Page 24: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Nodes Palette

� Nodes Palettes

– The palettes are groups of processing nodes listed across the bottom of

the screen.

– The palettes include Favorites, Sources, Record Ops (operations), Field

Ops, Graphs, Modeling, Output, and Export. When you click on one of

the palette names, the list of nodes in that palette is displayed below.

� the Record Ops palette tab.

Clementine

� the Record Ops palette tab.

� To add nodes to the canvas, double-click icons from the Nodes Palette or

drag and drop them onto the canvas.

Page 25: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Nodes Palette Tabs

� Nodes Palette Tabs

– Sources

� Nodes bring data into Clementine.

– Record Ops

� Nodes perform operations on data records, such as selecting,

merging, and appending.

Clementine

merging, and appending.

– Field Ops

� Nodes perform operations on data fields, such as filtering, deriving

new fields, and determining the data type for given fields.

– Graphs

� Nodes graphically display data before and after modeling. Graphs

include plots, histograms, web nodes, and evaluation charts.

Page 26: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Nodes Palette Tabs

� Nodes Palette Tabs

– Modeling

� Nodes use the modeling algorithms available in Clementine, such as

neural nets, decision trees, clustering algorithms, and data

sequencing.

– Output

Clementine

– Output

� Nodes produce a variety of output for data, charts, and model results,

which can be viewed in Clementine or sent directly to another

application, such as SPSS or Excel.

Page 27: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Managers

� The upper right window holds three managers:

– Streams

– Outputs

– Models

� You can switch the display of the manager window to

Clementine

show any of these items.

Page 28: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Managers

� Streams tab

– The Streams tab will display the active streams, which you can choose

to work on in a given session.

– You can use the Streams tab to open, rename, save, and delete the

streams created in a session.

Clementine

Page 29: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Managers

� Outputs tab

– The Output tab will show all graphs and tables output during a session.

– You can display, save, rename, and close the tables, graphs, and reports

listed on this tab.

Clementine

Page 30: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Managers

� Models tab

– The Models tab will show all the trained models you have created in the

session.

– These models can be browsed directly from the Models tab or added to

the stream in the canvas.

Clementine

Page 31: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Projects

� On the lower right side of the window is the projects

tool, used to create and manage data mining projects.

� There are two ways to view projects you create in

Clementine:

– The CRISP-DM view

Clementine

– The Classes view

Page 32: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Projects

� The CRISP-DM tab provides a way to organize projects

according to the CRISP-DM methodology.

� For both experienced and first-time data miners, using the

CRISP-DM tool will help you to better organize and

communicate your efforts.

Clementine

Page 33: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Projects

� The Classes tab provides a way to organize your work in

Clementine categorically—by the types of objects you create.

� This view is useful when taking inventory of data, streams, and

models.

Clementine

Page 34: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Clementine Toolbars

Clementine

Page 35: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Using Shortcut Keys

Clementine

Page 36: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Automating Clementine

� Since advanced data mining can be a complex and

sometimes lengthy process, Clementine includes

several types of coding and automation support. They

are:

– Clementine Language for Expression Manipulation

(CLEM)

Clementine

(CLEM)

– Scripting

– Batch mode

Page 37: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Automating Clementine

� Clementine Language for Expression Manipulation

(CLEM)

– is a language for analyzing and manipulating the data that

flows along Clementine streams.

– Data miners use CLEM extensively in stream operations to

perform tasks as simple as deriving profit from cost and

Clementine

perform tasks as simple as deriving profit from cost and

revenue data or as complex as transforming Web-log data

into a set of fields and records with usable information.

– For more information, see What Is CLEM? in Chapter 7 in

Clementine® 12.0 User’s Guide.

Page 38: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Automating Clementine

� Scripting

– is a powerful tool for automating processes in the user

interface and working with objects in batch mode.

– Scripts can perform the same kinds of actions that users

perform with a mouse or a keyboard.

– You can set options for nodes and perform derivations

Clementine

– You can set options for nodes and perform derivations

using a subset of CLEM.

– You can also specify output and manipulate generated

models.

– For more information, see Scripting Overview in Chapter 2

in Clementine® 12.0 Scripting and Automation Guide.

Page 39: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

Automating Clementine

� Batch mode

– enables you to use Clementine in a non-interactive manner

by running Clementine with no visible user interface.

– Using scripts, you can specify stream and node operations

as well as modeling parameters and deployment options.

– For more information, see Introduction to Batch Mode in

Clementine

– For more information, see Introduction to Batch Mode in

Chapter 7 in Clementine® 12.0 Scripting and Automation

Guide.

Page 40: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

References

Clementine

Page 41: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

References

� Integral Solutions Limited., Clementine® 12.0 User’s

Guide, 2007.

Clementine

Page 42: CLEM 01 Clementine Overview - webpages.iust.ac.irwebpages.iust.ac.ir/yaghini/Courses/Data_Mining_881/CLEM_01...Introduction Three of the common data mining tools – SPSS Clementine

The end

Clementine