Data Mining using SPSS Modeler 2nd...

Data Mining using SPSS Modeler 2nd Session

IBM TaiwanClaire Lin

Agenda

Data Mining Process

Business Understanding

Data Understanding Live Demo and Exercise

Data Preparation and Manipulation Live Demo and Exercise

What is Data Mining?

The analysis step of the Knowledge Discovery in Databases (KDD) process, it encompasses a number of techniques to extract useful information from (large) data files, without necessarily having preconceived notions about what will be discovered.

The goal of data mining is to extract information from a data set and transform it into an understandable structure for further use

Data mining process

Cross Industry Standard Process for Data Mining(CRISP)

Data mining process

Cross Industry Standard Process for Data Mining(CRISP)

What SPSS Modeler can do? Input raw data Data understanding

Check missing data Check anomalous and outlier

data Data preparation

Filter, derive, reclassify nodes Modeling Output

Business Understanding

Determining business objectives Finding what people will buy together with 粽子 during Dragon Festival

Predicting who is likely to not renew and contract for mobile phone service

Assessing the situation

Determining data mining goals

Producing a project plan

Data Understanding

Need to understand What your data resources are

What the characteristics of those resources are

Includes Collecting initial data

Describing data

Exploring data

Verifying data quality

Missing Data

Anomalous Data

Data Understanding - Missing Data

Blank Contain no information.

White space if the field is string and Null value (non-numeric) if the field is numeric

Empty string A string field may be empty, which means that it contains nothing (This

is common in databases)

Value blanks Represent missing or invalid information

Data Understanding - Missing Data

Data Understanding - Anomalous Data

What is Anomalous Data? Far from the center of the distribution

Measured by the mean or median and using the standard deviation as a measure of spread

Far from other values

Whether close to the center of the distribution, or not

Data Understanding

Anomaly detection

SPSS Modeler User Interface

Data Sources

Database: ODBC source

Var. File: free-field text file

Fixed File: fixed-field text file

Statistics File/SAS File/Excel File

Data Understanding

The Data Audit node

Provide report

Missing values

Outlier data and Extreme data

Information on a field’s distribution

Data Understanding

Anomaly detection models identify outliers or unusual cases by using clustering analysis

Each record is assigned an anomaly index

It's the ratio of the group deviation index to its average over the cluster that the case belongs to

Cases with an index value greater than 2 could be good anomaly candidates

Data Understanding – Outliers Data Live Demo

Live Demo SPSS Modeler UI

Read data into SPSS Modeler

Check missing data

Check anomalous and outlier data

Data Audit Node

Anomaly Node

Live Demo & Exercise I

Data Preparation and Manipulation

Objective: Construct the final dataset for modeling

Record Operations Select partial data from dataset

Sort the data

Field Operations

Type: Specifies field metadata and properties

Type Description

Continuous Used to describe numeric values, such as a range of 0–100 or 0.75–1.25.A continuous value can be an integer, real number, or date/time.

Categorical String values

Nominal Used to describe data with multiple distinct values, each treated as a member of a set.

Ordinal Used to describe data with multiple distinct values that have an inherent order.

Flag Used for data with two distinct values that indicate the presence or absence of a trait. Such as true and false, Yes and No or 0 and 1.

Filter: Filters, renames fields

Derive: Modifies data values or creates new fields

Reclassify

Live Demo & Exercise II

Japanese

Hebrew

Thank YouEnglish

MerciFrench

DankeGerman

GrazieItalian

GraciasSpanish

ObrigadoBrazilian

Portuguese

Arabic

Simplified Chinese

Traditional Chinese

Korean

go raibh maith agatGaelic Tak

Danish

TrugarezBreton

DutchDank u

Dekujeme Vam

DankonEsperanto

Tack så mycketSwedish

Data Mining using SPSS Modeler 2nd...

Documents

Transcript of Data Mining using SPSS Modeler 2nd...

The new RX nodes in IBM SPSS Modeler€¦ · 2 ways to build Extensions in IBM/SPSS Modeler 1.Using the Custom Dialog Builder (available in both SPSS Modeler and SPSS Statistics)

Wat is IBM SPSS Modeler - Smit Consult · Wat is IBM SPSS Modeler IBM SPSS Modeler supports multiple users and multiple uses What you are looking at here is the Modeling workbench.

SPSS Modeler Premium 14.1 Campus Edition Installation ......SPSS Modeler Premium 14.1 Campus Edition Installation Instructions - Case Software Center 1/31 ...

IBM Campaign 및 IBM SPSS Modeler Advantage Marketing ...doc.unica.com/products/campaign/10_0_0/ko_kr/IBMCampaign1000… · 1. IBM Campaign IBM SPSS Modeler Advantage Marketing Edition

Certifyhere BAS-010 Exam - IBM SPSS Modeler Professional v2

MELJUN CORTES Predictive Modeling With IBM SPSS Modeler

IBM SPSS Modeler Advantage Enterprise Marketing Management ...

Users Guide SPSS Modeler

Big Data Analytics Smart Pack SPSS Modeler 소개

What's New In Modeler 14 - oficialus SPSS atstovas Lietuvojes New in IBM SPSS... · Business Analytics 2 The new features of IBM SPSS Modeler 14 help commercial, government and academic

IBM Campaign and IBM SPSS Modeler Advantage Marketing ...

IBM SPSS Modeler CRISP-DM Guide - GitHub Pagesinseaddataanalytics.github.io/INSEADAnalytics/CRISP_DM.pdf · IBM® SPSS® Modeler is the IBM Corp. enterprise-strength data mining workbench.

IBM SPSS Licensing - Synergy Business Intelligence · IBM SPSS Modeler Editions • IBM SPSS Modeler Gold – Build and deploy predictive models directly into business processes –

Using Apache Spark with IBM SPSS Modeler

CURSO Minería de Datos Predictiva con SPSS/IBM Modeler€¦ · CURSO Minería de Datos Predictiva con SPSS/IBM Modeler ... Interpretation of Modeler output for Two ... in the expert

IBM SPSS Modeler CRISP-DM Guide · Preface IBM ®SPSS Modeler is the IBM Corp. enterprise-strength data mining workbench.SPSS Modeler helps organizations to improve customer and citizen

IBM Campaign IBM SPSS Modeler Advantage Marketing Editiondoc.unica.com/products/campaign/9_1_2/ko_kr/IBM... · IBM SPSS Modeler Advantage Marketing Edition IBM Campaign . . . IBM

IBM Campaign and IBM SPSS Modeler Advantage Marketing ...doc.unica.com/.../en_us/IBMCampaign912SPSSMA800IntegrationReleas… · 67409 IBM SPSS Modeler Advantage Marketing Edition

Predictive Analytics Workshop using IBM SPSS Modeler · Predictive Analytics Workshop using IBM SPSS Modeler Christine Lopez –Technical Sales Specialist November 19, 2014

Introduction to R in IBM SPSS Modeler - IBM · PDF fileCustom Dialog Builder ... Introduction to ... SPSS Modeler. Introduction to R in IBM SPSS Modeler. Introduction to R in IBM SPSS