Data Mining 2008

31
SSAS 2008 Data Mining Lynn Langit/MSDN Developer Evangelist Southern California http://blogs.msdn.com/SoCalDevGal

description

This is the deck I used at San Diego SQL UG in May 2008

Transcript of Data Mining 2008

Page 1: Data Mining 2008

SSAS 2008 Data Mining

Lynn Langit/MSDN Developer Evangelist Southern California

http://blogs.msdn.com/SoCalDevGal

Page 2: Data Mining 2008

Session Prerequisites Working SQL Server 2005 Developer Understanding of OLAP concepts Working SQL Server Analysis Server

2005 Developer Interest in or basic knowledge of Data

Mining concepts

Page 3: Data Mining 2008

Session Objectives and Agenda Understand what, why, when & how of

SQL Server 2008 Data Mining Examine the core functionality of the Data

Mining Extensions Hear about the new and/or advanced

functionality of Data Mining

Page 4: Data Mining 2008

Predictive AnalyticsPredictive Analytics

PresentatioPresentationn

ExplorationExploration DiscoveryDiscovery

PassivePassive

InteractiInteractiveve

ProactivProactivee

Role of SoftwareRole of Software

Business Business InsightInsight

Canned reportingCanned reporting

Ad-hoc reportingAd-hoc reporting

OLAPOLAP

Data miningData mining

What and Why Data Mining?

Page 5: Data Mining 2008

Cubes vs. Data Mining

Page 6: Data Mining 2008

DM - From Scenarios to Tasks

Page 7: Data Mining 2008

From Tasks to Techniques

Page 8: Data Mining 2008

BI for Everyone

Enterprise – Performance PointEnterprise – Performance Point

Individual – Excel Individual – Excel

Project – Share PointProject – Share Point

Page 9: Data Mining 2008

Microsoft’s Predictive Analytics

Data Mining SQL extensionsData Mining SQL extensions(DMX)(DMX)

Application Application DeveloperDeveloper

Data Mining Data Mining SpecialistSpecialist

Microsoft Dynamics CRMMicrosoft Dynamics CRMAnalytics FoundationAnalytics Foundation

SQL Server 2005 SQL Server 2005 Business Intelligence Development StudioBusiness Intelligence Development Studio

Microsoft SQL Server 2008 Analysis ServicesMicrosoft SQL Server 2008 Analysis Services

Information Information WorkerWorker

Data Mining Add-ins for Data Mining Add-ins for the 2007 Microsoft Office systemthe 2007 Microsoft Office system

Microsoft SQL Server 2008 Data MiningMicrosoft SQL Server 2008 Data Mining

BI AnalystBI Analyst

Custom Custom AlgorithmsAlgorithms

Page 10: Data Mining 2008

Data Mining Add-ins for Office 2007

Table Analysis Tools for Excel 2007Table Analysis Tools for Excel 2007

Data Mining Template for Visio 2007Data Mining Template for Visio 2007

Data Mining Client for Excel 2007Data Mining Client for Excel 2007

Information Information WorkerWorker

BI AnalystBI Analyst

Data Mining Data Mining SpecialistSpecialist

Page 11: Data Mining 2008

SSASSSAS(Data(Data

Mining)Mining)ExcelExcel

SSAS SSAS (DSV)(DSV)QueryQueryExcelExcel

SSISSSISSSASSSASSSRSSSRSExcelExcelYour AppsYour Apps

SSISSSISSSASSSASExcelExcel

Business Business UnderstandiUnderstandi

ngng

Data Data UnderstandiUnderstandi

ngng

Data Data PreparationPreparation

ModelingModeling

EvaluationEvaluation

DeploymentDeployment

DataData

Microsoft Data Mining Lifecycle CRISP-DM

www.crisp-dm.org

Page 12: Data Mining 2008

Understand & Prepare specifics

Page 13: Data Mining 2008

Demo 1 – Explore / Clean / Partition Data

Page 14: Data Mining 2008

Modeling Specifics

Page 15: Data Mining 2008

Demo 3 – Modeling

Page 16: Data Mining 2008

Evaluation Specifics

Page 17: Data Mining 2008

Demo 4 – Evaluation

Page 18: Data Mining 2008

Data Mining – Logical Model

Mining ModelMining Model

Mining ModelMining Model

Training DataTraining Data

DB dataDB dataClient dataClient dataApplication dataApplication data

Data MiningData MiningEngineEngine

To To PredictPredict

Predicted DataPredicted Data

Mining ModelMining ModelDB dataDB dataClient dataClient dataApplication dataApplication data““Just one rowJust one row””

Data MiningData MiningEngineEngine

algorithmalgorithm

Page 19: Data Mining 2008

Analysis ServicesAnalysis ServicesServerServer

Mining ModelMining Model

Data Mining AlgorithmData Mining Algorithm DataDataSourceSource

Data Mining - Physical Model

Your ApplicationYour Application

OLE DB/ ADOMD/ XMLAOLE DB/ ADOMD/ XMLA

DeploDeployy

BI Dev BI Dev StudioStudio (Visual (Visual Studio)Studio)

App DataApp Data

Page 20: Data Mining 2008

Data Mining Interfaces – APIs

Analysis Server (msmdsrv.exe)

OLAP Data Mining

Server ADOMD.NET

.Net Stored Procedures Microsoft Algorithms Third Party Algorithms

XMLAXMLAOver TCP/IPOver TCP/IP

OLEDB for OLAP/DM ADO/DSO

XMLAXMLAOver HTTPOver HTTP

Any Platform, Any Device

C++ App VB App .Net App

AMO

Any App

ADOMD.NET

WANWAN

DM Interfaces

Page 21: Data Mining 2008

Configuration Model Creation/Management

Database Administrators Session Mining Models

Model Application Permissions on models Permissions on data sources

Page 22: Data Mining 2008

Deployment Browse

Copy to Excel Drillthrough

Query Default Advanced

Excel Services Manage models and structures

Export/Import Rename

Connection Database Trace

Page 23: Data Mining 2008

Excel Functions*

DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange[, string CommaSeparatedColumnNames])

DMPREDICT ( Connection, Model, PredictionResult,

Value1, Name1, [...,Value32, Name32])

DMCONTENTQUERY (Connection, Model, PredictionResult[, WhereClause])

Page 24: Data Mining 2008

Data Mining Extensions (DMX)

CREATE MINING MODELCREATE MINING MODEL CreditRiskCreditRisk

(CustID(CustID LONG KEY, LONG KEY,

Gender TEXT DISCRETE,Gender TEXT DISCRETE,

Income Income LONG LONG CONTINUOUS,CONTINUOUS,

Profession TEXT DISCRETE,Profession TEXT DISCRETE,

RiskRisk TEXT DISCRETE PREDICT) TEXT DISCRETE PREDICT)

USINGUSING Microsoft_Decision_Trees Microsoft_Decision_Trees

INSERT INTOINSERT INTO CreditRisk CreditRisk

(CustId, Gender, Income, (CustId, Gender, Income, Profession, Risk)Profession, Risk)

Select Select

CustomerID, Gender, Income, CustomerID, Gender, Income, Profession,RiskProfession,Risk

From CustomersFrom Customers

SelectSelect NewCustomers.CustomerID, NewCustomers.CustomerID, CreditRisk.Risk, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)PredictProbability(CreditRisk.Risk)

FROMFROM CreditRisk CreditRisk PREDICTION JOINPREDICTION JOIN NewCustomersNewCustomers

ONON CreditRisk.Gender=NewCustomer.GenderCreditRisk.Gender=NewCustomer.Gender

ANDAND CreditRisk.Income=NewCustomer.Income CreditRisk.Income=NewCustomer.Income

AND AND CreditRisk.Profession=NewCustomer.ProfessionCreditRisk.Profession=NewCustomer.Profession

Page 25: Data Mining 2008

DMX Column Expressions Predictable Columns Source Data Columns Functions - Predict

“Workhorse”Discrete scalar valuesContinuous scalar valuesAssociative nested tablesSequence nested tablesTime SeriesOverloaded to

PredictAssociationPredictSequencePredictTimeSeries

PredictProbability PredictSupport PredictHistogram Cluster ClusterProbability GetNodeId IsInNode

Arithmetic operators Stored Procedure Subselect

Select from nested tables

Page 26: Data Mining 2008

Data Mining Interfaces – XMLA ++

Analysis Server (msmdsrv.exe)

OLAP Data Mining

Server ADOMD.NET

.Net Stored Procedures Microsoft Algorithms Third Party Algorithms

XMLAXMLAOver TCP/IPOver TCP/IP

OLEDB for OLAP/DM ADO/DSO

XMLAXMLAOver HTTPOver HTTP

Any Platform, Any Device

C++ App VB App .Net App

AMO

Any App

ADOMD.NET

WANWAN

DM Interfaces

Page 27: Data Mining 2008

New to SQL Server 2008 Microsoft Time Series algorithm improved

ARIMA plus ARTxp method, and a blending algorithm = better results

New prediction mode allows adding new data to time series models

Holdout Support added Easily partition data into training and test sets that are stored in mining structure & available to

query after processing

Ability to build mining models based on filtered subsets added Results in less structures, i.e. can just filter existing

Drillthrough functionality extended makes all mining structure columns available, not just columns included in the model

allows you to build more compact models

Cross-validation added allows users to quickly validate their modeling approach by automatically building temporary

models and evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in BIDS, in addition to being accessible programmatically via a stored procedure call.

Page 28: Data Mining 2008

Summary

Data Mining in SQL Server 2008 is mature, powerful and accessible

Can use Excel 2007 Familiar client for BI – OLAP cubes AND Data Mining

models Model Creators / Users Excel Data or Server Data

SSAS and Excel both support the full DM Cycle Data Understanding Data Preparation Modeling Validation Deployment

Page 29: Data Mining 2008

DM Webcasts

Fri, 02 Nov 2007MSDN Webcast: Build Smart Web Applications with SQL Server Data Mining (Level 200)Thu, 08 Nov 2007MSDN Webcast: Building Adaptive Applications with SQL Server Data Mining (Level 300)Mon, 19 Nov 2007MSDN Webcast: Extending and Customizing SQL Server Data Mining (Level 300)Fri, 30 Nov 2007MSDN Webcast: Creating Visualizations for SQL Server Data Mining (Level 300)Thu, 01 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 1 of 3): Your First Project with SQL Server Data Mining (Level 200)Thu, 15 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 2 of 3): Understand SQL Server Data Mining Add-ins for the 2007 Office System (Level 200)Thu, 29 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 3 of 3): Use Predictive Intelligence to Create Smarter KPIs (Level 200)

Page 30: Data Mining 2008

BI Resources from Lynn Langit

1. “Foundations of SQL Server 2005 Business Intelligence”(published by APress in April 2007)

2.2. http://blogs.msdn.com/SoCalDevGal

3.3. “Building Business Intelligence “Building Business Intelligence Solutions with SQL Server 2008” Solutions with SQL Server 2008” (MSPress Fall 2008)(MSPress Fall 2008)

Page 31: Data Mining 2008

DM Resources

Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx

Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet

Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx

Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx

SQL Server Data Mininghttp://www.sqlserverdatamining.comhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspxhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspx