The Good, the bad, and the ugly of Thin Client/Server Computing
Excel – Not a Bad Data Mining Client At All
description
Transcript of Excel – Not a Bad Data Mining Client At All
![Page 1: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/1.jpg)
Excel – Not a Bad Data Mining Client At All
Allan MitchellSQL Server MVP
Konesans Limitedww.SQLIS.com
![Page 2: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/2.jpg)
Who am I
• SQL Server MVP• SQL Server Consultant• Joint author on Wrox Professional SSIS book• Worked with SQL Server since version 6.5• www.SQLDTS.com and www.SQLIS.com
![Page 3: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/3.jpg)
Today’s Schedule
• Mostly Demos• Data Mining Add-In for Excel 2007– Added XL Functions– Visualisation Methods
![Page 4: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/4.jpg)
Today’s Schedule
• Added XL Functions - Not a lot of people know these exist– DMPREDICT– DMPREDICTTABLEROW– DMCONTENTQUERY
– Only exist after add-in installed
![Page 5: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/5.jpg)
Today’s Schedule
• Visualisation Methods– Accuracy Charts– Classification Matrix– Profit Charts– Folding (X-Validation)– Calculator (if we get time)
![Page 6: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/6.jpg)
Excel Functions
• DMPREDICT• Can take a variable number of arguments, the minimum being 3.• The first parameter is the Analysis Services connection to be used.
An empty string refers to the current (active) connection.• The second parameter is the name of the mining model that will
execute the prediction• The third parameter, is the requested predicted entity (predictable
column, in general, but could also be any prediction function)• The function may also take up to 32 pairs of arguments. Each such
pair contains the value and the name of an input (in this order, i.e. value followed by name).
![Page 7: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/7.jpg)
Excel Functions
• DMPREDICTTABLEROW• The first parameter is the Analysis Services connection to be used.
An empty string refers the current (active) connection.• The second parameter is the name of the mining model that will
execute the prediction• The third parameter, is the requested predicted entity (predictable
column, in general, but could also be any prediction function)• The fourth parameter is a range of cells to be passed as inputs• The fifth parameter (optional) is a comma-separated list of column
names to be used as names for the inputs
![Page 8: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/8.jpg)
Excel Functions
• DMPREDICTTABLEROW• If range of cells is form XL List Object• Column Headers taken from List• 5th Parameter not necessary– Unless Column Name != Model Column Name
![Page 9: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/9.jpg)
Excel Functions
• DMCONTENTQUERY• The first parameter is the Analysis Services connection to
be used. An empty string refers to the current (active) connection.
• The second parameter is the name of the mining model that will execute the prediction
• The third parameter, is the requested content column• The fourth parameter is a WHERE clause to be appended
to the content query
![Page 10: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/10.jpg)
DEMOData Mining Excel functions
![Page 11: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/11.jpg)
Excel Add-In
• Great way of visualising Data Mining• Takes away some of the mystery• Easy to use• Some wizards• Freedom vs. flexibility
![Page 12: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/12.jpg)
Accuracy Charts
• Compare 1-n models against– Another model– Best model– Thumb in the air model/no model/chance
![Page 13: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/13.jpg)
Accuracy Charts
• Interpreting– How does a model compare with other models– What is the cumulative gain– Lift
• The real thing we want to see is.....– By how much do we beat the “chance” model
![Page 14: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/14.jpg)
DEMOAccuracy Charts
![Page 15: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/15.jpg)
Classification Matrix
• What are we interested in– How well did my model predict outcomes– False Positive– False Negative– True Positive– True Negative
![Page 16: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/16.jpg)
Classification Matrix
Predicted TRUE FALSE
Actual
TRUE True Positive False Negative (type 2 error)
FALSE False Positive (type 1 error) True Negative
![Page 17: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/17.jpg)
Classification Matrix
• A misclassification is not always a bad thing• Consider– Predicted possibility of disease– Extra care/treatment given– Real result is “No disease”– Example of false positive– Is it such a bad thing?
![Page 18: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/18.jpg)
DEMOClassification Matrix
![Page 19: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/19.jpg)
Profit Charts
• Closely follows lift/cumulative gain chart• Apply costs to efforts
![Page 20: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/20.jpg)
Profit Charts
• Apply costs to– Initial/Fixed outlay– Cost per case– Return per case
• Target predictable column• Target Outcome• Count of cases to use
![Page 21: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/21.jpg)
DEMOProfit Chart
![Page 22: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/22.jpg)
X-Validation/Folding/Rotation Estimation
• Validates your model• Tests whether model generally applicable• Large variations in results between partitions– Model not generally applicable– May need tuning
![Page 23: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/23.jpg)
X-Validation/Folding/Rotation Estimation
• Stratified K-Fold Cross Validation• Creates K folds– Representative partitions
• Holds one partition out• Trains model with others• Tests with holdout partition• Repeat (different holdout/test partition)* K
![Page 24: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/24.jpg)
DEMOX-Validation/Folding/Rotation Estimation
![Page 25: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/25.jpg)
Prediction Calculator
• Set costs and profits associated with– Getting the prediction right– Getting the prediction wrong
• See profit curves• See profit threshold scores• Pad for entering new data
![Page 26: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/26.jpg)
Prediction Calculator
• Cloud Version available• Print version available for later data entry• Easy to use• Easy to understand
![Page 27: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/27.jpg)
DEMOPrediction Calculator
![Page 28: Excel – Not a Bad Data Mining Client At All](https://reader036.fdocuments.in/reader036/viewer/2022062305/568161b3550346895dd17c7f/html5/thumbnails/28.jpg)
Thank you…[email protected]