Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen,...
Transcript of Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen,...
Big Data AnalyticsModule 4 – Data Mining and Predictive Analytics Including Mahout
Saptak Sen, MicrosoftBill Ramos, Advaiya
• Overview of predictive analytics & data mining
• How Microsoft supports predictive analytics
• How Mahout fits into the picture
• Demos
Agenda
Data Mining
Predicting future performance from historical data
*Source: Ventana Research, Predictive Analytics Benchmark Research Report, March 2012.
Recommenda-tion engines
Advertising analysis
Weather forecasting for business planning
Social network analysis
IT infrastructure and web app optimization
Legal discovery and document archiving
Pricing analysisFraud detection
Churn analysis
Equipment monitoring
Location-based tracking and services
Personalized Insurance
Predictive analytics should address the likelihood of something happening in the future, even if it is just an instant later*
Data mining tool in SQL Server Analysis Services
• Rich data mining algorithms, for clustering, classification, forecasting through time series analysis, and more
• Rich developer experience
Analysis Services Data Mining Algorithms
Classify Estimate Cluster Forecast Associate
• Decision Trees
• Logistic Regression
• Naïve Bayes
• Neural Networks
• Decision Trees
• Linear Regression
• Logistic Regression
• Neural Networks
• Clustering
• Time Series
• Association Rules
• Decision Trees
Data mining add-in for Excel
• Ease of use through Excel
• Rich data mining algorithms for clustering, prediction, forecasting, market basket analysis, and more
• Scalable through integration with SSAS
Algorithms: Data Mining Add-in for Excel
Menu Data Mining
Analyze Key Influencers Naïve Bayes
Detect Categories Clustering
Fill From Example Logical Regression
Forecast Time Series
Highlight Exceptions Clustering
Scenario Analysis – Goal Seek Logical Regression
Scenario Analysis – What If Logical Regression
Prediction Calculator Logical Regression
Shopping Basket Analysis Association Rules
Demo 1: Excel Data Mining Add-In
Windows Azure HDInsight
Microsoft Excel(Mining Add-in)
Microsoft Excel
Excel Data Mining Add-in
Serving LayerSpeed LayerBatch Layer
Flat files (.txt, .dat, .xl
sx, etc.)
Mahout
Mahout
• Scalable machine learning algorithms on Hadoop platform
• Algorithms for clustering, classification, and batch-based collaborative filtering using the map/reduce paradigm
• Supports a wide range of use cases—from email spam filtering, to fraud detection, to recommendations for books or movies
Applications
ClusteringRecommendersVector Similarity
PatternMining
Classification
Regression
GeneticDimension Reduction
Matrices
Collocations
Examples
Demo 2: Mahout
Flat files (.txt, .dat, .xl
sx, etc.)
Running Mahout job on Hadoop Command Window to get output
file
Convert to Mahout input
Hadoop Command Window
Output file
Serving LayerSpeed LayerBatch Layer
Windows Azure HDInsight
HDInsight Consoles
Learn more
• Data Mining SSAS http://
msdn.microsoft.com/en-us/library/bb510516.aspx
• Microsoft SQL Server 2012 SP1 Data Mining Add-ins for Microsoft Office 2013
• http://www.microsoft.com/en-us/download/details.aspx?id=35578.
• Mahout on Windows Azure - Machine Learning Using Microsoft HDInsighthttp://social.technet.microsoft.com/wiki/contents/articles/15102.mahout-on-windows-azure-machine-learning-using-microsoft-hdinsight.aspx
Questions?