VMworld 2013: Troubleshooting VXLAN and Network Services in a Virtualized Environment
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
-
Upload
rajit-saha -
Category
Data & Analytics
-
view
370 -
download
2
Transcript of Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
© 2014 VMware Inc. All rights reserved.
Virtualized Big Data Platform@ VMware Corp IT
Rajit Saha
Hadoop Development LeadVMware Corp IT Data Solution and Delivery
An Enterprise Data Warehouse meets an Elephant
2
Business Use Case for Big Data Analytics@ VMware BI Space
Personalized Marketing & Customer Targeting
Personalized Campaign Content Strategy
MyVMware Log AnalyticsCombine User Level data - logins and other activities with Clickstream Data and Product Data
VMware Product’s List Price Optimization and Deal Analytics for VMware Pricing Team
- Complex ETL, Bigger Joins- Flattening Star Schema Tables- Propensity Modeling
EDW
- Deeper Learning of VMware Product Issues- Build highly intelligent recommendation System to fix Customer Issues with faster turn around time
GSS Service Request Logs Analytics- High Volume ~ 400TB- A lot of Variety of data- Complex parsing
Clickstream Data Analytics• Path analysis – First user visit to buy
product • Propensity Modeling • Predictive Analytics - which
product user will buy• Customer Lifetime Value
Analysis
554 columns1.5B Rows20TB Data
( 2yrs)
Variety
VolumeVelocity
BIG
DATA
3
• This Big Data Cluster is fully Virtualized • based on vSphere 6.0 and VMware Big Data Extensions 2.2
• We used EMC Isilon 7.2.0.2 with two patches for HDFS Storage
• We used Pivotal Big Data Suite 3.0 for Hadoop 2.6 and HAWQ 1.3 • We used Pivotal Spring XD 1.2 for Data Ingestion to Hadoop
• We integrated this with Alpine Data Lab 5.4 for running • Deeper Analytic Functions• Machine Learning Algorithms
• We integrated HUE 2.6 for GUI based HIVE/PIG Query execution client
Components of Big Data Cluster
4
5
On-Prem Big Data Production Datacenter
6
Apache Ambari – The Hadoop Cluster Management Console
Management &
Monitoring - HDFS - Yarn/Map reduce - Hive - HAWQ - Spring XD
Clickstream
ftps.vmware.comraw data filesfir
ewal
l
Daily push of Clickstream Logs
Data Ingestion to Isilon HDFS via Spring XD
Lookup Logs
Clickstream Logs
Adv. Analytics
Users
• Data Cleaning• Better Consumable
Structured data• Data Partitioning • Schema Building• Faster Analytic Power
- Daily 2M Clickstream Records ( ~10GB ) ares being ingested from Adobe Omniture to Isilon HDFS
- 1.5Billion Records and 554 columns and ~20TB of data
- Data Cleanup and Pre Processing using PIG, Hadoop Streaming and Python Scripts
- Fit the Data into HIVE/HAWQ Schema
- End Users ( Data Scientists ) consume via HUE/pgAdmin/Alpine Data Lab
python
Data Processing Pipeline – Click Stream Data
8
Data Consumption – pgAdmin3 ( via HAWQ Database) ….
9
And visualize the results ..
37%
7%
7%6%
6%
6%
4%
4%
3%
3%2%
2%2%
2%2% 1%1%1% 1%1%
Top 20 Countries with unique vmware.com Visits
on 2015 Q1 usajpndeugbrchnindcanfraauskorespbraitanldruschetwnpolmexswe
34%
7%
7%6%
10%
6%
3%
3%
2%
4%
3%
3%
2%2%
2% 1% 1% 1% 1%1%
Top 20 Countries with unique vmware.com Visitors
on 2015 Q1 usajpndeugbrchnindcanfraauskorespbraitanldruschetwnpolmexswe
Disclaimer : This is based on Synthesized Dataset for demo purpose, not Real Data
10
Data Consumption – HUE
Hive Query to find out unique visits in VMware
site 2015 Q120
14-01
2014
-02
2014
-03
2014
-04
2014
-05
2014
-06
2014
-07
2014
-08
2014
-09
2014
-10
2014
-11
2014
-12
2015
-01
2015
-02
2015
-03
2015
-04
2015
-05
2015
-06
2015
-070
2000000
4000000
6000000
8000000
10000000
12000000
14000000
Unique Visits in 2014 and 2015 month wise
visits
Month
Visi
t Cou
ntDisclaimer : This is based on Synthesized Dataset for demo
purpose, not Real Data
11
Advanced Data Analytics by Alpine Data Lab
Time Series Analysis on Jan 2015 Clickstream Data
12
At VMware IT, we have established the fact that an Enterprise Big Data Analytics Platform can be successfully built and run on top of VMware Virtual Infrastructure with EMC Isilon and PHD 3.0
-with great performance
Take Away …
13
Thank You
QA