SymEx 2015 - Agile Process for Big Data Analytic
-
Upload
pmi-indonesia-chapter -
Category
Leadership & Management
-
view
324 -
download
0
Transcript of SymEx 2015 - Agile Process for Big Data Analytic
• More than 17 years experiences in IT industry with theoretical physics background. He started as scientific programmer at University of Indonesia’s semiconductor lab then later worked as software engineer and architect in various software companies. He joined SRIN in 2014 to lead development of various mobile apps and middleware platforms, as well as to conduct research projects on predictive data analytic using deep machine learning technologies. Prior to SRIN, he spent 10 years at Microsoft Indonesia as Director of Developer Ecosystem (DX) division.
insert photo
Context of “Big Data” Science Scope of Data Analytic Project Management Complexities Team Structure and R&R Agile Principles and Process Model Common Execution Issues Q&A
Volume Exceeds physical limits of vertical scalability
Velocity Decision window small compared to data change rate
Variety Many different formats makes integration expensive
Variability Many options or variable interpretations confound analysis
By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.
– – Gartner, Mark Beyer “Information Management in the 21st Century”
Data, Data, .. Everywhere New Data Sources Larger Data Volumes
New Data Management Technologies Hadoop + Spark + Tool Ecosystem
New Era of Data Analytic Descriptive, Predictive & Prescriptive Data-Driven Organization
10x increase every five years
85% from new data types
Volume Velocity Variety
2013 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Cloud Data Storage is Unlimited
Quincy, WA Chicago, IL San Antonio, TX Dublin, Ireland Generation 4 DCs
Generic Tasks 1. Define Analytic
Requirement 2. Setup
Infrastructure 3. Collect Data 4. Data Modeling 5. Data Processing 6. Model Deployment 7. Monitoring 8. Evaluation 9. Etc….
Complexity
Valu
e
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
What happened?
Why did it happen?
What will happen?
How can we make it happen?
Vision Analytics
Recommenda-tion engines
Advertising analysis
Weather forecasting for business planning
Social network analysis
Legal discovery and document archiving
Pricing analysis
Fraud detection
Churn analysis
Equipment monitoring
Location-based tracking and services
Personalized Insurance
Advance computation based on machine learning & predictive analytics are core capabilities that are needed throughout future business
Pull-based Batch Loads
Enterprise Data Models
Complex ETL Logic
Poorly Suited to Non-Relational Data
Emergent design is difficult
CRISP-DM - Cross Industry Standard Process for Data Mining.
Framework for Guidance Process Model Non-proprietary Experience Base Application/Industry neutral Tool neutral Focus on business issues As well as technical analysis
Business Understanding
Data Understanding
Data Preparation Modeling Deployment Evaluation
Format Data
Integrate Data
Construct Data
Clean Data
Select Data
Determine Business
Objectives
Review Project
Produce Final
Report
Plan Monitoring &
Maintenance
Plan Deployment
Determine Next Steps
Review Process
Evaluate Results
Assess Model
Build Model
Generate Test Design
Select Modeling Technique
Assess Situation
Explore Data
Describe Data
Collect Initial Data
Determine Data Mining
Goals
Verify Data
Quality
Produce Project Plan
Common Issues Learning curve for data science & data engineer. We can’t design insights, we discover it through exploring Low data quality .. Less insights from the data. The result is not good enough.
Key Strategies Extra dedicated time to learn before project sprints (Eq. MOOC). Add capabilities to explore data, iterate and publish intermediate results. Improve data quality based on feedbacks. Build-Measure-Release iteration.