Video Analytics on Hadoop webinar victor fang-201309
-
Upload
drvictorfang -
Category
Technology
-
view
108 -
download
1
description
Transcript of Video Analytics on Hadoop webinar victor fang-201309
![Page 1: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/1.jpg)
A NEW PLATFORM FOR A NEW ERA
![Page 2: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/2.jpg)
2© Copyright 2013 Pivotal. All rights reserved. 2© Copyright 2013 Pivotal. All rights reserved.
What You Can Do With Hadoop Webinar Series
Unstructured Data – Video AnalyticsSeptember 6, 2013
Dr. Chunsheng (Victor) Fang, Sr. Data ScientistAnnika Jimenez, Global Head of Data Science ServicesNikesh Shah, Sr. Product Marketing Manager
![Page 3: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/3.jpg)
3© Copyright 2013 Pivotal. All rights reserved.
What You Will Learn
Pivotal Data Science Lab Services
New Emerging Trends for Unstructured Data
Video Analytics on Hadoop
Analytics with SQL
![Page 4: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/4.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Platform
Cloud Storage
Virtualization
Data & AnalyticsPlatform
CloudApplication
Platform
Data-DrivenApplication
Development
Pivotal Data Science Labs
![Page 5: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/5.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Data Science
![Page 6: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/6.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Data Science Value Chain
Instrumen-tation
Logs Capture Store
Transform and
PrepareAccess Model
Development Deploy Applications Process Change
Product Engineer
Platform Engineer DBA
Data Engineer/Program
mer
Data Engineer Data
Scientist
Platform Engineer
Application Developer
PMO
![Page 7: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/7.jpg)
© Copyright 2013 Pivotal. All rights reserved.
How We Help Our Customers
1. Data Science Strategy Definition
2. Point Proof-of-Value Model Development
3. Multiple Model Development + Apps
4. DSIC Transformation to “Predictive Enterprise”
5. Also:– Algorithm development– Pushing the envelope in problem-solving
Pivotal Data Science Labs
![Page 8: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/8.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Data Science Knowledge Development
![Page 9: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/9.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Data Science Dream Team Derek Lin – Network Security, Fraud Detection, Speech and Language
Processing, (Principal Scientist at RSA, M.S. in Signal Processing, USC) Hulya Farinas – Optimization, Resource Allocation in Healthcare
(Modeler at M-Factor, IBM, Ph.D. in Operations Research, University of Florida)
Kaushik Das – Mathematical Modeling in Energy, Retail and Telco(Director of Analytics at M-Factor, M.S. in Mineral Engineering, UC Berkeley)
Sarah Aerni – Genomics and Machine Learning (Ph.D. in Biomedical Informatics, Stanford)
Mariann Micsinai – Next Generation Sequencing (Market Risk Management Associate at Lehman Brothers, Ph.D. in Computational Biology, NYU and Yale)
Victor Fang – Imaging and Graph Analytics, Machine Learning (Sr. Scientist at Riverain Medical, SDE at Amazon.com, Ph.D. in Computer Sciences, University of Cincinnati)
Emily Kawaler – Clinical Informatics and Machine Learning (M.S. in Computer Sciences, University of Wisconsin-Madison)
Anirudh Kondaveeti – Trajectory Data Mining and Machine Learning (Ph.D. in Computing & Dec. Systems Eng, Arizona State University)
Hong Ooi – Insurance and Finance Risk Modeling (Statistician at ANZ, Ph.D. in Statistics, Australian National University)
Michael Brand –Text, Speech and Video Research for Retail, Finance and Gaming (Chief Scientist at Verint Systems, M.S. in Applied Mathematics, Weizmann Institute)
Kee Siong Ng – Data Mining in Healthcare (Sr. Data Miner at Medicare Australia, Ph.D. in Computer Science, and Postdoctoral Fellow, Australian National University)
Noelle Sio – Digital Media Analytics and Mathematical Modeling(Sr. Analyst at eHarmony, Fox Interactive Media (Myspace), M.S. in Applied Mathematics, Cal Poly Pomona)
Jin Yu – Stochastic Optimization, Robust Statistics in Machine Learning, Computer Vision (Research Associate at U of Adelaide, Ph.D. in Machine Learning, Australian National University)
Rashmi Raghu – Computational Methods and Analysis (Ph.D. in Mechanical Engineering, Stanford)
Woo Jung – Bayesian Inference and Demand Analysis (Sr. Statistician at M-Factor, M.S. in Statistics, Stanford)
Jarrod Vawdrey – Marketing Analytics & SAS (Analytics Consultant at Aspen Marketing, B.S. in Mathematics, Kennesaw State University)
Niels Kasch – Text Analytics and NLP (Ph.D. in Computer Science, UMBC)
Vivek Ramamurthy – Online Learning, Stochastic Modeling, Convex Optimization (Ph.D. in Operations Research, UC Berkeley)
Srivatsan Ramanujam – NLP and Text Mining(Natural Language Scientist at Sony, Salesforce.com, M.S. in Computer Sciences, UT Austin)
Alexander Kagoshima – Time Series, Statistics and Machine Learning (M.S. in Economics/Computer Science, TU Berlin)
![Page 10: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/10.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Data Science Labs: Packaged Services
LAB PRIMER(2-Week Strategy)
• Customized Analytics Roadmap
• 1-day Moderated Brainstorming Session
• Prioritized Opportunities
• Architectural Recommendations
LAB 600(6-Week Lab)
• Prof. Services(Data Load)
• Data Science Model Building
• Project Management
• Ready-to-DeployModel(s)
LAB 1200(12-Week Lab)
• Prof. Services(Data Load)
• Data Science Model Building
• Project
• Management
• Ready-to-DeployModel(s)
LAB 100(2-Week Lab)
• On-site PivotalAnalytics Training
• Rapid Model/InsightBuild on CustomerData(2 weeks)
![Page 11: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/11.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Approach: Data Science Lab 1200Week
1 2 3 4 5 6 7 8 9 10 11 12
Data Exploration
Features Building
Model Development
Code QA and Scoring
Model Optimization& Validation
Data Loaded
InsightsPresentation
Training
PreliminaryModel Review
Feature ReviewData Review
Documentation
![Page 12: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/12.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Program Management Data Architecture and Engineering Data Scientists Training and Skills
Development
Facilitate data loading processes from source systems to Pivotal Data Fabric
Coordinate data needs with Data Scientists
Best practice education for analytics performance
Data migration to support new applications
Oversight and communication plans
Organizational alignment
Risk mitigation
Resource planning
Prioritize deliverables
Socialize progress of overall initiative
Instill data collaboration culture
Execute Data Science Lab engagements around revenue generation or cost saving efforts
Hands on education with new data analysis techniques
Introduce new analytics tools and methodologies
Identify candidates for deeper data science training
Create training curriculum
Recruiting Methodology
Parallel computing techniques defined and demonstrated
Build institutional knowledge for client data science team
Data Science Innovation Center (DSIC)Key Principles• Building a predictive enterprise is, first and foremost, about building a human infrastructure.• Analytics is an iterative knowledge discovery process and needs to be managed as such.• Discovery starts from asking the right questions – that can be as important as finding
answers to those questions.
![Page 13: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/13.jpg)
© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved.
Large Scale Video Analytics Platform on HadoopDr. Chunsheng (Victor) Fang, Sr. Data Scientist
![Page 14: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/14.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Video Analytics Taskforce Chunsheng (Victor) Fang, Ph.D.
– Sr. Data Scientist
Regunathan Radhakrishnan, Ph.D.– Sr. Data Scientist
Derek Lin, – Principal Data Scientist
Sameer Tiwari– Hadoop Architect
Kenneth Dowling & Michael Nemesh– DCA Admin
![Page 15: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/15.jpg)
16© Copyright 2013 Pivotal. All rights reserved.
Industry Use CaseSurveillance Video Anomaly Detection
![Page 16: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/16.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Anomaly Detection in Surveillance Video Detect anomalous objects in a restricted perimeter.
Typical large enterprise collects TB’s video per day.
Hadoop MapReduce runs computer vision algorithms in parallel and captures violation events.
Post-Incident monitoring enabled by Hadoop / HAWQ.
![Page 17: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/17.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Unstructured Video Data Workflow
Unstructured data as input
ETL: Distributed Video Transcoder
Analytics: Distributed Video Analytics
Structured Insights in relational database for advanced analytics
ETL AnalyticsUnstructured
DataStructured
Insights
![Page 18: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/18.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Real World Video Data Benchmark Surveillance Videos (i-LIDS) from United Kingdom Home
Office– Library of HiDef CCTV video footage based around ‘scenarios’ central to the
government’s requirements. – The footage accurately represents real operating conditions and potential threats.
Anomaly Detection: Sterile zone dataset
Night Day
![Page 19: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/19.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Most Common Video Standards
MPEG & ITU: responsible for many video standards
MPEG-2 (1995): Widely adopted, DVDs, Digital TV broadcast, set-top boxes
![Page 20: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/20.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Intro to MPEG Standard MPEG standard encodes video frames
– Redundancy in time: inter-frame encoding– Redundancy in space: intra-frame encoding
Motion compensation– I-frame: (Key frame) intra-frame encoding– P-frame: (Predicted frame) Predicting regions of
current frame from previous frame – B-frame: (Bi-predictive frame) Predicting regions of
current frame using both previous and next frame
![Page 21: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/21.jpg)
© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved. 22© Copyright 2013 Pivotal. All rights reserved.
Distributed Video Transcoder on HadoopDistributed MapReduce MPEG Transcoder
![Page 22: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/22.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Motivation of Distributed Video Transcoding
Can we decode the individual frames from an arbitrary block in Hadoop File System (HDFS)?
Hadoop splits any file into 64MB or 128MB blocks in HDFS.
Each block can be processed in parallel by customized Map-Reduce function
Most video file standards are Not Hadoop-Friendly.
![Page 23: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/23.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Decoding MPEG-2 with MapReduce
Two key observations– Video header information: available only at the header in the bitstream– Group of Pictures (GOP) header repeats
Steps to decode arbitrary blocks– Step 1: Configure each mapper to extract the header information from each file;
▪ Totals ~20 videos at 5GB
– Step 2: Start searching for GOP header in each block in parallel;– Step 3: Decode frames into a suitable image format (JPEG, BMP, etc);– Step 4: Consolidate all time-stamped frames into Hadoop Sequence File.
▪ Reduces to sequence file at 500MB
Transcoding MPEG-2 video into Hadoop-friendly format
![Page 24: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/24.jpg)
© Copyright 2013 Pivotal. All rights reserved.© Copyright 2013 Pivotal. All rights reserved.
Distributed Video Analytics Platform on Hadoop
![Page 25: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/25.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Object Detection with Gaussian Mixture Model The video data is much more noisier than we realize.
You don’t realize it because your visual cortex can denoise.
For computer, it requires good statistical models (e.g. GMM) for robustness.
Distribution of pixel intensities over time
![Page 26: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/26.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Typical Video Analytics Workflow Video/image data are highly unstructured
Hadoop proven to be excellent in extracting structured insights from Big Data
A typical workflow:
ANALYTIC RESULT
Foreground Extraction
Background Stat Model
Visual KeyComposite
Key
Feature Extraction
/Classification
((Key, Time), Loc)
![Page 27: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/27.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Use Case 1: Anomaly Detection Extracting structured info from Unstructured data
Computer vision algorithms fit into Mapper/Reducer framework
Intermediate (Key, Value)– (RestrictedArea, IntrusionEvent(Time, ViolatorImage) )
Map Reduce
HDFS
Map
Map
Map
HDFS / GPDB
Reduce
Reduce
2012-09-01 07:00:00
![Page 28: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/28.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis
Tracking multiple objects in Big Data video archives
Building high level summarization e.g. moving trajectory time series
T1 T2 T3
T4 T5 T6
![Page 29: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/29.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis “Map”
Map
Foreground Extraction
Background Stat Model
Visual KeyComposite
Key
Feature Extraction
/Classification
((VisKey, time), loc)
Emit(K,V)
![Page 30: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/30.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Use Case 2: Trajectory Analysis “Reduce”
Reduce
Aggregate
User definedTrajectory
model
(Object, Trajectory)
2nd Sort on Composite key
((VisKey, time), loc)
![Page 31: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/31.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Video Analytics Platform Supports Video ETL
– Support standard formats: MPG, AVI, MP4. – Sequence file in HDFS
Image Processing Toolkit– Support standard formats (e.g. JPEG, BMP, PNG)– Color space conversion– Edge/key point detection– Morphological processing– Filtering: convolutional, median, etc.
PHD MapReduce for scalable computer vision algorithms
HAWQ SQL for high level analytics
![Page 32: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/32.jpg)
34© Copyright 2013 Pivotal. All rights reserved.
Video Analytics Demo
![Page 33: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/33.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Performance Quick Facts
Each frame takes 103 millisecond to process a 720x576 video frame (near real time even in Java)
Detection algorithm: Linearly scale with #processors
• Impacts: • Enhance public security• Improve security officers’ producitivity
![Page 34: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/34.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Querying the Analytics Results Average speed of the red car on yesterday, using window function
SELECT sqrt(power(avg(abs(x_diff)),2) + power(avg(abs(y_diff)),2))*FPS_MPS_FACTOR
FROM (SELECT X-lag(X,1) OVER (ORDER BY TIME ) AS x_diff, Y-lag(Y,1) OVER (ORDER BY TIME ) AS y_diffFROM SANMATEO WHERE TARGET = AND TIME > (CURRENT_TIMESTAMP – INTERVAL ‘1’ DAY)AND TIME < (CURRENT_TIMESTAMP );
) x_tmp;
RESULT:
7.2 mph
![Page 35: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/35.jpg)
© Copyright 2013 Pivotal. All rights reserved.
More Use Cases Most of computer vision algorithms are embarrassingly parallel
No data sharing between processes– Feature extraction– Object detection/classification
Video Categorization for user generated contents– Find out trending in Youtube videos by topic modeling
Object Detection– Detect known categories of objects, e.g. face, bar code,
vehicle.
Object Search– Given a known object, using template matching to locate
the object
Haar-like + AdaBoost Cascade Face Detector
![Page 36: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/36.jpg)
© Copyright 2013 Pivotal. All rights reserved.
Summary Hadoop : a great tool for data scientists to crunch Unstructured
Big Data!
Hadoop extracts Structured insights from Unstructured video with customized computer vision algorithms.
Scalable framework with ease of experimenting, developing, deploying!
Pivotal HD demonstrates large scale video analytics use cases:– Anomaly detection– Trajectory analysis– More …
![Page 37: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/37.jpg)
48© Copyright 2013 Pivotal. All rights reserved. 48© Copyright 2013 Pivotal. All rights reserved.
Q&A
![Page 38: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/38.jpg)
© Copyright 2013 Pivotal. All rights reserved.
More Information
Pivotal Blog Site August 12, 2013
Large Scale Video Analytics
Contact the Data Science Lab Services
![Page 39: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/39.jpg)
50© Copyright 2013 Pivotal. All rights reserved. 50© Copyright 2013 Pivotal. All rights reserved.
Thank You
![Page 40: Video Analytics on Hadoop webinar victor fang-201309](https://reader034.fdocuments.in/reader034/viewer/2022051412/54c637e54a79594e588b45a7/html5/thumbnails/40.jpg)
A NEW PLATFORM FOR A NEW ERA