Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics...
Transcript of Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics...
![Page 1: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/1.jpg)
Panel Discussion – Data Analytics and ComputingChallenges
Venkat Gudivada, East Carolina University, Greenville, NC, USATorsten Ullrich, Fraunhofer Austria Research GmbH, Austria
Maaike de Boer, TNO & Radboud University, the NetherlandsNuccio Piscopo, Engineering Ingegneria Informatica S.p.A., Italy
Jolon Faichney, Griffith University, AustraliaFlorence Nicol, ENAC, France
Gudivada Data Analytics and Computing Challenges 1/11
![Page 2: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/2.jpg)
Evolution of Data Analytics
SQL Analytics: RDBMS, OLTP, and OLAP
Business Analytics: Business Intelligence (BI), Data Warehousing(OLAP Cubes, OLAP Servers), and Data Mining
Visual Analytics
Big Data Analytics
Cognitive Analytics
Traffic Analytics, Text Analytics, Spatial Analytics, Risk Analytics,and Graph Analytics
Data Science
Gudivada Data Analytics and Computing Challenges 2/11
![Page 3: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/3.jpg)
Types of Data Analytics
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
Gudivada Data Analytics and Computing Challenges 3/11
![Page 4: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/4.jpg)
Gudivada Data Analytics and Computing Challenges 4/11
![Page 5: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/5.jpg)
Gudivada Data Analytics and Computing Challenges 5/11
![Page 6: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/6.jpg)
Gudivada Data Analytics and Computing Challenges 6/11
![Page 7: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/7.jpg)
Gudivada Data Analytics and Computing Challenges 7/11
![Page 8: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/8.jpg)
Data Challenges for Data Analytics
Data quality
Data provenance
Differential privacy
Big data-driven machine learning applications pose unique challenges
Gudivada Data Analytics and Computing Challenges 8/11
![Page 9: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/9.jpg)
Machine Learning Challenges
Data sparsity in feature space
Data correlations
Parallelization
Decision trees, Bagging/Bootstrapped Aggregation, Random Forests,and Boosted Trees
Gudivada Data Analytics and Computing Challenges 9/11
![Page 10: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/10.jpg)
Computing Challenges for Data Analytics
High volume data
Streaming data
Real-time analytics
In-memory analytics
Incremental computation
Gudivada Data Analytics and Computing Challenges 10/11
![Page 11: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/11.jpg)
Panel Summary - Data Analytics Challenges
Data quality, differential privacy, and provenance
Data heterogeneity
Information extraction from multimedia big data
Reproducibility of analysis
Leveraging open and linked data
Functional data analysis to overcome the inadequacy of multivariatestatistical techniques
Gudivada Data Analytics and Computing Challenges 11/11
![Page 12: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/12.jpg)
1
ALLDATA 2015, 17-25 April, Barcelona, Spain
Anomaly DetectionUsing Deep Learning
Dr Jolon FaichneySchool of Information and CommunicationTechnology
Griffith University, Australia
![Page 13: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/13.jpg)
2
ALLDATA 2015, 17-25 April, Barcelona, Spain
What is Anomaly Detection?
• Historically faults were detected by analysing logs
• Today, logs are too large to manually analyse in realtime
• Changes in data may indicate that a fault will occur before ithas occurred
• What is considered an anomaly may change over time
![Page 14: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/14.jpg)
3
ALLDATA 2015, 17-25 April, Barcelona, Spain
Machine Temperature
Intended shutdown
Symptom
Failure
![Page 15: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/15.jpg)
4
ALLDATA 2015, 17-25 April, Barcelona, Spain
Amazon Web Services
• Can you pick the anomalies?
![Page 16: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/16.jpg)
5
ALLDATA 2015, 17-25 April, Barcelona, Spain
Anomaly Algorithms
• Etsy.com
◦ Skyline
◦ A set of simple detectors and a voting scheme
◦ ADVec
◦ Can detect short and long term trends
• Numenta
◦ HTM
◦ Hierarchical Temporal Memory
![Page 17: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/17.jpg)
6
ALLDATA 2015, 17-25 April, Barcelona, Spain
Hierarchical Temporal Memory
![Page 18: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/18.jpg)
7
ALLDATA 2015, 17-25 April, Barcelona, Spain
Anomaly Data Set
• NAB – Numenta Anomaly Benchmark
◦ AWS CloudWatch
◦ Machine Temperature Sensor
◦ NYC Taxi
◦ Tweets
◦ Traffic
◦ AdExchange
◦ Artificial Data
![Page 19: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/19.jpg)
8
ALLDATA 2015, 17-25 April, Barcelona, Spain
Results
Detector Standard Reward Low FP Reward Low FN
Numenta HTM 64.7 56.5 69.3
Twitter AdVec 47.1 33.6 53.5
TemplateMatching
41.02 43.15 38.44
Etsy Skyline 35.7 27.1 44.5
Random 16.8 5.8 25.9
Null 0 0 0
![Page 20: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/20.jpg)
9
ALLDATA 2015, 17-25 April, Barcelona, Spain
Topics for Discussion
• Can machines reliably find anomalies?
• Can machine learning be implemented for real timeanomaly detection at levels of scale?
![Page 21: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/21.jpg)
THE FUTURE OF MULTIMEDIASYSTEMSPanel on Data Analytics and Computing Challenges | Maaike de Boer
![Page 22: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/22.jpg)
MULTIMEDIA SYSTEMS NEED TO BE SELF-EXPLAINABLE DESPITE OF (POSSIBLE)LOWER PERFORMANCE
2 | The Future of Multimedia Systems
High performing deep learning systemsvs.
Lower performing explainable systemsOr can we use the best of both (and how)?
![Page 23: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/23.jpg)
SCALABLE SOLUTIONS
3 | The Future of Multimedia Systems
Assume a user query in a multimedia system has no match to pre-trained detectors (words used to index an item with)
What to do?• We should pre-train as many concept detectors as possible
(opposed to a few high-performing detectors) to have somematch
• We should focus on semantic decomposability of a query
• Other suggestions?
![Page 24: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/24.jpg)
www.eng.it
![Page 25: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/25.jpg)
Nuccio Piscopo
Data Scientist - Big Data & Analytics Competency Center
Engineering Ingegneria Informatica S.p.A.
Data Analytics and Computing
Challenges
Panel at AllData Conference, Venice – April 26, 2107
www.eng.it
![Page 26: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/26.jpg)
www.eng.it
Big Data: the Source-Service
Big Data Analytics:
• Logics (data intelligence) moves to functional programming paradigm
• Data transfer from structure/unstructured/semi-structured runs on dataframes
……. so, might data modeling change by design elements/construct?
Metadata:
• Vector Construct V = (v1, v2, v3, …). Vector elements map heterogeneous data
topology.
Metamodel:
• Set of vectors covering sources morphology through explicit formal specifications of the
terms and relationships in the datasource domain (ontology)
![Page 27: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/27.jpg)
www.eng.it
Prescriptive Metamodel Framework – Ontology vs. Vectors
Structured/Unstructured
Category: dataflow, datasource, dataset , spare source Element: time, sourcetype, entitytype, provenance, destination, ext, … Construct: vector F = [fi.j] i,j Є N
Category: record, table, spare info Element: data records, fields record, spare field Construct: vector R = [ri.j] i,j Є N
Category: Datafile size, frequency, transferring method, owner, approvals, … Element: dataflow properties, source properties Construct: vector P = [pi.j] i,j Є N
Functional Layer - Vector Identifier
On
tolo
gy
Data Mapping Layer - Metamodel
⟨F| = (Source, Date, Category, Type, Destination, Version, …)
⟨R| = (Checkdate, AthleteID, Age, Height, Weight,…)
…………………
Category: metadata Element: vectors Construct: metamodel Mv = {fi,j , ri,q, pi,l}
![Page 28: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/28.jpg)
www.eng.it
Prescriptive Analytics – Machine-Learning
Ingestion Layer
Sources Layer Data-Lake
Forward-Looking Backward-Looking
Distillation Data Quality Source morphology changes Data topology anomalies ….
Service variability Crowding fluctuation Variables behaviour Data gauges instability …..
Methods Algorithms MBBEFD, SVM, SOM- Kohonen, LinkedMatrix, Cstab,, … Density-Based, Connectivity, State Space,..
Machine-Learning
Understanding (“prescriptively”) Services Changes
![Page 29: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/29.jpg)
www.eng.it
Prescriptive Analytics – Engine
Service Layer
Ingestion Layer
Sources Layer
Prescriptive on-the-fly analytics:
Simulation by vectors metamodels Aggregation status by dataframes Verify variables behaviour Verify services gauges deviations
Bulk analytics: Compare function by prescriptive directions Start conditional statistics Verify deviations on mass-crowding Trace data aggregation instability
Data-Lake
![Page 30: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/30.jpg)
www.eng.it
Conclusion – Future Works
Prescriptive Metamodel Framework introduces vectors data modelling as extended construct for
dataframes metamodels in Big Data systems. Analytics running on vectors metadata enables on-the-fly
service gauge changes and machine-learning analytics by a new formalism. Prescriptive analytics runs
both on the forward-looking and backward-looking.
Future Works:
• Consolidate the framework as Prescriptive Analytics Solution
• Extend Vector Modeling by general construct of Data Models for structured, unstructured and semi-
structured information
• Extend vectors mathematical method as practice for Big Data analytics
All trademarks, trade names, service marks and logos referenced herein belong to their respective companies/offices. Data and cases included in the presentation are trial examples with no real values.
![Page 31: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/31.jpg)
@EngineeringSpa
Engineering Ingegneria
Informatica Spa
gruppo.engineering
www.eng.it
![Page 32: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/32.jpg)
Documentation and TraceabilityData Analytics and Computing Challenges
Torsten Ullrich
Fraunhofer Austria Research GmbH, Visual Computing &Technische Universitat Graz, Austria
Panel on ALLDATA & MMEDIA & KESA
1 / 3
![Page 33: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/33.jpg)
Documentation and Traceability
Image Sources: Prana Fistianduta, CC3.0, Wikimedia Commons; Huntster, NASA, Wikimedia Commons;Emgonzalez, Public Domain, Wikimedia Commons; Chaos, CC3.0 / GNU Free Documentation License, Wikimedia Commons
2 / 3
Institut fur ComputerGraphikund WissensVisualisierung
![Page 34: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/34.jpg)
Documentation and Traceability
Open Access, Open Data & Open Science
Open Problems: future reproducibility
1 physical layer / hardware layer
2 hardware abstraction layer
3 operating system call interface
4 system libraries & software frameworks
5 application layer & system environment
3 / 3
Institut fur ComputerGraphikund WissensVisualisierung
![Page 35: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/35.jpg)
Documentation and Traceability
Open Access, Open Data & Open Science
Open Problems: future reproducibility
1 physical layer / hardware layer
2 hardware abstraction layer
3 operating system call interface
4 system libraries & software frameworks
5 application layer & system environment
3 / 3
Institut fur ComputerGraphikund WissensVisualisierung
![Page 36: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/36.jpg)
Documentation and Traceability
Open Access, Open Data & Open Science
Open Problems: future reproducibility
1 physical layer / hardware layer
2 hardware abstraction layer
3 operating system call interface
4 system libraries & software frameworks
5 application layer & system environment
3 / 3
Institut fur ComputerGraphikund WissensVisualisierung
![Page 37: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/37.jpg)
Documentation and Traceability
Open Access, Open Data & Open Science
Open Problems: future reproducibility
1 physical layer / hardware layer
2 hardware abstraction layer
3 operating system call interface
4 system libraries & software frameworks
5 application layer & system environment
3 / 3
Institut fur ComputerGraphikund WissensVisualisierung
![Page 38: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/38.jpg)
Documentation and Traceability
Open Access, Open Data & Open Science
Open Problems: future reproducibility
1 physical layer / hardware layer
2 hardware abstraction layer
3 operating system call interface
4 system libraries & software frameworks
5 application layer & system environment
3 / 3
Institut fur ComputerGraphikund WissensVisualisierung
![Page 39: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/39.jpg)
Documentation and Traceability
Open Access, Open Data & Open Science
Open Problems: future reproducibility
1 physical layer / hardware layer
2 hardware abstraction layer
3 operating system call interface
4 system libraries & software frameworks
5 application layer & system environment
3 / 3
Institut fur ComputerGraphikund WissensVisualisierung
![Page 40: Panel Discussion – Data Analytics and Computing Challenges · Panel Discussion – Data Analytics and Computing Challenges Author Venkat Gudivada, East Carolina University, Greenville,](https://reader034.fdocuments.in/reader034/viewer/2022052009/601e1e79975f0a184e762e9b/html5/thumbnails/40.jpg)
Documentation and Traceability
Open Access, Open Data & Open Science
Open Problems: future reproducibility
1 physical layer / hardware layer
2 hardware abstraction layer
3 operating system call interface
4 system libraries & software frameworks
5 application layer & system environment
3 / 3
Institut fur ComputerGraphikund WissensVisualisierung