Polyglot Processing - An Introduction 1.0
-
Upload
dr-mohan-k-bavirisetty -
Category
Documents
-
view
478 -
download
0
Transcript of Polyglot Processing - An Introduction 1.0
![Page 1: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/1.jpg)
POLYGLOT PROCESSING – AN INTRODUCTION
Dr. Mohan K. BavirisettyChief ScientistModern Renaissance
![Page 2: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/2.jpg)
Agenda
1. Big Data Landscape2. Lambda vs. Kappa Architecture3. Spark vs. Storm vs. Flink4. Demo 1 – Apache Spark 5. Demo 2 – Storm, Kafka and Redis 6. Demo 3 – Flink with Data Stream API?7. Summary8. Questions
The purpose of computing is insight not data – Richard Hamming
![Page 3: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/3.jpg)
BIG DATA LANDSCAPE
![Page 4: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/4.jpg)
What is Big Data?
Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
Source: Gartner Research
![Page 5: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/5.jpg)
![Page 6: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/6.jpg)
What is a Real-time Analytics Platform?
![Page 7: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/7.jpg)
• Batch Operations1
• Micro batch Operations2
• Real-time Streaming3
3 Common Kinds of Workloads
“Evidence-based decision-making (aka Big Data) is not just the latest fad, it'sthe future of how we are going to guide and grow business.” – Kristen Hammond, CTO, Narrative Sciences
![Page 8: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/8.jpg)
8 Requirements of Real-time Computing
Keep Data Moving
Allow SQL Queries
Handle Stream Imperfections
Generate Predictable Outcomes
Integrate Streaming Data and Stored Data
Guarantee Data Safety and Availability
Partition and Scale Applications Automatically
Process and Respond Instantaneously
![Page 9: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/9.jpg)
How do major data engines compare?
![Page 10: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/10.jpg)
Real-time Streaming Architecture
![Page 11: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/11.jpg)
Berkeley Data Analytics Stack
![Page 12: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/12.jpg)
Polyglot …..• One who is versed in many languages …Polyglot
• Different languages, frameworks and services• Example Java with Scala, Clojure inside Trident
Polyglot Programming
• Capacity to store data in multiple formats• Structured, document, Log, GPS
Polyglot Persistence
• Refers to capability to process any kind of data, any kind of workload, any kind of workflow
Polyglot Processing
![Page 13: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/13.jpg)
![Page 14: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/14.jpg)
LAMBDA VS. KAPPA ARCHITECTURES
![Page 15: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/15.jpg)
Lambda Architecture
![Page 16: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/16.jpg)
What is Apache Storm?
Apache Storm is a free and open source distributed real-time computation system it makes it easy to reliably process unbounded streams of data.
![Page 17: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/17.jpg)
Why Apache Storm?
Storm is fast, horizontally scalable, fault-tolerant, easy to setup and operate and programming language agnostic
![Page 18: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/18.jpg)
Apache Storm
![Page 19: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/19.jpg)
Apache Storm can be used to realize an APM Use Case
![Page 20: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/20.jpg)
Apache SparkApache Spark is a fast and general engine for large-scale data processing.
• Spark is fast
• Spark is easy
• Spark is extensible
![Page 21: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/21.jpg)
Lambda Implementation with Spark
![Page 22: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/22.jpg)
Kappa Architecture
![Page 23: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/23.jpg)
Apache Flink
![Page 24: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/24.jpg)
Apache Flink has unified runtime engine
![Page 25: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/25.jpg)
![Page 26: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/26.jpg)
DEMONSTRATION
![Page 27: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/27.jpg)
SUMMARY
![Page 28: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/28.jpg)
Summary• Big Data Challenges are being met with new and
innovative approaches and architectures.• Lambda Architecture is a pragmatic near-term
solution. Fidelity is already implementing it.• Kappa Architecture could turn out to be long-term
elegant solution to Polyglot Processing.• Apache Spark, Strom and Flink have their strengths
and niche areas of applicability.• Apache Samoa, Apache Zappelin and Tacheon add
value further by providing additional capabilities
![Page 29: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/29.jpg)
Maturity
Tim
e
Descriptive
Preventive/Prescriptive
Working Toward Analytics Mastery
Predictive
![Page 30: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/30.jpg)
Next Stage of Data Explosion
![Page 31: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/31.jpg)
QUESTIONS?
We do not learn by inference and deduction and the application of mathematics to philosophy, but by direct intercourse …
- Henry David Thoreau
![Page 32: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/32.jpg)
THANK YOU
![Page 33: Polyglot Processing - An Introduction 1.0](https://reader035.fdocuments.in/reader035/viewer/2022062412/589bcaa31a28ab92618b46c1/html5/thumbnails/33.jpg)
Appendix- References and Resources
• 8 Requirements of Real-time Stream Processing http://cs.brown.edu/~ugur/8rulesSigRec.pdf
• Design Patterns for Real-Time Streaming Analytics http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/38774
• Big Data: Principles and Best Practices of Scalable Real-time Data Systems. http://bit.ly/1LscB7z
• Real-time Stream Processing Next-Step for Apache Flink http://www.confluent.io/blog/2015/05/06/real-time-stream-processing-the-next-step-for-apache-flink/
• SAMOA – Scalable Advanced Massive Online Analysishttp://jmlr.csail.mit.edu/papers/volume16/morales15a/morales15a.pdf
• Lambda Architecture http://lambda-architecture.net/• Kappa Architecture http://www.kappa-architecture.com/• Apache Spark http://spark.apache.org/• Apache Storm https://storm.apache.org/• Apache Flink https://flink.apache.org/• Apache SAMOA https://samoa.incubator.apache.org/• Apache Zappelin https://zeppelin.incubator.apache.org/• Tacheon http://tachyon-project.org/