Scalability and Big Data at Senzari
-
Upload
chris-boos -
Category
Education
-
view
828 -
download
2
description
Transcript of Scalability and Big Data at Senzari
![Page 1: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/1.jpg)
SCALABILITY AND DATA ANALYTICS MATTER
HCB (@boosc)
![Page 2: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/2.jpg)
Agenda
• Buzzword bingo
• Data
• Analytics
• Scalability
• Distributed and parallel concepts
• Technology and tools
• Senzari and big data
![Page 3: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/3.jpg)
Buzzword Bingo
Big DataData Engineer
H-Space
HadoopCassandra HBasePIGredis.io Eucalyptus
Machine Learning Support Vector Machines
Gaussian ProcessesSwarm Intelligence
Genetic Algorithms
Agents/Bots
R+Natural Language Processing
ClusteringCore Dataset
NoStats
![Page 4: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/4.jpg)
Data, lots of it
![Page 5: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/5.jpg)
79 times more CPU power than used in Apollo missions on one iPhone
![Page 6: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/6.jpg)
What we can do
![Page 7: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/7.jpg)
Data
![Page 8: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/8.jpg)
Knowledge pyramid
Data Processing 1960 s 1950 s Data
Data:
Unfiltered, Research, Creation, Gathering
![Page 9: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/9.jpg)
Knowledge pyramid
Data Processing 1960 s 1950 s Data
Information Mangement 1980 s 1970 s Information
Information:
Organized Data, Patterns, Presentation
![Page 10: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/10.jpg)
Knowledge pyramid
Data Processing 1960 s 1950 s Data
Information Mangement 1980 s 1970 s Information
Knowledge Management 1990 s Knowledge
Knowledge:
Useful Patterns, Predictability, Conversation
![Page 11: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/11.jpg)
Knowledge pyramid
Data Processing 1960 s 1950 s Data
Information Mangement 1980 s 1970 s Information
Knowledge Management 1990 s Knowledge
Knowledge Ecology 2000 s Intelligence
Intelligence: Choice, Understanding, Dicision
![Page 12: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/12.jpg)
Knowledge pyramid
Data Processing 1960 s 1950 s Data
Information Mangement 1980 s 1970 s Information
Knowledge Management 1990 s Knowledge
Knowledge Ecology 2000 s Intelligence
Wisdom 2010 s Systems Thinking
Wisdom:
Evaluation, Interpretation, Retrospective
![Page 13: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/13.jpg)
Knowledge pyramid
Data Processing 1960 s 1950 s Data
Information Mangement 1980 s 1970 s Information
Knowledge Management 1990 s Knowledge
Knowledge Ecology 2000 s Intelligence
Wisdom 2010 s Systems Thinking
Yield
![Page 14: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/14.jpg)
Why you need big data
Data Processing 1960 s 1950 s Data
Information Mangement 1980 s 1970 s Information
Knowledge Management 1990 s Knowledge
Knowledge Ecology 2000 s Intelligence
Wisdom 2010 s Systems Thinking
Yield You Are Here !
![Page 15: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/15.jpg)
Analytics
![Page 16: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/16.jpg)
Even in simple datasets, common statistics fails - (avg, min, max, distribution)
![Page 17: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/17.jpg)
Finding clusters, evaluating outliers and interpreting white noise
![Page 18: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/18.jpg)
Two tips for looking at data:
1. Plot it
2. Remove all labels
![Page 19: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/19.jpg)
Scalability
![Page 20: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/20.jpg)
Cloud Computing Is
When the IT guys are finally able to explain to business
people what they were talking about 20 years ago!
![Page 21: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/21.jpg)
=
![Page 22: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/22.jpg)
Computation on demand
+Pay as you go
![Page 23: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/23.jpg)
BASE(Basically Available, Soft State, Eventual consistency)
not
ACID(Atomicity, Consistency, Isolation, Durability)
![Page 24: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/24.jpg)
How to scale (AWS Example)
• Do not allocate instances manually
• Each component needs to be independent
• Plan for failure
• Actively provoke failure
![Page 25: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/25.jpg)
Human Software
• Click Workers and Mechanical Turks are not just cheap labour
• They allow programmers to hand tasks to humans they are not able to handle algorithmically
• Make use of it to
• Do things too complicated for machine learning
• Pre populate machine learning spaces
![Page 26: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/26.jpg)
Distributed and parallel concepts
![Page 27: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/27.jpg)
Imperative Programming
• Step by step explanation what to do
• Explaining WHAT to do rather than RESULTS you want
• Always necessary for basic algorithms
1
2
3
![Page 28: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/28.jpg)
Functional Programming I
• Combine results to become a program
• Allows dynamic distribution
• Map-Reduce is only one way of doing it!
1
2
3
![Page 29: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/29.jpg)
Functional Programming II
F ( G ( H ( A,B) , C), D)
getMusicLikes(getFriends(facebookID)
Instead of
for i in getFriends(facebookID) getMusicLikes(i)
![Page 30: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/30.jpg)
Technology and tools
![Page 31: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/31.jpg)
Data Storage
• Cassandra - for write performance
• Hbase - for read performance
• Redis.io - for predictable operation time
![Page 32: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/32.jpg)
Other Data Storage
• Mongo - NOSQL for beginners (close to SQL, but scalability is very manual)
• SONOS -Graph DB (Windows based)
• CouchDB, etc. etc. - nice concepts, lots of great ideas, but communities too small
![Page 33: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/33.jpg)
Distributed Computing
• Hadoop
• Zookeeper as DLS
![Page 34: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/34.jpg)
Languages
• ERLANG
• HASKELL
• SCALA
• Lisp
• Prolog
• Mathmatica
![Page 35: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/35.jpg)
STDOUT
No, You Don‘t Have to Learn ERLANG? No,Use Hadoop
Streaming With Python
Program 1
Line 1
Line 1
Line 1
Line 1
Program 2
Program 2
Program 2
Program 2
![Page 36: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/36.jpg)
Check out my tool list:http://www.hcboos.net/100-links/
![Page 37: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/37.jpg)
Senzari and big data
![Page 38: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/38.jpg)
The AMP3 PlatformAdaptable Music Parallel Processing Platform
![Page 39: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/39.jpg)
Behind AMP
![Page 40: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/40.jpg)
Technologies
• AWS: EC2, S3, EBS, SNS, ELB
• Cassandra + Hadoop + Solandra
• Zookeeper
• Dynamic scaling server (Lich Lord)
• Asynchronous messaging system
• Modules built in python
![Page 41: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/41.jpg)
Effects
• Built on top of python platform
• Fully automated scaling
• Fully distributed data processing
• Message channels allow code decoupling
• Message channels allow replay
• Message channels allow outtasking
![Page 42: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/42.jpg)
Thank You for Your Time
![Page 43: Scalability and Big Data at Senzari](https://reader033.fdocuments.in/reader033/viewer/2022051609/547c39f5b479599d508b4622/html5/thumbnails/43.jpg)
Credits
• „Big Data Just Beginning to Explode“ by CSC http://www.csc.com/insights/flxwd/78931-big_data_just_beginning_to_explode
• „Social media network connections among twitter users“ by Marc Smith http://www.flickr.com/photos/marc_smith/
• Asteroid Datasets by Bruce Gary http://brucegary.net/POVENMIRE/x.htm