Big Data Processing Utilizing Open-source Technologies - May 2015
-
Upload
amir-sedighi -
Category
Software
-
view
540 -
download
2
Transcript of Big Data Processing Utilizing Open-source Technologies - May 2015
![Page 1: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/1.jpg)
Big-Data Processing utilizingOpen-Source Technologies
32 Slides
Amir SedighiRayanesh Dadegan Data Solutions Ltd.
May 2015
![Page 2: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/2.jpg)
Amir Sedighi - May 2015 2
References● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1
● http://www.forbes.com/fdc/welcome_mjx.shtml
● ZYMR Spark Your Real-Time Big Data Analytics
● http://dataconomy.com
● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landscape/
● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8-9122f7210440&v=qf1&b=&from_search=12
● https://wiki.apache.org/hadoop/PoweredBy
● Making Sense Of Streaming Processing by Martin Kleppmann
![Page 3: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/3.jpg)
Amir Sedighi - May 2015 3
Data Explosion
![Page 4: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/4.jpg)
Amir Sedighi - May 2015 4
Data Explosion
![Page 5: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/5.jpg)
Amir Sedighi - May 2015 5
● Big-Data is that everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze.– Data Providers
● Business Companies● People
![Page 6: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/6.jpg)
Amir Sedighi - May 2015 6
Volume, Velocity, Variety● “There was 5 exabytes of
information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt
![Page 7: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/7.jpg)
Amir Sedighi - May 2015 7
Big-Data Processing
![Page 8: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/8.jpg)
Amir Sedighi - May 2015 8
How to setup a Big-Data processing platform using commodity machines?
![Page 9: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/9.jpg)
Amir Sedighi - May 2015 9
Vertical or Horizontal?
![Page 10: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/10.jpg)
Amir Sedighi - May 2015 10
Scale Up vs Scale Out
![Page 11: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/11.jpg)
Amir Sedighi - May 2015 11
Scale Up vs Scale Out
![Page 12: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/12.jpg)
Amir Sedighi - May 2015 12
Big-Data Processing Open-Source Technology Stack
![Page 13: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/13.jpg)
Amir Sedighi - May 2015 13
Map-Reduce
![Page 14: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/14.jpg)
Amir Sedighi - May 2015 14
Hadoop Framework
![Page 15: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/15.jpg)
Amir Sedighi - May 2015 15
Apache Hadoop Main Projects
![Page 16: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/16.jpg)
Amir Sedighi - May 2015 16
![Page 17: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/17.jpg)
Amir Sedighi - May 2015 17
SQL on Hadoop
● Apache Hive● Apache Drill (Dremel)● Cloudera Impala● Facebook Presto● Apache Kylin
![Page 18: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/18.jpg)
Amir Sedighi - May 2015 18
More Map-Reduce (YARN)
● Apache Spark● Apache Flink (Stratosphere)● Apache Hama● Apache Tez (DAG, Complex Data Processing)
![Page 19: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/19.jpg)
Amir Sedighi - May 2015 19
Service Programming
● Apache Thrift● Apache Zookeeper● Apache Avro● Google Kryo
![Page 20: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/20.jpg)
Amir Sedighi - May 2015 20
Data Stores
● Data Stores– KeyValue– Graph– Columnar– Document Store– In Memory
![Page 21: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/21.jpg)
Amir Sedighi - May 2015 21
Data Transfer
● Apache Flume● Apache Sqoop
![Page 22: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/22.jpg)
Amir Sedighi - May 2015 22
Search
● Elasticsearch● Apache SolR
![Page 23: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/23.jpg)
Amir Sedighi - May 2015 23
Log Management
● ELK● Logstash● FluentD
![Page 24: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/24.jpg)
Amir Sedighi - May 2015 24
Machine Learning
● Apache Mahout● MLLib● GraphX
![Page 25: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/25.jpg)
Amir Sedighi - May 2015 25
Messaging and Queuing● Apache Kafka● ZeroMQ
![Page 26: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/26.jpg)
Amir Sedighi - May 2015 26
Stream Processing
● Apache Storm● Apache Samza● Apache Spark
![Page 27: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/27.jpg)
Amir Sedighi - May 2015 27
Data Processing
Transient Query– Issued once, then forgotten
Persistent DataStored until deleted by user or apps
![Page 28: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/28.jpg)
Amir Sedighi - May 2015 28
Stream Processing
Transient Data– Deleted as Window Slides
Forward
Generated up-to-date answers as time goes on
Persistent Queries
Tim
e Ba
sed
Coun
t Bas
ed
![Page 29: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/29.jpg)
Amir Sedighi - May 2015 29
![Page 30: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/30.jpg)
Amir Sedighi - May 2015 30
![Page 31: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/31.jpg)
Amir Sedighi - May 2015 31
● http://recommender.ir
● http://helio.ir
![Page 32: Big Data Processing Utilizing Open-source Technologies - May 2015](https://reader033.fdocuments.in/reader033/viewer/2022042818/55b6e3f7bb61eb7e268b497a/html5/thumbnails/32.jpg)
Amir Sedighi - May 2015 32
Thank You!
Find this slide here:
http://www.slideshare.net/AmirSedighi
LinkedIn:
http://www.linkedin.com/in/amirsedighi
Blog:
http://hexican.com
Email:
Twitter:
@amirsedighi