Distributed Data Processing Workshop - SBU
-
Upload
amir-sedighi -
Category
Data & Analytics
-
view
1.433 -
download
1
description
Transcript of Distributed Data Processing Workshop - SBU
![Page 1: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/1.jpg)
1
کارگاه پردازش داده توزیع شده
پردیس- شهیدبهشتی
دانشکده علوم و مهندسی کامپیوتر
پایگاه داده توزیع شدهدرس:
دکتر هادی طباطباییاستاد:
ابوالفضل صدیقی ارائه: ۱۳۹۳آبان
![Page 2: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/2.jpg)
Distributed Data Processing
School of Computer Science and Engineering
A. Sedighi
@amirsedighiHexican.com
![Page 3: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/3.jpg)
3
Every Game needs it's Playing Yard
![Page 4: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/4.jpg)
4
Every Game needs it's Playing Yard
![Page 5: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/5.jpg)
5
What can I do on a Single Machine?
● MVC Programming
● Regular Biz Apps
● 100 GBs Data
● Web Surfing
● ...
![Page 6: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/6.jpg)
6
Linux Cluster
![Page 7: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/7.jpg)
7
![Page 8: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/8.jpg)
8
![Page 9: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/9.jpg)
9
Introduction
This is a 4 sessions, hands-on, step-by-step
tutorial on setting up, a Linux cluster on your
machine (Notebook or PC), to try a few number
of big-data processing frameworks and tools.
![Page 10: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/10.jpg)
10
What we are going to do?
● Your notebook, or a PC is just enough for starting.– Setting your Linux cluster up.
● Distributed Log Management and Realtime Search-Engines– What is Elasticsearch?
– Elasticsearch on the cluster.
– Monitoring and Usage.
● The most popular Distributed Data Processing Framework.– What is Apache Hadoop?
– Apache Hadoop on the cluster.
– Using Scenarios.
![Page 11: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/11.jpg)
11
What we would Learn?
● Leveraging our knowledge of Big-Data.
● Getting familiar with distributed data processing.
● Maximizing availability and reliability.
● Increasing data storage capacity.
● Leveraging data processing performance.
● Data locality is a silver bullet.
● Increasing cluster utilization.
● Taming giants by giving them a try.
![Page 12: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/12.jpg)
12
Preparing the Linux Cluster - VirtualBox
![Page 13: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/13.jpg)
13
Preparing the Cluster - Hosting
● VirtualBox
– Memory Size, Disk Capacity and CPU cores.
– Network Interfaces.● NAT, provides Internet.● Host-Only, provides cluster communication.
![Page 14: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/14.jpg)
14
Preparing the Cluster – Adding a Host-Only Network
![Page 15: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/15.jpg)
15
Preparing the Cluster – Adding a NAT Interface
![Page 16: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/16.jpg)
16
Preparing the Cluster – Adding a Host-Only Interface
![Page 17: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/17.jpg)
17
Preparing the Cluster – First Node
● Creating a Linux machine inside VirtualBox.
● Installing Linux. (I've used Ubuntu 12.04)
– Check Samba
– Check OpenSSH
● Give the first node all.
– Having an “install” folder on.
– Having primitives such as Java installed on.
● Shutting down the first node.
![Page 18: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/18.jpg)
18
Preparing the Cluster – Cloning, The Virtual Box Side
● Cloning the first node. (tutorial)
![Page 19: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/19.jpg)
19
Preparing the Cluster – Cloning, the Linux side
● Turning the new node on.
● Network configuration
– sudo nano /etc/hosts
– sudo nano /etc/hostname
– sudo nano /etc/network/interfaces
– sudo rm /etc/udev/rules.d/70-persistent-net.rules
● sudo reboot
![Page 20: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/20.jpg)
20
Preparing the Cluster – No Password Login
● Do this:
– ssh-keygen
– ssh-copy-id -i ~/.ssh/id_rsa.pub user@host
● Or this:
– ssh-keygen -t dsa -p '' -f ~/.ssh/id_dsa
– scp .ssh/id_rsa.pub user@host:~/master_key
– ssh user@host
– cat master_key >> ./ssh/authorized_keys
![Page 21: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/21.jpg)
21
Preparing the Cluster – Distributed Shell
● Do it like a Commander
– Installing DSH (Optional)
![Page 22: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/22.jpg)
22
Preparing the Cluster – Enjoy it
● To scale your cluster just repeat the cloning step.
![Page 23: Distributed Data Processing Workshop - SBU](https://reader033.fdocuments.in/reader033/viewer/2022060202/559b9f4d1a28ab14448b47d8/html5/thumbnails/23.jpg)
23
Next?
● An introduction to distributed Log Management and analytical search-engines.– How Elasticsearch works?
– Workshop.
● An introduction to Apache Hadoop
– How Apache Hadoop works?
– Workshop.