Hota hadoop

31
File Systems for File Systems for Cloud Computing Cloud Computing Chittaranjan Hota, PhD Faculty Incharge, Information Processing Division Birla Institute of Technology & Science-Pilani, Hyderabad Campus Jawahar Nagar, Shameerpet, Ranga Reddy District, Hyderabad, AP, India [email protected] 16 th March 2013 Computer Sc Dept., Utkal University, Vani Vihar, Bhubaneswar

Transcript of Hota hadoop

Page 1: Hota hadoop

File Systems for File Systems for Cloud ComputingCloud Computing

Chittaranjan Hota, PhDFaculty Incharge, Information Processing Division

Birla Institute of Technology & Science-Pilani, Hyderabad CampusJawahar Nagar, Shameerpet, Ranga Reddy District, Hyderabad, AP, India

[email protected]

16th March 2013Computer Sc Dept., Utkal University, Vani Vihar, Bhubaneswar

Page 2: Hota hadoop

Growth of the InternetGrowth of the Internet

Source: Cisco VNI Global Forecast, 2011-2016

Source: Internet world stats

Page 3: Hota hadoop

Golden era in Golden era in Computing Computing

Cloud Futures 2011, Redmond

Page 4: Hota hadoop

Cloud computing: Is it Cloud computing: Is it a hype?a hype?

 from $41 billion in 2011 to $241 billion in 2020

Page 5: Hota hadoop

Scaling up…Scaling up…SETI

Page 6: Hota hadoop

What is Cloud What is Cloud Computing?Computing?

Page 7: Hota hadoop

FilesFiles•Permanent Storage•Information sharing •Files have data and attributes

Page 8: Hota hadoop

What Distributed File What Distributed File System ProvidesSystem Provides

• Provide accesses to data stored at servers using file system interfaces

• What are the file system interfaces?o Open a file, check status on a file, close a fileo Read data from a fileo Write data to a fileo Lock a file or part of a fileo List files in a directory, delete a directoryo Delete a file, rename a file, add a symbolic link to a file

etc.

Page 9: Hota hadoop

DFS Design IssuesDFS Design Issues

• Mounting• Caching• Hints• Bulk Data Transfer• Replica management• Writing policies

Page 10: Hota hadoop

NFS architectureNFS architectureClient computer Server computer

UNIXfile

system

NFSclient

NFSserver

UNIXfile

system

Applicationprogram

Applicationprogram

Virtual file systemVirtual file system

PC

DO

S

UNIX kernel

system calls

RPC for (remote operations)

UNIX

Operations on local files

Operationson

remote files

UNIX kernel

Network

Page 11: Hota hadoop

Google File SystemGoogle File SystemMetadata: namespace, access control, mapping of files to chunks, and current location of chunks

1

2

3

4

Page 12: Hota hadoop

HDFS DesignHDFS Design

•Files stored as blockso Default 64MB

•Reliability through replicationo replicated across 3+ DataNodes

•Single NameNode coordinates access, metadatao Centralized management

•No data cachingo Little benefit due to large data sets, streaming reads

Page 13: Hota hadoop

Commodity HardwareCommodity Hardware

Page 14: Hota hadoop

HDFS ArchitectureHDFS Architecture

HDFS-Aware Application

POSIX API HDFS API

Regular VFS with local and NFS-supported files

Specific drivers

Separate HDFS view

Network stack

HDFS NameNode

HDFS NameNode

HDFS DataNodeHDFS DataNode

HDFS DataNodeHDFS DataNode

Page 15: Hota hadoop

HDFS ArchitectureHDFS ArchitectureNamenode

B

replication

Rack1 Rack2

Client

Blocks

Datanodes Datanodes

Client

Write

Read

Metadata opsMetadata(Name, replicas, …)

Block ops

Page 16: Hota hadoop

HDFS File ReadHDFS File Read

HDFS Client

Client Node

Distributed FileSystems

FSData InputStream

1: open

3: read

6: close

NameNodeNameNode

namenode

2: get block location

DataNodeDataNode

datanode

DataNodeDataNode

datanode

DataNodeDataNode

datanode

4: read5: read

Page 17: Hota hadoop

Hadoop ClustersHadoop Clusters

Page 18: Hota hadoop

Rack AwarenessRack Awareness

node

r1 r2 r1 rack

n2

d1 d2 Data center

d=2

n1 n1

d=0

n1

d=4d=6

Page 19: Hota hadoop

HDFS WriteHDFS Write

HDFS Client

Client Node

Distributed FileSystems

FSData OutputStream

1: create

3: write

6: close

NameNodeNameNode

namenode

2: create

DataNodeDataNode

datanode

DataNodeDataNode

datanode

DataNodeDataNode

datanode

4: write packet 5: ack packet

7: complete

Pipeline

4

5 5

4

Page 20: Hota hadoop

Data Center

NODE

RACK

Replica PlacementReplica Placement

Page 21: Hota hadoop

Computational GridsComputational Grids

[Source: IBM TJ Watson Research Center]

Page 22: Hota hadoop

Load DistributionLoad Distribution

Page 23: Hota hadoop

Map/ReduceMap/Reduce

Page 24: Hota hadoop

SLURMSLURM

Page 25: Hota hadoop
Page 26: Hota hadoop
Page 27: Hota hadoop
Page 28: Hota hadoop

Crowd SourcingCrowd Sourcing

Page 29: Hota hadoop

Foxtrot: Associating Foxtrot: Associating audio with locationsaudio with locations

Page 30: Hota hadoop

Allen Telescope Array 

Search for Extra Search for Extra Terrestrial Intelligence Terrestrial Intelligence

Page 31: Hota hadoop

Thank You!