NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
-
Upload
jason-shao -
Category
Technology
-
view
3.280 -
download
3
description
Transcript of NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
04/08/2023 © MapR Confidential 1
MapR, Architecture, Philosophy and Applications
NY HUG – October 2011
04/08/2023 © MapR Confidential 2
Outline
• Architecture (MapR)• Philosophy• Architectural (Machine learning)
04/08/2023 © MapR Confidential 3
Map-Reduce, the Original Mission
Input Output
Shuffle
04/08/2023 © MapR Confidential 4
Bottlenecks and Issues
• Read-only files• Many copies in I/O path• Shuffle based on HTTP• Can’t use new technologies• Eats file descriptors
• Spills go to local file space• Bad for skewed distribution of sizes
04/08/2023 © MapR Confidential 5
MapR Areas of Development
Map Re-
duce
Storage Service
s
Ecosystem
HBase
Management
04/08/2023 © MapR Confidential 6
MapR Improvements
• Faster file system• Fewer copies• Multiple NICS• No file descriptor or page-buf competition
• Faster map-reduce• Uses distributed file system• Direct RPC to receiver• Very wide merges
04/08/2023 © MapR Confidential 7
MapR Innovations
• Volumes• Distributed management• Data placement
• Read/write random access file system• Allows distributed meta-data• Improved scaling• Enables NFS access
• Application-level NIC bonding• Transactionally correct snapshots and mirrors
04/08/2023 © MapR Confidential 8
MapR's Containers
Each container contains Directories & files Data blocks
Replicated on servers No need to manage
directly
Files/directories are sharded into blocks, whichare placed into mini NNs (containers ) on disks
Containers are 16-32 GB segments of disk, placed on nodes
04/08/2023 © MapR Confidential 9
MapR's Containers
Each container has a replication chain
Updates are transactional Failures are handled by
rearranging replication
04/08/2023 © MapR Confidential 10
MapR's Containers
Each container contains Directories & files Data blocks
Replicated on servers No need to manage
directly
Files/directories are sharded into blocks, whichare placed into mini NNs (containers ) on disks
Containers are 16-32 GB segments of disk, placed on nodes
04/08/2023 © MapR Confidential 11
Container locations and replication
CLDB
N1, N2
N3, N2
N1, N2
N1, N3
N3, N2
N1
N2
N3Container location database (CLDB) keeps track of nodes hosting each container and replication chain order
04/08/2023 © MapR Confidential 12
MapR ScalingContainers represent 16 - 32GB of data
Each can hold up to 1 Billion files and directories 100M containers = ~ 2 Exabytes (a very large cluster)
250 bytes DRAM to cache a container 25GB to cache all containers for 2EB cluster
But not necessary, can page to disk Typical large 10PB cluster needs 2GB
Container-reports are 100x - 1000x < HDFS block-reports Serve 100x more data-nodes Increase container size to 64G to serve 4EB cluster
Map/reduce not affected
04/08/2023 © MapR Confidential 13
MapR's Streaming Performance
Read Write0
250
500
750
1000
1250
1500
1750
2000
2250
Read Write0
250
500
750
1000
1250
1500
1750
2000
2250
HardwareMapRHadoopMB
persec
Tests: i. 16 streams x 120GB ii. 2000 streams x 1GB
11 x 7200rpm SATA 11 x 15Krpm SAS
Higher is better
04/08/2023 © MapR Confidential 14
Terasort on MapR
1.0 TB0
10
20
30
40
50
60
3.5 TB0
50
100
150
200
250
300
MapRHadoop
Elapsed time (mins)
10+1 nodes: 8 core, 24GB DRAM, 11 x 1TB SATA 7200 rpm
Lower is better
04/08/2023 © MapR Confidential 15
HBase on MapR
Recordsper
second
Higher is betterZipfian Uniform0
5000
10000
15000
20000
25000
MapRApache
YCSB Random Read with 1 billion 1K records10+1 node cluster: 8 core, 24GB DRAM, 11 x 1TB 7200 RPM
04/08/2023 © MapR Confidential 16
# of files (m)
Rate
(file
s/se
c)
Op: - create file - write 100 bytes - close
Notes:
- NN not replicated
- NN uses 20G DRAM
- DN uses 2G DRAM
Out of box
Tuned
Small Files (Apache Hadoop, 10 nodes)
04/08/2023 © MapR Confidential 17
MUCH faster for some operations
# of files (millions)
CreateRate
Same 10 nodes …
04/08/2023 © MapR Confidential 18
What MapR is not
• Volumes != federation• MapR supports > 10,000 volumes all with
independent placement and defaults• Volumes support snapshots and mirroring
• NFS != FUSE• Checksum and compress at gateway• IP fail-over• Read/write/update semantics at full speed
• MapR != maprfs
04/08/2023 © MapR Confidential 19
Philosophy
04/08/2023 © MapR Confidential 20
Physics of startup companies
04/08/2023 © MapR Confidential 21
For startups
• History is always small• The future is huge• Must adopt new technology to survive• Compatibility is not as important• In fact, incompatibility is assumed
04/08/2023 © MapR Confidential 22
Startup phase
Absolute growth still very large
Physics of large companies
04/08/2023 © MapR Confidential 23
For large businesses
• Present state is always large• Relative growth is much smaller• Absolute growth rate can be very large• Must adopt new technology to survive• Cautiously!• But must integrate technology with legacy
• Compatibility is crucial
04/08/2023 © MapR Confidential 24
The startup technology picture
Old computersand software
Current computersand software
Expected hardwareand software growth
No compatibility requirement
04/08/2023 © MapR Confidential 25
The large enterprise picture
Proof of concept Hadoop cluster
Long-term Hadoop cluster
Current hardwareand software
?
Must worktogether
04/08/2023 © MapR Confidential 26
What does this mean?
• Hadoop is very, very good at streaming through things in batch jobs
• Hbase is good at persisting data in very write-heavy workloads
• Unfortunately, the foundation of both systems is HDFS which does not export or import well
04/08/2023 © MapR Confidential 27
Narrow Foundations
RDBMS NAS HDFS
Sequential File ProcessingOLAP OLTP
Web Services
Map/Reduce Hbase
Pig HiveBig data is heavy and
expensive to move.
04/08/2023 © MapR Confidential 28
Narrow Foundations
• Because big data has inertia, it is difficult to move• It costs time to move• It costs reliability because of more moving parts
• The result is many duplicate copies
04/08/2023 © MapR Confidential 29
One Possible Answer
• Widen the foundation• Use standard communication protocols• Allow conventional processing to share with
parallel processing
04/08/2023 © MapR Confidential 30
Broad Foundation
RDBMS NAS HDFS
Sequential File ProcessingOLAP OLTP
Web Services
Map/Reduce Hbase
Pig Hive
MapR
04/08/2023 © MapR Confidential 31
New Capabilities
04/08/2023 © MapR Confidential 32
Export to the world
NFSServerNFS
ServerNFSServerNFS
ServerNFSClient
04/08/2023 © MapR Confidential 33
Client
NFSServer
Local server
Application
Cluster Nodes
04/08/2023 © MapR Confidential 34
ClusterNode
NFSServer
Universal export to self
Task
Cluster Nodes
04/08/2023 © MapR Confidential 35
ClusterNode
NFSServer
Task
ClusterNode
NFSServer
Task
ClusterNode
NFSServer
Task
Nodes are identical
04/08/2023 © MapR Confidential 36
Application architecture
• High performance map-reduce is nice
• But algorithmic flexibility is even nicer
04/08/2023 © MapR Confidential 37
??
Hybrid model flow
Map-reduce
Map-reduce
Feature extraction and
down sampling
SVD(PageRank)(spectral)
DeployedModel
Down stream
modeling
04/08/2023 © MapR Confidential 38
04/08/2023 © MapR Confidential 39
Map-reduceSequential
Hybrid model flow
Feature extraction and
down sampling
SVD(PageRank)(spectral)
DeployedModel
Down stream
modeling
04/08/2023 © MapR Confidential 40
Sharded text indexing
• Mapper assigns document to shard• Shard is usually hash of document id
• Reducer indexes all documents for a shard• Indexes created on local disk• On success, copy index to DFS• On failure, delete local files
• Must avoid directory collisions • can’t use shard id!
• Must manage and reclaim local disk space
04/08/2023 © MapR Confidential 41
Sharded text Indexing
MapReducer
Input documents
Localdisk Search
EngineLocal
disk
Clustered index storage
Assign documents to shards
Index text to local disk and then copy index to
distributed file store
Copy to local disk typically required before
index can be loaded
04/08/2023 © MapR Confidential 42
Conventional data flow
MapReducer
Input documents
Localdisk Search
EngineLocal
disk
Clustered index storage
Failure of a reducer causes garbage to accumulate in the
local disk
Failure of search engine requires
another download of the index from clustered storage.
04/08/2023 © MapR Confidential 43
SearchEngine
Simplified NFS data flows
MapReducer
Input documents
Clustered index storage
Failure of a reducer is cleaned up by
map-reduce framework
Search engine reads mirrored index directly.
Index to task work directory via NFS
04/08/2023 © MapR Confidential 44
Simplified NFS data flows
MapReducer
Input documents
SearchEngine
Mirrors
SearchEngine
Mirroring allows exact placement
of index data
Aribitrary levels of replication also possible
04/08/2023 © MapR Confidential 45
K-means
• Classic E-M based algorithm• Given cluster centroids,• Assign each data point to nearest centroid• Accumulate new centroids• Rinse, lather, repeat
04/08/2023 © MapR Confidential 46
Aggregatenew
centroids
K-means, the movie
Assignto
Nearestcentroid
Centroids
Input
04/08/2023 © MapR Confidential 47
Averagemodels
Parallel Stochastic Gradient Descent
Trainsub
model
Model
Input
04/08/2023 © MapR Confidential 48
Updatemodel
Variational Dirichlet Assignment
Gathersufficientstatistics
Model
Input
04/08/2023 © MapR Confidential 49
Old tricks, new dogs
• Mapper• Assign point to cluster• Emit cluster id, (1, point)
• Combiner and reducer• Sum counts, weighted sum of points• Emit cluster id, (n, sum/n)
• Output to HDFS
Read fromHDFS to local disk by distributed cache
Written by map-reduce
Read from local disk from distributed cache
04/08/2023 © MapR Confidential 50
Old tricks, new dogs
• Mapper• Assign point to cluster• Emit cluster id, (1, point)
• Combiner and reducer• Sum counts, weighted sum of points• Emit cluster id, (n, sum/n)
• Output to HDFSMapR FS
Read fromNFS
Written by map-reduce
04/08/2023 © MapR Confidential 51
Poor man’s Pregel
• Mapper
• Lines in bold can use conventional I/O via NFS
51
while not done: read and accumulate input models for each input: accumulate model write model synchronize reset input formatemit summary
04/08/2023 © MapR Confidential 52
Click modeling architecture
Featureextraction
anddown
sampling
Input
Side-data
Datajoin
SequentialSGD
Learning
Map-reduce
Now via NFS
04/08/2023 © MapR Confidential 53
Click modeling architecture
Map-reduceMap-reduce
Featureextraction
anddown
sampling
Input
Side-data
Datajoin
SequentialSGD
Learning
Map-reduce cooperates
with NFSSequential
SGDLearning
SequentialSGD
Learning
SequentialSGD
Learning
04/08/2023 © MapR Confidential 54
Trivial visualization interface
• Map-reduce output is visible via NFS
• Legacy visualization just works
$ R> x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”)> plot(error ~ t, x)> q(save=‘n’)
04/08/2023 © MapR Confidential 55
Conclusions
• We used to know all this• Tab completion used to work• 5 years of work-arounds have clouded our
memories
• We just have to remember the future