Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object Storage
Building the world with Elastic Map Reduce
-
Upload
tim-jenks -
Category
Technology
-
view
141 -
download
2
description
Transcript of Building the world with Elastic Map Reduce
apps on maps...
Building the World with Elastic Map Reduce
Oliver Norton, Technical DirectorTim Jenks, Technical Lead
appsonmaps.com
AS3 SDK – Apps in Browser
JS API – Embed on your website
IOS SDK – Mobile, coming Q4 2012
social media
social commerce
traffic updates
journey planning
worldflightclub.com
flying like a bird
World Flight Club on YouTube
flipping the bird!
Flipper on YouTube
photographic maps
Photographic-based maps ….
layered data
We fuse layered data to procedurally generate our maps(using AWS’ Elastic Map Reduce)
streamed
All built & served from off-the-shelf Amazon Web Service infrastructure
pipeline
data size
Over 2TB Data
Terrain:GB (10m, ¼ million Km2)US (10m, 40x GB)
Buildings :GB (full coverage)
US (120 cities)
Roads: GB (¼ million miles)
US (4 million miles)
Processing this can start to be expensive $$$
before
• Limited scalability -> 60 desktop spec machines
• Multi-TB SAN with a £10k/year maintenance cost
• In house build that needed maintaining
• 10mbit/sec symmetric internet to upload TBs of data
• 3 developers knew how to run builds
• Electric costs -> who knows…
now
On Amazon Elastic Map Reduce
now
• Scalability -> 800 m1.large instances
• Off shelf tech that’s discoverable (hadoop, MRJob)
• Maintenance reduced
• Data is already in cloud (source, and destination)
• More predictable costs (and happier costs, with spot pricing)
• DevOps benefits: Now any engineer can write and run jobs, not just 3
pipeline
AWS
S3AWS
EMR AWS
S3AWS
CloudFront
AWS
EC2
mrjob
• MRjob from Yelp
• http://github.com/Yelp/mrjob
800 machines in 20linesclass MyMapReduceJob(MRJob): def mapper_init(self): self.__mapper = # wire up mapper
def mapper(self, key, line): # perform map work for key, value in self.__mapper.map(line, None): yield str(key), value
def reducer_init(self): self.__reducer = # wire up reducer
def reducer(self, key, values): # perform reduce work result = self.__reducer.reduce(key, values) if result: yield key, "built successfully" else: yield key, "failed"
if __name__ == '__main__': MyMapReduceJob.run()
amazon emr
Processing Complexity X Data Size
thanks
appsonmaps.com
Twitter: @eeGeo