Hadoop: today and tomorrow
-
Upload
steve-loughran -
Category
Technology
-
view
4.797 -
download
0
description
Transcript of Hadoop: today and tomorrow
© Hortonworks Inc. 2012
Hadoop: Today and Tomorrow
Steve Loughran– Hortonworksstevel at hortonworks.com@steveloughran
London, April 2012
© Hortonworks Inc. 2012
About me:
Page 2
• HP Labs:–Deployment, cloud infrastructure, Hadoop-in-Cloud
• Apache – member and committer–Ant (author, Ant in Action), Axis 2–Hadoop
–Dynamic deployments–Diagnostics on failures–Cloud infrastructure integration
• Joined Hortonworks in 2012–UK based: R&D + customer engagement
© Hortonworks Inc. 2012
About Hortonworks
Page 3
Hadoop at Yahoo!
40K+ Servers
170PB Storage
5M+ Monthly Jobs
1000+ Active Users
From developing and running the world's largest Hadoop clusters toadvancing open source Apache Hadoop for the broader market
HDP, training & support
2011
© Hortonworks Inc. 2012
Where is Hadoop?
•Today: Hadoop 1.x–Status & Roadmap
•Tomorrow: Hadoop 2.x–YARN–HDFS HA
•Enterprise integration
Page 4
Releases slowed with Hadoop take up
Page 5
• 64 Releases• Branches from the last 2.5 years:
–0.20.{0,1,2} – Stable release without security–0.20.2xx.y – Stable release with security
–0.21.0 – released, unstable, deprecated–0.22.0 – orphan, unstable, lack of community
Now: two release branches, one dev
Page 6
Hadoop 1.x• Stable, used in production systems• The one to use today
Hadoop 2.0• The successor• Not quite ready for use
Hadoop 2.x "trunk"• Where features & fixes first go in• If you want to help –start here
© Hortonworks Inc. 2012
Today: Hadoop 1.x
• A stable Hadoop release from the ASF–Merges various Hadoop 0.20.* branches (security, HBase support, …)
–A stable branch for patching and back-porting• Highlights:
–Security–HBase support (“append” operation)–WebHDFS–“new” MapReduce APIs complete & usable–Distribution packaging includes RPM files
Page 7
© Hortonworks Inc. 2012
WebHDFS: fast direct HTTP access
~:$ GET http://nnode:50070/webhdfs/v1/results/part-r-00000.csv?op=open
GATE4,eb8bd736445f415e18886ba037f84829,55000,2007-01-14,14:01:54,GATE4,ec58edcce1049fa665446dc1fa690638,8030803000,2007-01-14,13:52:31,GATE4,b6f07ce00f09035a6683c5e93e3c04b8,30000,2007-01-28,12:41:11,GATE4,a1bc345b756090854e9dd0011087c6c0,30000,2007-01-28,12:59:33,...
Page 8
Potential Uses:
Out of cluster access to HDFS
Cross-cluster, cross version HDFS access
Native filesystem clients
dfs.webhdfs.enabled=true
© Hortonworks Inc. 2012
Hortonworks Data Platform HDP1
Based on Hadoop 1.0, adds–HCatalog for table and schema management–Open APIs for metadata, data movement, app & job
management–Consumable “standard Hadoop” stack:
Hadoop 1.0.x core (HDFS, MapReduce)
Pig 0.9.x data flow programming language
Hive 0.8.x SQL-like language
HBase 0.92.x column table datastore
HCatalog 0.3.x table and schema management
ZooKeeper 3.4.x coordinator
Page 9
© Hortonworks Inc. 2012
Post-SQL KVS & Column Tables
Page 10
Project Voldemort
© Hortonworks Inc. 2012
Analysis tooling maturing
Page 11
DataFu
Pig
© Hortonworks Inc. 2012
Ingress
Page 12
facebook / scribe
Fluentd
Kafka
© Hortonworks Inc. 2012
Keep an eye on the graph layer
Page 13
Apache Giraph
Hama
Workshop: Beyond MapReduce
© Hortonworks Inc. 2012
Tomorrow: Hadoop 2.0
Page 14
• HDFS Federation–Clear separation of Namespace and Block Storage–Snapshots– Improved scalability and isolation
• HDFS HA– Active/Standby failover of Namenodes
• Next Generation MapReduce architecture (aka YARN)–New architecture enables other application types to plug in–Resource Manager a foundation for HA and fault tolerance
• Performance!
In beta 2012
© Hortonworks Inc. 2012
HDFS HA
NNActive
NNStandby
DN
FailoverControllerActive
ZK
CmdsMonitor Health of NN. OS, HW
Monitor Health of NN. OS, HW
Block Reports to Active & StandbyDN fencing: Update cmds from one
DN DN
FailoverControllerStandby
ZK ZKHeartbeat Heartbeat
© Hortonworks Inc. 2012
YARN: foundation of a datacentre OS
Multiple topology-aware applications in a single cluster
© Hortonworks Inc. 2012
Microsoft embraces Hadoop
Page 17
Good for enterprises & developers
Great for end users!
© Hortonworks Inc. 2012Page 18
Oracle accepts NoSQL
May 2011: “Don't be risking your data on NoSQL databases.”
Sept 2011:“Oracle NoSQL Database provides network-
accessible multi-terabyte distributed key/value pair storage with predictable latency. ”
• Oracle need compatible SQL & NoSQL business plans• & to justify high-end servers over “commodity” x86 boxes• Could drive Hadoop-centric JVM development
© Hortonworks Inc. 2012
Open Source “Enterprise” Tooling
Application Layer
• Spring Data for Hadoop in Beta
• Cascading → Apache 2.0 License
OS Layer
• RedHat building Hadoop story
• Canonical assisting Hadoop packaging
Page 19
© Hortonworks Inc. 2012
What does all this mean?
Page 20
© Hortonworks Inc. 2012Page 21
facebook: 45 PB, Yahoo! 180+PB
© Hortonworks Inc. 2012
Hadoop has the momentum
• Platform: stable version & evolving version• Tooling & layers: ecosystem • Commercial training and support• Adoption by enterprise vendors
Page 22
© Hortonworks Inc. 2011
Hadoop is the Big Data Platform
Page 23
© Hortonworks Inc. 2012
Get involved with the Apache project!
•Join the -user mailing lists– [email protected]– [email protected]– [email protected]
•File bug reports in JIRA•Contribute to the documentation•Add: patches, tests, features, …
Page 24
© Hortonworks Inc. 2012
Questions?
hortonworks.com
Page 25
© Hortonworks Inc. 2012
hortonworks.com
Page 26