Data infrastructure and Hadoop at LinkedIn
-
Upload
hari-shankar -
Category
Technology
-
view
948 -
download
1
description
Transcript of Data infrastructure and Hadoop at LinkedIn
![Page 1: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/1.jpg)
Big data and Hadoop
September 2012
Hari Shankar Menon
Software engineer
1
![Page 2: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/2.jpg)
2
LinkedIn Engineering Data warehouse team
Previously, Software engineer @Clickable– Worked on building the reporting and analytics platform on
Hadoop and HBase.
Hadoop and Open-source enthusiast
About me
![Page 3: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/3.jpg)
3
About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges
Agenda
![Page 4: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/4.jpg)
Our mission
Connect the world’s professionals to make them more productive and successful
4
![Page 5: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/5.jpg)
5
*as of Nov 4, 2011**as of June 30, 2011
2004 2005 2006 2007 2008 2009 2010
2 48
17
32
55
90
LinkedIn Members (Millions)
175M+
85%Fortune 100 Companies use LinkedIn to hire
Company Pages
>2M
**
New Members joining
~2/sec
Professional searches in 2011
~4.2B
LinkedIn by numbers
![Page 6: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/6.jpg)
6
About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges
![Page 7: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/7.jpg)
* Chart from Philip Russom- Research Director: TDWI
What is big data?
![Page 8: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/8.jpg)
8
Infrastructure technologies
Databus
Primary data store (Front-end)Distributed key-value store
Document-oriented store
Distributed PubSub messaging
Search technologies
Database change replication SenseiDB
Zoie Bobo
![Page 9: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/9.jpg)
9
http://data.linkedin.com/opensource
Open source
![Page 10: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/10.jpg)
10
About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges
![Page 11: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/11.jpg)
11
What is Hadoop Evolution of Hadoop Impact
![Page 12: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/12.jpg)
12
Recommendation systems– Generating recommendations– Modeling– A/B Testing– Grandfathering
Data warehouse/ETL– Raw data storage– Aggregations– Heavy lifting
Data sciences– Strategic analyses– Experimentation sandbox
@
![Page 13: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/13.jpg)
13
Pandora Search for People
Events YouMay BeInterested In
Groups browse maps
The Recommendations opportunity
• Relevance/Latency
• Offline computation
• Caching
![Page 14: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/14.jpg)
14
Improving recommendations
• Mathematical modeling
• A/B Testing
• Grandfathering
![Page 15: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/15.jpg)
15
Hadoop in the Data warehouse
• Source of truth• Lower retention• Ad-hoc analysis
• Longer retention• Complex
transformations• Algorithmic
computations
![Page 16: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/16.jpg)
16
Hadoop in Data Sciences
• Deep dives
• Sandbox
• Hackday projects
![Page 17: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/17.jpg)
17
Data Insights - 1
Job migration after financial collapse
![Page 18: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/18.jpg)
18
Data Insights - 2
![Page 19: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/19.jpg)
19
Data Insights - 3
![Page 20: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/20.jpg)
20
About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges
![Page 21: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/21.jpg)
21
1. User adoption of new technologies
2. Real-time processing
3. Graph/Network algorithms
4. Making data accessible
Challenges
![Page 22: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/22.jpg)
22
User adoption
![Page 23: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/23.jpg)
23
• Challenges• Random reads/writes• Warm-up time
• Solutions• Parts of the problem that can be moved offline?• HBase, Voldemort
Real-time processing
![Page 24: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/24.jpg)
24
• Graph problems• Traditional joins
Map-reduce-incompatible problems
![Page 25: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/25.jpg)
25
• Hadoop Tons of data
Making data accessible
![Page 26: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/26.jpg)
26
Finally!
No Silver bullet
Hadoop Offline processing
Scalability by design
![Page 27: Data infrastructure and Hadoop at LinkedIn](https://reader034.fdocuments.in/reader034/viewer/2022050815/54566fd2af795953128b45b5/html5/thumbnails/27.jpg)
27
www.linkedin.com/in/harisreekumar
www.linkedin.com/company/linkedin/careers