Cloud Computing Cloud Computing Overview of Distributed Computing.
Building a Business on Open Source Distributed Computing
-
Upload
oleksiy-kovyrin -
Category
Documents
-
view
220 -
download
0
Transcript of Building a Business on Open Source Distributed Computing
-
8/14/2019 Building a Business on Open Source Distributed Computing
1/89
Building a Business on Open SourceDistributed Computing
company: www.visibletechnologies.com
blog: www.roadtofailure.comtwitter: @lusciouspear
Sunday, December 20, 2009
http://www.roadtofailure.com/http://www.roadtofailure.com/http://www.visibletechnologies.com/http://www.visibletechnologies.com/ -
8/14/2019 Building a Business on Open Source Distributed Computing
2/89
Social Media and Scaling
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
3/89
Social Media and Scaling
Scalability Matters Now.
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
4/89
Social Media and Scaling
Scalability Matters Now.
SM produces large, complex data
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
5/89
Social Media and Scaling
Scalability Matters Now.
SM produces large, complex data
Anyone can collect the web
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
6/89
Social Media and Scaling
Scalability Matters Now.
SM produces large, complex data
Anyone can collect the web
Make a Twitter in a few days
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
7/89
Social Media and Scaling
Scalability Matters Now.
SM produces large, complex data
Anyone can collect the web
Make a Twitter in a few days
Easy to get TBs of data
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
8/89
Social Media and Scaling
Scalability Matters Now.
SM produces large, complex data
Anyone can collect the web
Make a Twitter in a few days
Easy to get TBs of data
Big Data enabling new fields forcompanies
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
9/89
What Visible Does
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
10/89
What Visible Does
BI and Brand Management on SocialMedia
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
11/89
What Visible Does
BI and Brand Management on SocialMedia
Listen, Monitor, Engage
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
12/89Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
13/89Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
14/89Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
15/89
Old Product: RDBMS
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
16/89
Old Product: RDBMS
A few MSSQL servers on boxes
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
17/89
Old Product: RDBMS
A few MSSQL servers on boxes
Lots of ETL
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
18/89
Old Product: RDBMS
A few MSSQL servers on boxes
Lots of ETL
Several TB, inserts slow, deletes
impossible, random fail
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
19/89
Why RDBMS Bad
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
20/89
Why RDBMS Bad
Nonlinear scale cost
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
21/89
Why RDBMS Bad
Nonlinear scale cost
Used as a storage abstraction
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
22/89
Why RDBMS Bad
Nonlinear scale cost
Used as a storage abstraction
Mainly Select, Join, Group, Count
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
23/89
Why RDBMS Bad
Nonlinear scale cost
Used as a storage abstraction
Mainly Select, Join, Group, CountSpecialized Scale-Out ones meh
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
24/89
Why RDBMS Bad
Nonlinear scale cost
Used as a storage abstraction
Mainly Select, Join, Group, CountSpecialized Scale-Out ones meh
Impedance Mismatch - Try to be High-Throughput, Low-Latency
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
25/89
Why RDBMS Bad
Nonlinear scale cost
Used as a storage abstraction
Mainly Select, Join, Group, CountSpecialized Scale-Out ones meh
Impedance Mismatch - Try to be High-Throughput, Low-Latency
Swiss-army knife, unstable,
transactions, advanced SQL, tuningSunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
26/89
Why OSS?
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
27/89
Why OSS?
Previously all MS
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
28/89
Why OSS?
Previously all MS
It exists!
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
29/89
Why OSS?
Previously all MS
It exists!Scaling + Licensing = No
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
30/89
Why OSS?
Previously all MS
It exists!Scaling + Licensing = No
Cant build a platform without source
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
31/89
Why OSS?
Previously all MS
It exists!Scaling + Licensing = No
Cant build a platform without source
Its Enterprise Now!
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
32/89
Goals for New Platform
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
33/89
Goals for New Platform
Golden Timeline
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
34/89
Goals for New Platform
Golden Timeline
Search/Analyze *any* data
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
35/89
Goals for New Platform
Golden Timeline
Search/Analyze *any* dataLinear Cost
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
36/89
Goals for New Platform
Golden Timeline
Search/Analyze *any* dataLinear Cost
Not Hacked Together
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
37/89
Goals for New Platform
Golden Timeline
Search/Analyze *any* dataLinear Cost
Not Hacked Together
Collect the Social Internet
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
38/89
HOW TO SCALE
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
39/89
HOW TO SCALE
What makes you special?
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
40/89
HOW TO SCALE
What makes you special?What are you willing to sacrifice?
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
41/89
HOW TO SCALE
What makes you special?What are you willing to sacrifice?
How will you structure the data?
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
42/89
Avoiding Impedance Mismatch
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
43/89
Avoiding Impedance Mismatch
Most problems can be divided intoHigh or Low latency
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
44/89
Avoiding Impedance Mismatch
Most problems can be divided intoHigh or Low latency
Get a lot of data eventually, or a littlenow
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
45/89
Avoiding Impedance Mismatch
Most problems can be divided intoHigh or Low latency
Get a lot of data eventually, or a littlenow
MapReduce vs. Sharding/Indexing
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
46/89
Ecosystem
Hadoop DFS
HBase
Hive
MapReduce
CascadingPig
Katta/App
lications
Zookeeper
Unstructured
Storage
Structured
Storage
Raw
Processing
Compiled
Processing
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
47/89
Simple Workflow
Collect SemanticAnalysis
UnstructuredAnalysis
Store in
HBase
StructuredAnalysis
Indexing
Pull
Indexes
Load/
Replicate
Shards Search
Store in
Hadoop
Hadoop
Hadoop +
HBase
Lucene+
Solr+
Katta
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
48/89
Unstructured Processing Cluster
CollectSemantic
Analysis
Unstructured
AnalysisInternet
XMLHTMLHBase
Records
Structured
Store
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
49/89
Hadoop + MR
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
50/89
Hadoop + MR
Special: Crunch web-scale data fast
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
51/89
Hadoop + MR
Special: Crunch web-scale data fast
Sacrifice: Low-Latency, Transactions,Random Access, Updates
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
52/89
Hadoop + MR
Special: Crunch web-scale data fast
Sacrifice: Low-Latency, Transactions,Random Access, Updates
Structure: Chunked flat files
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
53/89
Structured Processing Cluster
Store in
HBase
Structured
Analysis
IndexingStore in
HadoopHBase
Records
Unstructured
ClusterSearch
Cluster
Lucene Index ShardedLucene Index
Enriched Data
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
54/89
Document Structure
ContentID: 00BAC189
Title: Iron Maiden Rules
Body: I think Janick Gers is an amazing guitarist blah blah
PostDT: 20090718
ParentID: 0FDEADBEEF
Permalink: www.roadtofailure.com/post?=20
Sunday, December 20, 2009
http://www.roadtofailure.com/post?=20http://www.roadtofailure.com/post?=20http://www.roadtofailure.com/post?=20 -
8/14/2019 Building a Business on Open Source Distributed Computing
55/89
HBase
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
56/89
HBase
Special: Scalable random/sequential
access almost as fast as RDBMS
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
57/89
HBase
Special: Scalable random/sequentialaccess almost as fast as RDBMS
Sacrifice: Joins, Secondary Indexes,Transactions (kind of)
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
58/89
HBase
Special: Scalable random/sequentialaccess almost as fast as RDBMS
Sacrifice: Joins, Secondary Indexes,Transactions (kind of)
Structure: BigTable - column oriented
Sunday, December 20, 2009
h l
-
8/14/2019 Building a Business on Open Source Distributed Computing
59/89
Search Cluster
Pull
Indexes
Load/
ReplicateShards
Lucene
Indexes from
HDFS
Lucene
Indexes
Lucene
Indexes
Search
Sunday, December 20, 2009
h
-
8/14/2019 Building a Business on Open Source Distributed Computing
60/89
Search
Sunday, December 20, 2009
l
-
8/14/2019 Building a Business on Open Source Distributed Computing
61/89
Katta + Solr
Sunday, December 20, 2009
S l
-
8/14/2019 Building a Business on Open Source Distributed Computing
62/89
Katta + Solr
Special: Sharded search
Sunday, December 20, 2009
K S l
-
8/14/2019 Building a Business on Open Source Distributed Computing
63/89
Katta + Solr
Special: Sharded search
Sacrifice: Consistency, high-throughput
Sunday, December 20, 2009
K S l
-
8/14/2019 Building a Business on Open Source Distributed Computing
64/89
Katta + Solr
Special: Sharded search
Sacrifice: Consistency, high-throughput
Structure: Reverse index
Sunday, December 20, 2009
BI
-
8/14/2019 Building a Business on Open Source Distributed Computing
65/89
BI
Sunday, December 20, 2009
BI
-
8/14/2019 Building a Business on Open Source Distributed Computing
66/89
BI
Group, Sort, Filter, Count, Sum
Sunday, December 20, 2009
BI
-
8/14/2019 Building a Business on Open Source Distributed Computing
67/89
BI
Group, Sort, Filter, Count, Sum
Semi-additive (Avg) rare but not hard
Sunday, December 20, 2009
BI
-
8/14/2019 Building a Business on Open Source Distributed Computing
68/89
BI
Group, Sort, Filter, Count, Sum
Semi-additive (Avg) rare but not hard
MapReduce Jobs
Sunday, December 20, 2009
BI
-
8/14/2019 Building a Business on Open Source Distributed Computing
69/89
BI
Group, Sort, Filter, Count, Sum
Semi-additive (Avg) rare but not hard
MapReduce Jobs
Faceted Search
Sunday, December 20, 2009
E l
-
8/14/2019 Building a Business on Open Source Distributed Computing
70/89
Examples
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
71/89
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
72/89
Sunday, December 20, 2009
Ch ll
-
8/14/2019 Building a Business on Open Source Distributed Computing
73/89
Challenges
Sunday, December 20, 2009
Ch ll
-
8/14/2019 Building a Business on Open Source Distributed Computing
74/89
Challenges
Scaling Search
Sunday, December 20, 2009
Ch ll
-
8/14/2019 Building a Business on Open Source Distributed Computing
75/89
Challenges
Scaling Search
Understanding Latency
Sunday, December 20, 2009
Ch ll
-
8/14/2019 Building a Business on Open Source Distributed Computing
76/89
Challenges
Scaling Search
Understanding LatencyWhat do we need now? Can
customers wait for big data?
Sunday, December 20, 2009
Ch ll
-
8/14/2019 Building a Business on Open Source Distributed Computing
77/89
Challenges
Scaling Search
Understanding LatencyWhat do we need now? Can
customers wait for big data?
Monitoring
Sunday, December 20, 2009
R R l f S li
-
8/14/2019 Building a Business on Open Source Distributed Computing
78/89
Recap: Rules for Scaling
Sunday, December 20, 2009
R R l f S li
-
8/14/2019 Building a Business on Open Source Distributed Computing
79/89
Recap: Rules for Scaling
RDBMS is not a Swiss-Army Knife
Sunday, December 20, 2009
Recap: R les for Scaling
-
8/14/2019 Building a Business on Open Source Distributed Computing
80/89
Recap: Rules for Scaling
RDBMS is not a Swiss-Army Knife
Know your sacrifices
Sunday, December 20, 2009
Recap: Rules for Scaling
-
8/14/2019 Building a Business on Open Source Distributed Computing
81/89
Recap: Rules for Scaling
RDBMS is not a Swiss-Army Knife
Know your sacrifices
Know your specialness
Sunday, December 20, 2009
Recap: Rules for Scaling
-
8/14/2019 Building a Business on Open Source Distributed Computing
82/89
Recap: Rules for Scaling
RDBMS is not a Swiss-Army Knife
Know your sacrifices
Know your specialness
Know your data structure
Sunday, December 20, 2009
Recap: Rules for Scaling
-
8/14/2019 Building a Business on Open Source Distributed Computing
83/89
Recap: Rules for Scaling
RDBMS is not a Swiss-Army Knife
Know your sacrifices
Know your specialness
Know your data structure
Ponder Latency
Sunday, December 20, 2009
What Next?
-
8/14/2019 Building a Business on Open Source Distributed Computing
84/89
What Next?
Sunday, December 20, 2009
What Next?
-
8/14/2019 Building a Business on Open Source Distributed Computing
85/89
What Next?
HBase Analytics?
Sunday, December 20, 2009
What Next?
-
8/14/2019 Building a Business on Open Source Distributed Computing
86/89
What Next?
HBase Analytics?
What would make a bank trust it
Sunday, December 20, 2009
What Next?
-
8/14/2019 Building a Business on Open Source Distributed Computing
87/89
What Next?
HBase Analytics?
What would make a bank trust it
Teach people to think about data
Sunday, December 20, 2009
-
8/14/2019 Building a Business on Open Source Distributed Computing
88/89
...
Sunday, December 20, 2009
The End
-
8/14/2019 Building a Business on Open Source Distributed Computing
89/89
The End
company: www.visibletechnologies.com
blog: www.roadtofailure.comtwitter: @lusciouspear
http://www.roadtofailure.com/http://www.roadtofailure.com/http://www.visibletechnologies.com/http://www.visibletechnologies.com/