Post on 24-Jun-2015
description
Calpont InfiniDB® Accelerating Data Insights
Huge Data Analytics: Calpont InfiniDB Columnar DBMS Empowers New Research with The World’s First Searchable Genotype Database Strata Conference 2012
Calpont Proprietary and Confidential
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Today’s Agenda
• Introduction of today’s speakers •What is InfiniDB? •Announced today: InfiniDB 3 •Huge Data Analytics: InfiniDB Empowers New
Research with The World’s First Searchable Genotype Database
•Questions •More information and resources
2
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Today’s Presenters
3
Fernanda Foertter HPC Administrator / Scientific Programmer Genus plc Jim Tommaney Chief Technology Officer Calpont Corporation
What is InfiniDB?
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Calpont Corporation
• Company o Privately held and backed oOffices
Dallas (Headquarters) Silicon Valley
•Business o Scale-out MPP analytic database oMySQL Columnar + Map Reduction o Commercial Open Core model
• Products o InfiniDB Enterprise
Forthcoming 4th major release o InfiniDB Community
Modified Open Source license
5
Calpont Mission To provide a highly
scalable data platform that enables
analytic business decisions as timely as customers and markets dictate.
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Innovative Companies Turning to InfiniDB
6
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 7
What is InfiniDB?
®
Scalable
Fast
Simple
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB
8
What is InfiniDB?
Full-Featured SQL
Familiar MySQL Look and Feel
Big Data Analytics Engine
Game Changing Performance
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 9
Focus on Analytics Workloads
InfiniDB is … Engineered for large queries Engineered for ad-hoc flexibility Analytics, not OLTP Unique combination of columnar + map-reduce
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 10
What is InfiniDB?
®
Scalable
Fast
Simple
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 11
InfiniDB – Two Tier Architecture
Purpose built for big data analytics. • User Module (UM)
Understands SQL. • Performance Module (PM)
Operates on data blocks.
or …
Single Server
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 12
InfiniDB Performance Foundations
®
The Power and Scale of Map-Reduce plus
Transformational I/O Efficiency
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 13
Power and Scalability of Map-Reduce
SQL Operations are mapped to Performance Module threads • Parallel/Distributed Data Access • Parallel/Distributed Joins (Inner, Outer) • Parallel/Distributed Sub-queries (From, Where, Select) • Parallel/Distributed Group By, Distinct, and Aggregation • Extensible with Parallel/Distributed User Defined Functions
Results are returned to User Module in Reduce Phase
Map ↓↓↓↓↓ Reduce ↑↑↑↑↑
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 14
Power and Scalability of Map-Reduce
Map ↓↓↓↓↓ Reduce ↑↑↑↑↑
InfiniDB is not: … a hadoop style map-reduce framework.
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 15
Power and Scalability of Map-Reduce
Map ↓↓↓↓↓ Reduce ↑↑↑↑↑
InfiniDB is: … custom built and highly optimized map-reduce framework for queries.
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 16
Transformational I/O Efficiency
Techniques to Avoid Unnecessary I/O oVertical Partitioning: read only the columns required oHorizontal Partition: focus on the rows required oJust-in-time materialization
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 17
Transformational I/O Efficiency
Techniques for Efficient I/O oColumnar compression reduces I/O from disk oGlobal data buffer cache can reduce disk I/O oReal-time decompression accelerates reads from disk oAvoidance of Random I/O
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 18
Simple - Automatic Everything
• Vertical Partitioning • Horizontal Partitioning • Compression • Compression Algorithm Selection • Distribution of data across disk resources • Distribution of work across CPU resources
Simple
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 19
InfiniDB
®
Scalable
Fast
Simple
InfiniDB 3 Announced Today
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB 3: It is Now Possible...
21
InfiniDB 3
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 22
Today’s Presenters
Fernanda Foertter HPC Administrator / Scientific Programmer Genus plc Jim Tommaney Chief Technology Officer Calpont Corporation
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Where I Work
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Breeding Values
Genetic Evaluation
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Phenotype: Meat Quality
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Selection for Lean Growth
1980 2005
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Selection for Lean Growth
1980 2005
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Halothane Gene (1991)
•Gene is associated
oHigh carcass yield
o Stress triggers hyperthermia
o Poor meat quality
X (Nn/nn)
(NN)
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
DNA Marker Use
1990 2009
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
1991HAL
1994ESR
1998RN & MC4R
2003MIS
2004Large-scale SNP discovery1999
FUT1 & PRKAG3
1991 - 2002Single genes, QTLCandidate genes
Large-scale SNP discovery, genome scans,
sequencing
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
0
10000
20000
30000
40000
50000
60000
70000
2004 2005 2006 2007 2008 2009
Num
ber o
f SN
Ps
Sudden Data Growth
Porcine SNP Panel Density
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Sudden Data Growth
Sample Collection
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
Year
Animals (cumulative)
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
Year
Tissue(cumulative)
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
EBV Lean Yield Meat Quality Robustness Feed efficiency Etc
economic weights
Index = a1 × EBV1 + a2 × EBV2 + . . .
Genetic Evaluation
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Data Pipeline
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Genomic Data Deluge
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Project: Genotyping DB
The Need • Accumulating SNP chip data • Difficulty searching through • Next Gen Sequencing • Cheaper SNP chips • LOTS of animals • Other projects needed the
data
Other Considerations • Store large data…BIG data • Scalable • Alternative to Oracle • Minimally impact
infrastructure • Easy for scientists to use
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
What Do Vendors Provide for Genotype Data?
nothing
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Think Outside the (Vendor’s) Box…
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
All Databases are Not Created Equal
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
All Vehicles are Not Created Equal
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Genomic Data
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
SNP Data
Animal ID SNP1 SNP2 SNP3 … SNP65K
1 0 1 2 1 2
2 1 1 0 0 0
3
4
5 1 2 2 0 2
…
XXXX
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Single Research Cohort
What about selection and cohort comparisons?
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Column Bases Make More Sense
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
InfiniDB: Parallel Columnar DB
2
3
7
9
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Complicated Searches are Faster!
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Scales for a Fraction of the Cost
Compression Up 75%
Speed vs RDBMS 15X faster
Scalability 100’s TB, parallel queries/ingest
Cost vs Oracle 25%
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Future Projects: Imputation
$150 $150
$15 $15
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Caution: Data multiplies in a BIG way
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Conclusions
•Helps to have a deep understanding of the scientific problems being solved
•Have a good understanding of the data access pattern • Tool should solve 80% of the highest use patterns •Use combination of software, hardware knowledge to
improve performance • Think “out of the vendor box”, especially where
research is cutting edge • Take the lead to show new tools users may not even be
aware they want/ need
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
Questions
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved.
More Information on InfiniDB
Visit us at: o www.Calpont.com o www.InfiniDB.org o Visit Booth #414 to register to win an iPad 3
InfiniDB® Scalable. Fast. Simple. Copyright © 2012 Calpont. All Rights Reserved. 52 Enter for a Chance to Win an iPad 3