©2017 Cambridge Semantics Inc. All rights reserved. Company Confidential
Anzo Smart Data Lake™ - Accelerating InsightDisrupting the Analytics Time-to-Value Function
Barry ZaneVice President, [email protected]
Ben SzekelyVice President, Solution [email protected]
©2017 Cambridge Semantics Inc. All rights reserved.
Big Data and Analytics Industry Trends
• We are graduating from pieced-together ETL, Hadoop and BI solutions to consolidate around complete end-to-end solutions– Forward thinking customers looking to product vendors for innovation, delivery
and accountability for value. – Consolidation of partnerships and acquisitions
©2017 Cambridge Semantics Inc. All rights reserved.
Cloud Computing Trends
• Cloud Computing is a transformative cost saver for analytics as demand for access to all data grows– Think beyond infrastructure balance sheet savings – Pay only for the analytics compute you use, as business needs demand and peak.
©2017 Cambridge Semantics Inc. All rights reserved.
The importance of “Time-to-Value”
• Time-to-Value from data becoming the key driver for analytics strategy with an assumption of self-service
Analyst Request
IT Data Prep
IT Data Extraction
IT Data Enrichment
Data Discovery
Effor
t
Time to Value
©2017 Cambridge Semantics Inc. All rights reserved.
Key Risks
Costs rising from vendor lock-in of data format/storage, analytics tools and cloud infrastructure.
©2017 Cambridge Semantics Inc. All rights reserved.
Anzo Smart Data Lake: Accelerating Insight
Disparate Sources
Insight
Exploratory AnalyticsKnowledge Discovery
Data on Demand
Automated Ingestion
Rich Models
Scalability
Security
Enterprise Knowledge Graph
Governance
©2017 Cambridge Semantics Inc. All rights reserved.
IT B
uild
and
Dep
loym
ent
Anzo Smart Data Lake
Traditional BI
and Analytics
Tool Chains
Add
New
Dat
a
Add
New
Dat
a
Ad
d N
ew D
ata
A
dd N
ew D
ata
Disrupting the Time-to-Value Function
Tim
e an
d Re
sour
ce
Inve
stm
ents
Insights and Value
Traditional BI
and Analytics
Tool Chains
Anzo Smart Data Lake
©2017 Cambridge Semantics Inc. All rights reserved.
Anzo Smart Data LakeA Graph-based Platform to Disrupt the Analytics Time-to-Value Function
Connectors Models Rules Analytics & Tools
ASDL Customer Fingerprint - Intellectual Property
Data Ingestion& Mapping
AutomatedETL Generation
CollaborativeMapping
Text Processing
DataCataloging
Data & ModelGovernance
Active Metadata Management
Role-Based Security
Discovery & Analytics
Automated Query Generation
User Dashboards and Custom UI/UX
Self-Serve Live
Extracts
In-Memory MPP Query
Graphmartson Demand
ELT, Model BasedData Integration
Document Search
Actionable Insights
Enterprise Data Sources
EnterpriseData Lakes
“Last Mile”Analytics
©2017 Cambridge Semantics Inc. All rights reserved.
Data Ingestion& Mapping
AutomatedETL Generation
CollaborativeMapping
Text Processing
DataCataloging
Data & ModelGovernance
Active Metadata Management
Role-Based Security
Discovery & Analytics
Automated Query Generation
Custom User Dashboards
Self-Serve Live
Extracts
In-Memory MPP Query
Graphmartson Demand
ELT, Model BasedData Integration
Document Search
Actionable Insights
“Last Mile”Analytics
Elastically Scaled Analytics
Scalable Encrypted Storage
Anzo Smart Data Lake – Cloud DeploymentASDL cloud deployment in Amazon Web Services or Google Cloud Platform
Cloud automation is a significant and strategic component of the Cambridge Semantics roadmap including deployment, elastic scale and high-availability. Our cloud mission is to offer customers lower costs in development, maintenance and operations – using cloud resources efficiently as business needs determine.
EnterpriseData Lakes
Enterprise
Data Sources
Elastically Scaled Ingestion
Cloud-delivered ASDL offers faster deployment and on-demand scale
©2017 Cambridge Semantics Inc. All rights reserved.
Large Scale Graph Analytics
Graph is a simple, clean model for standard analytic queries and allows you to do more.
But, using Graph has had terrible performance for standard analytics queries against large-scale data.
If you can’t do the standard “data warehouse” queries at scale, you won’t get to the algorithms that only Graph can perform!
Build a Graph engine designed for large-scale analytics.
Leverage parallel computing - lots of hardware. Scale to hundreds of severs.
Extend the SPARQL language to backfill functionality present in SQL.
Deploy thru a user interface that automatically writes the SPARQL, and visualizes the results.
PROBLEM
SOLUTION
©2017 Cambridge Semantics Inc. All rights reserved.
Analytic Landscape
ROLAP - Relational online analytics•Broad adoption, 45 years of technology evolution•Based on declarative SQL for business analysts•Formal ANSI/ISO standard since 1986
GOLAP - Graph based online analytics•Narrow adoption, accelerating over past 15 years•Based on declarative SPARQL for business analysts•Formal W3C standard since 2008
Hadoop (Spark) - Offline batch analytics•Growing adoption since created in 2005 (2012)•All queries programmed in Java/Scala/Python…•Apache and community standards•Limited only by programmer’s talents and available APIs
©2017 Cambridge Semantics Inc. All rights reserved.
GOLAP is Real Relational Data Warehouse, Really
Relational Databases are predefined “rectangular” tables and rows with columns.–Very natural for subjects (aka rows) with a number of known attributes common to all/most
of the subjects.–Allows columns to be links (aka keys) to other table’s subjects.
Challenged by:–Sparsity–One-to-many needs a separate “join table”–You need to understand the data in advance
Graphs are real relational, really. Just a little different than the points above!
©2017 Cambridge Semantics Inc. All rights reserved.
RDF/SPARQL… like RDB/SQL, but...
Standard SQL aggregates, joins, etc, but simple and powerful relationship capabilities.
“How is Joe related to Mary”–In SQL Relational
•Are they spouses?•Are they siblings?•Are they friends?•Do they have the same hobby?•… enumerate the choices, EXPLODES with degrees of separation
–In SPARQL Graph•How is Joe related to Mary?•… you can directly specify degrees of separation
Pretty exciting, essentially all the power of SQL, but you can do more, with more diverse data, where the data tells you about itself, rather than you knowing in advance.
©2017 Cambridge Semantics Inc. All rights reserved.
The Smart Data Lake is the “database”
• Data cached in HDFS, AWS/GCP buckets • Multiple Graph Query Engine instances, usually on subsets• Ephemeral in-memory operation• Short term instances - load, query, toss
©2017 Cambridge Semantics Inc. All rights reserved.
Thank You
Click here to request a demo
Top Related