Post on 31-Dec-2016
Real-Time Analytics at Salesforce.com
Donovan SchneiderPrincipal Architect
SDForum May, 2010
Agenda
Motivation Our approach Making it work Conclusions and future directions
Evolution of Business Intelligence: Canned reports ➜ Ad-hoc query ➜ DW ➜ Real-Time Cloud Analytics
More than 50 percent of data warehouse projects will have limited acceptance, or be an outright failure
Real-time. Always.
Accessible By Mere Mortals
Flexible. In Sync. Reportable.
Our Vision for CRM AnalyticsDeliver Insight That is Accessible, Real-time, and Trustworthy
What Drives Actionable Insight?
ResponsiveRelevant
Easy to Use Actionable
Reliable
Reporting and Analytics that are…
IncreasedUser Adoption
Business user friendly
Powerful capabilities to answer real-world business questions
Fast performance, timely insight when needed
Integrated into the CRM to enable actions from insight
Accurate & consistent results
Real-Time
VisibilityUser
AdoptionActionable Insight
And Our Customers Use It. A Lot!
12M+ reports2.5M+ run per day
750K dashboards700K views per day
Agenda
Motivation Our approach Making it work Conclusions and future directions
We take a fundamentally different approach than most
Painfully slow unless against DWData is never fresh against DWChanges to CRM propagate slowlyETL process is complicated and expensive
Usable by all, from rep to SVPReal-time, all the timeFlexible & customizablePowerful w/o complicated DWOne sharing model
Easy, Real-Time and FlexibleComplicated, Out of Date, & Rigid
Single tenant DW CRM
Other Systems
ERP
OtherClouds
72,000+ Companies
HROther
Systems
ETLProcesses
Real-Time Reporting
DW Reporting
1x / day
Is a Data Warehouse Needed for CRM Analytics?
PerformancePre-aggregation was the only way to get decent reporting/analytics performance out of OLTP/CRM
Why people think they need a DW
Requirement to combine multiple data sources in 1 report/dashboard-CRM systems were hard to integrate with external data sources-And then, they were not built at all for BI
Business View of the Data-Corporate wide ontology-Single view of the customer-Historical data capture
Why we don’t need one
Force.com API200M+ API calls/day10M records/hourConsumes External Web Services
Entire System is Business Driven-Business people configure the system with their business terms, there is no IT translation req’d-Sales, Service, Analytics Clouds are all on same platform-History Tables and Analytical snapshots
Cloud Computing Scale & Multi-tenant Optimization Engine
And DW based architecture makes the system out of date, rigid and expensive
Analytics
Dashboards Reports List Views Search
Agenda
Motivation Our approach Making it work Conclusions and future directions
Building a Multi-tenant Cloud Platform is Hard!
Lots of Pieces to Assemble! Relational / Text / Non-relational Application Services / Lifecycles Caching and Performance Scalability and Reliability Infrastructure and Backups Release Processes
Development Lifecycle
Brief Review of Force.com Multi-tenancy
Real-time App Composition
Massive Shared Database
Shared General Purpose Kernel
True Multi-tenancy: Why Share Everything?
~15 Databases ~100 Servers
2 Mirrors
100,000’s of Unique Applications
1 Code Base
Force.com Data Architecture
Shared Metadata Cache
Bulk Processing Engine
Multi-Tenant-Aware Query Optimizer
Runtime Application Generator
Full-Text Search Engine
Real-time App Composition
Sharing Relational Data Structures is Hard
Your Definitions
YourData
YourOptimizations
IndexesPivot table for non-unique indexes
UniqueFieldsPivot table for unique indexesRelationshipsPivot table for foreign keys
MRUIndexPivot table for most-recently-usedFallBackIndexPivot table for Name field index…others…
Harrah’s Data
Dell’s Products
Your Rep’s Data
Flex Schema on Steroids: Everyone’s Data
Flex Column: Multiple Data Types
ID Tenant Data 2
1000001 Harrah’s $190
1000002 Harrah’s $250
1000003 Harrah’s $680
1000004 Harrah’s Poker
1000005 Harrah’s Black Jack
1000006 Harrah’s Craps
1000007 Dell Display
1000008 Dell Laptop
1000009 Dell Server
ID Data 1 Data 2
10002 unus erat toto naturae
10003 vultus in orbe
10004 quem dixere chaeos
10005 rudis indigestaque
10006 meis perpetuum
10007 deducite temopra
10008 carmen ante
10009 mare et terras
10010 tegit et quod
10011 omnia caelum
10012 unus erat toto naturae
10013 vultus in orbe
10014 quem dixere chaeos
10015 rudis indigestaque
10016 meis perpetuum
10017 deducite temopra
10018 carmen ante
10019 mare et terras
10020 tegit et quod
10021 omnia caelum
10022 unus erat toto naturae
10023 vultus in orbe
10024 quem dixere chaeos
10025 rudis indigestaque
10026 meis perpetuum
10027 deducite temopra
10028 carmen ante
10029 mare et terras
10030 tegit et quod
10031 omnia caelum
10032 unus erat toto naturae
10033 vultus in orbe
Flex Schema: Everyone’s Optimizations
Multi-tenant IndexMuti-Tenant Table
ID Tenant Data 2
1000001 Harrah’s $190
1000002 Harrah’s $250
1000003 Harrah’s $680
1000004 Harrah’s Poker
1000005 Harrah’s Black Jack
1000006 Harrah’s Craps
1000007 Dell Display
1000008 Dell Laptop
1000009 Dell Server
Tenant Text Number
Harrah’s $190
Harrah’s $250
Harrah’s $680
Harrah’s Poker
Harrah’s Black Jack
Harrah’s Craps
Dell Display
Dell Laptop
Dell Server
SyncCopy
Reporting Index Optimization
Reporting IndexMuti-Tenant Table
ID Tenant Data 2
1000001 Harrah’s $190
1000002 Harrah’s $250
1000003 Harrah’s $680
1000004 Harrah’s Poker
1000005 Harrah’s Black Jack
1000006 Harrah’s Craps
1000007 Dell Display
1000008 Dell Laptop
1000009 Dell Server
Tenant Data 2 Data 7 … Data k
Dell Display
Dell Laptop
Dell Server
SyncCopy
But How Do You Make the Queries Fast?
Real-time App Composition
Shared Metadata Cache
Bulk Processing Engine
Multi-Tenant-Aware Query OptimizerRuntime Application GeneratorFull-Text Search Engine
A Real World Question
Michael Dell wants to know if Servers are selling well in the West.
How will Force.com answer this question quickly?
ID Data 1 Data 2
10002 unus erat toto naturae
10003 vultus in orbe
10004 quem dixere chaeos
10005 rudis indigestaque
10006 meis perpetuum
10007 deducite temopra
10008 carmen ante
10009 mare et terras
10010 tegit et quod
10011 omnia caelum
10012 unus erat totonaturae
10013 vultus in orbe
10014 quem dixere chaeos
10015 rudis indigestaque
10016 meis perpetuum
10017 deducite temopra
10018 carmen ante
10019 mare et terras
10020 tegit et quod
10021 omnia caelum
10022 unus erat toto naturae
10023 vultus in orbe
10024 quem dixere chaeos
10025 rudis indigestaque
10026 meis perpetuum
10027 deducite temopra
10028 carmen ante
10029 mare et terras
10030 tegit et quod
10031 omnia caelum
10032 unus erat toto naturae
10033 vultus in orbe
Visibility
Indexes
Millions of Sales Line Items
The fastest path to the answer ID Data 1 Data 2
10002 unus erat toto naturae
10003 vultus in orbe
10004 quem dixere chaeos
10005 rudis indigestaque
10006 meis perpetuum
10007 deducite temopra
10008 carmen ante
10009 mare et terras
10010 tegit et quod
10011 omnia caelum
10012 unus erat totonaturae
10013 vultus in orbe
10014 quem dixere chaeos
10015 rudis indigestaque
10016 meis perpetuum
10017 deducite temopra
10018 carmen ante
10019 mare et terras
10020 tegit et quod
10021 omnia caelum
10022 unus erat toto naturae
10023 vultus in orbe
10024 quem dixere chaeos
10025 rudis indigestaque
10026 meis perpetuum
10027 deducite temopra
10028 carmen ante
10029 mare et terras
10030 tegit et quod
10031 omnia caelum
10032 unus erat toto naturae
10033 vultus in orbe
M. Dell
Servers
West
Multi-tenant Query Optimizer
Run pre-queriesCheck
user VisibilityCheck filter selectivity
Build query based on results of pre-queries
Execute query
User Visibility
# of rows that the user can access
=
Filter Selectivity
How specificis this filter?
=
Multi-tenant Query Optimizer
SharedVisibility
SharedIndexes
ID Data 1 Data 2
10002 unus erat toto naturae
10003 vultus in orbe
10004 quem dixere chaeos
10005 rudis indigestaque
10006 meis perpetuum
10007 deducite temopra
10008 carmen ante
10009 mare et terras
10010 tegit et quod
10011 omnia caelum
10012 unus erat totonaturae
10013 vultus in orbe
10014 quem dixere chaeos
10015 rudis indigestaque
10016 meis perpetuum
10017 deducite temopra
10018 carmen ante
10019 mare et terras
10020 tegit et quod
10021 omnia caelum
10022 unus erat toto naturae
10023 vultus in orbe
10024 quem dixere chaeos
10025 rudis indigestaque
10026 meis perpetuum
10027 deducite temopra
10028 carmen ante
10029 mare et terras
10030 tegit et quod
10031 omnia caelum
10032 unus erat toto naturae
10033 vultus in orbe
ID Data 1 Data 2
10002 unus erat toto naturae
10003 vultus in orbe
10004 quem dixere chaeos
10005 rudis indigestaque
10006 meis perpetuum
10007 deducite temopra
10008 carmen ante
10009 mare et terras
10010 tegit et quod
10011 omnia caelum
10012 unus erat totonaturae
10013 vultus in orbe
10014 quem dixere chaeos
10015 rudis indigestaque
10016 meis perpetuum
10017 deducite temopra
10018 carmen ante
10019 mare et terras
10020 tegit et quod
10021 omnia caelum
10022 unus erat toto naturae
10023 vultus in orbe
10024 quem dixere chaeos
10025 rudis indigestaque
10026 meis perpetuum
10027 deducite temopra
10028 carmen ante
10029 mare et terras
10030 tegit et quod
10031 omnia caelum
10032 unus erat toto naturae
10033 vultus in orbe
Stop
Go
Multi-tenant Optimizer Statistics
Acting on pre-queries
Pre-QuerySelectivitymeasurements
Construct final database query, forcing…
User Filter
Low Low …nested loops join; drive using view of rows that the user can see
Low High …use of index related to filter
High Low …ordered hash join; driving using data table
High High … use of index related to filter
Report Execution
JoinsFiltersHints
Aggregations
SortsAggregations
Filters
Application ServerApplication ServerApplication Server
rowscachePre-queries
SQL
Agenda
Motivation Our approach Making it work Conclusions and future directions
Conclusions
Real-time BI is possible A data warehouse is not required Interesting technical challenges
– Cannot rely on database’s cost-based optimizer– Sophisticated sharing models great for customers
but technically challenging– Real-time data limits caching applicability– Protect tenants from each other
We need help
Technical Direction
Increased focus on usability Expanded analytical capabilities Collaboration Scalability