Phoenix - A High Performance Open Source SQL Layer over HBase
-
Upload
salesforce-developers -
Category
Technology
-
view
2.805 -
download
0
description
Transcript of Phoenix - A High Performance Open Source SQL Layer over HBase
PhoenixA High Performance Open Source SQL Layer over HBase
James Taylor, Salesforce.com, Principle Member of the Technical Staff@JamesPlusPlus
Safe harborSafe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
Adam TormanDirector, Product ManagementSalesforce.com@atorman
Agenda▪ What is Phoenix?▪ What is HBase?▪ Why use HBase?▪ Why use Phoenix?▪ Use Cases▪ Under the hood▪ Roadmap▪ Q&A
What is Phoenix?▪ BSD-Licensed open source project
• https://github.com/forcedotcom/phoenix
▪ High performance SQL engine• Over HBase data• Not based on map-reduce• Targets low latency applications
▪ Turns your key value store into a database▪ Powers the big data use cases at Salesforce.com
Phoenix Demo▪ Phoenix Stock Analyzer▪ Fortune 500 companies▪ 10 years of historical stock prices
• Over 2 Billion rows
▪ Demonstrates support for low latency applications
What is HBase?▪ Part of Apache Hadoop ecosystem▪ Runs on top of HDFS▪ Key/value store
• Sparse• Consistent• Distributed• Multidimensional• Sorted
HBase distribution of data
Phoenix
Phoenix
Phoenix
Why use HBase?▪ If you have lots of data
• Scales linearly• Shards automatically
▪ If you can live without transactions• But there’s work being done here…
▪ If your data changes▪ If you need strict consistency
Why use Phoenix?▪ Gives folks an API they already know▪ Reduces the amount of code users need to write▪ Performs optimizations transparent to the user
• Aggregation• Skip scanning• Secondary indexing• Query optimization
▪ Leverages existing tooling• SQL client• OLAP engine
Phoenix versus Hive performance
Phoenix versus Impala performance
Use cases▪ Data Archival
• Archive big data off Oracle and into HBase while maintaining query-ability of data
▪ Platform Monitoring• Enable customers to track performance metrics of their platform applications
Why Phoenix is important
• Scalable, low latency app development starts with Phoenix• Phoenix worries about the physical scale and fast performance
app developers don’t have to• Looks, tastes, feels like SOQL to a force.com developer
• Under the covers, turns Hbase into a database that we understand
• Several key customer use cases:• Data Archive• Monitoring, Audit, and Compliance
Archive Problem Set
• Field History Tracking grows unbounded
• Enterprise customers require long term storage of ‘cold’ data
• Data retention policies can require years of data to be kept around
Archive Pilot Demonstration
field history retention
Field history is the basis for data audit trail
Policy driven data retention policy – 5, 7, 10… years
Increased limits to track history on many fields
data lifecycle management
Time policy driven data lifecycle from live to archive state
Configurable behavior across custom schema, accessibility & archive data model
Maintain and assure operational efficiency
Retain access and visibility across data lifecycle
Winter ‘14 & Spring ’14 - Pilots
Spring ‘14 & Summer ’14 - Pilots
Archive Roadmap
Monitoring Use Case
• Security, Compliance, and Audit• Product Support and Limits Analysis• Product Usage and Management
For EU data compliance, I need to know who, when, and from where someone outside of Europe accessed data
Before I can take an ex-employee to court for downloading the client list, I need to know when, where, and how they did it
I want to take action when I detect an intrusion, identity fraud, or data leakage
I need to analyze limits consumption to ensure mission critical apps don’t run out of resources
What is the status of my batch import or sandbox copy
Before I invest in more licenses, I want to know how many people are actually using it
Identity Fraud
Custom Event Schema
Queries made possible by PhoenixQuery the number of logins over the span of a weekSELECT Count(LoginTime), UserId FROM LoginEvent WHERE LoginTime > 2013-03-04T17:38:39.000Z AND LoginTime <= 2013-06-04T17:38:39.000Z Group By
UserId
List all [custom] correlation ids for all users over the past weekSELECT Username, UserId, cCorrelationIdFROM LoginEventWHERE LoginTime > 2013-03-04T17:38:39.000Z AND LoginTime <= 2013-06-04T17:38:39.000Z
Query login counts per user and browser - low count may indicate an anomaly or just a change in browserSELECT Count(Id), Browser, UserId, UsernameFROM LoginEventWHERE LoginTime > 2013-03-04T17:38:39.000Z AND LoginTime <= 2013-06-04T17:38:39.000ZGROUP BY UserId, Browser
Query login counts per user and status - high count of failed or invalid password attempts may indicate a brute force attack
SELECT Count(Id), Status, UserId, UsernameFROM LoginEventWHERE LoginTime > 2013-03-04T17:38:39.000Z AND LoginTime <= 2013-06-04T17:38:39.000ZGROUP BY UserId, Status
Custom Events
Collect, query, and report on canned and custom events defined by our customers
Define custom time series based metrics to discover anomalies and summarize user interactions
API First but not only - declarative reporting user interface
Custom client side event publishing
Summer ‘14 & Winter ’15 - Pilots
Platform Monitoring Roadmap
Phoenix under the hood
FEATURERow Key
Key Values
ORG_ID DATE
TXNS
IO_TIME
RESPONSE_TIME
Product Metrics HTable
● Scan➢ Start key: ORG_ID (:1) + DATE (:2)➢ End key: ORG_ID (:1) + DATE (:3)
● Filter➢ Filter: IO_TIME > 100
● Aggregation➢ Intercepts scan on region server➢ Builds map of distinct FEATURE values➢ Returns one row per distinct group➢ Client does final merge
SELECT feature, SUM(txns)FROM product_metricsWHERE org_id = :1AND date >= :2 AND date <= :3AND io_time > 100GROUP BY feature
Phoenix Roadmap▪ Apache incubator project▪ Joins▪ Multi-tenant tables▪ Cost-based query optimizer▪ OLAP extensions
• WINDOW, PARTITION OVER, RANK
▪ Monitoring and management▪ Transactions
James Taylor
Principle Member of the Technical Staff,Salesforce.com
@JamesPlusPlus
Adam Torman
Director, Product Management,Salesforce.com
@atorman