Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with...
-
Upload
antonia-parsons -
Category
Documents
-
view
217 -
download
2
Transcript of Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with...
Louisville User Group Meeting
April 25, 2012
Lori Pieper
Maximize WebFOCUS Performance with Hyperstage
Agenda
The “Big Data” Business ChallengePivoting Your Perspective Introducing WebFOCUS HyperstageHow does it work?So what’s the big deal?DemonstrationWrap Up and Q&A
Copyright 2007, Information Builders. Slide 3
The “Big Data”Business Challenge
Copyright 2007, Information Builders. Slide 4
Traditional Data Warehousing
Labor intensive, heavy indexing, aggregations and partitioning
Hardware intensive: massive storage; big servers
Expensive and complex
More Data, More Data Sources
More Kinds of Output Needed by More Users,
More Quickly
Limited Resources and Budget
0101010101010101010101010101
0101010101010101010101010
0101010101010101010101
1
0101010101010101010101
10
1010 1011001
0 110
01
1
0
01
101
010101
1
1
0101
0
1010
101
10 0101
10
01
10
0110
1
0
10101
01 010 01 0101
011
10100101
1
01
0
10
1010 1011001
0 110
01
1
0
01
10
1
0
10101
10
0101010101010101010101010
0101010101010101010101010101
1
10110
0 101
1010 10 1101
010
0
0 101 0010
0
Real time data
Multiple databases
External Sources
Data Warehousing Challenges
Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)
How Performance Issues are Typically Addressed – by Pace of Data Growth
Don't Know / Unsure
Upgrade networking infrastructure
Archive older data on other systems
Upgrade/expand storage systems
Upgrade server hardware/processors
Tune or upgrade existing databases
0% 20% 40% 60% 80% 100%
7%
21%
30%
33%
54%
66%
4%
32%
44%
60%
70%
75%
High Growth
Low Growth
When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the
problem
IT Manager’s try to mitigate these response times …..
Limitations of “Traditional” Solutions
Adding indexes: Increases disk space requirements
Sum of index space requirements can even exceed the source DB
Index Management Increases load times to build the index Predefines a fixed access path
Reports run slow if you haven’t “anticipated” the reporting needs correctly
Limitations of “Traditional” Solutions
Building OLAP Cubes:Cube technology has limited scalability
Number of dimensions is limited Amount of data is limited
Cube technology is difficult to update (add Dimension) Usually requires a complete rebuild Cube builds are typically slow New design results in a new cube
Reports run slow if you haven’t “anticipated” the reporting needs correctly
Copyright 2007, Information Builders. Slide 8
Pivoting Your Perspective:Turn Row-based into Column-based
Row-based databases are ubiquitous because so many
of our most important business systems are transactional.
Row-oriented databasesare well suited for
transactional environments, such as a call center where a
customer’s entire record is required when their profile
is retrieved and/or when fields are frequently updated.
The Ubiquity of Rows …
But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all
column data for any query.
30 columns
50 millions
Rows
Why is Row-based Limiting for Analytics?
Row Oriented (1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)
Works well if all the columns are needed for every query. Efficient for transactional processing if all the data for the row is available
Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression
Column Oriented (1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)
Why is Column-based Perfect for Analytics?
Employee Id
1
2
3
Name
Smith
Jones
Fraser
Location
New York
New York
Boston
Sales
50,000
65,000
40,000
4 Fraser Boston 70,000
Employee Id
1
2
3
Name
Smith
Jones
Fraser
Location
New York
New York
Boston
Sales
50,000
65,000
40,000
1 Smith New York 50,000
2 Jones New York 65,000
3 Fraser Boston 40,000
1
2
3
Smith New York 50,000
Jones New York 65,000
Data stored in rows
Fraser Boston 40,000
Data stored in columns
Why is Column-based Perfect for Analytics?
4 Fraser Boston 70,000
4 Fraser Boston 70,000 4 Fraser Boston 70,000
Copyright 2007, Information Builders. Slide 12
Introducing Hyperstage
Hyperstage is a high performance analytic data store designed to
handle business-driven queries on large volumes of data—with minimal
IT intervention—achieving outstanding query performance, with less hardware, no database
tuning and easy migration.
Introducing WebFOCUS Hyperstage ….
Easy to implement and manage, Hyperstage provides the answers to your business users’ needs at a price you can afford.
Introducing WebFOCUS Hyperstage ….
But really…What is it?
Hyperstage combines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses.
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Knowledge Grid
Compressor
BulkLoader
Unmatched Administrative Simplicity: • No indexes• No data partitioning• No materialized views
Hyperstage adds data compression of 10:1 to 40:1 so you can
manage large amounts of data using
much smaller disk footprint.
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Knowledge Grid
Compressor
BulkLoader
Powerful Data compression:• Store terabytes of data with only
gigabytes of disk space
Hyperstage adds a bulk loader plus an
easy to use extraction and load tool, called HyperCopy, making
data loading a breeze.
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Knowledge Grid
Compressor
BulkLoader
Includes embedded ETL:• Easy and seamless migration of existing
analytical databases• No change in query or application
required
Copyright 2007, Information Builders. Slide 18
How Does it Work?
Smarter Architecture
No maintenance No query planning No partition schemes Easy “load and go”
Data Packs – data stored in manageably sized, highly compressed data packs
Knowledge Grid – statistics and metadata “describing” the super-compressed data
Column Orientation
WebFOCUS Hyperstage Engine
Data compressed using algorithms tailored to data type
How does it work?
64K
Data Packs Each data pack contains 65, 536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data
type and data distribution
Compression Results vary depending on the
distribution of data among data packs
A typical overall compression ratio seen in the field is 10:1
Some customers have seen results have been as high as 40:1
Patent Pending
Compression
Algorithms
64K
64K
64K
Data Packs and Compression
Data Organization and the Knowledge Grid ….
Data Organization and the Knowledge Grid ….
This knowledge grid layer = 1% of the compressed volume
Data Pack Nodes (DPN)A separate DPN is created for every data pack created in the database to store basic statistical information
Character Maps (CMAPs)Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character
HistogramsHistograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals.
Pack-to-Pack Nodes (PPN)PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used.
salary age job city
Completely Irrelevant
Suspect
All values match
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;
WebFOCUS Hyperstage Example: Query and Knowledge Grid
salary age job city
1. Find the Data Packs with salary > 50000
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;
WebFOCUS Hyperstage Example: salary > 50000
Completely Irrelevant
All values match
Suspect
salary age job city
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 65
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;
WebFOCUS Hyperstage Example: age<65
Completely Irrelevant
Suspect
All values match
salary age job city
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;
WebFOCUS Hyperstage Example: job = ‘shipping’
Completely Irrelevant
Suspect
All values match
salary age job city
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;
WebFOCUS Hyperstage Example: city = ‘Louisville’
Completely Irrelevant
Suspect
All values match
salary cityAll packsignored
All packsignored
All packsignored
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’5. Eliminate All rows that have been flagged as
irrelevant
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;
WebFOCUS Hyperstage Example: Eliminate Pack Rows
Completely Irrelevant
Suspect
All values match
age job
salary cityAll packsignored
Only this pack will be de-compressed
All packsignored
All packsignored
1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’5. Eliminate All rows that have been flagged as
irrelevant6. Finally we identify the pack that needs to be
decompressed
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;
WebFOCUS Hyperstage Example: Decompress and scan
Completely Irrelevant
Suspect
All values match
age job
Copyright 2007, Information Builders. Slide 29
Hyperstage – So what’s the big deal?
WebFOCUS HyperstageThe Big Deal…
No indexesNo partitionsNo viewsNo materialized aggregates
Value propositionLow IT overheadReduced I/O = faster response timesEase of implementationFast time to marketLess Hardware Lower TCO
“Load and Go”
Some Real World Results
Insurance Company Query performance issues with SQL Server - Insurance claims
analysis Compression achieved 40:1 Most queries running 3X faster in Hyperstage
Large Bank Query performance issues with SQL Server - Web traffic analysis Compression achieved 10:1 Queries that ran for 10 to 15 mins in SQL Server ran in sub-seconds
in Hyperstage Government Application
Query performance issues with Oracle – Federal Loan/Grant Tracking
Compression achieved 15:1 Queries that ran for 10 to 15 minutes in Oracle ran in 30 seconds in
Hyperstage
31
Copyright 2007, Information Builders. Slide 32
Demonstration …
Q&A
Copyright 2007, Information Builders. Slide 33