Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with...

Louisville User Group Meeting

April 25, 2012

Lori Pieper

Maximize WebFOCUS Performance with Hyperstage

Agenda

The “Big Data” Business ChallengePivoting Your Perspective Introducing WebFOCUS HyperstageHow does it work?So what’s the big deal?DemonstrationWrap Up and Q&A

Copyright 2007, Information Builders. Slide 3

The “Big Data”Business Challenge


Traditional Data Warehousing

Labor intensive, heavy indexing, aggregations and partitioning

Hardware intensive: massive storage; big servers

Expensive and complex

More Data, More Data Sources

More Kinds of Output Needed by More Users,

More Quickly

Limited Resources and Budget

0101010101010101010101010101

0101010101010101010101010

0101010101010101010101

1

0101010101010101010101

10

1010 1011001

0 110

01

1

0

01

101

010101

1

1

0101

0

1010

101

10 0101

10

01

10

0110

1

0

10101

01 010 01 0101

011

10100101

1

01

0

10

1010 1011001

0 110

01

1

0

01

10

1

0

10101

10

0101010101010101010101010

0101010101010101010101010101

1

10110

0 101

1010 10 1101

010

0

0 101 0010

0

Real time data

Multiple databases

External Sources

Data Warehousing Challenges

Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)

How Performance Issues are Typically Addressed – by Pace of Data Growth

Don't Know / Unsure

Upgrade networking infrastructure

Archive older data on other systems

Upgrade/expand storage systems

Upgrade server hardware/processors

Tune or upgrade existing databases

0% 20% 40% 60% 80% 100%

7%

21%

30%

33%

54%

66%

4%

32%

44%

60%

70%

75%

High Growth

Low Growth

When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the

problem

IT Manager’s try to mitigate these response times …..

Limitations of “Traditional” Solutions

Adding indexes: Increases disk space requirements

Sum of index space requirements can even exceed the source DB

Index Management Increases load times to build the index Predefines a fixed access path

Reports run slow if you haven’t “anticipated” the reporting needs correctly

Limitations of “Traditional” Solutions

Building OLAP Cubes:Cube technology has limited scalability

Number of dimensions is limited Amount of data is limited

Cube technology is difficult to update (add Dimension) Usually requires a complete rebuild Cube builds are typically slow New design results in a new cube

Reports run slow if you haven’t “anticipated” the reporting needs correctly


Pivoting Your Perspective:Turn Row-based into Column-based

Row-based databases are ubiquitous because so many

of our most important business systems are transactional.

Row-oriented databasesare well suited for

transactional environments, such as a call center where a

customer’s entire record is required when their profile

is retrieved and/or when fields are frequently updated.

The Ubiquity of Rows …

But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all

column data for any query.

30 columns

50 millions

Rows

Why is Row-based Limiting for Analytics?

Row Oriented (1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)

Works well if all the columns are needed for every query. Efficient for transactional processing if all the data for the row is available

Works well with aggregate results (sum, count, avg. ) Only columns that are relevant need to be touched Consistent performance with any database design Allows for very efficient compression

Column Oriented (1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)

Why is Column-based Perfect for Analytics?

Employee Id

1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

4 Fraser Boston 70,000

Employee Id

1

2

3

Name

Smith

Jones

Fraser

Location

New York

New York

Boston

Sales

50,000

65,000

40,000

1 Smith New York 50,000

2 Jones New York 65,000


1

2

3

Smith New York 50,000

Jones New York 65,000

Data stored in rows

Fraser Boston 40,000

Data stored in columns

Why is Column-based Perfect for Analytics?


4 Fraser Boston 70,000 4 Fraser Boston 70,000


Introducing Hyperstage

Hyperstage is a high performance analytic data store designed to

handle business-driven queries on large volumes of data—with minimal

IT intervention—achieving outstanding query performance, with less hardware, no database

tuning and easy migration.

Introducing WebFOCUS Hyperstage ….

Easy to implement and manage, Hyperstage provides the answers to your business users’ needs at a price you can afford.


But really…What is it?

Hyperstage combines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses.


How is it architected?

Hyperstage Engine

Knowledge Grid

Compressor

BulkLoader

Unmatched Administrative Simplicity: • No indexes• No data partitioning• No materialized views

Hyperstage adds data compression of 10:1 to 40:1 so you can

manage large amounts of data using

much smaller disk footprint.



Hyperstage Engine

Knowledge Grid

Compressor

BulkLoader

Powerful Data compression:• Store terabytes of data with only

gigabytes of disk space

Hyperstage adds a bulk loader plus an

easy to use extraction and load tool, called HyperCopy, making

data loading a breeze.



Hyperstage Engine

Knowledge Grid

Compressor

BulkLoader

Includes embedded ETL:• Easy and seamless migration of existing

analytical databases• No change in query or application

required


How Does it Work?

Smarter Architecture

No maintenance No query planning No partition schemes Easy “load and go”

Data Packs – data stored in manageably sized, highly compressed data packs

Knowledge Grid – statistics and metadata “describing” the super-compressed data

Column Orientation

WebFOCUS Hyperstage Engine

Data compressed using algorithms tailored to data type

How does it work?

64K

Data Packs Each data pack contains 65, 536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data

type and data distribution

Compression Results vary depending on the

distribution of data among data packs

A typical overall compression ratio seen in the field is 10:1

Some customers have seen results have been as high as 40:1

Patent Pending

Compression

Algorithms

64K

64K

64K

Data Packs and Compression

Data Organization and the Knowledge Grid ….

Data Organization and the Knowledge Grid ….

This knowledge grid layer = 1% of the compressed volume

Data Pack Nodes (DPN)A separate DPN is created for every data pack created in the database to store basic statistical information

Character Maps (CMAPs)Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character

HistogramsHistograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals.

Pack-to-Pack Nodes (PPN)PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used.

salary age job city

Completely Irrelevant

Suspect

All values match

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’;

WebFOCUS Hyperstage Example: Query and Knowledge Grid

salary age job city

1. Find the Data Packs with salary > 50000


WebFOCUS Hyperstage Example: salary > 50000


All values match

Suspect

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 65


WebFOCUS Hyperstage Example: age<65


Suspect

All values match

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’


WebFOCUS Hyperstage Example: job = ‘shipping’


Suspect

All values match

salary age job city

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’


WebFOCUS Hyperstage Example: city = ‘Louisville’


Suspect

All values match

salary cityAll packsignored

All packsignored

All packsignored

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’5. Eliminate All rows that have been flagged as

irrelevant


WebFOCUS Hyperstage Example: Eliminate Pack Rows


Suspect

All values match

age job

salary cityAll packsignored

Only this pack will be de-compressed

All packsignored

All packsignored

1. Find the Data Packs with salary > 500002. Find the Data Packs that contain age < 653. Find the Data Packs that have job = ‘shipping’4. Find the Data Packs that have city = ‘Louisville’5. Eliminate All rows that have been flagged as

irrelevant6. Finally we identify the pack that needs to be

decompressed


WebFOCUS Hyperstage Example: Decompress and scan


Suspect

All values match

age job


Hyperstage – So what’s the big deal?

WebFOCUS HyperstageThe Big Deal…

No indexesNo partitionsNo viewsNo materialized aggregates

Value propositionLow IT overheadReduced I/O = faster response timesEase of implementationFast time to marketLess Hardware Lower TCO

“Load and Go”

Some Real World Results

Insurance Company Query performance issues with SQL Server - Insurance claims

analysis Compression achieved 40:1 Most queries running 3X faster in Hyperstage

Large Bank Query performance issues with SQL Server - Web traffic analysis Compression achieved 10:1 Queries that ran for 10 to 15 mins in SQL Server ran in sub-seconds

in Hyperstage Government Application

Query performance issues with Oracle – Federal Loan/Grant Tracking

Compression achieved 15:1 Queries that ran for 10 to 15 minutes in Oracle ran in 30 seconds in

Hyperstage

31


Demonstration …

Q&A


Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with...

Documents

Transcript of Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with...