Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run...

3
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries need Parallelism to break the job up across many processors to increase performance. Oracle is not designed for extreme parallelism Parallelism is a part-time feature set object by object, query by query It is not designed to let every query run parallel It is not designed to let any single query run across all processors and server nodes It can support few wide (high parallelism) or many narrow (low/no parallelism) but not both at the same time When it runs out of parallel processes, things default to serial mode with a fraction of normal performance Our ETL has design and operational issues resulting from parallelism issues ETL schedules with artificial dependencies to insure enough parallel processes are available Often big ETL workflows that start during heavy work load get no parallelism, run very slow, and have to be restarted later when parallel processes are available Report query performance can give inconsistent performance because of parallelism issues Manual SQL Tuning Required Most current ETL has overwritten SQL which is heavily tuned to achieve best performance Ad hoc reporting doesn’t allow for manual tuning, so query plans are often sub-optimal to the point of non-functional performance When optimizer chooses poor join plans, queries often fail because they run out of temp space Load Complexities with Partition Exchange Loading Large complex database procedures have been written to create, prep, index, stats gather, and exchange in partitions Partition exchange procedures have proven very challenging for DBAs to support and enhance to meet new needs Partition exchange procedures have to be called from ETL mappings adding complexity to ETL development and support Complex Configuration The database configuration settings have to be custom configured by our people for our multi-vendor hardware stack Configuration has proven to be very complex Oracle RAC adds further complexity with configuration of disk cluster management software to support its share everything nature Interconnect traffic and speed to support shared cache is always a concern with RAC RAC has proven to have a high number of bugs associated to it through the years at our company End Results Can not run complex business queries and ETL fast by utilizing all hardware New summarized marts or tables are needed to answer new questions adding overall complexity and greatly slowing business responsiveness Can not support large numbers of concurrent queries when tuned to handle large queries (current state) Can not effectively support complex queries created by MicroStrategy Very High Complexity in support and development

Transcript of Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run...

Page 1: Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.

Oracle Challenges• Parallelism Limitations

Parallelism is the ability for a single query to be run across multiple processors or servers.Large queries need Parallelism to break the job up across many processors to increase performance.– Oracle is not designed for extreme parallelism

• Parallelism is a part-time feature set object by object, query by query• It is not designed to let every query run parallel• It is not designed to let any single query run across all processors and server nodes• It can support few wide (high parallelism) or many narrow (low/no parallelism) but not both at the same time• When it runs out of parallel processes, things default to serial mode with a fraction of normal performance

– Our ETL has design and operational issues resulting from parallelism issues• ETL schedules with artificial dependencies to insure enough parallel processes are available• Often big ETL workflows that start during heavy work load get no parallelism, run very slow, and have to be restarted later when parallel

processes are available– Report query performance can give inconsistent performance because of parallelism issues

• Manual SQL Tuning Required– Most current ETL has overwritten SQL which is heavily tuned to achieve best performance– Ad hoc reporting doesn’t allow for manual tuning, so query plans are often sub-optimal to the point of non-functional performance– When optimizer chooses poor join plans, queries often fail because they run out of temp space

• Load Complexities with Partition Exchange Loading– Large complex database procedures have been written to create, prep, index, stats gather, and exchange in partitions– Partition exchange procedures have proven very challenging for DBAs to support and enhance to meet new needs– Partition exchange procedures have to be called from ETL mappings adding complexity to ETL development and support

• Complex Configuration– The database configuration settings have to be custom configured by our people for our multi-vendor hardware stack– Configuration has proven to be very complex– Oracle RAC adds further complexity with configuration of disk cluster management software to support its share everything nature– Interconnect traffic and speed to support shared cache is always a concern with RAC– RAC has proven to have a high number of bugs associated to it through the years at our company

• End Results– Can not run complex business queries and ETL fast by utilizing all hardware– New summarized marts or tables are needed to answer new questions adding overall complexity and greatly slowing

business responsiveness– Can not support large numbers of concurrent queries when tuned to handle large queries (current state)– Can not effectively support complex queries created by MicroStrategy– Very High Complexity in support and development

Page 2: Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.

External Findings

• MPP share nothing architecture of Teradata and Netezza is best practice– Supports extreme parallelism for extreme performance with monster queries and ETL– Performs by applying lots of inexpensive hardware to the problem…Brute force– Each process and processor works completely independently and autonomously on its own

data set– Everything runs fully parallel across all processors/disks– Workload is managed through queue management, not by spawning more processes

contending for the same data

• Most very happy MicroStrategy customers are running against Teradata or Netezza• Appliances eliminate complex custom configurations, reduces TOC, and gives

customers “One Throat to Choke”• Database software built for data warehousing implicitly handles tasks such as

storage/data layout, partitioning, parallelism, and high efficiency loading• Built ground up cost based optimizer is best for query plans in BI

– Oracle evolved from a rule based optimizer– Netezza and Teradata are cost based from the beginning

• TPC-H warehousing benchmarks test with a workload of a high number of small queries with high marks going to the conventional OLTP databases like Oracle, but not representative of what we need

Page 3: Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.

Benefits to application development

• More efficient and effective data modeling– Less pre-aggregation and restructuring of data is needed to meet reporting

performance requirements– Data models can focus on meeting business needs instead of being performance

focused• More efficient ETL development

– High load and transformation performance means less issues around load windows and ability to load history

– No custom built partition management and exchange logic– No extra partition load logic in ETL– Minimized SQL tuning

• More flexible and effective reporting– Extreme improvement in conventional reporting performance– Ability to support complex analysis in MicroStrategy with good response time– Ability to do analysis across very large sets of data

• High performance querying against atomic data– Complete and timely data analysis for projects– Fast prototyping of marts and reporting– One off questions to be answered by large queries, not large BI projects