TPC-DS Specification V 1.0.0L Web viewThe benchmark provides a representative evaluation of the...

download TPC-DS Specification V 1.0.0L Web viewThe benchmark provides a representative evaluation of the System Under Test ... a business operation as they progress over ... column surrogate

If you can't read please download the document

Transcript of TPC-DS Specification V 1.0.0L Web viewThe benchmark provides a representative evaluation of the...

TPC-DS Specification V 1.0.0L

TPC BENCHMARK DS

Standard Specification

Version 2.5.0

June, 2017

Transaction Processing Performance Council (TPC)

www.tpc.org

[email protected]

2017 Transaction Processing Performance Council

All Rights Reserved

TPC Benchmark DS - Standard Specification, Version 2.5.0Page 106 of 137

Legal Notice

The TPC reserves all right, title, and interest to this document and associated source code as provided under U.S. and international laws, including without limitation all patent and trademark rights therein.

Permission to copy without fee all or part of this document is granted provided that the TPC copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the Transaction Processing Performance Council. To copy otherwise requires specific permission.

No Warranty

TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, THE INFORMATION CONTAINED HEREIN IS PROVIDED AS IS AND WITH ALL FAULTS, AND THE AUTHORS AND DEVELOPERS OF THE WORK HEREBY DISCLAIM ALL OTHER WARRANTIES AND CONDITIONS, EITHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY (IF ANY) IMPLIED WARRANTIES, DUTIES OR CONDITIONS OF MERCHANTABILITY, OF FITNESS FOR A PARTICULAR PURPOSE, OF ACCURACY OR COMPLETENESS OF RESPONSES, OF RESULTS, OF WORKMANLIKE EFFORT, OF LACK OF VIRUSES, AND OF LACK OF NEGLIGENCE. ALSO, THERE IS NO WARRANTY OR CONDITION OF TITLE, QUIET ENJOYMENT, QUIET POSSESSION, CORRESPONDENCE TO DESCRIPTION OR NON-INFRINGEMENT WITH REGARD TO THE WORK.

IN NO EVENT WILL ANY AUTHOR OR DEVELOPER OF THE WORK BE LIABLE TO ANY OTHER PARTY FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO THE COST OF PROCURING SUBSTITUTE GOODS OR SERVICES, LOST PROFITS, LOSS OF USE, LOSS OF DATA, OR ANY INCIDENTAL, CONSEQUENTIAL, DIRECT, INDIRECT, OR SPECIAL DAMAGES WHETHER UNDER CONTRACT, TORT, WARRANTY, OR OTHERWISE, ARISING IN ANY WAY OUT OF THIS OR ANY OTHER AGREEMENT RELATING TO THE WORK, WHETHER OR NOT SUCH AUTHOR OR DEVELOPER HAD ADVANCE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES.

Trademarks

TPC Benchmark, TPC-DS and QphDS are trademarks of the Transaction Processing Performance Council.

Acknowledgments

Developing a TPC benchmark for a new environment requires a huge effort to conceptualize research, specify, review, prototype, and verify the benchmark. The TPC acknowledges the work and contributions of the TPC-DS subcommittee member companies in developing the TPC-DS specification.

The TPC-DS subcommittee would like to acknowledge the contributions made by the many members during the development of the benchmark specification. It has taken the dedicated efforts of people across many companies, often in addition to their regular duties. The list of significant contributors to this version includes Susanne Englert, Mary Meredith, Sreenivas Gukal, Doug Johnson 1+2, Lubor Kollar, Murali Krishna, Bob Lane, Larry Lutz, Juergen Mueller, Bob Murphy, Doug Nelson, Ernie Ostic, Raghunath Othayoth Nambiar, Meikel Poess, Haider Rizvi, Bryan Smith, Eric Speed, Cadambi Sriram, Jack Stephens, John Susag, Tricia Thomas, Dave Walrath, Shirley Wang, Guogen Zhang, Torsten Grabs, Charles Levine, Mike Nikolaiev, Alain Crolotte, Francois Raab, Yeye He, Margaret McCarthy, Indira Patel, Daniel Pol, John Galloway, Jerry Lohr, Jerry Buggert, Michael Brey, Nicholas Wakou, Vince Carbone, Wayne Smith, Dave Steinhoff, Dave Rorke, Dileep Kumar, Yanpei Chen, John Poelman, and Seetha Lakshmi.

Document Revision History

Date

Version

Description

08-28-2015

2.0.0

Mail ballot version

11-12-2015

2.1.0

Includes FogBugz entries 937, 991, 1002, 1033 1053, 1060, 1121, 1128, 1135, 1136

06-09-2016

2.2.0

Includes FogBugz entries 1571 1559 1539 1538 1537 1531 1502 1501 1480 1479 1474 1473 1472 1470 1393 1322 1263

08-05-2016

2.3.0

Includes FogBugz entries 1676, 1627, 1531, 1501 and 616

02-24-2017

2.4.0

Includes FogBugz entries 1728, 1697, 1696 and 1654

06-08-2017

2.5.0

Includes FogBugz entries 1756, 1894, 1909, 1912, 1980 and 1981

TPC Membership (as of June 2017)

Full Members

Associate Members

Table of Contents

0PREAMBLE7

0.1Introduction7

0.2General Implementation Guidelines7

0.3General Measurement Guidelines8

0.4Workload Independence9

0.5Associated Materials9

1Business and Benchmark Model11

1.1Overview11

1.2Business Model12

1.3Data Model and Data Access Assumptions13

1.4Query and User Model Assumptions13

1.5Data Maintenance Assumptions15

2Logical Database Design17

2.1Schema Overview17

2.2Column Definitions17

2.3Fact Table Definitions18

2.4Dimension Table Definitions24

2.5Implementation Requirements31

2.6Data Access Transparency Requirements34

3Scaling and Database Population35

3.1Scaling Model35

3.2Test Database Scaling35

3.3Qualification Database Scaling36

3.4dsdgen and Database Population37

3.5Data Validation38

4Query Overview39

4.1General Requirements and Definitions for Queries39

4.2Query Modification Methods40

4.3Substitution Parameter Generation46

5Data Maintenance47

5.1Implementation Requirements and Definitions47

5.2Refresh Data47

5.3Data Maintenance Functions50

6Data Accessibility Properties61

6.1The Data Accessibility Properties61

7Performance Metrics and Execution Rules62

7.1Definition of Terms62

7.2Configuration Rules63

7.3Query Validation65

7.4Execution Rules65

7.5Output Data70

7.6Metrics70

8SUT AND DRIVER IMPLEMENTATION73

8.1Models of Tested Configurations73

8.2System Under Test (SUT) Definition73

8.3Driver Definition74

9PRICING76

9.1Priced System76

9.2Allowable Substitution77

10FULL DISCLOSURE78

10.1Reporting Requirements78

10.2Format Guidelines78

10.3Full Disclosure Report Contents78

10.4Executive Summary83

10.5Availability of the Full Disclosure Report85

10.6Revisions to the Full Disclosure Report85

10.7Derived Results86

10.8Supporting Files Index Table87

10.9Supporting Files88

11AUDIT90

11.1General Rules90

11.2Auditor's Check List90

11.3Clause 4 Related Items91

11.4Clause 5 Related Items92

11.5Clause 6 Related Items92

11.6Clause 7 Related Items92

11.7Clause 8 Related Items92

11.8Clause 9 Related Items92

11.9Clause 10 Related Items93

PREAMBLE

Introduction

The TPC BenchmarkDS (TPC-DS) is a decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. The benchmark provides a representative evaluation of the System Under Tests (SUT) performance as a general purpose decision support system.

This benchmark illustrates decision support systems that:

Examine large volumes of data;

Give answers to real-world business questions;

Execute queries of various operational requirements and complexities (e.g., ad-hoc, reporting, iterative OLAP, data mining);

Are characterized by high CPU and IO load;

Are periodically synchronized with source OLTP databases through database maintenance functions.

Run on Big Data solutions, such as RDBMS as well as Hadoop/Spark based systems.

A benchmark result measures query response time in single user mode, query throughput in multi user mode and data maintenance performance for a given hardware, operating system, and data processing system configuration under a controlled, complex, multi-user decision support workload.

While separated from the main text for readability, comments and appendices are a part of the standard and their provisions must be enforced.

General Implementation Guidelines

The purpose of TPC benchmarks is to provide relevant, objective performance data to industry users. To achieve that purpose, TPC benchmark specifications require benchmark tests be implemented with systems, products, technologies and pricing that:

a) Are generally available to users;

b) Are relevant to the market segment that the individual TPC benchmark models or represents (e.g., TPC-DS models and represents complex, high data volume, decision support environments);

c) Would plausibly be implemented by a significant number of users in the market segment modeled or represented by the benchmark.

In keeping with these requirements, the TPC-DS database must be implemented using commercially available data processing software, and its queries must be executed via SQL interface.

The use of new systems, products, technologies (hardware or software) and pricing is encouraged so long as they meet the requirements above. Specifically prohibited are benchmark systems, products, technologies or pricing (hereafter referred to as "implementations") whose primary purpose is performance optimization of TPC benchmark results without any corresponding applicability to real-world applications and environments. In other words, all "benchmark special" implementations, which improve benchmark results but not real-world performance or pricing, are prohibited.

A number of characteristics shall be evaluated in order to judge whether a particular implementation is a benchmark special. It is not required that each point below be met, but that the