SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015...

27
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

Transcript of SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015...

Page 1: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.1 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015

Page 2: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

SEIZE THE DATA. 2015

Vertica SQL on Hadoop

Aliaksei Sandryhaila, Bob Hansen / August 12, 2015

Page 3: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3 SEIZE THE DATA. 2015

This is a rolling (up to three year) Roadmap and is subject to change without notice.

Forward-looking statements

This document contains forward looking statements regarding future operations, product development, product capabilities and availability dates. This information is subject to substantial uncertainties and is subject to change at any time without prior notification. Statements contained in this document concerning these matters only reflect Hewlett Packard's predictions and / or expectations as of the date of this document and actual results and future plans of Hewlett-Packard may differ significantly as a result of, among other things, changes in product strategy resulting from technological, internal corporate, market and other changes. This is not a commitment to deliver any material, code or functionality and should not be relied upon in making purchasing decisions.

Page 4: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 SEIZE THE DATA. 2015

This is a rolling (up to three year) Roadmap and is subject to change without notice.

HP confidential information

This Roadmap contains HP Confidential Information.

If you have a valid Confidential Disclosure Agreement with HP, disclosure of the Roadmap is subject to that CDA. If not, it is subject to the following terms: for a period of 3 years after the date of disclosure, you may use the Roadmap solely for the purpose of evaluating purchase decisions from HP and use a reasonable standard of care to prevent disclosures. You will not disclose the contents of the Roadmap to any third party unless it becomes publically known, rightfully received by you from a third party without duty of confidentiality, or disclosed with HP’s prior written approval.

Page 5: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Big Data gets more valuable as more kinds of data are brought together

Hadoop

HDFS

Billing

Clickstream

Telemetry

Page 6: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

VSQLH makes Vertica the power tool for your Hadoop data

Hive Pig

MapReduceHB

ase

HDFS

Billing

Clickstream

Telemetry

Page 7: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 SEIZE THE DATA. 2015

Flat files in HDFS can be accessed via the HDFS Connector

CREATE EXTERNAL TABLE tweet_keywords

(keyword VARCHAR(64),

created_at TIMESTAMP,

score INT)

AS COPY FROM SOURCE

Hdfs(url=‘http://n01:50070/webhdfs/v1/posts/*’)

PARSER TwitterSentiment();

HDFS

json csv

This is a rolling (up to 3 year) roadmap and is subject to change without notice

HP Vertica

Page 8: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 SEIZE THE DATA. 2015

Structured data in HCatalog can be accessed via the Hcatalog Connector

CREATE HADOOP SCHEMA hive WITH

HOSTNAME='hcat.vertica.com‘

HCATALOG_DB=‘hive‘

HCATALOG_USER=‘hive_user’;

HC

ata

log

HDFS

json csv

HDFS

txt orc

This is a rolling (up to 3 year) roadmap and is subject to change without notice

HP Vertica

Page 9: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 SEIZE THE DATA. 2015

Structured data in HCatalog can be accessed via the Hcatalog Connector

SELECT

keyword,

EXTRACT(month from created_at),

AVG(score)

FROM hive.tweet_keywords

WHERE created_at >=

now()- ‘3 months’::interval

GROUP BY 1,2

ORDER BY 2;

HC

ata

log

HDFS

json csv

HDFS

txt orc

This is a rolling (up to 3 year) roadmap and is subject to change without notice

HP Vertica

Page 10: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 SEIZE THE DATA. 2015

HDFS can be used as a storage tier in Vertica Enterprise

CREATE LOCATION 'webhdfs://server/user/asmith/data'

ALL NODES

SHARED

USAGE 'data'

LABEL 'archive';

SELECT set_object_storage_policy('finances',

'archive', 1900, 2014);

HC

ata

log

HDFS

json csv

HDFS

txt orc

HDFS

ROS ROS

This is a rolling (up to 3 year) roadmap and is subject to change without notice

HP Vertica

Page 11: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 SEIZE THE DATA. 2015

What’s new in Excavator

Page 12: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Excavator powers VSQLH to extend across your organization

Hive Pig

MapReduceHB

ase

HDFS

Billing

Clickstream

Telemetry

Finance

Marketing

Sales

Page 13: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Hadoop data is accessible across department boundaries

Access secured HDFS clusters

• HDFS connector (flat files on HDFS)

• HCatalog connector (Hive data on HDFS)

• HDFS storage (Vertica data on HDFS) HC

ata

log

HDFS

json csv

HDFS

txt orc

HDFS

ROS ROS

Page 14: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Vertica can access Kerberized Hadoop clusters

Support for Kerberos

• Session- and process-wide ticket reuse

• Multiple concurrent queries

• Transparent on a Kerberized Vertica cluster

HDFS

Kerberos serverClient

ID

Page 15: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15 SEIZE THE DATA. 2015

Kerberos setup in Vertica takes a few quick steps

ALTER DATABASE mydb SET KerberosKeytabFile = '/etc/krb5.keytab';

ALTER DATABASE mydb SET KerberosServiceName = 'vertica';

ALTER DATABASE mydb SET KerberosRealm = 'example.com';

CREATE AUTHENTICATION 'gss_auth' METHOD 'gss' HOST '0.0.0.0/0';

GRANT AUTHENTICATION 'gss_auth' TO Public;

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Page 16: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Work with native Hadoop data directly

Vertica supports ORC

• ORC – Optimized Row Columnar format

• Introduced in Hive 0.11

• Supported by all major Hadoop distributions (Apache, HDP, CDH, MapR)

• Advantages− open-source (third-party APIs available)− column-major− efficient (metadata, encoding, compression)− forward- and backward-compatible

• Similar to RCFile, Parquet− support for these and other formats can be added in the future

Page 17: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Benefit from ORC similarly to Vertica’s native format

ORC format

• Tables stored in column-major blocks of rows (stripes)

• Metadata stored in file footer, stripe footer and index

• Full data type support− primitive SQL types− complex Hive types

• Encoding (RLE, delta, dictionary)

• Compression (zlib, snappy)

Page 18: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Read ORC conveniently and fast

Built-in ORC reader

• No UDL setup needed – ORC is natively supported by Vertica

• Create and query ORC tables− Stream as external tables

CREATE EXTERNAL TABLE t(…) AS COPY FROM “…” ORC;

− Load into Vertica tablesCREATE TABLE t(…);

COPY t FROM “…” ORC;

Page 19: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Read ORC conveniently and fast

Access data on HDFS and local FS

• Read from HDFSCOPY t FROM “webhdfs://hostname:port/data/sales/orders.orc” ORC;

• Read from local FSCOPY t FROM “/data/sales/orders.orc” ORC;

• No need for serialization/deserialization

SerDe

orc

Before

orc

Excavator

Page 20: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20 SEIZE THE DATA. 2015

Use Vertica as a stable SQLH engine

Stability of a mature RDBMS

• ANSI SQL support

• Resource management

• Workload distribution

Vertica Impala (CDH 5.3)

Supported unmodified

TPC-DS* queries

* Transaction Processing Performance Council Decision Support benchmark (99 business analytics queries)

94

56

This is a rolling (up to 3 year) roadmap and is subject to change without notice

5

Page 21: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21 SEIZE THE DATA. 2015

Achieve competitive performance on Hadoop

Performance optimizations

• Predicate pushdown

• Column pruning

• Co-located data processing

• Sideways Information Passing

3 TB (scale factor 3K)9-node cluster (32x Intel Xeon, 256M RAM)CDH 5.3

This is a rolling (up to 3 year) roadmap and is subject to change without notice

0x

1x

2x

3x

4x

5x

6x

7x

8x

Rel

ativ

e sp

eed

-up

(lar

ger

is b

ette

r)

Queries (slowest to fastest)

TPC-DS performance vs Impala

Page 22: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22 SEIZE THE DATA. 2015

Up to 7x

faster

ComparableUp to 2x

slower

43 queries more!

0x

1x

2x

3x

4x

5x

6x

7x

8x

Rel

ativ

e sp

eed

-up

Queries

TPC-DS performance vs Impala

VSQLH is reliable and performantThis is a rolling (up to 3 year) roadmap and is subject to change without notice

• Work in progress: additional ~1.5x speed-up

Page 23: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23 SEIZE THE DATA. 2015

Contribute to VSQLH

Open-source VSQLH components

• ORC C++ API

− Developed in collaboration with Hortonworks

− Became a top-level Apache project in June 2015

− Check out orc.apache.org

• HDFS C++ API

− Developed in collaboration with Hortonworks

− Integrated into Apache Hadoop in July 2015

• APIs for other file formats and filesystems can be integrated into Vertica

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Page 24: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24 SEIZE THE DATA. 2015

Where will we go from here?

Page 25: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

VSQLH will be even faster

REST

HDFS HDFSorc

orcorc

orcorc

orc

orcorc

orc

EE EE EE

Optimize

Plan

Native access to HDFS Hcatalog Optimizer

Page 26: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26 SEIZE THE DATA. 2015

This is a rolling (up to 3 year) roadmap and is subject to change without notice

Q & A

HC

ata

log

HDFS

json csv

HDFS

txt orc

HDFS

ROS ROS0x

1x

2x

3x

4x

5x

6x

7x

8x

Rel

ativ

e sp

eed

-up

Queries

TPC-DS performance vs Impala

Page 27: SEIZE THE DATA. 2015 - Hewlett Packard Enterpriseh41382. · 2015-08-12 · SEIZE THE DATA. 2015 This is a rolling (up to 3 year) roadmap and is subject to change without notice Hadoop

SEIZE THE DATA. 2015