SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

273
Seoul Data Engineering Camp SDEC 2011 Seoul, South Korea June 27-28 2011년 6월 30일 목요일

description

Currently telecom companies store their data in database or data warehouse, treating them through ETL process and working on statistics and analysis by using OLAP tools or data mining engines. However, due to the data explosion along with the spread of Smart Phones traditional data storages like DB and DW aren’t sufficient to cope with these “Big Data”. As an alternative the method of storing data in Hadoop and performing ETL process and Ad-hoc Query with Hive is being introduced, and China Mobile is being mentioned as the most representative example. But, they are adopted mainly by new projects, which have low barriers in applying the new Hive data model and HQL. On the other hand, it is extremely difficult to replace the existing database with the combination of Hadoop and Hive if there are already a number of tables and SQL queries. NexR is migrating the telecom company’s data from Oracle DB to Hadoop, and converting a lot of existing Oracle SQL queries to Hive HQL queries. Though HQL supports a similar syntax to ANSI-SQL, it lacks a large portion of basic functions and hardly supports Oracle analytic functions like rank() which are utilized mainly in statistical analysis. Furthermore, the difference of data types like null value is also blocking the application of it. In this presentation, we will share the experience converting Oracle SQL to Hive HQL and developing additional functions with MapReduce. Also, we will introduce several ideas and trials to improve Hive performance.http://sdec.kr/

Transcript of SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Page 1: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Seoul Data Engineering CampSDEC 2011

Seoul, South KoreaJune 27-28

2011년 6월 30일 목요일

Page 2: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Replacing Legacy Telco DB/DW to Hadoop and Hive

JunHo ChoNexR

2011년 6월 30일 목요일

Page 3: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Agenda

2011년 6월 30일 목요일

Page 4: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Agenda

• Motivation for Hive and Hadoop

2011년 6월 30일 목요일

Page 5: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Agenda

• Motivation for Hive and Hadoop

• Hive Internal

2011년 6월 30일 목요일

Page 6: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Agenda

• Motivation for Hive and Hadoop

• Hive Internal

• Oracle Migration UseCase

2011년 6월 30일 목요일

Page 7: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Agenda

• Motivation for Hive and Hadoop

• Hive Internal

• Oracle Migration UseCase

• Hive Optimization

2011년 6월 30일 목요일

Page 8: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Agenda

• Motivation for Hive and Hadoop

• Hive Internal

• Oracle Migration UseCase

• Hive Optimization

• Future Work

2011년 6월 30일 목요일

Page 9: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco Data

2011년 6월 30일 목요일

Page 10: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco Data

2011년 6월 30일 목요일

Page 11: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco Data

2011년 6월 30일 목요일

Page 12: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco Data

2011년 6월 30일 목요일

Page 13: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco Data

2011년 6월 30일 목요일

Page 14: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco Data

2011년 6월 30일 목요일

Page 15: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco Data

2011년 6월 30일 목요일

Page 16: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco Data

2011년 6월 30일 목요일

Page 17: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

2011년 6월 30일 목요일

Page 18: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

2011년 6월 30일 목요일

Page 19: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

2011년 6월 30일 목요일

Page 20: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

2011년 6월 30일 목요일

Page 21: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Bottleneck

2011년 6월 30일 목요일

Page 22: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Bottleneck

Availability

2011년 6월 30일 목요일

Page 23: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Bottleneck

Availability

Scalability

2011년 6월 30일 목요일

Page 24: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Bottleneck

Availability

Scalability

Expensive

2011년 6월 30일 목요일

Page 25: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 26: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 27: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Divide & Conquer

2011년 6월 30일 목요일

Page 28: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

2011년 6월 30일 목요일

Page 29: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

Storage & Computing

2011년 6월 30일 목요일

Page 30: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

2011년 6월 30일 목요일

Page 31: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

Collection

2011년 6월 30일 목요일

Page 32: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

2011년 6월 30일 목요일

Page 33: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

Search

2011년 6월 30일 목요일

Page 34: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

2011년 6월 30일 목요일

Page 35: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

Analysis

2011년 6월 30일 목요일

Page 36: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

2011년 6월 30일 목요일

Page 37: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

Coordination

2011년 6월 30일 목요일

Page 38: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

OpenSource

2011년 6월 30일 목요일

Page 39: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NexR Data Platform

Data SourcesHDFS

Index

RawData

Real-Time& BatchIndexing

Near RT Search &Monitoring

SummaryTable

DimensionTable

BatchETL

Collection Platform

AnalysisPlatform

SearchPlatform

OLAP

AdvancedAnalytics

2011년 6월 30일 목요일

Page 40: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NexR Data Platform

Data SourcesHDFS

Index

RawData

Real-Time& BatchIndexing

Near RT Search &Monitoring

SummaryTable

DimensionTable

BatchETL

Collection Platform

AnalysisPlatform

SearchPlatform

OLAP

AdvancedAnalytics

2011년 6월 30일 목요일

Page 41: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 42: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

What is HIVE ?

2011년 6월 30일 목요일

Page 43: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

What is HIVE ?

• A system for managing and querying structured data built on top of Hadoop

• Map-Reduce for execution

• HDFS for storage

• Metadata in an RDBMS

2011년 6월 30일 목요일

Page 44: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

What is HIVE ?

• A system for managing and querying structured data built on top of Hadoop

• Map-Reduce for execution

• HDFS for storage

• Metadata in an RDBMS

• Key Building Principles

• SQL is a familiar language

• Extensibility - Types, Functions, Formats, Scripts

• Performance

2011년 6월 30일 목요일

Page 45: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Why Hive ?

2011년 6월 30일 목요일

Page 46: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 47: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Count call-record per phone ?

2011년 6월 30일 목요일

Page 48: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 49: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException {

String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); }}

2011년 6월 30일 목요일

Page 50: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException {

String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); }}

Mapper

2011년 6월 30일 목요일

Page 51: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException {

String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); }}

public class CallCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

int sum = 0; while (values.hasNext()) { IntWritable value = (IntWritable) values.next(); sum += value.get(); // process value }

output.collect(key, new IntWritable(sum)); }}

Mapper

2011년 6월 30일 목요일

Page 52: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException {

String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); }}

public class CallCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

int sum = 0; while (values.hasNext()) { IntWritable value = (IntWritable) values.next(); sum += value.get(); // process value }

output.collect(key, new IntWritable(sum)); }}

Mapper

Reducer

2011년 6월 30일 목요일

Page 53: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException {

String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); }}

public class CallCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

int sum = 0; while (values.hasNext()) { IntWritable value = (IntWritable) values.next(); sum += value.get(); // process value }

output.collect(key, new IntWritable(sum)); }}

public class CallCount {

public static void main(String[] args) { JobClient client = new JobClient(); JobConf conf = new JobConf(WordCount.class);

// specify output types conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class);

// specify input and output dirs FileInputPath.addInputPath(conf, new Path("input")); FileOutputPath.addOutputPath(conf, new Path("output"));

// specify a mapper conf.setMapperClass(KeyCountMapper.class);

// specify a reducer conf.setReducerClass(CallCountReducer.class); conf.setCombinerClass(CallCountReducer.class);

client.setConf(conf); try { JobClient.runJob(conf); } catch (Exception e) { e.printStackTrace(); } }}

Mapper

Reducer

2011년 6월 30일 목요일

Page 54: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException {

String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); }}

public class CallCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {

int sum = 0; while (values.hasNext()) { IntWritable value = (IntWritable) values.next(); sum += value.get(); // process value }

output.collect(key, new IntWritable(sum)); }}

public class CallCount {

public static void main(String[] args) { JobClient client = new JobClient(); JobConf conf = new JobConf(WordCount.class);

// specify output types conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class);

// specify input and output dirs FileInputPath.addInputPath(conf, new Path("input")); FileOutputPath.addOutputPath(conf, new Path("output"));

// specify a mapper conf.setMapperClass(KeyCountMapper.class);

// specify a reducer conf.setReducerClass(CallCountReducer.class); conf.setCombinerClass(CallCountReducer.class);

client.setConf(conf); try { JobClient.runJob(conf); } catch (Exception e) { e.printStackTrace(); } }}

Mapper

Reducer

Driver

2011년 6월 30일 목요일

Page 55: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 56: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

SELECT pnum, count(pnum) FROM cdr GROUP BY pnum;

2011년 6월 30일 목요일

Page 57: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

History of Hive

• Hive development cycle is fast and the developer community is growing rapidly

• Product release cycle is accelerating

Projectstarted

03/08 12/09 02/10 10/10 03/114/09

0.3.0 0.4.0 0.5.0 0.6.0 0.7.0 0.7.1

06/11

2011년 6월 30일 목요일

Page 58: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

History of Hive

• Hive development cycle is fast and the developer community is growing rapidly

• Product release cycle is accelerating

Projectstarted

03/08 12/09 02/10 10/10 03/114/09

0.3.0 0.4.0 0.5.0 0.6.0 0.7.0 0.7.1

06/11

2011년 6월 30일 목요일

Page 59: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

History of Hive

• Hive development cycle is fast and the developer community is growing rapidly

• Product release cycle is accelerating

Projectstarted

03/08 12/09 02/10 10/10 03/114/09

0.3.0 0.4.0 0.5.0 0.6.0 0.7.0 0.7.1

06/11

2011년 6월 30일 목요일

Page 60: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Who use Hive?

http://wiki.apache.org/hadoop/Hive/PoweredBy

2011년 6월 30일 목요일

Page 61: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

UseCase in Hive?

2011년 6월 30일 목요일

Page 62: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Report and ad hoc query

UseCase in Hive?

2011년 6월 30일 목요일

Page 63: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Report and ad hoc query

• Log Analysis

UseCase in Hive?

2011년 6월 30일 목요일

Page 64: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Report and ad hoc query

• Log Analysis

• Social Graph Analysis

UseCase in Hive?

2011년 6월 30일 목요일

Page 65: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Report and ad hoc query

• Log Analysis

• Social Graph Analysis

• Data mining and analysis

UseCase in Hive?

2011년 6월 30일 목요일

Page 66: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Report and ad hoc query

• Log Analysis

• Social Graph Analysis

• Data mining and analysis

• Machine Learning

UseCase in Hive?

2011년 6월 30일 목요일

Page 67: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Report and ad hoc query

• Log Analysis

• Social Graph Analysis

• Data mining and analysis

• Machine Learning

• Dataset cleaning

UseCase in Hive?

2011년 6월 30일 목요일

Page 68: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Report and ad hoc query

• Log Analysis

• Social Graph Analysis

• Data mining and analysis

• Machine Learning

• Dataset cleaning

• Data Warehouse

UseCase in Hive?

2011년 6월 30일 목요일

Page 69: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

2011년 6월 30일 목요일

Page 70: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

select col1 from tab1 where ...

2011년 6월 30일 목요일

Page 71: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

2011년 6월 30일 목요일

Page 72: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

2011년 6월 30일 목요일

Page 73: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

2011년 6월 30일 목요일

Page 74: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

a 123344b 121211c 342434

2011년 6월 30일 목요일

Page 75: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

2011년 6월 30일 목요일

Page 76: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

2011년 6월 30일 목요일

Page 77: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

Parser

2011년 6월 30일 목요일

Page 78: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

Parser

2011년 6월 30일 목요일

Page 79: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB tab1

Parser

2011년 6월 30일 목요일

Page 80: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

tab1

insclause-0

Parser

2011년 6월 30일 목요일

Page 81: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

tab1

insclause-0

col1

Parser

2011년 6월 30일 목요일

Page 82: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

tab1

insclause-0

col1 col2

Parser

2011년 6월 30일 목요일

Page 83: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

tab1

insclause-0

col1 col2

Parser

2011년 6월 30일 목요일

Page 84: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

2011년 6월 30일 목요일

Page 85: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

2011년 6월 30일 목요일

Page 86: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 87: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

TOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

2011년 6월 30일 목요일

Page 88: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

TableScanOperatorTOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

2011년 6월 30일 목요일

Page 89: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

TableScanOperatorTOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

2011년 6월 30일 목요일

Page 90: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperatorTOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

2011년 6월 30일 목요일

Page 91: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperatorTOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

2011년 6월 30일 목요일

Page 92: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperator

SelectOperator

TOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

2011년 6월 30일 목요일

Page 93: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperator

SelectOperator

TOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

2011년 6월 30일 목요일

Page 94: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

TOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

2011년 6월 30일 목요일

Page 95: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

2011년 6월 30일 목요일

Page 96: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

2011년 6월 30일 목요일

Page 97: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 98: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 99: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 100: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

Context

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 101: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPrunerFIL

SELTS

Context

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 102: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPrunerFIL

SELTS

Context

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 103: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 104: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 105: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

col1, col2

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 106: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 107: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

col1, col2, col3

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 108: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 109: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

col1, col2, col3

FilterOperator

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 110: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

2011년 6월 30일 목요일

Page 111: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

2011년 6월 30일 목요일

Page 112: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

TS - GenMRTableScan1

FS - GenMRFileSink1

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 113: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

TS - GenMRTableScan1

FS - GenMRFileSink1

FetchTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 114: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

TS - GenMRTableScan1

FS - GenMRFileSink1

FetchTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 115: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

TS - GenMRTableScan1

FS - GenMRFileSink1

FetchTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 116: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 117: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 118: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 119: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 120: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 121: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

2011년 6월 30일 목요일

Page 122: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

MapRedTask

2011년 6월 30일 목요일

Page 123: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FILOperator

FILOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

SELOperator

FSOperator

2011년 6월 30일 목요일

Page 124: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FILOperator

FILOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

SELOperator

FSOperator

2011년 6월 30일 목요일

Page 125: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Oracle Migration to Hive

2011년 6월 30일 목요일

Page 126: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Oracle to Hive

l BulkLoad

l DDL

l SQL

l Statistic Function

l Analytic Function

2011년 6월 30일 목요일

Page 127: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

l Load

l DDL

l HQL (ANSI-SQL)

l Built-In/UDF/UDAF

l HQL + UDF, Pig, MapReduce

Oracle to Hive

l BulkLoad

l DDL

l SQL

l Statistic Function

l Analytic Function

2011년 6월 30일 목요일

Page 128: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

l Load

l DDL

l HQL (ANSI-SQL)

l Built-In/UDF/UDAF

l HQL + UDF, Pig, MapReduce

Oracle to Hive

l BulkLoad

l DDL

l SQL

l Statistic Function

l Analytic Function

No UpdateNo InsertNo Low Latency

2011년 6월 30일 목요일

Page 129: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Oracle SQL

2011년 6월 30일 목요일

Page 130: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

2011년 6월 30일 목요일

Page 131: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

Table

2011년 6월 30일 목요일

Page 132: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

Table Log /hive/Log

2011년 6월 30일 목요일

Page 133: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

Table

Partition

Log /hive/Log

2011년 6월 30일 목요일

Page 134: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

Table

Partition

Log /hive/Log

time=hour /hive/Log/time=1h

2011년 6월 30일 목요일

Page 135: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

Table

Partition

Bucket

Log /hive/Log

time=hour /hive/Log/time=1h

2011년 6월 30일 목요일

Page 136: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

Table

Partition

Bucket

Log /hive/Log

time=hour /hive/Log/time=1h

/wh/Log/time=1h/part-$hash(phone-num)phone-num

2011년 6월 30일 목요일

Page 137: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

Table

Partition

Bucket

ExternalTable

Log /hive/Log

time=hour /hive/Log/time=1h

/wh/Log/time=1h/part-$hash(phone-num)phone-num

2011년 6월 30일 목요일

Page 138: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

Hive Entity Sample HDFS LOC

Table

Partition

Bucket

ExternalTable

Log /hive/Log

time=hour /hive/Log/time=1h

/wh/Log/time=1h/part-$hash(phone-num)phone-num

customer/app/meta/dir

(arbitrary location)

2011년 6월 30일 목요일

Page 139: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Data Model

HDFS

Partition

part-001

Partition

Bucket

MetaStore

Data LocationBucketing InfoPartitioning Info

/hive/Log/hive/Log/time=1h/hive/Log/time=1h/part-0001

Table

MetaStore DB

2011년 6월 30일 목요일

Page 140: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Column Data Types

2011년 6월 30일 목요일

Page 141: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Column Data Types

• Primitive Types

• int type : tinyint, smallint, int, bigint

• boolean, float, double, string

2011년 6월 30일 목요일

Page 142: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Column Data Types

• Primitive Types

• int type : tinyint, smallint, int, bigint

• boolean, float, double, string

• Nest-able Collections

• array : value(any-type)

• map : key(primitive) and value(any-type)

2011년 6월 30일 목요일

Page 143: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Column Data Types

• Primitive Types

• int type : tinyint, smallint, int, bigint

• boolean, float, double, string

• Nest-able Collections

• array : value(any-type)

• map : key(primitive) and value(any-type)

• User-defined types

• structures with attributes

2011년 6월 30일 목요일

Page 144: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

DataType Convert

2011년 6월 30일 목요일

Page 145: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NUMBER(n)

DataType Convert

2011년 6월 30일 목요일

Page 146: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TINYINTINT/BIGINT

NUMBER(n)

DataType Convert

2011년 6월 30일 목요일

Page 147: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TINYINTINT/BIGINT

NUMBER(n)

NUMBER(n,m)

DataType Convert

2011년 6월 30일 목요일

Page 148: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TINYINTINT/BIGINT

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

DataType Convert

2011년 6월 30일 목요일

Page 149: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TINYINTINT/BIGINT

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

VARCHAR2

DataType Convert

2011년 6월 30일 목요일

Page 150: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TINYINTINT/BIGINT

STRING

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

VARCHAR2

DataType Convert

2011년 6월 30일 목요일

Page 151: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TINYINTINT/BIGINT

STRING

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

DATE

VARCHAR2

DataType Convert

2011년 6월 30일 목요일

Page 152: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

TINYINTINT/BIGINT

STRING “yyyy-MM-dd HH:mm:ss” format

STRING

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

DATE

VARCHAR2

DataType Convert

2011년 6월 30일 목요일

Page 153: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Oracle DML

• HIVE supports ANSI-SQL

• Sub-Queries in FROM clause

• Join query : equi-join/inner-join , outer-join

2011년 6월 30일 목요일

Page 154: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Range Operator

2011년 6월 30일 목요일

Page 155: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Range Operator

BETWEEN ~ AND ~

2011년 6월 30일 목요일

Page 156: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Range Operator

SELECT * from Employee WHERE

salary BETWEEN 100 AND 500;

BETWEEN ~ AND ~

2011년 6월 30일 목요일

Page 157: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Range Operator

SELECT * from Employee WHERE

salary BETWEEN 100 AND 500;

BETWEEN ~ AND ~

SELECT * from Employee WHERE

salary >= 100 AND salary <=500;

2011년 6월 30일 목요일

Page 158: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Range Operator

SELECT * from Employee WHERE

salary BETWEEN 100 AND 500;

BETWEEN ~ AND ~

SELECT * from Employee WHERE

salary >= 100 AND salary <=500;SELECT * from Employee WHERE

BETWEEN(salary,100,500);

2011년 6월 30일 목요일

Page 159: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

IN / EXISTS Clause

2011년 6월 30일 목요일

Page 160: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

IN / EXISTS Clause

IN / EXISTS SubQuery

2011년 6월 30일 목요일

Page 161: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

IN / EXISTS Clause

SELECT * from Employee e WHERE e.DeptNo

IN(SELECT d.DeptNo FROM Dept d)

IN / EXISTS SubQuery

2011년 6월 30일 목요일

Page 162: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

IN / EXISTS Clause

SELECT * from Employee e WHERE e.DeptNo

IN(SELECT d.DeptNo FROM Dept d)

IN / EXISTS SubQuery

SELECT * from Employee e WHERE

EXISTS(SELECT 1 FROM Dept d WHERE e.DeptNo=d.DeptNo)

2011년 6월 30일 목요일

Page 163: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

IN / EXISTS Clause

SELECT * from Employee e WHERE e.DeptNo

IN(SELECT d.DeptNo FROM Dept d)

IN / EXISTS SubQuery

SELECT * from Employee e

LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo)

SELECT * from Employee e WHERE

EXISTS(SELECT 1 FROM Dept d WHERE e.DeptNo=d.DeptNo)

2011년 6월 30일 목요일

Page 164: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NOT IN Clause

2011년 6월 30일 목요일

Page 165: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NOT IN Clause

NOT IN SubQuery

2011년 6월 30일 목요일

Page 166: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NOT IN Clause

SELECT * from Employee e WHERE e.DeptNo

NOT IN(SELECT d.DeptNo FROM Dept d)

NOT IN SubQuery

2011년 6월 30일 목요일

Page 167: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NOT IN Clause

SELECT * from Employee e WHERE e.DeptNo

NOT IN(SELECT d.DeptNo FROM Dept d)

NOT IN SubQuery

SELECT e.* from Employee e

LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo)

WHERE d.DeptNo IS NULL

2011년 6월 30일 목요일

Page 168: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NOT EXIST Clause

2011년 6월 30일 목요일

Page 169: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NOT EXIST Clause

NOT EXIST SubQuery

2011년 6월 30일 목요일

Page 170: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NOT EXIST Clause

SELECT * from Employee e WHERE

NOT EXISTS(SELECT 1 FROM Dept d WHERE e.DeptNo=d.DeptNo)

NOT EXIST SubQuery

2011년 6월 30일 목요일

Page 171: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NOT EXIST Clause

SELECT * from Employee e WHERE

NOT EXISTS(SELECT 1 FROM Dept d WHERE e.DeptNo=d.DeptNo)

NOT EXIST SubQuery

SELECT e.* from Employee e

LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo)

WHERE d.DeptNo IS NULL

2011년 6월 30일 목요일

Page 172: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

LIKE Clause

2011년 6월 30일 목요일

Page 173: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

LIKE Clause

LIKE / NOT LIKE

2011년 6월 30일 목요일

Page 174: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

LIKE Clause

SELECT * from Employee e WHERE name LIKE ’%steve’

LIKE / NOT LIKE

2011년 6월 30일 목요일

Page 175: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

LIKE Clause

SELECT * from Employee e WHERE name LIKE ’%steve’

LIKE / NOT LIKE

SELECT e.* from Employee e WHERE name LIKE ‘%steve’

2011년 6월 30일 목요일

Page 176: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

LIKE Clause

SELECT * from Employee e WHERE name LIKE ’%steve’

LIKE / NOT LIKE

SELECT e.* from Employee e WHERE name LIKE ‘%steve’

SELECT * from Employee e WHERE name NOT LIKE ’%steve’

2011년 6월 30일 목요일

Page 177: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

LIKE Clause

SELECT * from Employee e WHERE name LIKE ’%steve’

LIKE / NOT LIKE

SELECT e.* from Employee e WHERE name LIKE ‘%steve’

SELECT * from Employee e WHERE name NOT LIKE ’%steve’

SELECT e.* from Employee e WHERE NOT name LIKE ‘%steve’

2011년 6월 30일 목요일

Page 178: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

LIKE Clause

SELECT * from Employee e WHERE name LIKE ’%steve’

LIKE / NOT LIKE

SELECT e.* from Employee e WHERE name LIKE ‘%steve’

SELECT * from Employee e WHERE name NOT LIKE ’%steve’

SELECT e.* from Employee e WHERE NOT name LIKE ‘%steve’

2011년 6월 30일 목요일

Page 179: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (1/4)

2011년 6월 30일 목요일

Page 180: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (1/4)

SELF JOIN

2011년 6월 30일 목요일

Page 181: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (1/4)

SELECT *

FROM Employee e1, Employee e2 WHERE e1.ID = e2.Id

SELF JOIN

2011년 6월 30일 목요일

Page 182: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (1/4)

SELECT *

FROM Employee e1, Employee e2 WHERE e1.ID = e2.Id

SELF JOIN

SELECT *

FROM Employee e1 JOIN Employee e2 ON (e1.ID = e2.Id)

2011년 6월 30일 목요일

Page 183: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (2/4)

2011년 6월 30일 목요일

Page 184: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (2/4)

CROSS JOIN (Cartesian Product)

2011년 6월 30일 목요일

Page 185: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (2/4)

SELECT emp.Name, dept.Name FROM Employee emp, Dept dep

CROSS JOIN (Cartesian Product)

2011년 6월 30일 목요일

Page 186: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (2/4)

SELECT emp.Name, dept.Name FROM Employee emp, Dept dep

CROSS JOIN (Cartesian Product)

SELECT emp.Name, dept.Name FROM Employee emp JOIN Dept dep

2011년 6월 30일 목요일

Page 187: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (3/4)

2011년 6월 30일 목요일

Page 188: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (3/4)

LEFT OUTER JOIN

2011년 6월 30일 목요일

Page 189: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (3/4)

SELECT * FROM Emp, Dept

WHERE Emp.deptNo = Dept.deptNo(+)

LEFT OUTER JOIN

2011년 6월 30일 목요일

Page 190: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (3/4)

SELECT * FROM Emp, Dept

WHERE Emp.deptNo = Dept.deptNo(+)

LEFT OUTER JOIN

SELECT * FROM Emp

LEFT OUTER JOIN Dept ON Emp.deptNO = Dept.deptNo

2011년 6월 30일 목요일

Page 191: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (4/4)

2011년 6월 30일 목요일

Page 192: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (4/4)

RIGHT OUTER JOIN

2011년 6월 30일 목요일

Page 193: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (4/4)

SELECT * FROM Emp, Dept

WHERE Emp.deptNo(+) = Dept.deptNo

RIGHT OUTER JOIN

2011년 6월 30일 목요일

Page 194: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

JOIN Operator (4/4)

SELECT * FROM Emp, Dept

WHERE Emp.deptNo(+) = Dept.deptNo

RIGHT OUTER JOIN

SELECT * FROM Emp

RIGHT OUTER JOIN Dept ON Emp.deptNO = Dept.deptNo

2011년 6월 30일 목요일

Page 195: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Oracle Function

2011년 6월 30일 목요일

Page 196: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Condition Function

2011년 6월 30일 목요일

Page 197: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Condition Function

CASE

2011년 6월 30일 목요일

Page 198: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Condition Function

CASE

CASE expr WHEN cond1 THEN r1

[WHEN cond2 THEN r2]* [ELSE r] END

2011년 6월 30일 목요일

Page 199: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Condition Function

CASE

CASE expr WHEN cond1 THEN r1

[WHEN cond2 THEN r2]* [ELSE r] END

CASE expr WHEN cond1 THEN r1

[WHEN cond2 THEN r2]* [ELSE r] END

2011년 6월 30일 목요일

Page 200: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Math Function

2011년 6월 30일 목요일

Page 201: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Math Function

ROUND

2011년 6월 30일 목요일

Page 202: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

ROUND

Math Function

ROUND

2011년 6월 30일 목요일

Page 203: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

ROUND

Math Function

ROUND

CEIL

2011년 6월 30일 목요일

Page 204: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

2011년 6월 30일 목요일

Page 205: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

MOD

2011년 6월 30일 목요일

Page 206: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

PMOD

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

MOD

2011년 6월 30일 목요일

Page 207: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

PMOD

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

POWER

MOD

2011년 6월 30일 목요일

Page 208: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

POW/POWER

PMOD

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

POWER

MOD

2011년 6월 30일 목요일

Page 209: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

POW/POWER

PMOD

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

POWER

MOD

SQRT

2011년 6월 30일 목요일

Page 210: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

SQRT

POW/POWER

PMOD

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

POWER

MOD

SQRT

2011년 6월 30일 목요일

Page 211: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

SQRT

POW/POWER

PMOD

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

POWER

SIN/COS

MOD

SQRT

2011년 6월 30일 목요일

Page 212: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

SIN/COS

SQRT

POW/POWER

PMOD

CEIL/CEILING

ROUND

Math Function

ROUND

CEIL

POWER

SIN/COS

MOD

SQRT

2011년 6월 30일 목요일

Page 213: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

2011년 6월 30일 목요일

Page 214: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

SUBSTR

2011년 6월 30일 목요일

Page 215: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

SUBSTRSUBSTR

2011년 6월 30일 목요일

Page 216: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

SUBSTRSUBSTR

TRIM

2011년 6월 30일 목요일

Page 217: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

TRIM

SUBSTRSUBSTR

TRIM

2011년 6월 30일 목요일

Page 218: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

TRIM

SUBSTRSUBSTR

TRIM

LPAD/RPAD

2011년 6월 30일 목요일

Page 219: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

LPAD/RPAD

TRIM

SUBSTRSUBSTR

TRIM

LPAD/RPAD

2011년 6월 30일 목요일

Page 220: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

LPAD/RPAD

TRIM

SUBSTRSUBSTR

TRIM

LTRIM/RTRIM

LPAD/RPAD

2011년 6월 30일 목요일

Page 221: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

LTRIM/RTRIM

LPAD/RPAD

TRIM

SUBSTRSUBSTR

TRIM

LTRIM/RTRIM

LPAD/RPAD

2011년 6월 30일 목요일

Page 222: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

LTRIM/RTRIM

LPAD/RPAD

TRIM

SUBSTRSUBSTR

TRIM

LTRIM/RTRIM

LPAD/RPAD

REPLACE

2011년 6월 30일 목요일

Page 223: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Character Function

REGEXP_REPLACE

LTRIM/RTRIM

LPAD/RPAD

TRIM

SUBSTRSUBSTR

TRIM

LTRIM/RTRIM

LPAD/RPAD

REPLACE

2011년 6월 30일 목요일

Page 224: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NULL Function

2011년 6월 30일 목요일

Page 225: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NULL Function

COALESCE

2011년 6월 30일 목요일

Page 226: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NULL Function

COALESCECOALESCE

2011년 6월 30일 목요일

Page 227: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NULL Function

COALESCECOALESCE

NVL

2011년 6월 30일 목요일

Page 228: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NULL Function

Custom UDF

COALESCECOALESCE

NVL

2011년 6월 30일 목요일

Page 229: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NULL Function

Custom UDF

COALESCECOALESCE

NVL

NVL2

2011년 6월 30일 목요일

Page 230: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

NULL Function

Custom UDF

Custom UDF

COALESCECOALESCE

NVL

NVL2

2011년 6월 30일 목요일

Page 231: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Condition Function

•DECODE

• Null Comparison Function

•NVL / NVL2

• Type Conversion

•TO_NUMBER

•TO_CHAR

•TO_DATE

Custom UDF Function

2011년 6월 30일 목요일

Page 232: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Oracle Analytic Function

2011년 6월 30일 목요일

Page 233: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

2011년 6월 30일 목요일

Page 234: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Joins, WHERE, GROUP BY clauses are performed

2011년 6월 30일 목요일

Page 235: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Joins, WHERE, GROUP BY clauses are performed

the analytic functions are performed with the result set

2011년 6월 30일 목요일

Page 236: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Joins, WHERE, GROUP BY clauses are performed

the analytic functions are performed with the result set

ORDER BY clause is processed

2011년 6월 30일 목요일

Page 237: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

name dept salary---------------------a Research 100b Research 100c Sales 200d Sales 300e Research 50f Accounting 200g Accounting 300h Accounting 400i Research 10

Rank salary in dept

2011년 6월 30일 목요일

Page 238: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

name dept salary---------------------a Research 100b Research 100c Sales 200d Sales 300e Research 50f Accounting 200g Accounting 300h Accounting 400i Research 10

2011년 6월 30일 목요일

Page 239: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Map

Map

Map

name dept salary---------------------a Research 100b Research 100c Sales 200d Sales 300e Research 50f Accounting 200g Accounting 300h Accounting 400i Research 10

2011년 6월 30일 목요일

Page 240: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Map

Map

Map

d Sales 300e Research 50f Accounting 200

a Research 100b Research 100c Sales 200

g Accounting 300h Accounting 400i Research 10

2011년 6월 30일 목요일

Page 241: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Map

Map

Map

d Sales 300e Research 50f Accounting 200

a Research 100b Research 100c Sales 200

g Accounting 300h Accounting 400i Research 10

DISTRIBUTED BY dept

2011년 6월 30일 목요일

Page 242: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Map

Map

Map

Reduce

Reduce

d Sales 300e Research 50f Accounting 200

a Research 100b Research 100c Sales 200

g Accounting 300h Accounting 400i Research 10

DISTRIBUTED BY dept

2011년 6월 30일 목요일

Page 243: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Map

Map

Map

Reduce

Reduce

c Sales 200g Accounting 300h Accounting 400d Sales 300f Accounting 200

g Research 300h Research 400e Research 300i Research 10

DISTRIBUTED BY dept

2011년 6월 30일 목요일

Page 244: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Map

Map

Map

Reduce

Reduce

SORT BY dept, salary

i Research 10g Research 300e Research 300h Research 400

c Sales 200d Sales 300f Accounting 200g Accounting 300h Accounting 400

2011년 6월 30일 목요일

Page 245: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Map

Map

Map

Reduce

Reducei Research 10g Research 300e Research 300h Research 400

c Sales 200d Sales 300f Accounting 200g Accounting 300h Accounting 400

2011년 6월 30일 목요일

Page 246: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

Map

Map

Map

Reduce

Reducei Research 10g Research 300e Research 300h Research 400

c Sales 200d Sales 300f Accounting 200g Accounting 300h Accounting 400

12123

1234

RANK(dept,salary)

2011년 6월 30일 목요일

Page 247: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

2011년 6월 30일 목요일

Page 248: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

RANK

2011년 6월 30일 목요일

Page 249: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

SELECT name,dept,salary,RANK() OVER (PARTITION BY dept

ORDER BY salary DESC) FROM emp

RANK

2011년 6월 30일 목요일

Page 250: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

SELECT name,dept,salary,RANK() OVER (PARTITION BY dept

ORDER BY salary DESC) FROM emp

RANK

SELECT e.name,e.dept,e.salary,RANK(e.dept,e.salary) FROM (SELECT name, dept, salary FROM emp DISTRIBUTED BY dept SORT BY dept, salary DESC) e

2011년 6월 30일 목요일

Page 251: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Analytic Function

SELECT name,dept,salary,RANK() OVER (PARTITION BY dept

ORDER BY salary DESC) FROM emp

RANK

SELECT e.name,e.dept,e.salary,RANK(e.dept,e.salary) FROM (SELECT name, dept, salary FROM emp DISTRIBUTED BY dept SORT BY dept, salary DESC) e

RANK(arg1,arg2) - Custom UDF

2011년 6월 30일 목요일

Page 252: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Hive Optimization& Future Work

2011년 6월 30일 목요일

Page 253: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Tuning Parameter

2011년 6월 30일 목요일

Page 254: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Tuning Parameter

• Hadoop Tunning

2011년 6월 30일 목요일

Page 255: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Tuning Parameter

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

2011년 6월 30일 목요일

Page 256: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Tuning Parameter

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

2011년 6월 30일 목요일

Page 257: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Tuning Parameter

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

2011년 6월 30일 목요일

Page 258: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Tuning Parameter

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

• dfs.block.size

2011년 6월 30일 목요일

Page 259: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Tuning Parameter

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

• dfs.block.size

• Hive Tunning

2011년 6월 30일 목요일

Page 260: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Tuning Parameter

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

• dfs.block.size

• Hive Tunning

• hive.input.format = CombineHiveInputFormat

2011년 6월 30일 목요일

Page 261: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• Develop UDF to optimize number of MR jobs

• Extend GenericUDF to avoid java reflection

• Avoid creating new objects in UDF

UDF/UDAF

2011년 6월 30일 목요일

Page 262: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Future Work

2011년 6월 30일 목요일

Page 263: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• HiveQL SQL Compliance

• HIVE-282 - IN statement for WHERE clauses

• HIVE-192 - Add TIMESTAMP column type

• HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types

Future Work

2011년 6월 30일 목요일

Page 264: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• HiveQL SQL Compliance

• HIVE-282 - IN statement for WHERE clauses

• HIVE-192 - Add TIMESTAMP column type

• HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types

• Analytic Function

• HIVE-896 - Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive

• HIVE-952 - Support analytic NTILE function

Future Work

2011년 6월 30일 목요일

Page 265: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

• HiveQL SQL Compliance

• HIVE-282 - IN statement for WHERE clauses

• HIVE-192 - Add TIMESTAMP column type

• HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types

• Analytic Function

• HIVE-896 - Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive

• HIVE-952 - Support analytic NTILE function

• Optimization

• HIVE-1694 - Accelerate GROUP BY execution using indexes

• HIVE-482 - Optimize Group By + Order By with the same keys

Future Work

2011년 6월 30일 목요일

Page 266: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 267: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 268: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Hive

Oracle 2 Hive

2011년 6월 30일 목요일

Page 269: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

A system for managing and querying structured data built on top of Hadoop

Hive

Oracle 2 Hive

2011년 6월 30일 목요일

Page 270: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

A system for managing and querying structured data built on top of Hadoop

Hive

Oracle 2 Hivedata modelANSI-SQLbuilt-in function / custom UDFanalytic function

2011년 6월 30일 목요일

Page 271: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 272: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

2011년 6월 30일 목요일

Page 273: SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Question ?

2011년 6월 30일 목요일