Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service...

33
Oracle Big Data Cloud Service Presented by : Mandeep Kaur Sandhu Senior Oracle DBA Download these slides from : mandysandhu.com

Transcript of Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service...

Page 1: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

Oracle Big Data Cloud Service

Presented by : Mandeep Kaur SandhuSenior Oracle DBA

Download these slides from : mandysandhu.com

Page 2: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Introduction to Big Data• Oracle Big data deployment models• Oracle Big Data cloud Service• Core Principles• Access and Admin tasks• Data Management tools• Event Hub • Conclusion

2

Goals

Page 3: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

3

What is Big Data??

Batch

Streaming Data

Terabytes

Zettabytes

Structured andUnstructured

Structured

VarietyVelocity

Volume

• Big data is a term that describe Large or complex datasets

• Traditional data Processing system failed to analyse this data

• Big data identify the value of data

Page 4: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

An open Source Software Platform for distributed storage and processing – Highly Scalable , Reliable and Available

4

What is Hadoop??

Hadoop

Logically Distributed file system

Framework for processing

Designed to run on small/large machine for parallel processing

Allow resource Growth

Avoid Vendor Locks in

Page 5: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

HDFS MapReduce

HDFS stores the data in cluster

• NameNode• DataNode

5

Two Components

Page 6: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

Programming Model for processing large data sets

• Map - set of data and converts into another set of data • Reduce – Take output of Map as input and combine into smaller set

MapReduce

6

Page 7: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

7

Oracle Big Data Deployment Models

Oracle Big Data Cloud service model

delivered in your data centre, behind your

firewall

Oracle Big Data Cloud at

Customer(BDCC)

On- Premises engineered system designed to deliver predictable Hadoop

infrastructure

Oracle Big Data Appliance X6

Oracle public cloud infrastructure with cluster nodes and

data sources

Oracle Big Data Cloud Service

(BDCS)

Page 8: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

Operational Efficiency• Out of box installation • Automated cluster management• Cloudera Manager

Security• Data in encrypted – At rest and motion• Authorization and Authentication• Network Firewall

Versatility• Cloudera distribution – Apache Hadoop Enterprise Data hub• Install and operate third party software

8

BDCS - Core Principles

Page 9: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

Highly Efficient Cluster Management • Fault Tolerant – HA Hadoop Infrastructure• Fully tested Hadoop upgrades

Cluster Nodes• Cluster is a collection of nodes• Permanent nodes• Edge Nodes• Compute Nodes

9

BDCS - Features

Page 10: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Master or Data node • Last for the lifetime of the cluster• Each nodes has:

• 32 OCPU’s• 256 GB RAM• 48 TB Storage• Full Cloudera distribution – Licence and Support

10

Permanent Nodes

Page 11: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Empty Nodes – OS and disk• Hadoop client configs• Interface between Hadoop cluster and outside

Network• Permanent node

Note: No data Node role

11

Edge Nodes

Page 12: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• CPU and Memory• No disks• Temporary nodes• Need to Have cluster to add compute nodes• Cluster can be extended up to 15 cluster

compute nodes• No HDFS data

12

Compute Nodes

Page 13: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Oracle Linux 6 and Oracle Java – JDK8• Cloudera Enterprise (Data Hub Edition)

• CDH 5.X with support for YARN and MR2• Cloudera Impala• HBASE• Cloudera Search• Apache Spark

• Oracle R distribution• Oracle Big Data Spatial and Graph

13

BDCS – Included Software

Page 14: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

Oracle Big Data SQL Cloud Service• Unified SQL access• Dedicated instances

14

BDCS – Additional Component

Oracle cloud

Cloudera 12c

B X

Page 15: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Login to Oracle cloud • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes • Additional Node – Added later• Big Data SQL node

15

Oracle BDCS – Service Instance

Page 16: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Go to Oracle big data service instance • Create service cluster• Provide tags and Instance Name

16

Oracle BDCS – Service Cluster

Page 17: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Select Big data Appliance system – Service instance• SSH keys

17

Oracle BDCS – Service Cluster

Page 18: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

Starter pack 1 –> 3 instancesLowest IP address –> Master Node

18

Oracle BDCS – Admin page

Page 19: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• You can connect via– opc• CLI – bdacli• Overall information about cluster

19

Oracle BDCS – Connect

Page 20: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Open Cloudera console • Username/password

20

Access Cloudera console

Page 21: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Add nodes in one node increment – up to total 60 nodes• Four Permanent Hadoop nodes – Allow additional Edge Node

• Extend/Shrink the service

21

Administrative Tasks

Page 22: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Open Cloudera console – Hue

• Same account detail as CM• Add Group• Add User• Upload file

22

Hue – Group/user and File upload

Page 23: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• GUI based console• Login username – bigdatamgr• Explore jobs and data stored• Usage and Health of cluster• YARN jobs

23

Big Data Manager Console

Page 24: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Zeepelin Notebooks – Interactive analysis using R and Python

24

Oracle Big Manager - Notebook

Page 25: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

odcp• Command line for copy large files• Take input and split it into chunks• Uses spark to provide parallel transfer

Examples:

odcp hdfs:///user/mandy/bigdata01.csv hdfs:///user/mandy/bigdata01.csv_copy

odcp hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy

odcp hdfs:///user/mandy/bigdata01.csv s3://aserver/bigdata01.csv_copy

odcp s3://user/mandy/bigdata01.csv s3://mandy01/bigdata01.csv_copy

25

Data Management - odcp

Page 26: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

odiff• Oracle distribution diff – To compare large Data sets• Compatible with cloudera distribution• Minimum block size to compare – 5MB• Maximum – 2GB

Examples:

/usr/bin/odiff hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy

/usr/bin/odiff -V hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy

/usr/bin/odiff -d hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy

26

Data Management - odiff

Page 27: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

bda-oss-admin• To Manage data and resources• Can set the environment variables• Configure the cluster with storage provider

Examples:

bdm-oss-admin --cm-username admin --cm-password abce1234

bdm-oss-admin restart_cluster

#!/bin/bashexport CM_ADMIN="my_CM_admin_username"

27

Data Management

Page 28: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

bdm-cli • Big data command line interface to copy data and mange copy jobs• Duplicate of odcp commands

bdm-cli copy

bdm-cli create_job

28

Data Management – bdm-cli

Page 29: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

Oracle Big Data Cloud Service

Direct ingest into oracle BDCS

29

Data ingest options

Customer Data Centre

Flume

SCP

SCP(SSH protocol)

Common ingests using Flume or ETL work

VPN and FastConnect

Page 30: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Open Source stream processing• Real time streaming• High throughput and Low latency platform

30

Apache Kafka

Steams ProcessingIOT

Anomaly Detection

Data IntegrationData Lakes

HDFSObjects storage

Log AggregationClick Streams

Server logs

MessagingTraditional AppsMicros-services

Page 31: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Fully Managed streaming data platform• Provide world’s most popular message broker( kafka)

• Flexible• Available full managed and dedicated deployment option• Elastic – horizontally and Vertically

• Access• REST API access• SSH access to Kafka cluster

31

Oracle Event Hub Cloud Service

Page 32: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

• Start you big data journey now• Built and populate a data lake• Help business to solve the problems by using data• Register for oracle cloud free trail

https://cloud.oracle.com/tryit

32

Conclusion

Page 33: Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes ... Oracle Big Data Cloud Service. Direct ingest into

Thank you for your time!!

Follow and Subscribe Me.

Blog mandysandhu.com Twitter @mandysandhu14 LinkedIn kaurmandeep88