DBAs and R - At the Intersection of Oracle and ... · 5 Ashokkumar Sivasankaran – ACXIOM " Senior...

Post on 30-Sep-2020

3 views 0 download

Transcript of DBAs and R - At the Intersection of Oracle and ... · 5 Ashokkumar Sivasankaran – ACXIOM " Senior...

1

DBAs and R - At the Intersection of Oracle and Unstructured Data

2

Introduction

3 3

Today’s Objectives §  Understand the R Language. §  Understand the Data Visualization and it’s value. §  Learn the basic constructs of R §  See R in Action via a Demo §  Learn how Oracle is integrating R into it’s relational database product line.

4 4

Robert Dawson – Meta7

q Oracle Master Consultant, Meta7q AVP, Enterprise Databases, Oppenheimer Funds, Denver, COq Oracle DBA, Janus Capital, Denver, COq Oracle Application DBA, Blue Cross Blue Shield Denver, CO

5 5

Ashokkumar Sivasankaran – ACXIOM q Senior Team Leader and Database Architect, Acxiom ITO q OCE RAC Expert & OCP Database Administrator 7.3 to 11g q  ITIL V3 Foundation Certified q Member Chicago Oracle User Group q Chicago “RAC Attack” Instructor

6

About You.

7 7

About You. How do you learn? Do you like to read and access to content on media? Do you like digest information from charts, diagrams, timelines or maps? Do you enjoy hands-on activities involving movement? Verbal Learner

Visual Learner

Kinesthetic learner

8 8

Think about these three questions? What is your learning style? What is the learning style of your boss? What is the learning style of your “customer”?

9

DBAs and R - At the Intersection of Oracle and Unstructured Data

10 10

The United States of Data(bases)

The Mainframe Colonies

The Relational Heartland

The NoSQL Outpost

The Hadoop States Somewhere at the Intersection of Relational and Unstructured…..

11

The Big Data Story

12 12

The Big Data Landscape

13 13

R is NOT a Big Data Tool 1.  It’s a Data Tool. 2.  Leveraged by Data Scientists, Analysts, Developers, Engineers, Planners

and Researches. 3.  Open Source. 4.  Processes large sets of data fast!

14 14

What Data Tools are we using?

15 15

Telling the Visual Data Story…. “The most common data display is a noun accompanied by a number. For example, a medical patient's current level of glucose is reported in a clinical record as a word and number.” – Edward Tufte

Source: http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR

16

Stay Relevant.

17 17

The Data-Driven Organization

Warby-Parker - New York Online Sunglass Companywww.warbyparker.com

Carl Anderson @LeapingLlamas

18 18

The Data-Driven Organization

"People want to move from a culture of reporting to a culture of analytics”

19 19

Are you a Data Driven DBA? Data Questions about your Oracle Databases? 1.  What are the average IOPS for you databases during peak load. 2.  How many average active sessions does your primary DB support. 3.  What is the typical HCC compression ratio for you Exadata Storage? 4.  How many executions does you TOP SQL complete every day? 5.  What is the average DOP for your SQL Statements? 6.  What is you average and max CPU utilization? 7.  How many hours do you spend performing refreshes a month? 8.  How many Oracle Core are you utilizing today?

How are you making your next Hardware purchase decision? How do you know you are ready to expand?

20 20

What’s your learning Style?

Verbal Learner

Visual Learner

Kinesthetic learner

You Boss Customer

21 21

Traditional DBA Reporting Tools AWR Reports Vendor Tools OEM Graphics ADDM Report

22 22

R – AWR Reporting PDF File Load.

23

R In Action: Demo

24 24

R Language Basics §  Developed at Bell Labs (est. 2004) §  Open Source §  Runs on Windows, OSX, Linux, Unix §  Interpreted Language vs. Compiled §  Session-based §  http://www.r-project.org/ §  5,000 packages available. CRAN: Comprehensive R Archive Network

25 25

Key Components of R 1.  Simple Data: Vectors 2.  Compound Data Stored in: Data.Frame,Matrix, List 3.  Functional Programming 4.  Shared Code: Packages 5.  Graphic Packages: qplot(), ggplot(), hist()

26 26

Things to Remember about R-Basics

R is not Perl, Sed or Awk. 1.  Data.Frames = Tables 2.  Package-based. 3.  Use help() 4.  Graphic are ‘Packages’

27 27

R Development Tools

28 28

Demo: Data Load from Excel File (5000 rows) > awr_data <- read.xlsx2("awr-io-waits.csv.xlsx", 1, colClasses = c(snap_id="numeric",wait_class="character",event_name="character",wait_time_milli="numeric",wait_count="numeric")) > str(awr_data) 'data.frame': 5000 obs. of 5 variables: $ SNAP_ID : num 8195 8195 8195 8195 8195 ... $ WAIT_CLASS : Factor w/ 3 levels "Commit","System I/O",..: 3 3 3 3 3 3 3 3 3 2 ... $ EVENT_NAME : Factor w/ 4 levels "db file scattered read",..: 1 1 2 2 2 2 2 2 2 3 ... $ WAIT_TIME_MILLI: num 1 2 1 2 4 8 16 32 64 1 ... $ WAIT_COUNT : num 3 1 255 23 33 100 118 70 16 585 ... Key Things to remember: ü Columns are variables ü Rows are observations

29 29

Demo: Data Head > head(awr_data) SNAP_ID WAIT_CLASS EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT 1 8195 User I/O db file scattered read 1 3 2 8195 User I/O db file scattered read 2 1 3 8195 User I/O db file sequential read 1 255 4 8195 User I/O db file sequential read 2 23 5 8195 User I/O db file sequential read 4 33 6 8195 User I/O db file sequential read 8 100

30 30

Demo: Simple Table Group by w/ Pie Graph §  > table(awr_data$WAIT_CLASS) Commit System I/O User I/O 1197 1378 2425 §  > pie(table(awr_data$WAIT_CLASS))

31 31

Demo: Group by Pie Chart > table(awr_data$EVENT_NAME)

db file scattered read db file sequential read log file parallel write log file sync

714 1711 1378 1197

> pie(table(awr_data$EVENT_NAME))

32 32

Demo: Table Graphic Plot plot(awr_data$EVENT_NAME)

33 33

Demo: Table Plot Multicolumn.

> plot(awr_data$EVENT_NAME,awr_data$WAIT_COUNT)

34

Oracle R Extension: Use R with Oracle

35 35

Some Limitations Data Analysts Face with R. 1.  Memory-based processing. 2.  Data Extraction is time-consuming and painful! 3.  Data Security not included in program. 4.  Programing is “adhoc”, not “production-ready” 5.  Users are not typically, “IT”.

Oracle doesn’t have these limitations.

36 36

The Oracle R Products

Oracle R Distribution

Oracle R Enterprise (AA)

Oracle R Advanced for Hadoop (Connectors)

R Oracle (Package)

37 37

R on TechNet

38