Visualizing database performance hotsos 13-v2

Visualizing Database Performance with R

Gwen Shapira, Senior ConsultantFebruary, 2013

About Me– Oracle ACE Director– Member of Oak Table– 14 years of IT

– Performance Tuning– Troubleshooting– Hadoop

– Presents, Blogs, Tweets

– @gwenshap

About Pythian• Recognized Leader:

– Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server

– Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments

• Expertise:

– Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs.

– Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC

• Global Reach & Scalability:

– Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response

Will Talk About:

• Data pre-processing tools• Visualization tools and techniques• How to make great looking charts• What makes visuals effective• How to avoid visualization mistakes

Will NOT Talk About:

• How to collect performance data• Cool ASH queries• How to program in R• Statistics• Machine Learning• What the data actually means• How to explain the results to your boss

Why Visualize?

• Yet another analysis tool• But more fun• Highly effective

• Communications tool, too• But not at the same time

Reveal Structure in Data

Visualization Tools

R Studio

Getting Data In Shape

Use the DB, Luke

Aggregate

Filter

Getting DB Data to Rlibrary(RJDBC)drv <-JDBC("oracle.jdbc.driver.OracleDriver",

"/Users/grahn/code/jdbc/ojdbc6.jar")

conn<-dbConnect(drv,

"jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl","grahn","grahn")

# import the data into a data.framelfs <-dbGetQuery(conn,

"select SAMPLE_ID, TIME_WAITED from ashdump where EVENT='log file sync’ order by SAMPLE_ID")

With R"NAME","SNAP_TIME","BYTES""free memory",12-03-09 00:00:00,645935368"KGH: NO ACCESS",12-03-09 00:00:00,325214880"db_block_hash_buckets",12-03-09 00:00:00,186650624"free memory",12-03-09 00:00:00,134211304"shared_io_pool",12-03-09 00:00:00,536870912"log_buffer",12-03-09 00:00:00,16924672"buffer_cache",12-03-09 00:00:00,21676163072"fixed_sga",12-03-09 00:00:00,2238472"JOXLE",12-03-10 04:00:01,27349056"free memory",12-03-10 04:00:01,105800192"free memory",12-03-10 04:00:01,192741376"PX msg pool",12-03-10 04:00:01,8192000

Reshapeshared_pool <- read.csv(~/shapira/shared_pool.csv")install.packages("reshape")library(reshape)max_shared_pool<-

cast(shared_pool,SNAP_TIME ~ NAME,max)

Time free memory log_buffer buffer_cache12-03-09 00:00:00 645935368 16924672 21676163072

12-03-09 04:00:00 192741376

With R

out of scale

Select Subset of data

max_shared_pool <- subset(max_shared_pool, select = -c(buffer_cache))

boxplot((max_shared_pool)/1024/1024,xlab="Size in MBytes",horizontal=TRUE,las=1,par(mar=c(4,6,2,1))

With R

More SubsetsSAMPLE_ID TIME_WAITED WAIT_CLASS EVENT

10526629 14929 User I/O cell single block physical read

10465699 21572 Concurrency library cache: mutex X

10465699 65938 Concurrency library cache: mutex X

new <- subset (old, row filter, column filter)

phys_io <- subset(ash, WAIT_CLASS ==

“User I/O”, select = -

c(EVENT))

Filtering Data

SAMPLE_ID TIME_WAITED WAIT_CLASS

10526629 14929 User I/O

10526629 5015 User I/O

Another Filtering Syntax

short_waits <- subset(ash, ash$TIME_WAITED < 10000)

short_waits <- ash[ash$TIME_WAITED < 10000,]

SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT

Not a Typo!

Summarize with DDPLYinstall.packages(”plyr")library(plyr)

ash2 <- ddply(ash, ”SAMPLE_ID”, summarise,N=length(TIME_WAITED),

mean=mean(TIME_WAITED),max=max(TIME_WAITED));

SAMPLE_ID N MEAN MAX

10526629 2 9972 14929

10465699 2 43755 65938

Cheating for DBAs

library(sqldf)

ash2 = sqldf('select SAMPLE_ID, count(*) N, mean(TIME_WAITED), max(TIME_WAITED)from ash where WAIT_CLASS=“User I/O”group by SAMPLE_ID')

When all else fails

Text is text.Frits Hoogland converts 10046 trace to CSV for R with SED:

s/^$WAIT$\ #$[0-9]*$:\ nam='$.*$'\ ela=\ *$[0-9]*$\ [0-9a-z\ #|]*=$[0-9]*$\ [0-9a-z\ #|]*=$[0-9]*$\ [0-9a-z\ #|]*=$[0-9]*$\ obj#=$[0-9\-]*$\ tim=$[0-9]*$$/\1|\2|\3|\4|\5|\6|\7|\8|\9/

Exploring Data

Directions to Explore

• Shape of data• Correlations• Changes over time

The Goal of Analysis is a Story

• Who • What• When• Where• Why • Why• Why• Why• Why

Boxplot

• Initial step• Identify outliers• Compare groups• Summarize

75% of exports take

less than 600m

For Example:

How its done?

ash <- read.csv('~/Downloads/ash1.csv')

boxplot(ash$TIME_WAITED/1000000 ~ ash$WAIT_CLASS, xlab="Wait Class",ylab="Time Waited (s)",cex.axis=1.2)

Scatter Plot

• Incredibly versatile• Use to:

– Show changes over time– Show correlations– Highlight trends– Find model– Pretty much everything

Log Data

How its done?

install.packages("ggplot2")library(ggplot2)ggplot(ash,

aes(SAMPLE_ID,TIME_WAITED, color=factor(WAIT_CLASS)))+geom_point();ggplot(ash,

aes(SAMPLE_ID,log(TIME_WAITED), color=factor(WAIT_CLASS)))

+geom_point();

Only ”Small Waits”

500us Physical

Filtering

small_waits <- ash[ash$TIME_WAITED<15000,]

ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITED,color=factor(WAIT_CLASS))) + geom_point()

Smoothing

ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_smooth()

ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_point() + geom_smooth()

Data over Time

11gR2!

Finding Correlation

Regression (is not Causation)

How?concurr2 <- ddply(concurr,.(SAMPLE_ID), summarise,

N=length(TIME_WAITED), mean=mean(TIME_WAITED),

max=max(TIME_WAITED));

ggplot(concurr2,aes(N,max/1000000))+geom_point()+geom_smooth(method=lm)+xlab("Number of Samples")+ylab("Max Time Waited (s)")

Heatmap

• Values as “blocks” in a matrix

• Clearer than scatter plot for large amounts of data

• Shows less information

• Performance data made sexy

Heatmap

How?ash2 <- ddply(concurr,.(SAMPLE_ID), summarise,N=length(TIME_WAITED), mean=mean(TIME_WAITED),

max=max(TIME_WAITED))ash2 <- ash2[ash2$WAIT_CLASS %in% c("Concurrency","User I/O","Other"),]

ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) + geom_tile(aes(fill = log(N))) + scale_fill_gradient(low = ”green”, high = ”red")

Presenting Your Data

“Even irrelevant neuroscience information in an explanation of a psychological phenomenon may interfere with people’s abilities to critically consider the underlying logic of this explanation.”

Numerical quantities focus on expected values –

graphical summaries on unexpected values

--John Tukey

Our goal is an interesting presentation.

What is “Interesting”?

• Surprise• Beauty• Stories• Visuals• Counterintuitive• Variety

Bad Visualizations Lie

1. Omit important data2. Distort data3. Misleading 4. Confusing 5. Fake correlations and Bad models

Bad vs. Good Visuals

Eye-API

• Good:– distances– locations– length– high contrast

• Bad:– shades– relative area– angles

Good or Bad?

#1 Mistake – Throw a line on Data

Avoid Pie Charts

Infographics always have Pie Charts

Which is better?

Creativity is Allowed

Make it Beautiful – for Geeks

• Contrast• Reduce noise• Few colors• Few fonts• Lots of Data• More Signal• Less Noise

IMPORTant R Libraries

• reshape• plyr• ggplot2• sqldf• http://blog.revolutionanalytics.com/

2013/02/10-r-packages-every-data-scientist-should-know-about.html

Other Visualization Tools

• R + R Studio• Excel• Gephi• JIT, D3.js• Excel• ggobi

Thank you – Q&A

To contact us

sales@pythian.com

1-877-PYTHIAN

To follow us

http://www.pythian.com/blog

http://www.facebook.com/pages/The-Pythian-Group/163902527671

@pythian

http://www.linkedin.com/company/pythian

Visualizing database performance hotsos 13-v2

Documents

Transcript of Visualizing database performance hotsos 13-v2

Visualizing Time

Copyright and Confidential (c) 2005 Hotsos Enterprises, Ltd. Tracking Workloads within Oracle E-Business Suite 11i Larry Klein Hotsos Enterprises, Ltd.

AWR, ASH with EM13 at HotSos 2016

Visualizing and Verbalizing. What is visualizing and verbalizing? Visualizing is directly related to language comprehension, language expression, and.

Visualizing Fractions

Visualizing algebra

Visualizing Algorithms

Visualizing Narcocultura

Visualizing anthropology

Visualizing Sound

Hotsos 08 regarding_capacity_1_9c

Visualizing Reasoning

Visualizing courses

Visualizing 3D

Visualizing structures

Visualizing Evolution

Visualizing Synthesis

Visualizing Meaning

Visualizing data1

Visualizing Angiogenesis