Visualizing database performance hotsos 13-v2

Post on 26-Jan-2015

115 views 4 download

Tags:

description

 

Transcript of Visualizing database performance hotsos 13-v2

Visualizing Database Performance with R

Gwen Shapira, Senior ConsultantFebruary, 2013

About Me– Oracle ACE Director– Member of Oak Table– 14 years of IT

– Performance Tuning– Troubleshooting– Hadoop

– Presents, Blogs, Tweets

– @gwenshap

© 2013 Pythian2

About Pythian• Recognized Leader:

– Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server

– Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments

• Expertise:

– Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs.

– Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC

• Global Reach & Scalability:

– Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response

© 2013 Pythian3

Will Talk About:

• Data pre-processing tools• Visualization tools and techniques• How to make great looking charts• What makes visuals effective• How to avoid visualization mistakes

Will NOT Talk About:

• How to collect performance data• Cool ASH queries• How to program in R• Statistics• Machine Learning• What the data actually means• How to explain the results to your boss

Why Visualize?

• Yet another analysis tool• But more fun• Highly effective

• Communications tool, too• But not at the same time

© 2013 Pythian6

Reveal Structure in Data

Visualization Tools

R Studio

© 2013 Pythian9

© 2013 Pythian10

Getting Data In Shape

Use the DB, Luke

© 2013 Pythian11

Aggregate

Scale

Filter

Getting DB Data to Rlibrary(RJDBC)drv <-JDBC("oracle.jdbc.driver.OracleDriver",

"/Users/grahn/code/jdbc/ojdbc6.jar")

conn<-dbConnect(drv,

"jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl","grahn","grahn")

# import the data into a data.framelfs <-dbGetQuery(conn,

"select SAMPLE_ID, TIME_WAITED from ashdump where EVENT='log file sync’ order by SAMPLE_ID")

© 2013 Pythian12

With R"NAME","SNAP_TIME","BYTES""free memory",12-03-09 00:00:00,645935368"KGH: NO ACCESS",12-03-09 00:00:00,325214880"db_block_hash_buckets",12-03-09 00:00:00,186650624"free memory",12-03-09 00:00:00,134211304"shared_io_pool",12-03-09 00:00:00,536870912"log_buffer",12-03-09 00:00:00,16924672"buffer_cache",12-03-09 00:00:00,21676163072"fixed_sga",12-03-09 00:00:00,2238472"JOXLE",12-03-10 04:00:01,27349056"free memory",12-03-10 04:00:01,105800192"free memory",12-03-10 04:00:01,192741376"PX msg pool",12-03-10 04:00:01,8192000

© 2013 Pythian13

Reshapeshared_pool <- read.csv(~/shapira/shared_pool.csv")install.packages("reshape")library(reshape)max_shared_pool<-

cast(shared_pool,SNAP_TIME ~ NAME,max)

© 2013 Pythian14

Time free memory log_buffer buffer_cache12-03-09 00:00:00 645935368 16924672 21676163072

12-03-09 04:00:00 192741376

With R

© 2013 Pythian15

out of scale

Select Subset of data

max_shared_pool <- subset(max_shared_pool, select = -c(buffer_cache))

boxplot((max_shared_pool)/1024/1024,xlab="Size in MBytes",horizontal=TRUE,las=1,par(mar=c(4,6,2,1))

)© 2013 Pythian16

With R

© 2013 Pythian17

More SubsetsSAMPLE_ID TIME_WAITED WAIT_CLASS EVENT

10526629 14929 User I/O cell single block physical read

10526629 5015 User I/O cell single block physical read

10465699 21572 Concurrency library cache: mutex X

10465699 65938 Concurrency library cache: mutex X

© 2013 Pythian18

new <- subset (old, row filter, column filter)

phys_io <- subset(ash, WAIT_CLASS ==

“User I/O”, select = -

c(EVENT))

Filtering Data

© 2013 Pythian19

SAMPLE_ID TIME_WAITED WAIT_CLASS

10526629 14929 User I/O

10526629 5015 User I/O

Another Filtering Syntax

© 2013 Pythian20

short_waits <- subset(ash, ash$TIME_WAITED < 10000)

short_waits <- ash[ash$TIME_WAITED < 10000,]

SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT

10526629 5015 User I/O cell single block physical read

Not a Typo!

Summarize with DDPLYinstall.packages(”plyr")library(plyr)

ash2 <- ddply(ash, ”SAMPLE_ID”, summarise,N=length(TIME_WAITED),

mean=mean(TIME_WAITED),max=max(TIME_WAITED));

© 2013 Pythian21

SAMPLE_ID N MEAN MAX

10526629 2 9972 14929

10465699 2 43755 65938

Cheating for DBAs

© 2013 Pythian22

library(sqldf)

ash2 = sqldf('select SAMPLE_ID, count(*) N, mean(TIME_WAITED), max(TIME_WAITED)from ash where WAIT_CLASS=“User I/O”group by SAMPLE_ID')

When all else fails

Text is text.Frits Hoogland converts 10046 trace to CSV for R with SED:

s/^\(WAIT\)\ #\([0-9]*\):\ nam='\(.*\)'\ ela=\ *\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ obj#=\([0-9\-]*\)\ tim=\([0-9]*\)$/\1|\2|\3|\4|\5|\6|\7|\8|\9/

© 2013 Pythian23

Exploring Data

© 2013 Pythian24

Directions to Explore

• Shape of data• Correlations• Changes over time

© 2013 Pythian25

The Goal of Analysis is a Story

• Who • What• When• Where• Why • Why• Why• Why• Why

© 2013 Pythian26

Boxplot

• Initial step• Identify outliers• Compare groups• Summarize

© 2013 Pythian27

Fail?

75% of exports take

less than 600m

For Example:

© 2013 Pythian28

WHAT?

How its done?

ash <- read.csv('~/Downloads/ash1.csv')

boxplot(ash$TIME_WAITED/1000000 ~ ash$WAIT_CLASS, xlab="Wait Class",ylab="Time Waited (s)",cex.axis=1.2)

© 2013 Pythian29

Scatter Plot

• Incredibly versatile• Use to:

– Show changes over time– Show correlations– Highlight trends– Find model– Pretty much everything

© 2013 Pythian30

© 2013 Pythian31

WHAT?

Log Data

© 2013 Pythian32

How its done?

install.packages("ggplot2")library(ggplot2)ggplot(ash,

aes(SAMPLE_ID,TIME_WAITED, color=factor(WAIT_CLASS)))+geom_point();ggplot(ash,

aes(SAMPLE_ID,log(TIME_WAITED), color=factor(WAIT_CLASS)))

+geom_point();

© 2013 Pythian33

Only ”Small Waits”

© 2013 Pythian34

500us Physical

IO?

Filtering

small_waits <- ash[ash$TIME_WAITED<15000,]

ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITED,color=factor(WAIT_CLASS))) + geom_point()

© 2013 Pythian35

Smoothing

© 2013 Pythian36

Smoothing

ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_smooth()

ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_point() + geom_smooth()

© 2013 Pythian37

Data over Time

© 2013 Pythian38

11gR2!

Finding Correlation

© 2013 Pythian39

© 2013 Pythian40

Regression (is not Causation)

How?concurr2 <- ddply(concurr,.(SAMPLE_ID), summarise,

N=length(TIME_WAITED), mean=mean(TIME_WAITED),

max=max(TIME_WAITED));

ggplot(concurr2,aes(N,max/1000000))+geom_point()+geom_smooth(method=lm)+xlab("Number of Samples")+ylab("Max Time Waited (s)")

© 2013 Pythian41

Heatmap

• Values as “blocks” in a matrix

• Clearer than scatter plot for large amounts of data

• Shows less information

• Performance data made sexy

© 2013 Pythian42

Heatmap

© 2013 Pythian43

How?ash2 <- ddply(concurr,.(SAMPLE_ID), summarise,N=length(TIME_WAITED), mean=mean(TIME_WAITED),

max=max(TIME_WAITED))ash2 <- ash2[ash2$WAIT_CLASS %in% c("Concurrency","User I/O","Other"),]

ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) + geom_tile(aes(fill = log(N))) + scale_fill_gradient(low = ”green”, high = ”red")

© 2013 Pythian44

Presenting Your Data

© 2013 Pythian45

FACT

“Even irrelevant neuroscience information in an explanation of a psychological phenomenon may interfere with people’s abilities to critically consider the underlying logic of this explanation.”

© 2013 Pythian46

47

Numerical quantities focus on expected values –

graphical summaries on unexpected values

--John Tukey

Our goal is an interesting presentation.

What is “Interesting”?

• Surprise• Beauty• Stories• Visuals• Counterintuitive• Variety

© 2013 Pythian48

Bad Visualizations Lie

1. Omit important data2. Distort data3. Misleading 4. Confusing 5. Fake correlations and Bad models

© 2013 Pythian49

Bad vs. Good Visuals

© 2013 Pythian50

Eye-API

• Good:– distances– locations– length– high contrast

• Bad:– shades– relative area– angles

© 2013 Pythian51

Good or Bad?

© 2013 Pythian52

© 2013 Pythian53

#1 Mistake – Throw a line on Data

© 2013 Pythian54

© 2013 Pythian55

Avoid Pie Charts

© 2013 Pythian56

Infographics always have Pie Charts

© 2013 Pythian57

Which is better?

© 2013 Pythian58

Creativity is Allowed

© 2013 Pythian59

Make it Beautiful – for Geeks

• Contrast• Reduce noise• Few colors• Few fonts• Lots of Data• More Signal• Less Noise

© 2013 Pythian60

IMPORTant R Libraries

• reshape• plyr• ggplot2• sqldf• http://blog.revolutionanalytics.com/

2013/02/10-r-packages-every-data-scientist-should-know-about.html

© 2013 Pythian61

Other Visualization Tools

• R + R Studio• Excel• Gephi• JIT, D3.js• Excel• ggobi

© 2013 Pythian62

Thank you – Q&A

To contact us

sales@pythian.com

1-877-PYTHIAN

To follow us

http://www.pythian.com/blog

http://www.facebook.com/pages/The-Pythian-Group/163902527671

@pythian

http://www.linkedin.com/company/pythian

© 2013 Pythian63