Exadata MAA Best Practices Series Session 6: Migrating to Exadata
Exadata Performance Troubleshooting Methodology · Title: DOAG_2016_presentation_final Author: Jim...
Transcript of Exadata Performance Troubleshooting Methodology · Title: DOAG_2016_presentation_final Author: Jim...
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Exadata Performance Troubleshooting Methodology
James Viscusi Consulting Member of Technical Staff
Andrew Bulloch Architect
Server Technologies - Maximum Availability Architecture Team
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
THE FOLLOWING IS INTENDED TO OUTLINE OUR GENERAL PRODUCT DIRECTION. IT IS INTENDED FOR INFORMATION PURPOSES ONLY, AND MAY NOT BE INCORPORATED INTO ANY CONTRACT. IT IS NOT A COMMITMENT TO DELIVER ANY MATERIAL, CODE, OR FUNCTIONALITY, AND SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISIONS. THE DEVELOPMENT, RELEASE, AND TIMING OF ANY FEATURES OR FUNCTIONALITY DESCRIBED FOR ORACLE'S PRODUCTS REMAINS AT THE SOLE DISCRETION OF ORACLE.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
The questions
How do I monitor my Exadata environment? What parameters are most important?
What thresholds do I set to monitor my Exadata using Enterprise Manager?
How do I diagnose a performance problem involving Exadata?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
Level Setting – Exadata
What to do before problems occur?
What do we do when problems occur?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What is Exadata?• First and foremost Exadata is a platform to run Oracle databases in a
highly available and performant manner
• The hardware and software stack are tightly integrated. The components are tested by Oracle and work together, making the solution extremely performant.
• Every generation of Exadata is designated as X2, X3, X4, etc. The current naming standard is iterative and increases one number each hardware release. Available on Intel of SPARC chipsets
• The second part of the name is either a -2 or -8. These indicate the number of sockets on each compute node
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What makes up an Exadata Database Machine?
• Storage/Cell Servers
• Compute/Database Servers
• Infiniband Switches
• Ethernet Switch
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What do we do before problems occur?
• Configure Enterprise Manager metric extensions • Understand Key Performance Indicators (KPIs) for Exadata • View KPIs using Systems and Services in Enterprise Manager • Configure Adaptive Thresholds in EM 13 for Exadata KPIs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What are Metrics in Enterprise Manager?• A metric is a stored piece of information used to monitor a target
Type of Metrics – Metrics can be information collected by the EM Agent – Derived from information stored in the repository
• Metric Extensions – Metrics that are custom defined by users.
• Can be server side or repository side
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Metrics and Thresholds• Enterprise Manager has a comprehensive set of metrics that allow
thresholds to be defined on all target types. – Thresholds Allow for alerting if a chosen metric crosses a certain value
• Server ( Compute Node) Metrics – monitored as any other host target (memory, i/o , CPU, network )
• Cell Server Metrics – Creates incidents on all alerts received from the cell(SNMP Based)
• Database Metrics – Database Time Spent Waiting, Throughput, Efficiency
– One problem – Enterprise Manager monitors many metrics!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Key Performance Indicators• What Is a KPI?
– A quantifiable measurement used to determine server health or performance
• Defined a set of KPIs – Compute Nodes – Storage Servers – Infiniband Switches
• KPIs are defined and explained in: – http://www.oracle.com/technetwork/database/availability/exadata-storage-server-
kpis-2855362.pdf – Also reference MOS Note 2094648.1
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Compute and Infiniband KPIs
Compute Nodes
• CPU Utilization • Memory Utilization • Load Average • Swap Utilization
Infiniband Switches
• CPU Usage • Memory Percent Used • Root filesystem usage • SSH Session Count
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Storage Server Key Performance Indicators• Use Metric Extensions to create compound
metrics
• KPIs for a storage Storage Server aggregate read and write data – Create Metric Extensions (again in MOS 2094648.1)
• Disk IOPS • Disk Throughput • Response Time • I/O Load • Cell Health
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Services • Metric Extensions with Services allows a holistic view of the
storage grid – incidents will be created whenever warning or critical thresholds are
crossed
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
However… (another often asked question)• Using thresholds in Enterprise Manager allows users to be alerted in
the event metrics show an issue – i.e CPU usages exceeds a specified amount
• KPIs do not have universal values. They can differ depending on many things – Customer Requirements – Environment Usage
• Defining one set of thresholds that works for every customer/environment is not feasible
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Adaptive Thresholds (new in EM 13)• Use the collected metrics to make a data driven recommendation
for each specific system – Analyze the data over a 1-4 week window
• Not all metrics are eligible (but the ones we need are!) • Two methods of collecting the data from the paper
– Dynamic – Guided
• Companion Paper to the KPI paper – http://www.oracle.com/technetwork/database/availability/exadata-
adaptive-thresholds-3102556.pdf
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Customizing Adaptive Threshold collections
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Adaptive Threshold Data analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Adaptive Threshold Final Setting
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
AWR Baselines
• Collection of snapshots used for performance comparisons.
• Baselines are retained within the AWR even after the retention time for the data has been reached.
• Exadata should have a moving and a static baseline in place to capture different workloads.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What to do when problems occur?
• Review
• Rule Out Hardware
• Compare
• Drill down
2
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Checklist can be very useful!!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Check the Obvious
DB Machine Home Page • Contains a lot of good information at a quick glance
• Incident Manager • Alert Logs • Grid Infrastructure • ASM • Databases
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Incident Manager(Contd.) – maybe drop this slide?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What Changed?If there is a problem what has changed? And who might know?
Considerations • Patch levels (everywhere!) • Schema • Tunable OS parameters • Resource Management Plans • Code Changes • ADDM Comparison Report
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Compare Configurations- Exadata Level
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Compare Configuration- Database Level
• EM Job to compare one ‘reference’ database against one or more other databases
• Job can be scheduled on a repetitive basis, or run ad-hoc
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Compare Configurations- Schema Level
Capture Schema baselines
Compare schema’s
• With a baseline
• Between different databases
Synchronize schema’s
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Engineered Systems Health checks - OraChk
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
So, where are we?
Ruled out hardware issues….
Rule out configuration changes ….
So… drill down into the hardware and running SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Drill down Paths
After all changes, including any environments are ruled out…
• Start with AWR, ADDM and ASH
• Identify outliers / worst performing (Pareto)
• Review top wait events
• SQL Tuning Advisor
• If resource constrained consider resource limiting strategies
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Compare ADDM
• Full ADDM analysis across two AWR snapshot periods • Detects causes, measure effects, then correlates them ➢ Causes : Workload changes, Configuration changes ➢ Effects : Regressed SQL, Reach resource limits (CPU, I/O, memory, interconnect)
• Makes actionable recommendations along with quantified impact
AWR Snapshot Period 1
AWR Snapshot Period 2
Analysis ReportCompare Period ADDM
SQL Commonality
Regressed SQL
I/O Bound
Undersized SGA
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
SQL Advisors• SQL Tuning Set
– Capture of SQL in the database – Ideally run after a representative workload has been run
• Access Advisor – Analyses access patterns of SQL in the cache, or from a defined workload,
and gives recommendations on how to (re)structure the database objects • Performance Analyzer
– Analyses before and after images of SQL in a SQL Tuning Set (Testing scenarios)
– Compares results • Tuning Advisor
– Requires Tuning Sets (Collection of SQL from the database) – Analyses the SQL from a SQL Tuning Set and gives recommendations
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
SummaryHow do monitor my Exadata environment? What parameters are most important?
KPIs for Exadata Compute, Cells and Infiniband Switches
What thresholds do I set to monitor my Exadata using Enterprise Manager?
Enterprise Manager 13 - Adaptive Thresholds
How do I diagnose a performance problem involving Exadata?
Incident Review, Baseline comparison, Drill Down
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Want to Learn More?
Whitepapers and Links:
http://www.oracle.com/goto/maa (Enterprise Manager)
Engineered Systems Manageability Section
http://www.oracle.com/technetwork/database/availability/exadata-health-resource-usage-2021227.pdf
-and others…
https://blogs.oracle.com/EMMAA/
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Contact Me
Email: [email protected]
Twitter: @jviscusi
LinkedIn: Jim Viscusi