Post on 15-Dec-2015
Edward TrettelNorthwest Airlines, Inc.
edd.trettel@nwa.comedd.trettel@comcast.net
612-726-7434 (w)
763-780-3941 (h)763-438-6244 (m)
Contact Info
Click to Add Title
THE UNSUCCESSFUL
SELF-TREATMENT OF
A CASE OF
"WRITER'S BLOCK"'
FORECASTING DATABASE DISK SPACE
REQUIREMENTS:
A POOR MAN’S
APPROACH
Personality Inventory
• Which Shape do you prefer?Linear thinker. Precise. Accurate. Analytical. Tactical.
Holistic thinker. Creative. Artistic. Strategic.
Obsessed with Queuing Theory
• Northwest Airlines is the world's fourth largest airline
• With its global travel partners, NWA serves over:– 750 destinations– In 120 countries– On 6 continents.
• The U.S. system spans 49 states and DC
• Hub Cities:– Amsterdam– Detroit– Memphis– Minneapolis/St. Paul– Tokyo
“Necessity, who is the mother of invention.”Plato, The Republic
• NWA’s Distributed Database Environment:– Sybase, Oracle, UDB, MS SQL Server– 200 Database Instances– 400 Databases
• Hosting Operating Systems:– Sun Solaris– IBM Aix– Windows Server
The Problem
• 1,660 pager events per year on distributed database issues.
• 1,041 (62%) of these were for databases exceeding their 95% disk space full limits.
• At 100% full the database stops processing.• Management of these 400 databases’ disk space needs
was being done in a reactive, day-to-day manner by staff looking at the individual values inside each databases, using DBMS-specific interfaces.
Databases, Tablespaces, andFile Systems
File System
Tablespace
Scope of the Problem
•200 Database Instances• Hosting 400 Databases
• Consisting of 2,800 tablespaces• Made up of 5,100 OS files
These resources were managed reactively using good ‘ol “IEB-eyeball”
The SolutionOctober 2000 to Present
• Gather a small number of database disk space size metrics from each of these databases on an automatic, daily, unattended basis and put them into a database.
• Apply regression analysis techniques to see if there were any consistent growth (or decline) rates over the course of a year.
• Create forecasts on a per data holder basis, beaming out six months in the future.
• Leverage other descriptive statistics as well.• Provide for multi-dimensional analyses.
The “Data Holder” Concept
• Introduced as a a common construct across the disparate Sybase, Oracle, and UDB architectures.
• Refers to:– “Tablespaces” (and “datafiles”, “containers”) in Oracle
and UDB– Database devices and databases in Sybase.
The Collected Data
• The date and time of the collection• The instance name• The DBMS type (Sybase, Oracle, UDB)• The tablespace or database name• The number of bytes_allocated to this tablespace or
database• The number of bytes_free in this tablespace or
database• The number of bytes_used (derived as the difference
between the number allocated and number free)
(All gathered into a single table in a database)
The Collectors
• Straightforward SELECT statement• Gather size information from an instance’s data
dictionary or “system catalog”• Zero-maintenance. Have run daily and unattended for 6
years• Simplicity
Regression Analysis
• Used MS OLAP Services• Vetted “time” against Bytes Used on a per data holder, per instance
basis.• Used a year’s worth of daily observations.
y = mx + b
Bytes Used = slope(date) + constant• Where will we be six months from now?
y′ = m(current date+180) +b• Will we have a surplus or a deficit in disk space then?
Current Date Bytes Allocated - y′100GB - 87GB = 13GB (surplus)
78GB - 87GB = -9GB (deficit)
Benefits of Having the Data
• A Number of Derived Measures• Bytes Used:
(bytes_allocated – bytes_free)• Percent Used
(Bytes_used/bytes_allocated)*100• Percent Free
(bytes_free/bytes_allocated)*100
Correlation
• The Pearson product moment correlation coefficient (R2)• Values ranged all over the place from 0 to 1.• Since this number equates to the percent of the variance
observed in the dependent variable (bytes used) that’s accounted for by the independent variable (time), we were able assess the reliability (and usability) of our forecasts.
Beyond Forecasting:Additional Insights Provided by the
Data• Pivot tables of Time (along the x-axis) vs. Bytes Used
(along the y-axis) were constructed along these dimensions:– DBMS Name (Sybase, Oracle, UDB)– Instance Name– Data Holder Name
• This permitted dicing-and-slicing the data in a number of ways.
Questions Askedand Answered
• What’s the pattern of bytes_used over the past year for:– All Oracle instances?– All Sybase instances?– All UBD instances?– Oracle and Sybase combined?– Oracle and UDB combined?– Sybase and UDB combined?– Sybase and Oracle and UDB combined?
Questions Askedand Answered
• What’s the pattern of bytes_used over the past year for:– Any individual instance?– Any combination of instances? (Note this also
permits any combinations of instances of interest, regardless of the DBMS that’s hosting them.)
Questions Askedand Answered
• What’s the pattern of bytes_used over the past year for:– Any individual data holder? (Note that one must enter
an instance name for this to be meaningful. Otherwise it would show the total value for all data holders that have that name, regardless of the instance name.)
– Any combination of data holders?
Pivot Chart of Bytes Used
Database SizesBytes in Use
53,900,000,000
53,950,000,000
54,000,000,000
54,050,000,000
54,100,000,000
54,150,000,000
54,200,000,000
40 41 42 43 44 45 46 47 48 49 50 51 52 53 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 38 39 40
2004 2005
Year and Week-of-Year
Bytes in Use by Database
dbms_prd_name (All) db_instance crmop1 data_holder_name (All)
Sum of Bytes_Used
dbs_year dbs_week
Pivot Chart of Percent Used
Database SizesPercent Data_Holder in Use
(Only meaningful if a data_holder_name is selected, and that data_holder name is unique across DBMSs and instances. Supply further dbms_prd_name and db_instance criteria to ensure
uniqueness.)
13.80
14.00
14.20
14.40
14.60
14.80
15.00
15.20
15.40
40 41 42 43 44 45 46 47 48 49 50 51 52 53 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
2004 2005
Year and Week-of-Year
Percent Data Holder Full
dbms_prd_name (All) db_instance faiop3 data_holder_name FAIS_INDEXE
Sum of Percent_Full
dbs_year dbs_week
Benefits
• With this data now published on a regular monthly basis to the intranet, the consumers of it have gained considerable insights into the seasonal and other variations in their data usage patterns.
• The work group which is responsible for acquiring disk space for the entire IS organization can now set realistic budget values for next year’s disk space requirements, based upon the higher level rollups of the bytes_used data.
• Pager call reduction: the 1,041 pages that were previously issued per year for database disk space problems dropped to only a handful.
Benefits – The Sequel
• The rates of growth of the various applications or business systems at the organization were now quantified and published. This allowed the IT organization to compare those rates between applications, year-over-year, etc.
• The organization can now identify any anomalous rates that might indicate that an application change (intended or not) or business driver variation was having a significant impact on the rate at which data was being accrued in a database.
• Descriptive statistics can be compared between data holders to better understand their central tendencies and dispersion characteristics.
“Statistics is the grammar of science.”Karl Pearson
British mathematician and statistician
(1857 - 1936)
Questions ?
Thank you !!!
Pizza !!!!!!!!!