An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management
-
Upload
vlad-georgescu -
Category
Technology
-
view
847 -
download
1
description
Transcript of An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management
LOGO
www.sp2.fr
An Ontology-Based Autonomic System for
Improving Data Warehouses by Cache
http://www.polytech.univ-nantes.fr/COD/
by CacheAllocation Management
Vlad Nicolicin-Georgescu, Henri BriandRemi Lehn and Vincent Benatier
Knowledge and Experience Management Workshop FG-WM 200922/09/2009
LOGO
Contents
Introduction1
Problematic2
Knowledge Management3
Vlad Nicolicin Georgescu
3
Autonomic Computing4
Results6
Combining the Elements5
Conclusions and Future Directions7
22/09/2009
LOGO
� Decision Support Systems
� Computerized systems with the main goal to analyze a series of facts and give propositions for acting regarding the facts involved – Business Intelligence
Introduction
3
involved – Business Intelligence
� Their core is the analytical (derived) data which is translated into data warehouse (architecture) with the help of data marts (the bricks) (Inmon, 2005)
� The challenge: managing the data warehouses efficiently(cost, performance and resource scaling)
Vlad Nicolicin Georgescu22/09/2009
LOGO
Contents
Introduction1
Problematic2
Knowledge Management3
Vlad Nicolicin Georgescu
3
Autonomic Computing4
Results6
Combining the Elements5
Conclusions and Future Directions7
22/09/2009
LOGO
� Enterprises’ decision support systems – at the end of thefirst year up to 90% of data warehouse efforts isconsidered as failure (Frolick and Lindsey, 2003)
� The main causes
Problematic - Industrial
5
� The main causes� Bad management - manual configurations, manual maintenance
operations, bad scaling of systems resources � Bad performance due to inefficient common resource sharing
between groups and conglomerates� Increase of the data warehouse size with time� Any of the data may be accessed at any time: ‘Give me what I
want so I can tell you what I really want’
Vlad Nicolicin Georgescu22/09/2009
LOGO
� High costs of data warehouse maintenance (due toprevious causes) translated into:
� Need for increase in a systems hardware resources(normal cost)
Problematic – Industrial
6
(normal cost)
� Need for decisional experts to configure and maintaindata warehouses (more costly)
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Example� 10 Data warehouses and shared RAM memory� 1 data warehouse requires 20GB of RAM -> 200GB of RAM
• Costly high (sometimes not a problem)• Architecturally impossible (stuck!)
Problematic – Industrial
7
� How to reallocate and manage?� To manage them the enterprise makes use of an expert to
configure and maintain how the memory is allocated based on each data warehouse’s needs : priority, usage period, changes in the architecture etc
� The problem repeats recursively� Too hard to sustain due to cost and human limits
Vlad Nicolicin Georgescu22/09/2009
LOGO
� How to manage efficiently decision support systems:� How to formalize non structured data from different
sources (editors readme, forums, html ..)
� How to render various processes (RAM memory
Problematic – Scientific
8
� How to render various processes (RAM memory allocation between groups of data warehouse) autonomic based on the formalized knowledge
� Finding suitable algorithms for resource allocation and parameter configuration (cache memory ) in groups of data warehouse
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Building knowledge bases based on decision supportsystems - Ontologies and Ontology Based Rules
� Autonomic Computing based on the knowledge bases& algorithms for improving data warehouse performance
Problematic – Scientific
9
& algorithms for improving data warehouse performance
� Combining the notions of knowledge formalization withthe notions of autonomic computing for data warehousemanagement
Vlad Nicolicin Georgescu22/09/2009
LOGO
Contents
Introduction1
Problematic2
Knowledge Management3
Vlad Nicolicin Georgescu
3
Autonomic Computing4
Results6
Combining the Elements5
Conclusions and Future Directions7
22/09/2009
LOGO
� Manage data warehouse for improving itsperformances
Knowledge Management
11
� Knowledge division in the knowledge base toexpress a decision support system
Vlad Nicolicin Georgescu22/09/2009
LOGO
� The measure of performance: query response time fordata retrieval operations
� Analytical data is presented as opposed to operationaldata by being retrieval time relaxed (Inmon, 2005)
Knowledge ManagementData Warehouse Performance
12
data by being retrieval time relaxed (Inmon, 2005)
� True : if the operations we speak of concern aggregation andcalculation operations (i.e. during night)
� Not so true : when performing data retrieval tasks for rapportgeneration (day usage of the data warehouse)
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Several propositions for query response timeimprovement:
� (Malik et al, 2008): how to design physically data basesthroughout caches – data base and architecture oriented
Knowledge ManagementData Warehouse Performance
13
throughout caches – data base and architecture oriented
� (Saharia and Babad, 2000): determining which data is mostlikely to be accessed so it can be stored into caches - workswell for single data warehouse improvement and concerns thedata requested rather than on how to modify the datawarehouse parameters.
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Our proposition for dividing knowledge to represent adecision support system
� Three main types
Knowledge ManagementKnowledge Division
14
� Architectural
� Configuration and performance
� Experience and advice/best practices
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Architectural information� What components are part of a decision support systems� How are these entities linked and how do they exchange� What are the common resources characteristic for each entity
and shared between the
Knowledge ManagementKnowledge Division
15 Vlad Nicolicin Georgescu22/09/2009
LOGO
� Configuration and performance indicators (forEssbase multidimensional cubes)
� For each of the data warehouse: index file and data file size (how much space does it occupy on the disk )
Knowledge ManagementKnowledge Division
16
(how much space does it occupy on the disk )
� Three types of caches: index, data file and data cache
� Query response time on data retrieval operations
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Experience and best practices
� More delicate due to its subjectivity and non structured form in which the information finds itself
Knowledge ManagementKnowledge Division
17
� Represents all knowledge concerning decision support system and data warehouse management (in any form)
� Comes from several sources
� Formalized under the form of rules knowledge base , such as Event Condition Rules (Huebscher et al, 2008)
Vlad Nicolicin Georgescu22/09/2009
LOGO
Contents
Introduction1
Problematic2
Knowledge Management3
Vlad Nicolicin Georgescu
Knowledge Management3
Autonomic Computing4
Results6
Combining the Elements5
Conclusions and Future Directions7
22/09/2009
LOGO
� Previous propositions of representing self managing systems:
� Inspired by the functioning of the human body (Wang, 2007)
Autonomic Computing
19
� Self-healing systems to be further on elaborated to self-X systems (Gosh et al., 2007)
� Proposition made by IBM in 2001, and refined towards the current known form (IBM, 2001)
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Autonomic computing - the ability for an IT infrastructure to adapt and change in accordance with business policies and objectives, guiding systems to be (IBM, 2001):
Autonomic Computing
20
� Self-configuring
� Self-healing
� Self-optimizing
� Self-protecting
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Autonomic Computing Manager : automates the self-Xfunctions and externalizes these functions according tothe behavior defined by the management interfaces(IBM, 2001). The MAPE-K loop:
Autonomic ComputingAutonomic Computing Manager
21 Vlad Nicolicin Georgescu22/09/2009
LOGO
� We propose the implementation of the loop on each of the levels from the architecture of the decision support system
� Each entity has its own individual loop and is related to
Autonomic ComputingAutonomic Computing Manager
22
� Each entity has its own individual loop and is related to the superior entities only
� Each entity’s manager has two ‘responsibilities’:� Its individual self-management� Its direct children management
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Retaking the Decision Support System’s schema
Autonomic ComputingAutonomic Computing Manager
23 Vlad Nicolicin Georgescu22/09/2009
LOGO
� Self-improvement algorithm:� Specific for the individual loop of each of the data warehouse� Executed at the end of each day when statics over the usage of
the data warehouse are gathered and its parameters can be changed
Autonomic ComputingAlgorithms Self-Improvement
24
� Tries to improve the cache allocation for a data warehouse by repetitively decreasing the cache values up to a certain limit:
• Step : the amount of cache decrease at each time period (CV –cache value)
CV1 = CV0 - (CVmax –CV0)*step• Delta : the threshold at which the algorithm stops. The impact that a
cache modification has. If (RT1-RT0)/RT0 < delta then we accept the new cache proposition. (RT – average query response time)
Vlad Nicolicin Georgescu22/09/2009
LOGO
Autonomic ComputingAlgorithms Self-Improvement
25 Vlad Nicolicin Georgescu22/09/2009
LOGO
�Group improvement algorithm� Specific for each application (seen as a group of data
warehouse)
� Has the role of reallocating caches periodically between the data
Autonomic ComputingGroup-ImprovementAlgorithm
26
� Has the role of reallocating caches periodically between the data warehouses in the group depending on their average performance
� ‘The catch’: by a small sacrifice (delta) of some data warehouses there is important performance gain to others
� How to distinguish between performance and nonperformance data warehouses?
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Performance data warehouse: its average query response time is under the average response time of the group
� Non-performance data warehouse: the ones that are above (the equal can go in one of the two categories)
Autonomic ComputingGroup-ImprovementAlgorithm
27
above (the equal can go in one of the two categories)
Vlad Nicolicin Georgescu22/09/2009
LOGO
Contents
Introduction1
Problematic2
Knowledge Management3
Vlad Nicolicin Georgescu
Knowledge Management3
Autonomic Computing4
Results6
Combining the Elements5
Conclusions and Future Directions7
22/09/2009
LOGO
� Bringing the Knowledge Management , Autonomic Computing and Algorithms all together
� Knowledge bases are formalized with the help of OWL ontologies and ontology based rules
Combining the elements
29
ontologies and ontology based rules
� Autonomic Computing Managers are implemented with the help of ontology based rules and Java programs
� Algorithms are formalized by ontologies , rules and java programs
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Ontology : explicit formal specifications of the terms in the domain and relations among them (Grubber, 1992)
� It expresses:� The hierarchical inclusion relations between entities (taxonomy)
Combining the elementsKnowledge base
30
� The hierarchical inclusion relations between entities (taxonomy) � The inter-entity concept relations that makes it much more
powerful than a taxonomy
� Used with several knowledge formalization approaches
Vlad Nicolicin Georgescu22/09/2009
LOGO
� OWL: � W3C recommendation in xml based format for ontology representation � Evolved from the RDF
� It provides the main concepts of:� Individual : an instance of ‘something’, the actual concept itself (i.e.
John , Mary, Bob )
Combining the elementsKnowledge base
31
John , Mary, Bob )� Class : a group of individuals belonging to a same set having common
properties (i.e. John, Mary, Bob are Human , John, Bob are Men)� Property : a characteristic of an individual that makes it different form
others and allows him to belong to a class • Data type property : links an individual to a literal value (John is 30
years old )• Object property : links an individual to other individuals (John is the
friend of Mary, Mary hates Bob)
� Sentence representation: (subject, predicate, object) – (John, hasAge, 30)
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Used to formalize the first two types of information: architectural and configuration/performance
� The ‘static’ aspect of the approach� An OWL representation of a data warehouse
Combining the elementsKnowledge base
32 Vlad Nicolicin Georgescu22/09/2009
LOGO
� The dynamic part of the knowledge management aspect
� The rules that formalize:� The passage between the four states of the Autonomic
Computing Manager
Combining the elementsAutonomic Computing
33
Computing Manager� How does the knowledge base in the middle of the loop
connects with each state � How the two algorithms are implemented over the loop
� We base our approach on previous works to using autonomic computing with ontologies (Stojanovic, 2004)
Vlad Nicolicin Georgescu22/09/2009
LOGO
� Autonomic Computing Manager loop phases applied on the levels of the decision support systems
Combining the elementsAutonomic Computing
34 Vlad Nicolicin Georgescu22/09/2009
LOGO
� Described using Jena Ontology based rules � Example of the data warehouse individual self-improving
algorithm
Combining the elementsAlgorithms
35 Vlad Nicolicin Georgescu22/09/2009
LOGO
Contents
Introduction1
Problematic2
Knowledge Management3
Vlad Nicolicin Georgescu
Knowledge Management3
Autonomic Computing4
Results6
Combining the Elements5
Conclusions and Future Directions7
22/09/2009
LOGO
� Scenario:� With Oracle Hyperion Essbase BI solution� An Essbase application with two data warehouses (DW1 and
DW2) � A period of 14 days to see how each data warehouse improves
Results
37
and how the application relocates the memory� A random series of queries (from a given pool) is done on each
data warehouse each day� Individual self-improvement algorithm runs each day� Group reallocation algorithm runs each 4 days
Vlad Nicolicin Georgescu22/09/2009
LOGO
Results
38 Vlad Nicolicin Georgescu22/09/2009
LOGO
� At the end of day 5 we have a good ratio response time/cache allocation
� The data warehouses improve themselves (individual algorithm) fast and then oscillate around this point
Results
39
algorithm) fast and then oscillate around this point (DW2)
� At the end of the 6th day:� DW2 looses 2% in response time� DW1 gains around 80%� The application has reduced its memory consumption with 60%.
Vlad Nicolicin Georgescu22/09/2009
LOGO
Contents
Introduction1
Problematic2
Knowledge Management3
Vlad Nicolicin Georgescu
Knowledge Management3
Autonomic Computing4
Results6
Combining the Elements5
Conclusions and Future Directions7
22/09/2009
LOGO
� We have presented a common problematic in enterprises today: knowledge management in decision support systems
� We have presented how can we formalize data
Conclusions & Future DirectionsConclusions
41
� We have presented how can we formalize data warehouses with the help of ontologies and ontology based rules data
� We have seen how we can enable autonomy by using Autonomic Computing
� We presented results over a test on a real applicationVlad Nicolicin Georgescu22/09/2009
LOGO
� Extension of the parameters used for data warehouse performance: calculation time, aggregation time etc.
� Introduction of Service License Agreement (SLA) notions for defining data warehouse usage
Conclusions & Future DirectionsFuture directions
42
notions for defining data warehouse usage specifications
� Extension of the knowledge base so it can be enriched in an autonomic way
� Introduction of attenuation in algorithms to avoid oscillation
Vlad Nicolicin Georgescu22/09/2009
LOGO
Remarks…Questions…Propositions…
Vlad Nicolicin Georgescu22/09/2009
LOGO
References
� Mark N. Frolick and Keith Lindsey. Critical factors for data warehouse failure. Business Intelligence Journal, Vol. 8, No. 3, 2003.
� Debanjan Ghosh, Raj Sharman, H. Raghav Rao, and Shambhu Upadhyaya. Self-healing systems — survey and synthesis. Decision Support Systems 42, Vol 42:p. 2164–2185, 2007
� T. Gruber. What is an ontology? Academic Press Pub., 1992� M.C. Huebscher and J.A. McCann. A survey on autonomic computing – degrees, models and applications. ACM
Computing Surveys, Vol. 40, No. 3, 2008� Corporation IBM. An architectural blueprint for autonomic computing. IBMCorporation, 2001� Corporation IBM. Autonomic computing. powering your business for success. International Journal of Computer � Corporation IBM. Autonomic computing. powering your business for success. International Journal of Computer
Science and Network Security, Vol.7 No.10:p. 2–4, 2005� W.H. Inmon. Building the data warehouse, fourth edition. Wiley Publishing, 2005� S.S. Lightstone, G. Lohman, and D. Zilio. Toward autonomic computing with db2 universal database. ACM
SIGMOD Record, Vol. 31, Issue 3, 2002� A. Mateen, B. Raza, and T. Hussain. Autonomic computing in sql server. In 7th IEEE/ACIS International
Conference on Computer and Information Science, 2008� L. Stojanovic, J. Schneider, A. Maedche, S. Libischer, R. Studer, Th. Lumpp, A. Abecker, G. Breiter, and
J. Dinger. The role of ontologies in autonomic computing systems. IBM Systems Journal, Vol. 43, No. 3:p. 598–616, 2004
� V. Markl, G. M. Lohman, and V. Raman. Leo : An autonomic optimizer for db2. IBM Systems Journal, Vol. 42, No. 1, 2003
� A. N. Saharia and Y.M. Babad. Enhancing data warehouse performance through query caching. The DATA BASE Advances in Informatics Systems, Vol 31, No.3, 2000
� Yingxu Wang, Toward Theoretical Foundations of Autonomic Computing, Int’l Journal of Cognitive Informatics and Natural Intelligence, 1(3), 1-16, July-September 2007
Vlad Nicolicin Georgescu22/09/2009