Principal Components and Factor Analysis Principal components.
DESIGN AND IMPLEMENTATION OF DATA ANALYSIS COMPONENTS
Transcript of DESIGN AND IMPLEMENTATION OF DATA ANALYSIS COMPONENTS
DESIGN AND IMPLEMENTATION
OF
DATA ANALYSIS COMPONENTS
A Thesis
Presented to
The Graduate Faculty of The University of Akron
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
Grace C. Shiao
May, 2006
ii
DESIGN AND IMPLEMENTATION
OF
DATA ANALYSIS COMPONENTS
Grace C. Shiao
Thesis
Approved: Accepted: _________________________________ ____________________________________ Advisor Dean of the College Dr. Chien-Chung Chan Dr. Ronald F. Levant
_________________________________ ____________________________________ Committee Member Dean of the Graduate School Dr. Xuan-Hien Dang Dr. George R. Newkome _________________________________ ____________________________________ Committee Member Date Dr. Zhong-Hui Duan _________________________________ Department Chair Dr. Wolfgang Pelz
iii
ABSTRACT
This thesis describes the design and implementation of the data analysis
components. Many features of modern database systems facilitate the decision-making
process. Recently, Online Analytical Processing (OLAP) and data mining are
increasingly being used in a wide range of applications. OLAP allows users to analyze
data from a wide variety of viewpoints. Data mining is the process of selecting,
exploring, and modeling large amounts of data to discover previously unknown patterns
for business advantage. Microsoft® SQL server™ 2000 Analysis Services provides a rich
set of tools to create and to maintain OLAP and data mining objects. In order to use
these tools, users need to fully understand the underlying architectures and the
specialized technological terms, which are not related to the data analysis. The
complexities in the development challenges prevent the data analysts to use these tools
effectively. In this work, we developed several components, which can be used as the
foundation in the analytical applications. Using these components in the software
applications can hide the technical complexities and can provide tools to build the OLAP
and mining model and to access data information from these model systems. Developers
can also reuse these components without coding from scratch. The reusability of these
components enhances the application’s reliability and reduces the development costs and
time.
iv
DEDICATION
Dedicated to my late parents
Mr. and Mrs. K. C. Chang
Who taught me the value of Education
And
Opened my eyes to the Power of Knowledge
v
ACKNOWLEDGEMENTS
First of all, I want to thank my adviser Dr. Chien-Chung Chan for his guidance and
support throughout my graduate research. His feedback helped to strengthen my research
skills and contributed greatly to this thesis. I want to thank my thesis committee
members, Dr. Xuan-Hien Dang and Dr Zhong-Hui Duan, for their guidance and
encouragement. In addition, I want to thank the faculty members of the Department of
Computer Science for building the foundation of my computer knowledge.
I also want to thank my late parents and wish they would have been able to see this
finished manuscript. I appreciate both of them for their love, support and encouragement
in my life. I thank my husband S. Y. for his love and support through these years, and to
my daughter Ming-Hao and my son Ming-Jay for their love, humor, and understanding.
Lastly, I thank the Mighty God for all His grace and blessing in my life.
vi
TABLE OF CONTENTS Page
LIST OF TABLES............................................................................................................. ix
LIST OF FIGURES ............................................................................................................ x
CHAPTER
I. INTRODUCTION.......................................................................................................... 1
1.1 What is Online Analytical Processing (OLAP)? .................................................... 2
1.2 Data Mining ............................................................................................................ 3
1.3 Statement of the Problem........................................................................................ 3
1.4 Motivations and Contributions ............................................................................... 3
1.5 Organization of the Thesis ...................................................................................... 5
II. MICROSOFT SQL SERVER 2000 ANALYSIS SERVICES ..................................... 7
2.1 Overview................................................................................................................. 7
2.2 Architecture............................................................................................................. 7
2.2.1 Server Architecture .......................................................................................... 7
2.2.2 Client Architecture........................................................................................... 9
2.3 OLAP Cube............................................................................................................. 9
2.4 Analysis Manager ................................................................................................. 11
2.4.1 Creating the Basic Cube Model ..................................................................... 12
2.4.2 Browsing a Cube............................................................................................ 23
2.4.3 Building the Data Mining Models ................................................................. 24
vii
III. DESIGN OF DATA ANALYSIS COMPONENTS................................................ 32
3.1 Component-Based Development .......................................................................... 33
3.2 What Is a Component?.......................................................................................... 33
3.3 The cubeBuilder Component ................................................................................ 34
3.4 The cubeBrowser Component............................................................................... 37
3.4.1 Browsing OLAP objects ................................................................................ 38
3.4.1.1 Retrieving Information of Cube Schema ............................................. 39
3.4.1.2 Analytical Querying of Cube Data ...................................................... 41
3.5 The DMBuilder Component ................................................................................. 43
3.6 Conclusions........................................................................................................... 47
IV. CASE STUDIES AND RESULTS............................................................................ 48
4.1 A Case Study of the Heart Disease Datasets ........................................................ 48
4.1.1 Heart Disease Sample File ............................................................................. 49
4.1.2 Software Implementation............................................................................... 49
4.2 Implementation of the cubeBuilder Component................................................... 50
4.2.1 Creating a New Cube ..................................................................................... 51
4.2.2 The Fact Table and Measures Selections....................................................... 52
4.2.3 Adding Dimensions to the Cube .................................................................... 52
4.2.4 Processing and Building the New Cube......................................................... 53
4.2.5 The Results..................................................................................................... 54
4.3 Implementation of the cubeBrowser Component ................................................. 56
viii
4.3.1 Connection to the Analysis Server................................................................. 56
4.3.2 Retrieving the Cardio Cube Data................................................................... 57
4.3.3 Displaying the Cardio Cube Data .................................................................. 59
4.3.4 Drill-down and Drill-up Capacities ............................................................... 60
4.4 Implementation of the DMBuilder component ..................................................... 62
V. DISCUSSIONS AND FUTURE WORKS ................................................................. 67
5.1 Contributions and Evaluations.............................................................................. 67
5.2 Future Works ........................................................................................................ 70
BIBLIOGRAPHY............................................................................................................. 71
APPENDICES .................................................................................................................. 73
APPENDIX A. DATASET USED FOR CASE STUDIES......................................... 74
APPENDIX B. APPLICATION INTERFACE OF OLAP CUBE BUILDER ........... 76
APPENDIX C. SOURCE CODE OF CUBEBUILDER ............................................. 77
APPENDIX D. SOURCE CODE OF CUBEBROWSER ........................................... 84
APPENDIX E. SOURCE CODE OF DMBUILDER.................................................. 88
ix
LIST OF TABLES
Table Page 2.1 Storage options supported by Analysis Services ..................................................... 19
2.2 Summary of cube process options ........................................................................... 22
3.1 Values of the connection string ............................................................................... 41
3.2 Listings of properties required for OLAP mining model objects ............................ 46
x
LIST OF FIGURES
Figure Page 2.1 Analysis Services architecture ................................................................................ 8
2.2 The star and snowflake schemas........................................................................... 10
2.3 Screenshot of the Analysis Manager .................................................................... 11
2.4 Screenshot of the database dialog box of Cube Wizard ....................................... 13
2.5 Screenshot of the Provider for the Data Link dialog box ..................................... 13
2.6 Screenshot of the Connection tab of the Data Link dialog box ............................ 14
2.7 Screenshot of the "Select a fact table" dialog box with a selected fact table........ 15
2.8 Screenshot of the "Defining measures" dialog box. ............................................. 15
2.9 Screenshot of the Dimension Wizard ................................................................... 16
2.10 Screenshot of the "Select Dimension Table" dialog box ...................................... 17
2.11 Screenshot of the "Select levels" dialog box ........................................................ 17
2.12 Screenshot of the "Dimension Finish" dialog box................................................ 18
2.13 Screenshot of the "Storage Design Wizard" for selecting of storage options …...19
2.14 Screenshot of the "Set aggregation options" dialog box....................................... 20
2.15 Screenshot of the "Process" window .................................................................... 21
2.16 Screenshot of the "Process a cube" dialog box..................................................... 22
2.17 Screenshot of the "Cube Browser" and sample results......................................... 23
2.18 Screenshot of "Select source type" dialog box ..................................................... 25
xi
2.19 Screenshot of "Select source cube" window......................................................... 26
2.20 Screenshot of the selecting mining model technique ........................................... 26
2.21 Screenshot of the "Select case" dialog box for specifying a case of analysis....... 27
2.22 Screenshot of the "Select predicted entity" window............................................. 28
2.23 Screenshot of the "Select training data" window.................................................. 29
2.24 Screenshot of the "Saving the data model" of the Mining Model Wizard........... 30
2.25 Screenshot of the "Model execution diagnostics" window................................... 30
2.26 Screenshot of the content details of a created mining model ............................... 31
3.1 Architecture of the component cubeBuilder ......................................................... 35
3.2 Relationship of cubeBrowser to the Analysis Server ........................................... 38
3.3 The basic workflow of browsing OLAP cube data using cubeBrowser............... 40
3.4 The architecture and logic relations of DMBuilder with DSO ............................ 44
3.5 Flow Logic of the DMBuilder Component …...................................................... 45
4.1 Relationship of the heart disease test data ............................................................ 49
4.2 Screenshot of the cardio cube builder interface.................................................... 50
4.3 Screenshot of the “Data Source/Cube” section..................................................... 51
4.4 Screenshot of sample entries for both sections of "Data Source/Cube" and "Specify Fact/Measures"………………………...............……………………….51
4.5 Screenshot of sample entries of “Specify Fact/Measure” section ........................ 52
4.6 Screenshot of the “Add Dimensions to Cube” section ......................................... 53
4.7 Screenshot of sample entries for cube dimension................................................. 53
4.8 Screenshot of the “Process/Build Cube” section .................................................. 54
xii
4.9 Screenshot of the cardio test database object before building the new cardio cube……….…………………………………………………………...….55 4.10 Screenshot of the cardio test database object after building the sample "cube1"………………………………………………………………………...…55 4.11 Screenshot of the web form BrowseCube.aspx .................................................... 56
4.12 Screenshot of listing of available cube ................................................................. 57
4.13 Screenshot of specifying cube entry and measures………….. ............................ 57
4.14 Screenshot of selections of measures and the pre-defined view options.............. 58
4.15 Screenshot of selections of location for Pain-Type option ................................... 59
4.16 Screenshot of selections of pain-type for Patient option ...................................... 59
4.17 Results of cube data for Pain-Type option with test country................................ 59
4.18 Results of cube data for the angina chest pains per patient test city..................... 60
4.19 Screenshot of drill-down to the test center level of Patient option....................... 61
4.20 Screenshot of drill-up to the country’s level of Patient option ............................. 61
4.21 Screenshot of the main interface DMMBuilder .................................................... 62
4.22 Screenshot of the “Server/Database” section........................................................ 63
4.23 Screenshot of Mining model setup ....................................................................... 63
4.24 Screenshot of setting the mining model role ........................................................ 64
4.25 Screenshot of setting properties and algorithm for the mining model.................. 64
4.26 Screenshot of setting the attributes of analytical column ..................................... 65
4.27 Screenshot of the cardio mining model using Microsoft Decision Trees Algorithm…………………………...……………………………………………66
B.1 Screenshot of the OLAP cube builder interface for the power users .................... 76
1
CHAPTER I
INTRODUCTION
Data are not only valuable assets, but also the strategic resources in today’s
competitive environment. Organizations around the world are accumulating vast and
growing amounts of data in different database formats. Business companies need to
understand the effectiveness of their marketing efforts and quickly maintain the large
volumes of data created each day. These challenges require a well-defined database
system that can bring together disparate data with different dimensionality and
granularity. Making the data meaningful is no small task, especially given the different
aspects of data analysis. Companies need quality analysis of operational information to
understand their business strengths and weaknesses. Business analysis focuses on the
effective use of data and information to drive positive business actions. With good and
accurate data analysis, business decision makers can make well-informed decisions for
the future of their organizations. The Business Intelligence (BI) tools allow companies
to automate its functions of analysis, strategy, and forecasting to make better business
decisions. Online Analytical Processing (OLAP) and Data mining model are the key
features of the BI tools that help companies extract data from an operational system, to
summarize data into working totals, to find the hidden patterns from data for future
analysis and prediction, and to intuitively present these results to the end users [1, 2].
2
1.1 What is Online Analytical Processing (OLAP)?
The standard definition of OLAP provided by the OLAP Council [2] is:
“A category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user”.
The functionality of OLAP, according to the definition of the OLAP Council, lets
the users complete the following tasks [2]:
• Calculations and modeling applied across dimensions, through hierarchies and/or across members
• Trend analysis over sequential time periods • Slicing subsets for on-screen viewing • Drill-down to deeper levels of consolidation • Reach-through to underlying detail data • Rotation to new dimensional comparisons in the viewing area.
Therefore, OLAP performs multidimensional analysis of enterprise data and
provides the capabilities for complex calculations, trend analysis and very sophisticated
data modeling. In addition, OLAP enables end-users to perform ad hoc analysis of data
in multiple dimensions, thereby providing the insight and understanding they need for
better decision making.
An OLAP structure created from the operational data is called an OLAP cube [1, 2].
OLAP cubes are data processing units consisting of the fact and the dimensions from the
database. They provide multidimensional views and analytical querying capacities.
Therefore, OLAP technology can provide fast answers for complex querying on
operational data for decision-making management.
3
1.2 Data Mining
Data Mining is defined as the automated extraction of hidden predictive information
from database systems [3, 4]. Generally, it is the process of analyzing data from different
perspectives and discovering patterns and regularities in sets of data. Specifically, the
hidden patterns and the correlations discovered in the data can provide strategic business
advantages for decision-making in organizations.
1.3 Statement of the Problem
Microsoft® Analysis Services, shipped with SQL server™ 2000, is the OLAP
database engine and is able to build multidimensional cubes [1, 5]. It also provides the
application programs to browse the cube data and tools to support data mining algorithms
for discovering trends in data and predicting future results. The implementation of
Analysis Services is heavily wizard oriented in building and managing data cube and data
mining model. Although many features are also available through the predefined editors,
the wizard-intensive process still requires users to fully understand the cube structure and
associated objects in the definition process. The complexity of cube development makes
it difficult for end-users with little technical experience to gain access to these analysis
tools.
1.4 Motivations and Contributions
In reality, most decision-makers within an enterprise want to be able to use the
insights gained from their data for more tactical decision-making purposes. However,
they are not generally interested in spending time in building cube or mining model to
4
answer their business issues. Analysis Services provides intensive wizards and editors in
the development of OLAP cubes and the mining models. It has been designed to be
flexible for all levels of users, but users have difficulty learning to use these features
effectively and creating useful models for decision making. The best solution is to design
a specific front-end interface to meet the user’s requirements with the ability to cross-
analyze data even through a single click and to mask the underlying complexities of the
applications from the users.
Analysis applications contain sensitive and confidential information that should be
protected against unauthorized access and only are available to appropriate decision
makers. Analysis Services automatically creates an OLAP Administrators group in the
operating system. A member of the OLAP Administrators group has complete access to
the analysis objects. A user that is not a member of the OLAP Administrators group has
read- or write-access to the extent permitted based on dimension-level or cell-level
security but performs no administrative tasks. However, the active user must be a
member of the OLAP administrators group to use Analysis Manager. Therefore, the non-
Administrator user can not exploit the cube information through Analysis Manager. One
of the scope of this thesis is to construct a client-application interface by using the Multi-
dimensional Expressions (MDX) and ActiveX® Data Objects/Multi-dimensional
(ADO/MD) to query OLAP data to solve this conflict issue [1, 6].
The main contributions of this thesis are as follows:
• Development of a component, cubeBuilder, for software developers to design
application interface which can build the OLAP cube model to meet user’s
analytical requirements
5
• Development of a component, DMBuilder, for developers to design a specific
user-interface to create data mining model for users to uncover previously
unknown patterns
• Development of a component, cubeBrowser, for developers to design a client
interface to browse the cube data for non-Administrators group users.
In addition, these data analysis components not only help the software developers to
design the specific application without coding from scratch, but also hide the
complexities of development challenges from the less technically-oriented users.
1.5 Organization of the Thesis
This thesis covers the work on the development of the data analysis components,
cubeBuilder, cubeBrowser and DMBuilder for OLAP and mining model solutions. This
thesis is organized as follows:
Chapter II provides an overview of Microsoft SQL Server Analysis Services
including its fundamental operations and architectures in the functionality of OLAP and
Data Mining model. The step-by-step processes used to create an OLAP cube, to browse
the existing cube data and to create a data mining model with Analysis Manager are also
illustrated and described in Chapter II.
Chapter III focuses on the development of the design and the structures of the
analysis components for OLAP and mining model solutions.
Chapter IV describes the implementations of these analysis components in the
desktop and web-based applications interface for OLAP cube and mining model system.
6
It also describes a case study with the heart disease dataset to demonstrate the application
of the analysis components.
Chapter V presents a summary of the work that has been done in this thesis. It also
compares the functionalities between the analysis components and Analysis Manager in
the aspects of building of OLAP cube and mining model. The directions of future work
and the conclusion of this thesis are also presented in Chapter V.
7
CHAPTER II
MICROSOFT SQL SERVER 2000 ANALYSIS SERVICES
2.1. Overview
Microsoft® SQL server™ 2000 Analysis Services provides fully-functional OLAP
environment, which includes both OLAP and data-mining functionality [5]. It is a suite
of decision support engines and tools. It can also function as an intermediate layer that
converts relational warehouse data into a form, also called a cube, which makes it fast
and flexible for creating an analytical report.
2.2. Architecture
The architecture of Analysis Services can be divided into two portions: the server
and the client, as shown in Figure 2.1. The server portion, including the engines,
provides the functionality and power, while the client portion has interfaces for front-end
applications [5].
2.2.1. Server Architecture
The primary component of Analysis Services is the Analysis Server. The Analysis
Server operates as a Microsoft Window NT or Windows 2000 service and is
Analysis Manager
Decision Support Objects (DSO)
Data sources
Cubes
Analysis ServerMining models
Client ApplicationClient Application
ADO MD
PivotTable Service
Client
Server
Microsoft Management Console (MMC)
Figure 2.1 Analysis Services architecture
specifically designed to create and maintain multidimensional data structures [5, 6]. It
also provides multi-dimensional data values to client queries and manages connections to
the specified data sources and local access security. Figure 2.1 illustrates the Analysis
Manager, a snap-in console in Analysis Services, which communicates with the server
8
9
through the Decision Support Objects (DSO) component tool. The DSO is a set of
programming instructions for applications to work with the Analysis Services [7].
2.2.2. Client Architecture
The client side of the Analysis Services is primarily used to provide an accessing
interface, the PivotTable Service, between the server and the custom applications, as
shown in Figure 2.1 [6, 7]. PivotTable Service communicates with the Analysis server
and provides interfaces for client applications to access OLAP data and data mining data
on the server [6, 7]. It provides the OLE DB interface for users to access data managed
by Analysis Services, custom programs or client tools.
2.3 OLAP Cube
The primary form of data representation within the Analysis Services is the OLAP
cube [5-8]. A cube is a logical construct. It is a multidimensional representation of both
detailed and summary data. Cubes are designed according to the client’s analytical
requirements. Each cube represents data values of different business entities. Each side
of the cube presents a different aspect of the data.
Cubes in the Analysis Services are built using one of two types of database schemas:
the star schema and the snowflake schema [9]. Both schemas consist of a fact table and
dimension tables. The Analysis Services aggregates data from these tables to build
cubes. As shown in Figure 2.2, the star schema consists of a fact table and several
dimension tables. Each dimension table corresponds to a column in the fact table. The
data in the dimension tables are used to form the analytical queries in the fact table.
However, in the snowflake schema, several dimension tables are joined before being
linked to the fact table.
Star Schema
10
Snowflake Schema
Dimension table 1
Dimension table 2
Fact Table
Dimension Table
Fact Table
A layer of Dimension tables
Dimension Table
Dimension Table
Dimension table 3
Figure 2.2 The star and snowflake schemas
2.4 Analysis Manager
The Analysis Manager is a tool for the Analysis Server administration in Microsoft
SQL Server 2000 Analysis Services [5-9]. It is a snap-in application within the
Microsoft Management Console (MMC), which is the common framework for hosting
administrative tools. Figure 2.3 illustrates the screenshot of the hierarchical, tree-view
representation of the server and all its components in the left pane of the console.
Figure 2.3 Screenshot of the Analysis Manager
11
12
The major functional features for the Analysis Manager are summarized as follows:
• Administering Analysis server
• Creating database and specifying data sources
• Creating and processing cubes
• Creating dimensions for the specified database
• Specifying storage options and optimizing performance
• Authorizing and managing cube security
• Browsing cube data, shared dimensions and other objects
• Creating data mining model from relational and multidimensional data
• Viewing the Mining Model.
2.4.1 Creating the Basic Cube Model
Analysis Services provides wizards and editors within the Analysis Manager to let
the user create the cube easily [6, 8]. The step-by-step instructions for building a basic
cube model in the Analysis Manager using the Cube Wizard are summarized as follows:
1. Creating an Analysis Server’s database
A database acts like a folder that holds cubes, data sources, shared dimensions,
mining model and database roles as illustrated in Figure 2.3. To create a new database on
a server, after launching onto the Analysis Manager, right-click the server name and then
select new database from the pop-up menu [1, 2]. The Database dialog box appears for
user to enter a new database name for the new cube model, as shown in Figure 2.4.
Figure 2.4 Screenshot of the database dialog box of Cube Wizard
2. Specifying the data source
After creating a new database, a data source needs to be specified for the cube. The
data source contains the information of the data used in the cube [6, 7]. The purpose of
adding a data source is to let Analysis server establish connections to the source data.
The Data Link dialog box, as illustrated in Figure 2.5, can be opened by right-clicking the
Data Source folder and selecting New Data source from the pop-up menu.
Figure 2.5 Screenshot of the Provider for the Data Link dialog box
13
In the Data Link dialog box shown in Figure 2.6, the user can specify a provider, the
server name, login information and a database name to connect to the Analysis server.
Figure 2.6 Screenshot of the Connection tab of the Data Link dialog box
3. Selecting the fact table and the measures
The Cube Wizard and the Cube Editor are the tools to be used in the Analysis
Manager to create the OLAP cube [8]. A fact table contains the measure fields, which
consist of the numeric values for the analysis, and the key fields that are used to join to
dimension tables. The fact table should not contain any descriptive information or any
labels in addition to the measures and the index fields. Each cube must be based on only
one fact table. As shown in Figure 2.7, the panel displays all the tables in the specified
data source. After selecting the fact table, click the “Next” button, the Wizard displays
all of the available numeric data in the selected table, as shown in Figure 2.8
14
Figure 2.7 Screenshot of the “Select a fact table” dialog box with a selected fact table
After specifying the measures from the list, click the “Next” button, the Cube
Wizard asks the user to select dimensions or to create dimensions.
Figure 2.8 Screenshot of the “Defining measures” dialog box
4. Adding dimensions and levels to the cube
Dimensions are the categories for the user to analyze and summarize the data [6-8].
In other words, dimensions are the organized hierarchies that describe the data functions
in the fact table. There are two types of dimensions to be created for use in the cube. A
dimension created for use in an individual cube is called a private dimension. A shared
dimension is the one that multiple cubes can use [8]. A cube must contain at least one
dimension, and the dimension must exist in the database object where a cube will be
created.
15
In the Analysis Manager, a new dimension can be created either by the Cube Editor
or the Cube Wizard. If the editors are used to build the cube, then a dimension has to be
created before adding to a cube. However, if the Cube Wizard is used to create a cube,
then it will launch the Dimension Wizard to handle the task as part of the processing in
creating a cube [8]. The step-by-step processes of creating a new shared dimension with
the Dimension Wizard are summarized as follows:
a. Selecting the type of dimension schema in the screen of the “Choose how
you want to create the dimension”, as shown in Figure 2.9.
Figure 2.9 Screenshot of the Dimension Wizard
b. Specifying the dimension table from the available table list in the screen of
the “Select the dimension table”, as shown in Figure 2.10.
c. Selecting the level on the screen of the “Select the levels for your
dimension”, as shown in Figure 2.11.
16
Figure 2.10 Screenshot of the “Select Dimension table” dialog box
Figure 2.11 Screenshot of the “Select levels” dialog box
d. Specifying the new dimension name and previewing the dimension data in
the “Finish” dialog box of the Dimension Wizard, as illustrated in Figure
2.12.
17
Figure 2.12 Screenshot of the “Dimension Finish“ dialog box
5. Setting the storage options and setting up the cube aggregations
The storage mode determines how the data is organized in the server [8, 9]. It
affects the requirements of disk-storage space and the data-retrieval performance. There
are three types of storage options supported by Analysis Services: Multi-dimensional
OLAP (MOLAP), the relational OLAP, and the Hybrid OLAP (HOLAP). The
descriptions and storage locations of each mode are summarized in Table 2.1. The
Storage Design Wizard is used to select the option for the cube in the Analysis Manager,
as shown in Figure 2.13
18
Table 2.1 Storage options supported by Analysis Services
Storage Locations Storage Mode
Description Fact data Aggregated
Values ROLAP Relational OLAP
1. Slow processing, 2. Slow query response and 3. Huge storage requirements 4. Suitable for large databases or
legacy data.
Relational database Server
Relational Database Server
MOLAP Multidimensional OLAP 1. Require data duplication 2. Pre-summarizes the data to improve
performance in querying and displaying the data
3. High performances 4. Good for small to medium size data
sets.
Cube Cube
HOLAP Hybrid OLAP A combination of ROLAP and MOLAP 1. Does not create a copy of data 2. Provides connectivity to a large
number of relational databases. 3. Good for limited storage space but
faster query responses are needed.
Relational database Server
Cube
Figure 2.13 Screenshot of the “Storage Design Wizard” for selecting of storage options 19
After deciding the storage option, the next step is to specify the aggregation options
in the Set Aggregation Options dialog, as illustrated in Figure 2.14 [8, 9]. This option
allows the user to set the level of aggregation for the cube to boost the performance of
queries.
Aggregations are pre-calculated summaries of data that improve query response
time. The larger the level of cube’s aggregation, the faster the queries will be executed,
but a greater amount of disk space will be needed and more time will be required to
process the cube.
In the Analysis Services, there are three aggregation options for selection:
• Estimated storage reaches: specifying the maximum storage size in either megabytes (MB) or gigabytes (GB)
• Performance gain reaches: specifying the percentage amount of performance
gain for the queries • Until I click stop: selecting the manual control of the balance
.
Figure 2.14 Screenshot of the “Set aggregation options” dialog box
20
6. Processing the cube
Processing the cube is required before attempting to browse the cube data, especially
after designing its storage options and aggregations, because the aggregations are needed
to be calculated for the cube before the user to view the cube data [8, 9].
The major activities involved in the cube processing are described in a
“Process” window, as shown in Figure 2.15, and summarized as follows:
a. Reading the dimension tables to populate the levels from the actual data
b. Reading the fact table
c. Calculating specified aggregations
d. Storing the results in the cube.
Figure 2.15 Screenshot of the “Process” window
21
In the Analysis Manager, there are three options to be used to process a cube
depending on the different circumstances of the data structures. These options,
summarized in Table 2.2, can be selected in the “Process a Cube” dialog box, as shown in
Figure 2.16 [9].
Table 2.2 Summary of cube process options
Options of Process Circumstances
Full process Modifying the structure of the cube
Incremental update Adding new data to the cube
Refresh data Clear out and replacing a cube’s source data
Figure 2.16 Screenshot of the “Process a cube” dialog box
22
2.4.2 Browsing a Cube
In the Analysis Manager, using Cube Wizard to view the cube data is one of viewing
methods [5- 9]. There are two ways to open the Cube Browser to load cube data into it:
a. Right-click the cube name in the Analysis Manager Tree pane and selecting
“Browse Data” from the pop-up menu
b. Click the “Browse Sample Data” in the last step of the Cube Wizard
The cube Browser not only let users to view the multidimensional data in a flattened
two-dimensional grid format, as shown in Figure 2.17, but also makes it possible to drill
up or drill down different dimensions of data. However, the Cube Browser can not be
used to view unprocessed cube data [6].
Figure 2.17 Screenshot of the “Cube Browser” and sample results
23
24
2.4.3 Building the Data Mining Models
Data Mining is the process of extracting knowledge hidden from large volumes of
data [10, 11]. It involves uncovering patterns, trends, and relationships from historical
data and predicting outcomes of future situations. The primary mechanism for data
mining is the data mining model, an abstract object that stores data mining information in
a series of schema rowsets. The mining model serves as the blueprint for how data
should be analyzed or processed. Once the model is processed, information associated
with the mining model not only represents what was learned from the data, but also
allows users to discover the business trends for future decision making [11]. Two data
mining algorithms are built into Microsoft SQL server 2000 Analysis Services: Microsoft
Decision Trees and Microsoft Clustering [12, 13]
A. Decision Trees Algorithm:
Microsoft Decision Trees algorithm uses the recursive partitioning to divide the data
in a tree structure, and continually performs this search for predictive factors until there is
no more data to continue with [10-13]. A node in the tree structure represents each
predictive factor used to classify the data. This method focuses on providing information
paths for rules and patterns within data, and is useful in predicting the exact outcomes for
the future problems [12, 13].
B. Microsoft Clustering Algorithm:
Microsoft Clustering algorithm is based on the Expectation and Maximization (EM)
algorithm [11, 12]. It uses iterative refinement techniques to group records into
neighborhoods (clusters) that exhibit similar, predictable characteristics [13]. These are
useful for uncovering a relationship among data items in a large database with hundreds
of evaluated attributes.
The following steps describe the process of creating a mining model using the
mining model wizard in the Analysis Manager [13]:
1. Specifying the type of data:
In the window of “select data source type”, as shown in Figure 2.18, users
can select either relational data type or OLAP data to build the target mining
model.
Figure 2.18 Screenshot of the “Select source type” dialog box
2. Selecting the source cube:
In the “select source cube” window, as shown in Figure 2.19, users need to
highlight the target cube from the available cube lists [11, 13].
25
Figure 2.19 Screenshot of “Select source cube” window
3. Specifying the data mining method;
In the “Select data mining technique” window, as shown in Figure 2.2,
users can select one of two mining algorithms provided with the Analysis
Services: Microsoft Decision Trees and Microsoft Clustering [9, 10].
Figure 2.20 Screenshot of the selecting mining model technique
26
4. Identifying the case base or unit of analysis
In the “Select case” window, as shown in Figure 2.21, users need to
specify the case base of the analysis for the modeling task. A case is the basic
unit of analysis for mining task.
Figure 2.21 Screenshot of the “Select case” dialog box for specifying a case of analysis
5. Selecting the predicted entity:
In this step users must provide information for prediction used in the
mining model [12], as shown in Figure 2.22. The predicted entity can be
chosen as one of the following items:
A measure of the source table A member property of the case dimension and level Members of another dimension in the cube.
This feature provides flexibility in the process of predictive analysis using
OLAP data.
27
Figure 2.22 Screenshot of “Select predicted entity” window
6. Selecting a training data:
The training data is used to process OLAP data mining model and to
define the column structure of a data mining for the case set. As shown in
Figure 2.23, the users should select at least one additional data item from the
data training data [12, 13].
28
Figure 2.23 Screenshot of the “Select training data” window
7. Naming the model and process the model:
After user enters a model name and selects the “Save and process now”
check box, as shown in Figure 2.24, the wizard will process the model and
train the model with data based on the specified algorithm. Figure 2.25
displays the process of model execution [13]. When the process is complete, a
message of “Processing completed successfully” appears in the bottom of
dialog box.
29
Figure 2.24 Screenshot of the “Saving the data model” of the Mining Model Wizard
Figure 2.25 Screenshot of the “Model execution diagnostics” window
30
After clicking the “close” button, the OLAP Mining Model Editor will be launched
and system displays the content details of the proposed mining model, as shown in Figure
2.26.
Figure 2.26 Screenshot of the content details of a created mining model
31
32
CHAPTER III
DESIGN OF DATA ANALYSIS COMPONENTS
It has been known that Microsoft SQL Server 2000 provides the OLAP functionality
to build and manage multidimensional models of data and applications for use in large
enterprise systems [1, 2]. There are three programmatic interfaces in the Analysis
Services for user’s applications: ActiveX Data Objects Multidimensional (ADO MD),
OLE DB for Online Analytical Processing (OLE DB for OLAP) and Decision Support
Objects (DSO) [10 -14].
ADO MD is an extension to the ADO programming interface that can be used to
access multidimensional schema, to query cubes, and to retrieve the results [10]. It uses
an underlying OLE DB provider, which is Microsoft's strategic low-level application
program interface (API) for access to different data sources [11]. OLE DB for Online
Analytical Processing (OLE DB for OLAP) is a set of objects and interfaces that extend
the ability of OLE DB to provide access to multidimensional data stores [12]. DSO is
the administrative programming interface to create and alter cubes, dimensions, and
calculated members. It also can use other functions that are able to perform interactively
through the Analysis Manager application [13, 14].
33
Using these programmatic tools can provide more controls over those OLAP and
data mining operations. Developer can hide the complexities in the process of creating
cubes and mining model from a less technical user. ADOMD, the data abstraction tool,
allows developers to create either a local or remote front-end interface for exploration
metadata, databases and analysis functions. Especially it provides an analytical tool for
end-users who do not have the OLAP administrator privileges to access cube data with
Analysis Manager.
This chapter will introduce the data analysis components developed for building and
viewing the cube and data mining model in Microsoft SQL server 2000 environment.
3.1 Component-Based Development
Component-based development (CBD) is a software application methodology that
allows developers to reuse the existing components. The idea of reuse and the flexibility
are the main characteristics of CBD [15, 16]. Developers no longer need to construct
software applications from scratch; they only need to reuse existing re-built components
to meet application requirements. This feature of code reuse reduces the production costs
and enhances the maintainability of the software system. Flexibility is another useful
trait which allows for components to be easily replaced, modified and maintained. Using
CBD, the process of software design is made more effective and flexible.
3.2 What Is a Component?
A software component involves three essential parts: a service interface, an
implementation, and deployment [17]. A service interface specifies the component. An
34
implementation implements the interface to make the component work. The deployment
is the executable file to make the component run to meet the requirements. Kirby
McInnis [17] has given a single comprehensive definition of a component:
“A component is a language-neutral, independently implemented package of software services, delivered in an encapsulated and replaceable container, accessed via one or more published interfaces. A component is not platform contained or application bound.”
The reuse of existing components reduces the development and maintenance costs.
It also increases the productivity since there is no need to build new applications from
scratch.
3.3 The cubeBuilder Component
The component, cubeBuilder has been developed and is built on top of the DSO to
allow the developers to create the OLAP cube programmatically without using the
Analysis Manager [17, 18]. Figure 3.1 not only depicts the component’s architecture
and relation to the server, but also shows the workflow to create a data cube with the
component.
The sequence of operations involved in building an OLAP cube is described as
follows:
1. Connecting to an Analysis server:
The first step in the process of building an OLAP cube is to connect to an
Analysis server. A server object clsServer of the DSO object model is the main entry for
accessing the Analysis server.
Custom Application
cubeBuilder component
Connect to server
Create Database
Process Cube
Create Cube
Add DataSource
Create Dimension
Relational Database
Cube
Decision Support Objects (DSO)
Analysis Server
Figure 3.1 Architecture of the component cubeBuilder
The cubeBuilder component provides a method called ConnectToServer to use the
server object of the DSO to connect a computer where the Analysis server service is
running [18].
2. Creating a database object to contain dimensions and cubes:
After connection to the Analysis server, the database object is the first object needs
to be created in the process of building of the OLAP cube. A database object is a
container for the related cubes and other objects. It consists of data sources, shared
dimensions and database roles. It is also used to store the cubes, data mining models and
35
36
other related objects. The cubeBuilder component can either create a new database
object or open the existing database system in the server.
3. Adding a data source that contains the data.
After setting up the database object, a link to a data source into the database has to
be added before constructing an OLAP cube. The data source object of DSO specifies a
source of a data file to be used as the source database for the cube. The cubeBuilder
component is able to handle the following tasks through the data source object of DSO:
Setting up the connection to data source
Finding the specified data source
Adding new data source into the specified database object.
Setting the link to the specified data source
4. Creating dimensions and their levels
A dimension is a structural attribute of an OLAP cube and is an organized hierarchy
of categories that describe data in the fact table of the data warehouse system. These
categories provide users the base of data analysis. The cubeBuilder component uses the
dimension object of DSO to create a shared dimension in the user-specified database
object. The dimension object provides a specific implementation of the DSO dimension
interface. Through the Dimension interface, the component cubeBuilder can provide the
following tasks:
Creating a new dimension object in the database object
Creating a new level on the dimension, and sets the associated level’s properties.
37
5. Create a cube and specify dimensions and measures
The following steps illustrate the way to add a cube to the user-specified database
object by using the cubeBuilder component;
Using the method to add the user-specified cube name into the collections of the database object
Specifies the data source of the cube
Specifying the fact table of the cube
Setting up the SourceTable and EstimatedRows properties of the cube through the method of AddFactTblToCube
Specify the measures from the fact table for analysis
Adding the database’s dimension to the cube’s collections with the AddSharedDimToCube method.
6. Process a cube to load its structure and data.
After defining a cube and its measures to the database objects, the cube can be
processed. The cube can be fully processed by using the ProcessCube method of the
cubeBuilder component to load the cube’s structure and data.
3.4 The cubeBrowser Component
The component cubeBrowser can be used in the software applications to access data
information from the multidimensional data sources in the Microsoft Analysis Services.
It is a layer on top of ADO MD that can be used to write OLAP applications to retrieve
data information from the OLAP cube. Figure 3.2 shows how the component
cubeBrowser fits into the Analysis Services architecture. The PivotTable Services is the
client side of Microsoft Analysis Services and implements the OLE DB for OLAP, which
is a standard interface for returning OLAP data. OLE DB for OLAP is a high-
performance COM that doesn’t support OLE automation. ADO MD is the Microsoft’s
extension to ADO for accessing and manipulating data cubes [16, 17].
Analysis Server
Application
ADOMD cubeBrowser
PivotTable Service
OLEDB for OLAP
Cube
OLAP Engine
User
Figure 3.2 Relationship of cubeBrowser to the Analysis Server
3.4.1 Browsing OLAP objects
The component cubeBrowser can be used in the OLAP applications to allow the end
users to browse the OLAP cubes. It also can view the properties of the cubes and
underlying structures and to execute the analytical queries for their business questions.
The data information of cube schema is one of the two options to access data from OLAP
cubes. It includes the concept of an OLAP database containing all the cubes and their
underlying structures, while the other option consists of the execution of the analytical
queries and displaying the queried results for business analysts [15, 16]. 38
39
The basic workflow of the usage of the component cubeBrowser in browsing the
cube objects is shown in Figure 3.3 and is summarized as follows:
A. Retrieving the Information of cube schema
a. Setting up connection string and connect to server.
b. Displaying the results.
B. Execution of a analytical query
a. Setting up the direct connection to the Analysis Server.
b. Displaying the hierarchical structures of an OLAP database.
c. Constructing the MDX queries and displaying the retrieval results.
d. Illustrating the definition of a particular OLAP cube and its underlying
dimensions.
3.4.1.1 Retrieving information of cube schema
The information of the cube schema includes the concept of an OLAP database
containing all the cubes and their underlying structures. In order to get information of the
cube schema, the first step is to set up a connection to the Analysis Services engine. The
connection string consists of values for the provider, data source, initial catalog, and other
user’s and system’s information.
Table 3.1 lists the primary values needed to construct a connection string. The
provider is the name of the OLE DB for OLAP provider which is used to connect to the
OLAP engine. In the Analysis Services, the value is MSOLAP2, which is the name of
the Microsoft OLE DB Provider for OLAP Services 8.0 [19]. The data source is the host-
name of the server. The initial catalog is the particular database object in the specified
server.
Details of cell values
OLAP Application
Analysis Server
Database object
Cube
ADOMD
Catalog object
Cellset object
Process query & display result
Create MDX query
Setup connection
Listing the definitions
Setup connection
cubeBrowser
Hierarchical structure of a database object
Figure 3.3. The basic workflow of browsing OLAP cube data using cubeBrowser
40
41
Table 3.1 Values of the connection string
Parameter Value Provider Name of the OLE DB for OLAP used to connect to the
OLAP engine. In the Analysis Services, this value is MOSOLAP2.
Data Source The location of the server, expressed as a hostname.
Initial Catalog The name of OLAP database objects to be connected.
User ID Username to use for connection to the server.
Password Password used for user to connect to the server.
After construction of the connection string, the component cubeBrowser provides a
method to connect to an ADO MD Catalog object to the server and the database object
specified in the connection string. The detailed hierarchical structures of a cube can be
viewed by using the method ViewCubeStrct of the component cubeBrowser after
specification of a particular cube. This method uses the CubeDef object of ADO MD to
display the definition of a particular OLAP cube and its underlying dimensions [15, 16].
In summary, by using the component cubeBrowser in conjunction with ADO MD
object in the OLAP application, the end user can retrieve the complete information about
the structure of any cube stored in the Analysis Services [20, 21].
3.4.1.2 Analytical querying of cube data
In addition to drill down and to display the object schema with a particular OLAP
database, the component cubeBrowser provides features to support querying of an
Analysis Services cube with the MDX. The result from a MDX query of a cube is
returned in a structure called a Cellset [16].
42
The querying language to manipulating data through ADMD is called
Multidimensional Expressions (MDX). The MDX syntax supports the definition and
manipulation of multidimensional objects and the data stored in the cubes of the Analysis
Server [22]. In addition to its query capabilities, the MDX can be used to define the cube
structures and to change the data in some cases. It also can be used in conjunction with
ADOMD to build client applications to access OLAP data for business analysts [16, 23].
The following steps are required to process an MDX query:
a. Create a new Cellset object
A Cellset object is used to store the results of a multidimensional MDX query in the
ADO MD object model. The Cellset object is created based on a MDX query for the
user’s analysis.
b. Establishing the connection:
To make a connection to the Analysis Services engine, it is necessary to specify the
values of the provider, data source and initial catalog of the connection string of a Cellset
object.
c. Construction of an MDX query:
The general syntax for an MDX statement is shown as follows:
SELECT <member selection> on axis1, <member selection> on axis2 --- FROM <cube name> WHERE <slicer>
The three clauses shown above describe the nature and scope of an MDX query. The
axis clause specifies the data information wanted and the format to display the results. A
FROM clause defines the specific cube which contains the required data. A WHERE
clause is used to specify the conditional selection for the data slicing. The component
43
cubeBrowser provides a function to set up the MDX query based on the user’s
specification and the analytical questions.
d. Perform the query and populate the results:
After the construction the required query, the component cubeBrowser provides a
method to open a specific Cellset object. After the Cellset object is open, the resulted
data can be along the positions and displaying the data in its cell of the Cellset.
3.5 The DMBuilder component
In addition to the programmatic access to the OLAP cube resources, the Decision
Support Objects (DSO) can also be used to create and maintain the data mining objects
programmatically [10, 11]. The component DMBuilder, developed in this work, acts on
top of DSO to allow the software developers to accommodate direct programmatic access
to the data mining functionality within the Analysis Services. The analysis component
DMBuilder can provide an object model to program a range of a varied object set to work
with, including servers, databases, mining structures and algorithms, as well as OLAP
cube objects. It also allows developers to embed data mining functionality into
applications to meet user’s mining requirements. The architecture and the logic relations
of the component DMBuilder to DSO are depicted in Figure 3.4.
User
Cube
DSO
Analysis Server
Data Mining Model
DM Solutions
DMBuilder
Figure 3.4.The architecture and logic relations of DMBuilder with DSO The following steps describe the basic operations involved in the process of creating
a data mining model programmatically using the developed DMBuilder component in
conjunction with DSO [17] (Figure 3.5):
1. Connecting to the target Analysis server:
The component DMBuilder can connect to the target Analysis server through the
function of ConnectToServer with the user’s specified server name.
2. Selecting a target database object which contains the OLAP cube data sources:
After connecting to the target server, the database object which consists of the target
OLAP cube data sources can be selected and set up from the target server by using the
component’s function of SelectDbObj.
44
Connect Server/Database
object
Add Mining Roles
Setup mining properties 1. Data Source Name 2. Source cube Name 3. Case Dimension
Process Data Mining Model Set Up Data
Mining Algorithm
Analysis Column Entries
Add Mining Name
Data Mining Model
Analysis Server
DSO
DMBuilder
Figure 3.5 Flow Logic of the DMBuilder Component 3. Creating a new data mining model:
A new data mining model object can be created by using the AddNewMiningModel
of the DMBuilder component with the user-specified mining model name and class type.
When the OLAP mining model is created, the class type is set to be so called sbclsOLAP.
4. Creating and assigning a mining model roles:
Using the function “AddMiningModel” of DMBuilder component, the user-specified
mining model role can be created and assigned to the new OLAP mining model object.
5. Setting the needed properties for the new mining model:
There are several needed properties to be set up for the OLAP mining model objects.
45
46
Table 3.2 summarizes these needed properties required for the OLAP mining model
object. These properties can be set up by using the function “SetModelProperty” of the
component DMBuilder.
Table 3.2 Listings of properties required for OLAP mining model objects [17]
Property Descriptions Case dimension Defines the case dimension. Case Level Defines the case level of the case dimension. It identifies
the lowest level in the dimension. Mining model algorithm Defines the data mining algorithm providers. In analysis
Services, there are two types: Decision Trees and Microsoft Clustering.
Source cube Defines the OLAP cube used for training data. Subclass type Defines the column option type. The value for OLAP
mining model object is set to sbclsOLAP.
6. Creating a new mining model column and setting its properties:
Data mining column has several property types which are needed for the new
mining model. The most important properties are data type, content type and usage. In
data mining with the Analysis Services of SQL server 2000, there are four types of
column usage: input, predict, disabled and key. The component DMBuilder provides the
function called “EnableColumnProperty” to process this task and to send the column
metadata to the server.
7. Training and processing the mining model object:
All the necessary properties and definitions required for the target mining model are
set up and complete. Before the mining model can be used for analysis, this model needs
to be trained to find useful information or patterns in the data. This processing step is
executed in the server, and the time needed in the processing depends on the amount of
47
data involved and on the complexity of the analytical category. Before training and
processing, the model has only the defined metadata, however, after processing, the
hidden patterns are stored in the model. The function ProcessMiningModel of the
component DMBuilder will handle this task using the ProcessFull option [3, 11].
3.6 Conclusions
These analysis components cubeBuilder, cubeBrowser and DMBuilder provide a set
of functions for creating and managing OLAP solutions and data mining model in the
Analysis server. Fully compatible with the .NET environment, these components let
developers easily embed code into user-specific applications to build and to process the
target OLAP solutions and mining model systems. Using these data analysis components,
the SQL Server 2000 business intelligence is able to be integrated directly into user-
friendly applications, and the OLAP solutions can be created and managed
programmatically to meet user’s needs and specifications for their daily business analysis
and decision-makings.
48
CHAPTER IV
CASE STUDIES AND RESULTS
These data analysis components, developed in this thesis, are applied to a case study
of the heart disease database in the Microsoft SQL server 2000 environment . The
purpose of this case study is to provide the user’s application interface wrapped with the
analysis components in building the OLAP cube, in browsing the cube data and in
creating the mining model with the cardio test dataset. The results and implementations
of the case study are used to illustrate the advantage of using these data analysis
components for the OLAP solutions and the mining models. Each of the following
sections describes the practical aspects of the developed analysis components.
4.1 A Case Study of the Heart Disease Datasets
The heart disease datasets are collected from four different locations and are the
results of the heart disease diagnosis tests [24]. Each database has the same instance
format using only thirteen of a possible seventy-five attributes for analysis. Appendix A
provides the detailed descriptions of the heart disease datasets.
4.1.1 Heart Disease Sample File
The heart disease datasets are downloaded and are saved as the Microsoft Access
2003 database. These database samples consist of four tables. The relationship of these
sample tables is depicted in Figure 4.1. This constructed schema resembles the structure
of the star schema.
Figure 4.1 Relationship of the heart disease test data
4.1.2 Software Implementation
These data analysis components are implemented in the Microsoft SQL server 2000
environment using Visual Basic.Net (VB.Net) as the major programming language for
both the OLAP solutions and the mining model objects. The window front-end
applications implemented with the cubeBuilder and DMBuilder components are the
desktop stand-alone software applications and require the Analysis Services to reside on 49
the same system. The advantage of this approach is that the runtime’s access security
will not be an issue for connection to the Analysis server. In addition, an ASP.Net web-
based application implemented with the cubeBrowser component is also developed using
VB.Net as the major source code for end-users to browse the OLAP cube data.
4.2 Implementation of the cubeBuilder Component
The interface of cardio cube builder, cardioCube, as shown in Figure 4.2,
implements the component cubeBuilder in order to demonstrate the process of building
the OLAP cube with the heart disease database. The detailed procedures for building of
the cardio test cube are described, step-by-step, in the following subsections.
Figure 4.2 Screenshot of the cardio cube builder interface
50
4.2.1 Creating a New Cube
When the form was loaded, only the “Data Source/Cube” section was visible for
users to specify the name of the data source and the name of the new cube (Figure 4.3).
The name for the data source specifies the cardio data file used to build the cardio cube
(Figure 4.4). The name of the new cube is the name saved in the database object for
future reference. These specified names are added to the cardio database object in the
Analysis server through the functions, SetDataSource and AddCubetoDb, of the
component cubeBuilder.
Figure 4.3 Screenshot of the “Data Source/Cube” section
Figure 4.4 Screenshot of sample entries for both sections of “Data Source/Cube” and “Specify Fact/Measures”
51
4.2.2 The Fact Table and Measures Selections
After the setting of the data source and the adding of the new cube name “test1” to
the target database object, the section of “Specify Fact/Measures” is visible for the user to
specify the fact table and the measures which can be used in the building of the cardio
cube (Figure 4.4). The fact table lists the core features for the query to be used in the
analysis. It contains a column for each measure as well as a column for each dimension.
The measures are a set of numeric data based on the column values of the fact table and
are the key indicators for the primary analytical interest of the user [6]. Figure 4.5 shows
the details of the sample entries for the selection of the fact table and measures.
Figure 4.5 Screenshot of sample entries of “Specify Fact/Measure” section
4.2.3 Adding Dimensions to the Cube
Dimensions are the categories of the data analysis. As shown in Figure 4.6, the
section of “Add Dimensions to Cube” is used to add dimensions to the cube. There are
pre-defined shared dimensions are available in the cardio database object; these
dimensions can be specified and be added to the cube through the function of
52
AddDimToCube of the component cubeBuilder after clicking the button “Add
Dimension”. Figure 4.7 shows the sample entries of the dimension and its related key
column.
Figure 4.6 Screenshot of the “Add Dimensions to Cube” section
Figure 4.7 Screenshot of sample entries for cube dimension
4.2.4 Processing and Building the New Cube
After determining the measures, the dimensions and the fact table of the cube, the
section of “Process/Build Cube” of the form is visible as shown in Figure 4.8. After
clicking the “Build Cube” button, the “multidimensional OLAP” (MOLAP) is chosen as
the storage mode for the cardio cube [1, 6]. The storage format affects the disk-storage
space requirements and the data-retrieval performance. The MOLAP mode is chosen
because it stores the fact data and the aggregations on the Analysis server in a space-
53
efficient, highly indexed multidimensional form [1, 5]. In addition, MOLAP mode
summarizes the transactions into multidimensional views ahead of time. The data
retrievals on these types of databases are extremely fast, because all calculations have
been pre-generated when the cube is created.
Figure 4.8 Screenshot of the “Process/Build Cube” section
4.2.5 The Results
The detailed hierarchical database objects before and after the processes of building
the cardio cube are depicted in Figure 4.9 and Figure 4.10 respectively. The difference
between these two figures is that the new sample cube was added to the cube folder of the
cardio database object after the process of building the cardio cube. However, these
figures do not show the cube data of the cardio cube; the detailed cube data can be
accessed in the following section with the web-based application, Cardio Cube Browser,
which is implemented with the cubeBrowser component and is developed in this thesis.
54
Figure 4.9 Screenshot of the cardio test database object before building the new cardio cube
Figure 4.10 Screenshot of the cardio test database object after building the sample “cube1” 55
4.3 Implementation of the cubeBrowser Component
The ASP.NET web-based application, using VB.net as the programming code,
implements the cubeBrowser component for the end-users to browse the cardio cube data.
This application’s user interface developed in this work, is contained within a single web
form, called cubeBrowser.aspx, as shown in Figure 4.11 [20, 21]. This application
provides the following functional features for user in the process of retrieving cube data:
A. Connecting to the Analysis sever where the targeted cardio cube located
B. Retrieving the cardio cube data based on the user’s specifications
Figure 4.11 Screenshot of the web form BrowseCuber.aspx
4.3.1 Connection to the Analysis Server
In querying the cardio cube, the first stage is to set up the connection to the Analysis
server, which is the location of the target cardio cube. There are three functions in the
analysis component cubeBrowser: ConnectToServer, SetUpDatabase and
56
SetUpDataSource, to be used to connect to the server, to set up database object and data
sources in the Analysis server respectively. When the page is requested by the user, the
server will process the request and sends the page to the browser. In addition, the server
also connects to the Analysis server and lists the available cubes of the cardio database
object for users to view the data as shown in Figure 4.12.
Figure 4.12 Screenshot of listing of available cube
4.3.2 Retrieving the Cardio Cube Data
Once a connection has been made to the OLAP data source, the multidimensional
data of the cardio cube can be queried and manipulated through the MDX querying
language [22, 23]. The first step in creating the MDX query is to select the target cube
from the dropdown list (Figure 4.13). After the specification of the target cube, user
needs to select the measures whose data is in the cube, as shown in Figure 4.13.
Figure 4.13 Screenshot of specifying cube entry and measures
57
As shown in Figure 4.14, there are two pre-defined queries for user to select for their
analytical purpose to view the cardio cube data:
a. Pain type-location data
This option will display the cardio cube data results of different chest pain types in
the aspects of different geographical locations for the selected target measures (Figure
4.15).
b. Patient-pain type data
This option will display the cube data of different patient data with the selected chest
pain type for the target measures. This option needs user’s selection of the different chest
pains from the dropdown list as shown Figure 4.16.
Figure 4.14 Screenshot of selections of measures and the pre-defined view options
58
Figure 4.15 Screenshot of selections of location for Pain-Type option
Figure 4.16 Screenshot of selections of pain-type for Patient option
4.3.3 Displaying of the Cardio Cube Data
After the selections of measures and specifying of view options, click the “Browse”
button, the server processes the user’s request and displays the cube data in the grid
format as shown in Figure 4.17 and Figure 4.18.
Figure 4.17 Results of cube data for Pain-Type option with test country
59
Figure 4.18 Results of cube data for the angina chest pains per patient test city
4.3.4 Drill-down and Drill-up Capacities
OLAP tools organize the data in multiple dimensions and in hierarchies.
Dimensions usually associate with hierarchies, which organize data according to the
levels. The drilling-down and the drilling-up are the two analytical techniques whereby
the user navigates among various levels of data ranging from the most summarized (up)
to the most detailed (down) [20, 21]. For example, when viewing the cardio cube data of
different cites, a drill-down operation in the dimension of patient test center would
display tc001 to tc004 of each test centers, as shown in Figure 4.19. However, a drilling-
up operation would go in the reverse direction to a higher level and display the data of
test countries as shown in Figure 4.20.
60
Figure 4.19 Screenshot of drill-down to the test center level of Patient option
Figure 4.20 Screenshot of drill-up to the country’s level of Patient option
61
4.4 Implementation of the DMBuilder component
Figure 4.21 depicts the application interface DMMBuilder, which implements the
component DMBuilder. This interface is used to create a mining model with the
Microsoft Decision Trees algorithm as the constructing rules in the cardio cube data,
which is created from the previous section. This application interface is coded and
designed as a simple window form using VB.Net as the major programming code in the
MS SQL server 2000 environment [12]. As shown in Figure 4.21, the “Mining Model
Builder” form is divided into five groups. The following steps describe how to use this
application form to build the mining model with the cardio cube data.
Figure 4.21 Screenshot of the main interface DMMBuilder
62
Step 1: Setting up server and database information:
The first step of creating the mining model is to provide not only the name for the
Analytical server and database on which the user wants to perform the mining model task
but also the mining model’s name for storing of mining model attributes, as shown in
Figure 4.22. After clicking the “OK” button, an empty mining model has been created in
the user-specified server and added to the user-specified database object. In addition, the
section of “Mining Model Setup” is also available for the rest of the process, as shown in
Figure 4.23.
Figure 4.22 Screenshot of the “Server/Database” section
Figure 4.23 Screenshot of Mining model setup
63
Step 2: Setting up the mining model role:
The screen of the “Mining Model Role” is used to get the information of the mining
model role in order to set up the security role for the new mining model, as shown in
Figure 4.24. The method of SetMiningRole of the component DMBuilder is used to
perform this task.
Figure 4.24 Screenshot of setting the mining model role
Step 3: Setting up the properties of the mining model:
In this step, user needs not only to provide the data source name, source cube name,
case dimension and the general description of the model but also to specify the mining
model algorithm to be used for the target mining model (Figure 4.25). The Microsoft
Decision Trees algorithm is chosen as the method for prediction [10]. After clicking the
button “Add to DB”, the attribute information is added into each related property for the
target mining model.
Figure 4.25 Screenshot of setting properties and algorithm for the mining model
64
Step 4: Adding analysis column attribute:
The properties needed for the new data mining model column are set through the
form section “Analysis Column Entry”, as shown in Figure 4.26. In this step, the user
needs to identify the training case and the predictive outcome for the purpose of analysis.
Figure 4.26 Screenshot of setting the attributes of analytical column
Step 5: Saving and processing data mining model:
This new mining model object can be saved in the Analysis server after clicking the
“Save DMM” button (Figure 4.26). At this point, this new mining model is created but it
is not processed yet. Although a new mining model does not need to be processed, the
mining model can not be viewed until the processing is completed.
After clicking the button “Process DMM”, the processing of the new mining is
performed in the mode of fully process and the information about the patterns and rules
discovered in the training data are stored as the mining model content. The actual data
from the training dataset is not stored in the target server database. Figure 4.27 is a
screenshot of the content detail after processing the cardio mining model.
65
Figure 4.27 Screenshot of the cardio mining model using Microsoft Decision Trees Algorithm
66
67
CHAPTER V
DISCUSSIONS & FUTURE WORKS
This chapter summarizes the main contributions and conclusions of this thesis
regarding the data analysis components with OLAP solutions in the Microsoft SQL
server 2000 system. Moreover, this chapter also addresses some future works based on
the current work.
5.1 Contributions and Evaluations
The main purpose of this thesis is to develop the data analysis components as the
foundation for developers to build user-friendly interface applications for OLAP
solutions. These analysis components also can be used to hide the complexity challenge
and the heavy technological terms from the non-technical users in the process of building
the OLAP cubes and mining model system.
Our contributions are summarized as follows:
Detailed review the functionalities of Analysis Manager in the process of
building and viewing of the OLAP cubes as well as of building the data
mining models
Development of the data analysis components for OLAP solutions
68
Development of the desktop stand-alone interfaces implemented with the
component cubeBuilder and DMBuilder
Applying the case study of cardio disease dataset with the user-specific
application which is implemented the data analysis components developed in
the current work
Development of the web-based interface application implemented with the
component cubeBrowser. This web-based interface is also used to browse the
cardio cube data, which is created in the current work.
In addition to these contributions, the detailed reviews of the functionalities of
Analysis Manager in the process of building and viewing of the OLAP cubes as well as
of building the data mining models are also included in this thesis [5, 7].
Both Microsoft Analysis Manager and the analysis components can perform
following tasks for OLAP solutions:
A. Creating the database objects and specifying the data sources in the Analysis server
B. Building and processing the OLAP cubes
C. Creating and processing the data mining models
D. Specifying the storage options and optimizing the query performance
E. Browsing the cube data
Although Analysis Manager provides wizards and editors to help users to build and
to process the OLAP cubes and the mining models, but the technical terms and the fully
understanding of the underlying structure still become the barrier for user to use these
tools efficiently. In addition, the analysis components can help the developers in
69
designing the user-friendly interface application to hide the technical complexities from
the non-technical users. In addition, the analysis components offer the potential to
assemble applications much more rapidly and efficiently. A key to developing
applications quickly is the ability to reuse the existing pre-built components to meet the
user’s application requirements [6].
Analysis Manager installs a PivotTable service on the database server, which
includes an OLE DB provider that allows connecting to the OLAP data sources. The
PivotTable Service is an OLE DB provider for multidimensional data and data mining
operations [7, 14]. It is the primary method of communication between a client
application and a multidimensional data source or data mining model. It is used to build
client applications to interact the multidimensional data. It also provides methods for
online and off-line data-mining analysis of multidimensional data and relational data. It
offers connectivity to multidimensional cubes and data-mining models managed by the
Analysis Services.
The major limitation of the PivotTable Services is that it must be installed on the
client machine; otherwise, the client’s PivotTable control is unable to communicate with
the OLAP data sources.
To overcome this limitation, the development of the data analysis component,
implemented in the web-based OLAP browsing application interface, can provide re-
usable business solutions and can disseminate information more effectively. The
architecture that we presented here is designed to utilize several sophisticated
technologies, including SQL 2000 Analysis Services, the cubeBrowser component, and
ASP.NET, to the best of their capacities.
70
5.2 Future Works
This research developed the data analysis components for the OLAP solutions and
the mining model systems. It also demonstrated their functionalities with the cardio
databases. However, the analysis component for viewing the data of mining model is not
developed as well as its implementation. The development of the component in
visualizing the mining model will be the future work we are faced with.
The new release version, Microsoft SQL server 2005, enhances many features of
Business Intelligence and also builds complex business analytics with Analysis Services
[25, 26]. In addition, ADOMD.NET uses the XML for analytical protocol to
communicate with the analytical data source [27]. More works will be needed to develop
the data analysis components to use the new features of the Microsoft SQL server 2005
and ADOMD.NET to provide a user-friendly interface for OLAP solutions. In addition
to unload the design burden from the developers, these analysis components are able to
benefit the end users to navigate a rich, complex data set with a higher degree confidence
in analysis.
.
71
BIBLIOGRAPHY
[1]. Mailvaganam, H. 2004. “Introduction to OLAP: Slice Dice and Drill”. Retrieved August 22, 2005 from http://www.dwreview.com/OLAP/Introduction_OLAP.html.
[2]. The OLAP Council. OLAP and OLAP Server definitions. Retrieved August, 2005
from http://altaplana.com/olap/glossary.html. [3]. Thearling, K. 1995. “From Data Mining to Database Marketing”, Data
Intelligence Group. [4]. Thearling, K. 2000. An Introduction to Data Mining: Discovering hidden value in
your data warehouse. Retrieved August 18, 2005 from http://www.thearling.com/text/dmwhite/dmwhite.htm.
[5]. Pearson, W. 2002. “Introduction to SQL server 2000 Analysis Services-Creating
our first cube”. http://www.databasejournal.com/feature/mssql/article.php/1429671.
[6]. OLAP Train and Jacobson, R. 2000. Microsoft SQL Server 2000 Analysis
Services Step by Step. Microsoft Press. [7]. Garcia, L. 2003. “Understanding Microsoft SQL Server 2000 Analysis Services”.
http://www.phptr.com/articles/article.asp. [8]. Bertucc, P. 2002. Microsoft SQL Server Analysis Services. Microsoft® SQL
Server 2000 Unleashed, Second Edition. Chapter 42, 1347-1392. [9]. Soni, S.; Kurtz, W. 2001. “Analysis Services: optimizing cube performance using
Microsoft SQL server 2000 Analysis Services”. Retrieved April, 2005 from http://msdn.microsoft.com/library/en-us/dnsql2k/html/olapunisys.asp.
[10]. de Ville, B. 2001. “Data Mining in SQL server 2000”. SQL Server Magazine
http://www.windowsitpro.com/SQLServer/Article/ArticleID/16175/16175.html. [11]. Charran, E. 2002. “Introduction to Data Mining with SQL server”. Retrieved
August, 2005 from http://www.sql-server-performance.com/ec_data_mining.asp.
72
[12]. Rae, S. 2005. “Building intelligent .NET applications: Data-Mining predictions”. http://www.awprofessional.com/articles/article.asp.
[13]. Data Mining: http://www.megaputer.com/dm/dm101.php3.
[14]. Microsoft OLE DB Programmer's Reference: http://msdn.microsoft.com/library. [15]. Brust, A. J. 1999. “Put OLAP and ADO MD to Work”. VBPJ, November 1999 Issue.
94-97. [16]. Youness, S. 2000. “Using MDX and ADOMD to access Microsoft OLAP data”.
http://www.topxml.com/conference/wrox/2000_vegas/text/sakhr_olap.pdf. [17]. Whitney, R. 2002. “Collaboration through DSO”.
http://www.windowsitpro.com/SQLServer/Article/ArticleID/26564/26564.html. [18]. Frank C Rice (2002) Programming OLAP databases from Microsoft Access-
Using DSO. http://msdn.microsoft.com/library/default.
[19]. Microsoft OLE DB programmer's reference: http://msdn.microsoft.com/library. [20]. Nolan, C. 1999. “Manipulate and Query OLAP Data Using ADOMD and MDX -
Part I". Microsoft System Journal, August, 1999.
[21]. Nolan, C. 1999. “Manipulate and Query OLAP Data Using ADOMD and MDX - Part II". Microsoft System Journal, September, 1999.
[22]. Pearson, W. 2002. “MDX in Analysis Services”. Retrieved December, 2004 from
http://www.databasejournal.com/features/mssql/article.php/1495511. [23]. Pearson, W. 2002. “MDX Essentials”. Retrieved December, 2004 from
http://www.databasejournal.com/features/mssql/article.php/1550061. [24]. Heart Disease database. http://www.ics.uci.edu/~mlearn/MLSummary.html.
[25]. Frawley, M. 2004. “Analysis Services Comparison: SQL 2000 vs. 2005”. Retrieved October, 2005 from http://www.devx.com/dbzone/Article/21539.
[26]. Utley, C. 2005. “Solving Business Problems with SQL Server 2005 Analysis
Services”. Retrieved January, 2006 from http://www.microsoft.com/technet/prodtechnol/sql/2005/solvngbp.mspx.
[27]. Analysis Services Data Access Interfaces: ADOMD.NET Client Programming.
Retrieved January, 2006 from http://msdn2.microsoft.com/en-us/ library/ms123483.aspx.
73
APPENDICES
74
APPENDIX A
DATASET USED FOR CASE STUDIES
The database used in this work is downloaded from the web site of the Repository of
machine learning databases [24]. This heart-disease directory contains four databases
concerning the heart disease diagnosis. The data was collected from the following
locations:
1. Cleveland Clinic Foundation (Cleveland.data) 2. Hungarian Institute of Cardiology, Budapest (Hungarian.data) 3. V. A. Medical Center, Long Beach, Ca (long-beach-va.data) 4. University Hospital, Zurich, Switzerland (Switzerland.data)
Each database has the same instance format. While the databases have seventy-six
raw attributes, but all published experiments refer to using a subset of fourteen of them.
The authors of the databases have requested that any publications resulting from the
use of the data include the names of the principal investigator responsible for the data
collection.
They would be:
A. Creators:
1. Hungarian Institute of Cardiology, Budapest: Andras Janosi, M. D. 2. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. 3. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. 4. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:
Robert Detrano, M.D., Ph.D.
B. Donors: David W. Aha ([email protected]) Date: July, 1988.
75
C. Attributes
a. Age: age in years b. Sex: gender ( 1 = male; 0 = female) c. cp: chest pain type
Value 1: typical angina 2: atypical angina 3: non-angina pain 4: asymptomatic
d. tresbps: resting blood pressure (in mm Hg on admission to the hospital) e. chol: serum cholesterol in mg/dl f. fbs: fasting blood sugar > 120 mg/dl (1 = true; 0 = false) g. restecg: resting electrocardiographic results
Value 1. 0: normal 2. 1: Having ST/T wave abnormality 3. 2: showing probable or definite left ventricular
hypertrophy by 4. Estes’ criteria
h. thalach: maximum heart rate achieved i. exang: exercise induced angina (1 = yes; 0 = no) j. oldpeak: ST depression induced by exercise relative to rest k. slope: the slope of the peak existence ST segment l. ca: number of major vessels (0-3) colored by fluoroscopy m. thal: 3= normal; 6= fixed defect; 7= reversible defect n. num (prediction attribute): diagnosis: 0 is healthy, 1, 2, 3, 4 are sick.
APPENDIX B
APPLICATION INTERFACE OF OLAP CUBE BUILDER
Figure B.1 Screenshot of the OLAP cube builder interface for the power users
76
77
APPENDIX C
SOURCE CODE OF CUBBUILDER
This section consists of the source code for the analysis component, cubeBuilder,
which was written in the Visual Basic.NET programming language.
‘Visual Basic.NET source code
Public Class CubeBuilder ‘Declarations Public DataServer As New DSO.Server() Public DataSource As DSO.DataSource() Public DataProj As DSO.MDStore() Public Provider As String Public DataPath As String Public SerName As String ‘Initializations
Sub New() End Sub
Sub New(ByVal inServ As String, ByVal inProv As String, ByVal inPath As String)
SerName = inServ Provider = inProv DataPath = inPath End Sub
78
‘Class Property Public ReadOnly Property server() Get Return DataServer End Get End Property Public Property DSProvider() As String Get Return Provider End Get Set(ByVal Value As String) Provider = Value End Set End Property Public Property DataProject() Get Return DataProj End Get Set(ByVal Value) DataProj = Value End Set End Property ‘Connects to the specified server Public Sub ConnectToServer(ByVal servName As String, ByRef serv As DSO.Server) serv.Connect(servName)
End Sub ‘Closes the connection to the server Public Sub CloseServerConnect(ByRef inServer As DSO.Server) inServer.CloseServer() End Sub
79
‘Checking the validation status of a server Public Function ServerValid(ByRef serv As DSO.Server) As Boolean
If serv.IsValid Then Return True Else Return False End If End Function ‘Finding the target database object in the server Public Function FindDataProj(ByVal db As String, ByRef dServ As DSO.Server) As Boolean If dServ.MDStores.Find(db) Then Return True Else Return False End If End Function
‘Adding new database object to the server
Public Function AddNewDataProj(ByVal db As String, ByRef dServ As DSO.Server) As DSO.MDStore
Return dServ.MDStores.AddNew(db)
End Function
‘Setting Database object
Public Function SetDataProj(ByVal db As String, ByRef dServ As DSO.Server) As DSO.MDStore
Return dServ.MDStores.Item(db) End Function
80
‘Searching the specified data source Public Function FindDataSource(ByVal ds As String, ByRef dDB As DSO.MDStore)
As Boolean If dDB.DataSources.Find(ds) Then Return True Else Return False End If End Function ‘Adding new Data Source Public Function AddNewDataSource(ByVal ds As String, ByRef dDB As DSO.MDStore) As DSO.DataSource Return dDB.DataSources.AddNew(ds) End Function ‘Setting data source Public Function SetDataSource(ByVal ds As String, ByRef dDB As DSO.MDStore) As DSO.DataSource Return dDB.DataSources.Item(ds)
End Function ‘Get Data link connection Public Function GetDataLink(ByVal p As String, ByVal dp As String) As String Dim str As String Dim info As String = ";Persist Security Info=False;Jet OLEDB:SFP=True;""" str = "Provider=" & p & ";Data Source=" & dp & ";Persist Security Info=False;Jet OLEDB:SFP=True;" Return str End Function
81
‘Setting data link of data source Public Sub SetLinkDataSource(ByVal dLink As String, ByRef ds As DSO.DataSource) ds.ConnectionString = dLink ds.Update() End Sub ‘Creating Database Connection Public Function CreateDbaseDimension(ByRef dDbase As DSO.MDStore, ByRef dataSrc As DSO.DataSource, ByVal strDim As String,
ByVal strDescr As String ByVal strFromClause As String, ByVal strJoin As String, ByVal strDimType as String) As DSO.Dimension
Dim dsoNewDim As DSO.Dimension dsoNewDim = dDbase.Dimensions.AddNew(strDim) dsoNewDim.DataSource = dataSrc dsoNewDim.Description = strDescr dsoNewDim.FromClause = strFromClause dsoNewDim.JoinClause = strJoin dsoNewDim.DimensionType = strDimType Return dsoNewDim End Function ‘Adding Level to the Dimension table Public Sub AddLeveltoDim(ByRef dsoDim As DSO.Dimension, ByVal levStr As String, ByVal strDimtbl As String, ByVal ColumnStr As String, ByVal ColType As Short, ByVal colSize As Integer, ByVal EstSize As Integer)
Dim dsoLev As DSO.Level Dim strKeyColumn As String dsoLev = dsoDim.Levels.AddNew(levStr) strKeyColumn = """" & strDimtbl & """" & "." & """" & ColumnStr & """" dsoLev.MemberKeyColumn = strKeyColumn dsoLev.ColumnType = ColType dsoLev.ColumnSize = colSize
82
dsoLev.EstimatedSize = EstSize dsoDim.Update() End Sub ‘ Alternative method for adding level to the dimension table Public Sub AddLeveltoDim1(ByRef dsoDim As DSO.Dimension, ByVal levStr As String, ByVal strDimtbl As String, _ ByVal ColumnStr As String, ByVal ColType As String) Dim dsoLev As DSO.Level Dim strKeyColumn As String dsoLev = dsoDim.Levels.AddNew(levStr) strKeyColumn = """" & strDimtbl & """" & "." & """" & ColumnStr & """" dsoLev.MemberKeyColumn = strKeyColumn dsoLev.ColumnType = CShort(ColType) dsoLev.ColumnSize = 255 dsoLev.EstimatedSize = 1 dsoDim.Update() End Sub ‘Adding new cube to the database object Public Function AddNewCube(ByRef dSer As DSO.Server, ByVal dDB As String, ByVal DtSrc As String, ByVal dtCube As String) As DSO.MDStore Dim dsoCube As DSO.MDStore dsoCube = dSer.MDStores.Item(dDB).MDStores.AddNew(dtCube) dsoCube.DataSources.AddNew(DtSrc) dsoCube.Update() Return dsoCube End Function ‘Adding fact table to the cube Public Sub AddFactTblToCube(ByRef inCube As DSO.MDStore, ByVal strFactTblName As String) inCube.SourceTable = strFactTblName inCube.EstimatedRows = 100000 End Sub
83
‘Adding shared dimension to the cube Public Sub AddShareDDimToCube(ByRef inCube As DSO.MDStore, ByVal strDimName As String) inCube.Dimensions.AddNew(strDimName) inCube.Update() End Sub ‘Adding measure to the cube Public Sub AddMeasureToCube(ByRef inCube As DSO.MDStore, ByVal inMeaText As String, ByVal inDescr As String, ByVal factTbl As String, ByVal inField As String) Dim dsoMeasure As DSO.Measure dsoMeasure = inCube.Measures.AddNew(inMeaText) dsoMeasure.Description = inDescr dsoMeasure.SourceColumn = "" & factTbl & "." & "" & inField & "" dsoMeasure.SourceColumnType = ADODB.DataTypeEnum.adDouble dsoMeasure.AggregateFunction = DSO.AggregatesTypes.aggSum inCube.Update() End Sub ‘ Processing the cube Public Sub ProcessCube(ByRef iCube As DSO.MDStore) iCube.Process(DSO.ProcessTypes.processFull) End Sub End Class
84
APPENDIX D
SOURCE CODE OF CUBEBROWSER
This section consists of the source code for the analysis component, cubeBrowser,
which was written in the Visual Basic.NET programming language.
‘Visual Basic.NET source code
Public Class cubeBrowser
‘Declarations
Public cbServer As String Public cbDatabase As String Public cbDBconnect As New ADODB.Connection() Public cbCellset As New ADOMD.Cellset() 'Dim conStr As String
‘Initialization
Public Sub New(ByVal oSer As String, ByVal oDb As String)
cbServer = oSer cbDatabase = oDb End Sub
‘Getting the connection string for Catalog object
Public Function GetConCatalogString() As String Dim strTemp As String strTemp = " " strTemp = strTemp & "Provider=msolap; data source=" & cbServer strTemp = strTemp & "; Initial Catalog=" & cbDatabase & ";"
85
Return strTemp End Function ‘Connecting to Catalog object Public Function ConnectToCatalog(ByVal conStr As String) As Object Dim adomdCatlog As New ADOMD.Catalog() adomdCatlog.let_ActiveConnection(conStr) Return adomdCatlog End Function
‘Get Cellset connection string
Public Function GetCellConnectString() As String Dim strCon As String
strCon = " " strCon = strCon & "Provider=msolap; data source=" & cbServer strCon = strCon & "; database=" & cbDatabase & ";" Return strCon End Function
‘Getting connection to the cellset
Public Function GetConnectToCell(ByVal olapDb As ADODB.Connection) As Object cbCellset.ActiveConnection = olapDb Return cbCellset End Function ‘Connecting to Database object Public Function ConnectToDB(ByVal oS As String) As Object cbDBconnect.Open(oS) Return cbDBconnect End Function ‘Connecting to the cube
Public Function ConnectToCube(ByVal oStr As String, ByVal oMdx As String) As Object cbDBconnect.Open(oStr) cbCellset.ActiveConnection = cbDBconnect
86
cbCellset.Open(oMdx) Return cbCellset End Function
‘Displaying the Cellset I
Public Sub ViewCubeStruct(ByRef lstBox As Object, ByRef inCat As Object, ByVal inCubeName As String)
Dim cbDef As ADOMD.CubeDef Dim cbDim As ADOMD.Dimension Dim strDim As String Dim cbHir As ADOMD.Hierarchy Dim strLevel As String Dim cbLev As ADOMD.Level Dim strTemp As String cbDef = inCat.cubeDefs(inCubeName) strTemp = "Cube: " & inCubeName lstBox.Items.Add(strTemp) strTemp = " " For Each cbDim In cbDef.Dimensions strDim = " " strDim = " -Dimension: " & cbDim.Name lstBox.Items.Add(strDim) For Each cbHir In cbDim.Hierarchies For Each cbLev In cbHir.Levels strLevel = " -- " & cbLev.Name lstBox.Items.Add(strLevel) Next Next Next End Sub
‘Displaying the Cellset II Public Sub ViewCubeStruct(ByRef lstBox As Object, ByRef inCat As Object) Dim cbDef As ADOMD.CubeDef Dim cbDim As ADOMD.Dimension Dim strDim As String Dim cbHir As ADOMD.Hierarchy Dim strLevel As String
87
Dim cbLev As ADOMD.Level Dim strTemp As String For Each cbDef In inCat.CubeDefs strTemp = "Cube: " & cbDef.Name lstBox.Items.Add(strTemp) strTemp = " " For Each cbDim In cbDef.Dimensions strDim = " " strDim = " -Dimension: " & cbDim.Name stBox.Items.Add(strDim) For Each cbHir In cbDim.Hierarchies For Each cbLev In cbHir.Levels strLevel = " -- " & cbLev.Name lstBox.Items.Add(strLevel) Next Next Next Next End Sub
‘Close the object connection
Public Sub CloseConnection(ByVal iConn As Object)
iConn.Close()
End Sub
End Class
88
APPENDIX E
SOURCE CODE OF DMBUILDER
This section consists of the source code for the analysis component, DMBuilder,
which was written in the Visual Basic.NET programming language.
‘Visual Basic.NET source code
Public Class clsBuildMiningModel
‘Declarations
Public dsoCol As DSO.Column
‘Initialization
Public Sub New()
End Sub
‘Clearing the object
Public Sub ClearObject(ByRef inObj As Object) inObj = Nothing End Sub ‘Connecting to the Server Public Sub ConnectToServer(ByVal strSer As String, ByRef ser As DSO.Server) ser = New DSO.Server() ser.Connect(strSer)
End Sub
89
‘Closing the server connection Public Sub CloseServerConnection(ByRef s As DSO.Server)
s.CloseServer()
End Sub
‘Checking the Server connection status
Public Function IsServerConnect(ByRef ser As DSO.Server) As Boolean If ser.IsValid Then Return True Else Return False End If
End Function
‘Checking the target object’s status
Public Function IsExistingModel(ByRef db As DSO.MDStore, ByVal strName As String) As Boolean If db.MiningModels.Item(strName) Is Nothing Then Return False Else Return True End If End Function
‘Checking the target cube’s status
Public Function IsValidCube(ByRef db As DSO.MDStore, ByVal sCube As String) As Boolean If db.MDStores.Find(sCube) Then Return True Else Return False End If End Function
‘Adding a new mining model
90
Public Sub AddNewMiningModel(ByRef db As DSO.MDStore, ByVal mName As String, ByVal dtType As DSO.SubClassTypes, ByRef dMM As DSO.MiningModel) dMM = db.MiningModels.AddNew(mName, dtType) End Sub
‘Adding a new model role
Public Sub AddNewMMRole(ByRef dmm As DSO.MiningModel, ByVal rName As String, ByRef dRole As DSO.Role) dRole = dmm.Roles.AddNew(rName) End Sub
‘Setting the properties of the target mining model
Public Sub SetModelProperty(ByRef dmm As DSO.MiningModel, ByVal dtSrc As String, ByVal mDescr As String, ByVal dtType As DSO.SubClassTypes, ByVal mmAlgo As String, ByVal srcCube As String, ByVal cDim As String, ByVal mTrainQ As String) With dmm .DataSources.AddNew(dtSrc, DSO.SubClassTypes.sbclsOlap) .Description = mDescr .MiningAlgorithm = mmAlgo .SourceCube = srcCube .CaseDimension = cDim .TrainingQuery = mTrainQ .Update() End With End Sub
‘Enabling the column’s property
Public Sub EnableColumnProperty(ByRef dmm As DSO.MiningModel, ByVal strCol As String, ByVal CheckFlag As Boolean, ByVal InputSelect As Boolean, ByVal PredictSelect As Boolean)
91
dsoCol = dmm.Columns.Item(strCol) If CheckFlag = True Then dsoCol.IsInput = InputSelect dsoCol.IsPredictable = PredictSelect End If dsoCol.IsDisabled = False End Sub ‘Saving the target mining model
Public Sub SaveMiningModel(ByRef dMM As DSO.MiningModel) dMM.LastUpdated = Now dMM.Update() End Sub ‘Processing the mining model Public Sub ProcessMiningModel(ByRef dsoDMM As DSO.MiningModel, ByRef dsoLockType As DSO.OlapLockTypes, _ ByRef dsoLockDescr As String, ByVal prcType As DSO.ProcessTypes) With dsoDMM .LockObject(dsoLockType, dsoLockDescr) .Process(DSO.ProcessTypes.processFull) .UnlockObject() End With End Sub End Class