Data Mining & Housing

download Data Mining & Housing

of 13

Transcript of Data Mining & Housing

  • 8/2/2019 Data Mining & Housing

    1/13

    ABSTRACT

    The Data Warehousing supports business analysis and decision making by

    creating an enterprise wide integrated database of summarized, historical information.

    It integrates data from multiple incompatible sources. By transforming data into

    meaningful information a data warehouse allows the business manager to perform

    more substantive, accurate and consistent analysis.

    DataMining techniques can be implemented rapidly on existing software and

    hardware platforms to enhance the value of existing information resources and can be

    integrated with new products and systems as they are brought online. When

    implemented on high performance clien/server or parallel processing computersdatamining tools can analyze massive databases that support querying effectively.

    A Data Warehouse is of course a database, but it contains summarized

    information. Integration of Data Mining with Warehouse exploits effective results like

    better quering process, performance sharing and also getting reliable information.

    Here in the following section we expose the entire concept of Data Warehousing &

    Data Mining.

  • 8/2/2019 Data Mining & Housing

    2/13

    NARAYANA ENGG. COLLEGE

    By

    D. Ajith kumar(IVth CSIT)&

    SANJAY JOSHI(IIIrd ECE)

  • 8/2/2019 Data Mining & Housing

    3/13

    Contents

    1) Introduction

    Features

    Decision Support Systems

    2) Datawarehouse schemas

    3) Microsoft Data Warehousing Framework

    4) Dataminig working procedure

    Datawarehouse with data mining

    An approach to Client/Server data warehousing

    Applications

    Conclusion

  • 8/2/2019 Data Mining & Housing

    4/13

    INTRODUCTION:

    Modern organizations are under enormous pressure with recent development of the

    technology. Clearly we need a rapid access to all kinds of information. To assist this

    we need to consider the past and to identify relevant trend analysis. So in order to

    perform any trend analysis we must have a database.

    In most organizations you will find really large databases in operation for

    normal daily transactions. These types of databases are known as operational

    databases; in most cases they have not been design to store historical data or to

    respond to queries but simply to support all the applications for day to day

    transactions.

    The second type of database found in organizations is the data warehouse. This is

    designed for strategic decision support and is largely built up from the databases that

    make up the operational database. The basic characteristic of a data warehouse is that

    it contains vast amount of data which can mean billions of records. Smaller, local data

    warehouse are called data marts.

    A data warehouse is designed especially for decision support queries, therefore only

    data that is needed for decision support is extracted from the operational data and

    stored in the data warehouse along with the time when it was retrieved from

    operational databases.

    Datawarehousing

    Need for Datawarehouse:

    To summarise large valumes of data.

    To integrate datas from different sources.

    Make decision makers to access past data.

    Enable people to make informed decisions.

  • 8/2/2019 Data Mining & Housing

    5/13

    FEATURES :

    1. Time dependent: - That is, containing information collected over time, which

    implies there must always be a connection between the information in the

    warehouse and the time when it was entered.

    2. Non-volatile (permanent): -That is, data in datawarehouse is never updated

    but used only for queries. End users who want to update the data must use

    operational database.This means that data warehouse will always be filled with

    historical data.

    3. Subject oriented: - That is, built around all the existing applications of the

    operational data.The data warehouse is designed specifically for decision

    support while the operational databases contain about information for day to-

    dayuse.

    4. Integrated: - In data warehouse it is essential to integrate this information and

    make it consistent; only one name must exists to describe each individual entity.

    DECISION SUPPORT SYSTEM :

    When designing a decision support system, particular importance should be placed on

    the requirements of the end-user and the h/w and s/w products that will be required.

    The requirements of the end-users: -

    Some end-users need specific query tools so that they can build their queries

    themselves. Some others are interested only in particular part of information. We can

    build a specific type of application around this to speed up the query process.

    H/w and S/w products of a decision support systems:

    Working in a client/server environment allows you great flexibility in choosing the

    appropriate s/w for end-users because each individual need can be catered for on a

    local workstation.The h/w requirements depends on the type of data warehouse and

    the techniques with which you want to work.Two basic types of data warehouses :

  • 8/2/2019 Data Mining & Housing

    6/13

    1. Enterprise data warehouses: The enterprise data warehouse

    contains corporate wide information integrated from multiple operational data sources

    for consolidated data analysis. Typically it is composed of several subject areas such

    as customers, products, and sales and is used for both tactical and strategic decision

    making.

    2. DataMarts :Datamarts contain a subset of carporate wide data

    that is built for use by an individual department or division of an organization. Unlike

    the enterprise data warehouse, datamarts are often built from the bottom of by

    departmental resources for a specific support application or group of users. Datamarts

    contain summarized and often detailed about subject area.

    DATAWAREHOUSE SCHEMAS :

    A multidimensional data model identifies the dimensions, their hierarchies the

    measure functions etc., for the design of data cube. But realization of data cube is in

    designing phase. Variouse schemas as employed.

    1. Star schema :

    It is a modeling paradign in which the datawarehouse contains a large single fact table

    and a set of smaller dimensional tables, one for each dimension.

    Fact table:

    Fact table

    Dim1-key

    Dim2-keyDim3-key

    Summary

    Dim1table

    Dim1Attrib Dim2table

    Dim2Attrib

    Dim3table

    Dim3Attrib

  • 8/2/2019 Data Mining & Housing

    7/13

    It contains detailed summary data

    Each tuple consists of foreign key to each dimension table.

    Corresponds to only one tuple in each dimension table.

    Dimension table:

    It consists of columns that corresponds to the attributes of the dimensions.

    One tuple in a dimension table may corresponds to more than one tuple

    in the fact table.

    1:N relationship exists between factable and dimensiontables. It is easy to understand and easy to define hierarchies. It reduces the no. of physical joins and is easy to maintain.

    2. Snowflake schema :

    It consists of single fact table and multiple dimension tables. The difference between

    star schema and snowflake schema is that in star schema the dimension tables are

    denormalized and in snowflake schema these tables are normalized.

    Easier to maintain.

    Saves storage space.

    Microsoft Data Warehousing Framework:

    The goal of the data warehousing framework is to simplify the design implementation

    and management of data warehousing solutions. The data warehousing framework

    Fact tableDimension2

    tableDimension1

    tableDimension3

    table

  • 8/2/2019 Data Mining & Housing

    8/13

    describes the relationships between the various components used in the process of

    building using and managing a data warehouse.

    The core of the Microsoft framework is a set of enabling technologies comprised of

    the data transport layer and integrated data repository. Operational data must pass

    through a cleaning and transformation stage before being placed into the datamarts or

    data warehouse in order to confirm to the decisions laid out during the design stage.

    End-user tools including desktop productivity products specialized analysis

    products and custom programs are used to gain access the information in the data

    warehouse. Ideally user access is through a directory facility that enables the user

    search for appropriate and relevant data to resolve business questions, and provides a

    layer of security between the users and backend systems.Finally a verity of tools

    come into play for the management of data warehouse environment such as

    scheduling repeated tasks and managing multiserver N/w.

    Data Warehouse/ Data Mart Design

    Operational

    Sources

    Data

    Transform/

    Cleaning

    Datamarts

    or Data

    Warehouse

    Infor

    mation

    directory

    End-User

    Tools

    Repository(persistent shared metadata)

    Data Warehouse Management

    Schema Transform Schedule Repl Info Publish OLAP

    Building Using

  • 8/2/2019 Data Mining & Housing

    9/13

    Microsoft repository provides the integration point for the metadata shared by the

    various tools used in the data warehousing process. Shared metadata allows for the

    transparent integration of the multiple tools from a variety of vendors, with out the

    need for specialized interfaces between each of the products.

    Datamining

    DataMinig or knowledge discovery in databases is the nontrivial extraction of implicit

    and previously unknown and potentially usefull information from the data. Data

    mining is the search for relationship and global patterns that exist in large databases

    but are hidden among vast amount of data.

    WORKING PROCEDURE :

    DataMining software analyzes relationships and patterns in stored transactions data

    based on open-ended user queries.

    Generally sought four types of relationships are :

    classes : Stored data is used to locate data in predetermined groups.

    Clusters : Data items are grouped according to logical relationships or consumer

    preferences.

    Associations : Data can be mined to identify associations.

    Sequential patterns : Data is mined to anticipate behaviour patterns and trends.

    Major Steps :

    Extract, transform and load transaction data onto the datawarehouse system.

  • 8/2/2019 Data Mining & Housing

    10/13

    Store and manage the data in a multidimensional database system. Provide data access to business analysts and Information technology professionals. Analyze the data by application software. Present data in useful manner such as graph or table.Techniques in DataMining:

    1. Artificial Neural Networks: Non-linear predictive models that learn through

    training and resemble biological neural network in structure.

    2. Decision Trees: Tree shaped structures that represent sets of decisions. These

    decisions generate rules for classification of dataset.

    3. Genetic Algorithms: Optimization techniques that use processes such as genetic

    combinations, mutation and natural selection in a design based on the concepts of

    evaluation.

    4. Rule Induction: The extraction of useful if-then rules from data based on

    statistical significance.

    Datawarehouse with data mining:

    Data mining: - As is well known, in mining, enormous quantities of debris have to

    be removed before diamonds or gold can be found. The analogy that, with a computer

    you can automatically find the one 'information-diamond' among the tons of data-

    debris in your database is of course very attractive.

    Integration of a data mining in a decision support system is very helpful. The

    sole function of data warehouse is to supply information needed to make adequate

    decisions. In some cases you can use standard SQL tools for decision support, but if

    you want to compare millions of records and do not know exactly the type of

    information you require, or if you want to find hidden data then you have to turn to

    data mining. In many cases you will find that you need a separate computer for data

    mining; trying to mine operational data is almost impossible because there are

    different applications with different types of attributes and different data types but no

    historical data. With a data warehouse this problem does not exist - all the information

  • 8/2/2019 Data Mining & Housing

    11/13

    has been transferred from the operational database to the data warehouse;

    furthermore, in many cases you can clean the data before commencing data mining.

    The Relationship between operational data, a data warehouse, and datamarts

    Client/Server and data warehousing:

    Over the past few years it has proved very difficult to built effective decision

    support systems because the techniques available were not able to support the end-

    user satisfactorily. End-users would ideally like to have available all kinds of

    techniques such as GUI, statistical techniques, windowing mechanisms and

    visualization techniques so that they can easily access the data being sought. This

    means that a great deal of local computer power is needed at each workstation, and

    the client/server technique is the solution to this problem. Client/Server involves

    dispersing the s/w over several computers and creating an environment for the end-

    user so that it appears that each is working on just one system. The heavy load of GUI

    or other visual techniques can be processed on this local machines and all the

    database tasks handled by a specific database serve. In this way the database server

    can be completely optimized for the database. In some cases you can buy special

    databases that operate with specific type of h/w. With client/server you only have to

    change the piece of s/w that is related to the end-user the other applications do not

    Operational

    data

    Extracts

    from several

    databases

    Data

    WarehouseDatamarts

  • 8/2/2019 Data Mining & Housing

    12/13

    require alteration. Of all the techniques currently available on the market, client/server

    represents the best choice for building a data warehouse.

    APPLICATIONS:

    Datawarehousing:

    a. Sales and marketing analysis across many industries.

    b. Inventory turn and product tracking in manufacturing.

    c. Profitable lane or driver risk analysis in transportation.

    d. Claims analysis or fraud detection in insurance.

    DataMining:

    Retail/Marketing : Identifying buying patterns from customers.

    Banking: Detect patterns of fraudulent credit card use.

    Healthcare:

    1. Identifying the behaviour of the risky customer.

    2. Identifying successful medical therapies for different illenesses.

    Conclusion: -

    Acquiring of right information at right time to right people is key to take right

    decisions. To make possible so, the path called data warehouse is used to data

    mining.

    Bibliography:

    1. Data Mining by Pieter Adriaans , Dolf Zantinge

  • 8/2/2019 Data Mining & Housing

    13/13

    2. Decision Support and Data Warehouse Systems

    by Efrem G. Mallach

    Contact Address:

    D. Ajith kumar ,

    01711A1201, IV/IV CSIT

    NARAYANA ENGG. COLLEGE,

    NELLORE.

    Mail: [email protected]

    http://var/www/apps/conversion/current/tmp/scratch20368/[email protected]://var/www/apps/conversion/current/tmp/scratch20368/[email protected]