MC0088 - Data Mining

download MC0088 - Data Mining

of 6

Transcript of MC0088 - Data Mining

  • 7/27/2019 MC0088 - Data Mining

    1/6

    1. What is operational intelligence?

    Ans. Operational Intelligence (OI) is a form of real-time dynamic, business analytics that delivers visibility

    and insight into data, streaming events and business operations. Operational intelligence run queries against

    streaming data feeds and event data to deliver real-time analytic results as operational instructions.

    Operational Intelligence provides organizations the ability to make decision and immediately act on these

    analytic insights, through manual or automated actions. The purpose of OI is to monitor business activities

    and identify and detect situations relating to inefficiencies, opportunities, and threats and provide operational

    solutions. Some definitions define operational intelligence an event-centric approach to delivering information

    that empowers people to make better decisions. OI helps to quantify following:

    - The efficiency of the business activities

    - Impact of IT infrastructure and unexpected events on the business activities.

    - Execution of the business activities contributing to revenue gains or losses.

    This is achieved by observing the progress of the business activities and computing several metrics in

    real-time using these progress events and publishing the metrics to one or more channels. Thresholds

    can also be placed on these metrics to create notifications or new events.

    In addition, these metrics act as the starting point for further analysis .

    Sophisticated OI systems also provide the ability to associate metadata with metrics, process steps,

    channels, etc. with this, it becomes easy to get related information that tool 60% more time than the

    norm, or view the acceptance/rejection trend for the customer who was denied approval in this

    transaction, Launch the application that this process step interacted with.

    Features

    Different operational intelligence solutions may use many different technologies and be implemented in different

    ways. This section lists the common features of an operational intelligence solution:

    - Real-time monitoring

    - Real-time situation detection

    - Real-time dashboards for different user roles

    Correlation of events

    - Industry-specific dashboards

    - Multidimensional analysis

    Root cause analysis

    Time Series and trending analysis

    ComparisonOI is often linked to or compared with business intelligence (BI) or real time business intelligence, in the sense that

    both help make sense out of large amounts of information. But there are some basic differences: OI is primarily

    activity-centric, whereas BI is primarily data-centric. (As with most technologies, each of these could be sub-

    optimally coerced to perform the others task.) OI is, by definition real-time, unlike BI which is traditionally an after-

    the-fact and report based approach to identifying patterns, and unlike real time BI which relies on a database as the

    sole source off events.

  • 7/27/2019 MC0088 - Data Mining

    2/6

    2. What is Business Intelligence? Explain the components of BI architecture.

    Ans. Business Intelligence is an environment in which business users receive data that is reliable, consistent,

    understandable, easily manipulated and timely. With this data, business users are able to conduct analysis that

    yield overall understanding of where the business has been, where it is now and where it will be in the near

    future. Business intelligence serves two main purposes, it monitors the financial and operational health of the

    organization (reports, alerts, alarms, analysis tools, key-performance indicators and dashboards). It also

    regulates the operation of the organization providing two-way integration with operational systems and

    information feedback analysis. There are various definitions given by the experts, some of the definitions are

    given below:

    - Converting data into knowledge and making it available throughout the organization are the jobs

    of processes and applications known as Business Intelligence.

    - BI is a term that encompasses a broad range of analytical software and solutions for gathering,

    consolidating, analyzing and providing access to information in way that is supposed to let the

    users of an enterprise make better business decisions.

    Business Intelligence Infrastructure

    - Business organizations can gain a competitive advantage with well-designed business intelligence(BI) infrastructure. Think of the BI infrastructure as a set of layers that begin with the operational

    systems information and Meta data and end in delivery of business intelligence to various

    business use communities.

    - Based on the overall requirements of business intelligence, the data integration layer is required to

    extract, cleanse and transform data into load files for the information warehouse.

    - This layer begins with transaction-level operational data and Meta data about these operational

    systems.

    - Typically this data integration is done using a relational staging database and utilizing flat files

    extracts from source systems.

    - The product of a good data-staging layer is high-quality data, a reusable infrastructure meta data

    supporting both business and technical users.

    This information warehouse is usually developed incrementally over time and is architected to include key business

    variables and business metrics in a structure that meets all business analysis questions required by the business groups.

    Within the business intelligence life cycle, the operational systems are the starting point for data you will later want to

    analyze. If you do not capture the data in the operational system, you cant analyze it. If operational system contains

    error, those will only get compounded when you later aggregate and combine it with other data.

  • 7/27/2019 MC0088 - Data Mining

    3/6

  • 7/27/2019 MC0088 - Data Mining

    4/6

    4. What is Neural Network? Explain in details.

    Ans. An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way

    biological nervous systems, such as the brain, process information. The key element of this paradigm is the

    novel structure of the information processing system. It is composed of large number of highly interconnected

    processing elements (neurons) working in unison to solve specific problems. Neural networks are made up of

    many artificial neurons. An artificial neuron is an electronically modeled biological neuron. Number of

    neurons used on the task at hand. It could be as few as three or as many as several thousand. There are many

    different ways of connecting artificial neurons together to create a neural network. There are different types ofNeural Networks, each of which has different strengths particular to their applications. The abilities of

    different networks can be related to their structure dynamics and learning methods.

    Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be

    used to extract patterns and detect trends that are too complex to be noticed by either humans or other

    computer techniques. A trained neural network can be thought of as an expert in the category of information

    it has been given to analyze. This expert can then be used to provide projections given new situations of

    interest and answer what if questions.

    Other advantages include:

    Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial

    experience.

    Self-Organization: an ANN can create its own organization or representation of the information it receives

    during learning time.

    Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are

    being designed and manufactured which take advantage of this capability.

    Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the

    corresponding degradation of performance. However, some network capabilities may be retained even withmajor network damage.

    Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or

    forecasting needs including:

  • 7/27/2019 MC0088 - Data Mining

    5/6

    Sales forecasting

    Industrial process control

    Customer research

    Data validation

    Risk management

    Target marketing

    ANN is also used in the following specific paradigm like Recognition of speakers in communication,Diagnosis of hepatitis, Recovery of telecommunication from faulty software, Interpretation of multi-meaning

    Chinese words, Undersea mine detection, Texture analysis, Three dimensional object recognition, Hand

    written work recognition and facial recognition.

    5. What is partition algorithm? Explain with the help of suitable example

    Ans. The partition algorithm is based on the observation that the frequent sets are normally very few in

    number compared to the set of all item sets. As a result, if we partition the set of transactions to smaller such

    that each segment can be accommodated in the main memory, then we can compute the set of frequent sets of

    each of these partitions. It is assumed that these sets (set of local frequent sets) contain a reasonably small

    number of items sets. Hence we can read the whole database (the un-segmented one) once, to count the

    support of the set of all local frequent sets. The partition algorithm uses two scans of the database to discover

    all frequent sets. In one scan, it generates a set of all potentially frequent item sets, i.e, it may contain falsepositives; but no false negatives are reported. During the second scan, counters for each of these item sets are

    set up and their actual support is measured in one scan of the database. The algorithm executes in two phases.

    In the first phase, the partition algorithm logically divides the database into a number of non-overlapping

    partitions. The partitions are considered one at a time and all frequent item sets for that partition are generated.

    Thus, if there are n partitions, Phase I of the algorithm take n iterations. At the end of the Phase I, these

    frequent item sets are merged to generate a set of all potential frequent item sets. In this step, the local

    frequent items of same lengths from all n partitions are combined to generate the global candidate item sets. In

    Phase II, the actual supports for these item sets are generated and the frequent item sets are identified. The

    algorithm reads the entire database once during Phase I and once during Phase II. The partition sizes are

    chosen such that each partition can be accommodated in the main memory, so that the partitions are read only

    once in each phase.

    A partition P of the databases refers to any subset of the transactions contained in the database. Any two

    partitions are non-overlapping. We defined local support for an item set as the fraction of the transaction

    containing that particular item set in partition. We define a local frequent item set as an item set whose local

    support in a partition is at least the user-defined minimum support. A local frequent item set may or may not

    be frequent in the context of the entire database.

  • 7/27/2019 MC0088 - Data Mining

    6/6

    6. Describe the following with respect to Web Mining:

    a. Categories of Web Mining (5)

    Ans. Web is broadly defined as the discovery and analysis of useful information from the World Wide

    Web. Web mining is divided into three categories:

    1. Wed Content Mining

    2. Web Structure Mining

    3. Web Usage Mining

    All of the three categories focus on the process of knowledge discovery of implicit, previously unknown and

    potentially useful information from the web. Each of them focuses on different mining objects of the web.

    Content mining is used to search, collate and examine data by search engine algorithms. Web content mining targets

    the knowledge discovery, in which the main objects are the traditional collections of multimedia documents such as

    images, video and audio, which are embedded in or linked to the web pages.

    Web structure mining focuses on analysis of the link structure of the web and one of its purposes is to identify more

    preferable documents. The different objects are linked in some way.

    Web usage mining focuses on techniques that could predict the behavior of users while they are interacting with the

    WWW. Web usage mining, discover user navigation patterns from web data, tries to discover the useful informationfrom the secondary data derived from the interactions of the users while surfing on the web.

    b. Applications of Web Mining (5)

    Ans. With the rapid growth of World Wide Web, web mining becomes a very hot and popular topic in

    Web research. E-commerce and E-Services are claimed to be the killer applications for web mining, and

    web mining now also plays an important role for E-Commerce website and E-services to understand how

    their websites and services are used and to provide better services for their customers and users. A few

    applications are:

    - E-commerce customer behavior analysis

    - Commerce transaction analysis

    - E-commerce website design

    - E-banking

    - M-commerce

    - Web advertisement

    - Search engine

    - Online auction

    Open source software for web mining includes RapidMiner, which provides modules for text clustering,

    text categorization, information extraction, named entity recognition, and sentiment analysis. RapidMiner

    is used for example in applications like automated news filtering for personalized news surveys.

    SAS data quality solution provides an enterprise solution for profiling, cleansing, augmenting and

    integrating data to create consistent, reliable information.

    Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be

    applied directly to a dataset or called from your own data code. Weka contains tools for data pre-

    processing, classifications, regression, clustering, association rules, and visualization.