MC0088 - Data Mining

7/27/2019 MC0088 - Data Mining

1/6

1. What is operational intelligence?

Ans. Operational Intelligence (OI) is a form of real-time dynamic, business analytics that delivers visibility

and insight into data, streaming events and business operations. Operational intelligence run queries against

streaming data feeds and event data to deliver real-time analytic results as operational instructions.

Operational Intelligence provides organizations the ability to make decision and immediately act on these

analytic insights, through manual or automated actions. The purpose of OI is to monitor business activities

and identify and detect situations relating to inefficiencies, opportunities, and threats and provide operational

solutions. Some definitions define operational intelligence an event-centric approach to delivering information

that empowers people to make better decisions. OI helps to quantify following:

- The efficiency of the business activities

- Impact of IT infrastructure and unexpected events on the business activities.

- Execution of the business activities contributing to revenue gains or losses.

This is achieved by observing the progress of the business activities and computing several metrics in

real-time using these progress events and publishing the metrics to one or more channels. Thresholds

can also be placed on these metrics to create notifications or new events.

In addition, these metrics act as the starting point for further analysis .

Sophisticated OI systems also provide the ability to associate metadata with metrics, process steps,

channels, etc. with this, it becomes easy to get related information that tool 60% more time than the

norm, or view the acceptance/rejection trend for the customer who was denied approval in this

transaction, Launch the application that this process step interacted with.

Features

Different operational intelligence solutions may use many different technologies and be implemented in different

ways. This section lists the common features of an operational intelligence solution:

- Real-time monitoring

- Real-time situation detection

- Real-time dashboards for different user roles

Correlation of events

- Industry-specific dashboards

- Multidimensional analysis

Root cause analysis

Time Series and trending analysis

ComparisonOI is often linked to or compared with business intelligence (BI) or real time business intelligence, in the sense that

both help make sense out of large amounts of information. But there are some basic differences: OI is primarily

activity-centric, whereas BI is primarily data-centric. (As with most technologies, each of these could be sub-

optimally coerced to perform the others task.) OI is, by definition real-time, unlike BI which is traditionally an after-

the-fact and report based approach to identifying patterns, and unlike real time BI which relies on a database as the

sole source off events.


2/6

2. What is Business Intelligence? Explain the components of BI architecture.

Ans. Business Intelligence is an environment in which business users receive data that is reliable, consistent,

understandable, easily manipulated and timely. With this data, business users are able to conduct analysis that

yield overall understanding of where the business has been, where it is now and where it will be in the near

future. Business intelligence serves two main purposes, it monitors the financial and operational health of the

organization (reports, alerts, alarms, analysis tools, key-performance indicators and dashboards). It also

regulates the operation of the organization providing two-way integration with operational systems and

information feedback analysis. There are various definitions given by the experts, some of the definitions are

given below:

- Converting data into knowledge and making it available throughout the organization are the jobs

of processes and applications known as Business Intelligence.

- BI is a term that encompasses a broad range of analytical software and solutions for gathering,

consolidating, analyzing and providing access to information in way that is supposed to let the

users of an enterprise make better business decisions.

Business Intelligence Infrastructure

- Business organizations can gain a competitive advantage with well-designed business intelligence(BI) infrastructure. Think of the BI infrastructure as a set of layers that begin with the operational

systems information and Meta data and end in delivery of business intelligence to various

business use communities.

- Based on the overall requirements of business intelligence, the data integration layer is required to

extract, cleanse and transform data into load files for the information warehouse.

- This layer begins with transaction-level operational data and Meta data about these operational

systems.

- Typically this data integration is done using a relational staging database and utilizing flat files

extracts from source systems.

- The product of a good data-staging layer is high-quality data, a reusable infrastructure meta data

supporting both business and technical users.

This information warehouse is usually developed incrementally over time and is architected to include key business

variables and business metrics in a structure that meets all business analysis questions required by the business groups.

Within the business intelligence life cycle, the operational systems are the starting point for data you will later want to

analyze. If you do not capture the data in the operational system, you cant analyze it. If operational system contains

error, those will only get compounded when you later aggregate and combine it with other data.


3/6


4/6

4. What is Neural Network? Explain in details.

Ans. An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way

biological nervous systems, such as the brain, process information. The key element of this paradigm is the

novel structure of the information processing system. It is composed of large number of highly interconnected

processing elements (neurons) working in unison to solve specific problems. Neural networks are made up of

many artificial neurons. An artificial neuron is an electronically modeled biological neuron. Number of

neurons used on the task at hand. It could be as few as three or as many as several thousand. There are many

different ways of connecting artificial neurons together to create a neural network. There are different types ofNeural Networks, each of which has different strengths particular to their applications. The abilities of

different networks can be related to their structure dynamics and learning methods.

Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be

used to extract patterns and detect trends that are too complex to be noticed by either humans or other

computer techniques. A trained neural network can be thought of as an expert in the category of information

it has been given to analyze. This expert can then be used to provide projections given new situations of

interest and answer what if questions.

Other advantages include:

Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial

experience.

Self-Organization: an ANN can create its own organization or representation of the information it receives

during learning time.

Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are

being designed and manufactured which take advantage of this capability.

Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the

corresponding degradation of performance. However, some network capabilities may be retained even withmajor network damage.

Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or

forecasting needs including:


5/6

Sales forecasting

Industrial process control

Customer research

Data validation

Risk management

Target marketing

ANN is also used in the following specific paradigm like Recognition of speakers in communication,Diagnosis of hepatitis, Recovery of telecommunication from faulty software, Interpretation of multi-meaning

Chinese words, Undersea mine detection, Texture analysis, Three dimensional object recognition, Hand

written work recognition and facial recognition.

5. What is partition algorithm? Explain with the help of suitable example

Ans. The partition algorithm is based on the observation that the frequent sets are normally very few in

number compared to the set of all item sets. As a result, if we partition the set of transactions to smaller such

that each segment can be accommodated in the main memory, then we can compute the set of frequent sets of

each of these partitions. It is assumed that these sets (set of local frequent sets) contain a reasonably small

number of items sets. Hence we can read the whole database (the un-segmented one) once, to count the

support of the set of all local frequent sets. The partition algorithm uses two scans of the database to discover

all frequent sets. In one scan, it generates a set of all potentially frequent item sets, i.e, it may contain falsepositives; but no false negatives are reported. During the second scan, counters for each of these item sets are

set up and their actual support is measured in one scan of the database. The algorithm executes in two phases.

In the first phase, the partition algorithm logically divides the database into a number of non-overlapping

partitions. The partitions are considered one at a time and all frequent item sets for that partition are generated.

Thus, if there are n partitions, Phase I of the algorithm take n iterations. At the end of the Phase I, these

frequent item sets are merged to generate a set of all potential frequent item sets. In this step, the local

frequent items of same lengths from all n partitions are combined to generate the global candidate item sets. In

Phase II, the actual supports for these item sets are generated and the frequent item sets are identified. The

algorithm reads the entire database once during Phase I and once during Phase II. The partition sizes are

chosen such that each partition can be accommodated in the main memory, so that the partitions are read only

once in each phase.

A partition P of the databases refers to any subset of the transactions contained in the database. Any two

partitions are non-overlapping. We defined local support for an item set as the fraction of the transaction

containing that particular item set in partition. We define a local frequent item set as an item set whose local

support in a partition is at least the user-defined minimum support. A local frequent item set may or may not

be frequent in the context of the entire database.


6/6

6. Describe the following with respect to Web Mining:

a. Categories of Web Mining (5)

Ans. Web is broadly defined as the discovery and analysis of useful information from the World Wide

Web. Web mining is divided into three categories:

1. Wed Content Mining

2. Web Structure Mining

3. Web Usage Mining

All of the three categories focus on the process of knowledge discovery of implicit, previously unknown and

potentially useful information from the web. Each of them focuses on different mining objects of the web.

Content mining is used to search, collate and examine data by search engine algorithms. Web content mining targets

the knowledge discovery, in which the main objects are the traditional collections of multimedia documents such as

images, video and audio, which are embedded in or linked to the web pages.

Web structure mining focuses on analysis of the link structure of the web and one of its purposes is to identify more

preferable documents. The different objects are linked in some way.

Web usage mining focuses on techniques that could predict the behavior of users while they are interacting with the

WWW. Web usage mining, discover user navigation patterns from web data, tries to discover the useful informationfrom the secondary data derived from the interactions of the users while surfing on the web.

b. Applications of Web Mining (5)

Ans. With the rapid growth of World Wide Web, web mining becomes a very hot and popular topic in

Web research. E-commerce and E-Services are claimed to be the killer applications for web mining, and

web mining now also plays an important role for E-Commerce website and E-services to understand how

their websites and services are used and to provide better services for their customers and users. A few

applications are:

- E-commerce customer behavior analysis

- Commerce transaction analysis

- E-commerce website design

- E-banking

- M-commerce

- Web advertisement

- Search engine

- Online auction

Open source software for web mining includes RapidMiner, which provides modules for text clustering,

text categorization, information extraction, named entity recognition, and sentiment analysis. RapidMiner

is used for example in applications like automated news filtering for personalized news surveys.

SAS data quality solution provides an enterprise solution for profiling, cleansing, augmenting and

integrating data to create consistent, reliable information.

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be

applied directly to a dataset or called from your own data code. Weka contains tools for data pre-

processing, classifications, regression, clustering, association rules, and visualization.

MC0088 - Data Mining

Documents

Transcript of MC0088 - Data Mining