Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better...

18
1 Business Intelligence Business intelligence (BI) refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself. BI applications provide historical, current, and predictive views of business operations. Common functions of business intelligence applications are reporting, OLAP, analytics, data mining, business performance management, benchmarks, text mining, and predictive analytics. OLAP Online analytical processing, or OLAP is an approach to quickly answer multi-dimensional analytical queries. OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining. The typical applications of OLAP are in business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing). Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are faster than relational databases. Concept In the core of any OLAP system is a concept of an OLAP cube (also called a multidimensional cube or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by adding a column to the fact table. This allows an analyst to view the measures along any combination of the dimensions. Business Intelligence (BI) - www.viplavkambli.com

Transcript of Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better...

Page 1: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

1

Business Intelligence Business intelligence (BI) refers to skills, technologies, applications and practices used to help a

business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself.

BI applications provide historical, current, and predictive views of business operations. Common

functions of business intelligence applications are reporting, OLAP, analytics, data mining, business

performance management, benchmarks, text mining, and predictive analytics.

OLAP Online analytical processing, or OLAP is an approach to quickly answer multi-dimensional analytical

queries. OLAP is part of the broader category of business intelligence, which also encompasses

relational reporting and data mining. The typical applications of OLAP are in business reporting for

sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas. The term OLAP was created as a slight modification

of the traditional database term OLTP (Online Transaction Processing).

Databases configured for OLAP use a multidimensional data model, allowing for complex analytical

and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and

hierarchical databases that are faster than relational databases.

Concept In the core of any OLAP system is a concept of an OLAP cube (also called a multidimensional cube or

a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The

cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.

Each measure can be thought of as having a set of labels, or meta-data associated with it. A

dimension is what describes these labels; it provides information about the measure.

A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a

dimension. Each Sale has a Date/Time label that describes more about that sale.

Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by

adding a column to the fact table. This allows an analyst to view the measures along any combination

of the dimensions.

Business Intelligence (BI) - www.viplavkambli.com

Page 2: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

2

For Example:

Sales Fact Table

+-----------------------+

| sale_amount | time_id |

+-----------------------+ Time Dimension

| 2008.08| 1234|---+ +----------------------------+

+-----------------------+ | | time_id | timestamp |

| +----------------------------+

+---->| 1234 | 20080902 12:35:43|

+----------------------------+

Multidimensional databases

Multidimensional structure is defined as ―a variation of the relational model that uses multidimensional

structures to organize data and express the relationships between data‖ (O'Brien & Marakas, 2009, pg

177). The structure is broken into cubes and the cubes are able to store and access data within the

confines of each cube. ―Each cell within a multidimensional structure contains aggregated data related

to elements along each of its dimensions‖ (pg. 178). Even when data is manipulated it is still easy to

access as well as be a compact type of database. The data still remains interrelated. Multidimensional

structure is quite popular for analytical databases that use online analytical processing (OLAP) applications (O’Brien & Marakas, 2009). Analytical databases use these databases because of their

ability to deliver answers quickly to complex business queries. Data can be seen from different ways,

which gives a broader picture of a problem unlike other models (Williams, Garza, Tucker & Marcus,

1994). Analytics:

The simplest definition of Analytics is "the science of analysis". A simple and practical definition,

however, would be how an entity (i.e., business) arrives at an optimal or realistic decision based on

existing data. Business managers may choose to make decisions based on past experiences or rules

of thumb, or there might be other qualitative aspects to decision making; but unless there are data

involved in the process, it would not be considered analytics. Common applications of Analytics include the study of business data using statistical analysis in order

to discover and understand historical patterns with an eye to predicting and improving business

performance in the future. Also, some people use the term to denote the use of mathematics in

business. Others hold that field of analytics includes the use of Operations Research, Statistics and

Probability. However, it would be erroneous to limit the field of analytics to only statistics and

mathematics. Analytics closely resembles statistical analysis and data mining, but tends to be based on modeling

Business Intelligence (BI) - www.viplavkambli.com

Page 3: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

3

involving extensive computation. Some fields within the area of analytics are enterprise decision

management, marketing analytics, predictive science, strategy science, credit risk analysis and fraud

analytics.

Portfolio analysis A common application of business analytics is portfolio analysis. In this, a bank or lending agency has

a collection of accounts of varying value and risk. The accounts may differ by the social status

(wealthy, middle-class, poor, etc.) of the holder, the geographical location, its net value, and many

other factors. The lender must balance the return on the loan with the risk of default for each loan.

The question is then how to evaluate the portfolio as a whole. For instance, the least risk loan may be to the very wealthy, but there are a very limited number of

wealthy people. On the other hand there are many poor that can be lent to, but at greater risk. Some

balance must be struck that maximizes return and minimizes risk. The analytics solution may combine time series analysis, with many other issues in order to make decisions on when to lend money to

these different borrower segments, or decisions on the interest rate charged to members of a portfolio

segment to cover any losses among members in that segment.

Business Analytics:

According to Thomas Davenport, analytics are defined as the extensive use of data, statistical and

quantitative analysis, explanatory and predictive modeling , and fact-based management to drive

decision making. Analytics may be used as input for human decisions or may drive fully automated

decisions. According to Davenport,businesses analytics represent a subset of business intelligence.

The other part of business intelligence is querying, reporting, OLAP, and "alerts". The questions that

business analytics can answer represent more proactive and higher value questions than questions

other business intelligence tools can answer. In other words, querying, reporting, OLAP, and "alert"

tools can answer the questions: what happened; how many, how often, where; where exactly is the

problem; what actions are needed. Business analytics can answer the questions: why is this

happening; what if these trends continue; what will happen next; what is the best than can happpen.

Data mining is the process of extracting hidden patterns from data. As more data is gathered, with

the amount of data doubling every three years, data mining is becoming an increasingly important

tool to transform this data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.

While data mining can be used to uncover patterns in data samples, it is important to be aware that

the use of non-representative samples of data may produce results that are not indicative of the

domain. Similarly, data mining will not find patterns that may be present in the domain, if those

patterns are not present in the sample being "mined". There is a tendency for insufficiently

knowledgeable "consumers" of the results to attribute "magical abilities" to data mining, treating the

technique as a sort of all-seeing crystal ball. Like any other tool, it only functions in conjunction with the appropriate raw material: in this case, indicative and representative data that the user must first

collect. Further, the discovery of a particular pattern in a particular set of data does not necessarily

mean that pattern is representative of the whole population from which that data was drawn. Hence,

an important part of the process is the verification and validation of patterns on other samples of

data.

Business Intelligence (BI) - www.viplavkambli.com

Page 4: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

4

The term data mining has also been used in a related but negative sense, to mean the deliberate

searching for apparent but not necessarily representative patterns in large amounts of data. To avoid

confusion with the other sense, the terms data dredging and data snooping are often used. Note,

however, that dredging and snooping can be (and sometimes are) used as exploratory tools when

developing and clarifying hypotheses. Data Mining

Humans have been "manually" extracting information from data for centuries, but the increasing

volume of data in modern times has called for more automatic approaches. As data sets and the

information extracted from them has grown in size and complexity, direct hands-on data analysis has

increasingly been supplemented and augmented with indirect, automatic data processing using more

complex and sophisticated tools, methods and models. The proliferation, ubiquity and increasing

power of computer technology has aided data collection, processing, management and storage.

However, the captured data needs to be converted into information and knowledge to become useful.

Data mining is the process of using computing power to apply methodologies, including new

techniques for knowledge discovery, to data. Data mining identifies trends within data that go beyond simple data analysis. Through the use of

sophisticated algorithms, non-statistician users have the opportunity to identify key attributes of

processes and target opportunities. However, abdicating control and understanding of processes from

statisticians to poorly informed or uninformed users can result in false-positives, no useful results,

and worst of all, results that are misleading and/or misinterpreted. Although data mining is a relatively new term, the technology is not. For many years, businesses and

governments have used increasingly powerful computers to sift through volumes of data such as airline passenger trip records, census data and supermarket scanner data to produce market research

reports. (Note, however, that reporting is not always considered to be data mining). Continuous

innovations in computer processing power, disk storage, data capture technology, algorithms,

methodologies and analysis software have dramatically increased the accuracy and usefulness of the

extracted information. The term data mining is often used to apply to the two separate processes of knowledge discovery

and prediction. Knowledge discovery provides explicit information about the characteristics of the

collected data, using a number of techniques (e.g., association rule mining). Forecasting and

predictive modeling provide predictions of future events, and the processes may range from the

transparent (e.g., rule-based approaches) through to the opaque (e.g., neural networks). Metadata, (data about the characteristics of a data set), are often expressed in a condensed data-

minable format, or one that facilitates the practice of data mining. Common examples include

executive summaries and scientific abstracts.

Business Intelligence (BI) - www.viplavkambli.com

Page 5: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

5

A primary reason for using data mining is to assist in the analysis of collections of observations of

behaviour. Such data are vulnerable to collinearity because of unknown interrelations. An unavoidable

fact of data mining is that the (sub-)set(s) of data being analysed may not be representative of the

whole domain, and therefore may not contain examples of certain critical relationships and behaviours

that exist across other parts of the domain. To address this sort of issue, the analysis may be

augmented using experiment-based and other approaches, such as Choice Modelling for human-

generated data. In these situations, inherent correlations can be either controlled for, or removed

altogether, during the construction of the experimental design. There have been some efforts to define standards for data mining, for example the 1999 European

Cross Industry Standard Process for Data Mining (CRISP-DM 1.0) and the 2004 Java Data Mining

standard (JDM 1.0). These are evolving standards; later versions of these standards are under

development. Independent of these standardization efforts, freely available open-source software

systems like RapidMiner, Weka, KNIME, and the R Project have become an informal standard for

defining data-mining processes. Most of these systems are able to import and export models in PMML

(Predictive Model Markup Language) which provides a standard way to represent data mining models

so that these can be shared between different statistical applications. PMML is an XML-based language

developed by the Data Mining Group (DMG), an independent group composed of many data mining

companies. PMML version 3.2 was approved in May 2007, and was most recently updated in

December 2008. Work on the next version of PMML, version 4.0, has begun.

Research and evolution In addition to industry driven demand for standards and interoperability, professional and academic

activity have also made considerable contributions to the evolution and rigour of the methods and

models; an article published in a 2008 issue of the International Journal of Information Technology

and Decision Making summarises the results of a literature survey which traces and analyses this

evolution.

The premier professional body in the field is the Association for Computing Machinery's Special

Interest Group on Knowledge Discovery and Data Mining (SIGKDD). Since 1989 they have hosted an

annual international conference and published its proceedings, and since 1999 have published a

biannual academic journal titled "SIGKDD Explorations". Other Computer Science conferences on data

mining include:

* DMIN - International Conference on Data Mining;

* DMKD - Research Issues on Data Mining and Knowledge Discovery;

* ECML-PKDD - European Conference on Machine Learning and Principles and Practice of

Knowledge Discovery in Databases;

* ICDM - IEEE International Conference on Data Mining;

* MLDM - Machine Learning and Data Mining in Pattern Recognition;

Business Intelligence (BI) - www.viplavkambli.com

Page 6: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

6

* SDM - SIAM International Conference on Data Mining Knowledge Discovery in Databases (KDD) is the name coined by Gregory Piatetsky-Shapiro in 1989 to

describe the process of finding interesting, interpreted, useful and novel data. There are many

nuances to this process, but roughly the steps are to preprocess raw data, mine the data, and

interpret the results.

Pre-processing Once the objective for the KDD process is known, a target data set must be assembled. As data

mining can only uncover patterns already present in the data, the target dataset must be large

enough to contain these patterns while remaining concise enough to be mined in an acceptable

timeframe. A common source for data is a datamart or data warehouse.

The target set is then cleaned. Cleaning removes the observations with noise and missing data.

The clean data is reduced into feature vectors, one vector per observation. A feature vector is a

summarized version of the raw data observation. For example, a black and white image of a face which is 100px by 100px would contain 10,000 bits of raw data. This might be turned into a feature

vector by locating the eyes and mouth in the image. Doing so would reduce the data for each vector

from 10,000 bits to three codes for the locations, dramatically reducing the size of the dataset to be

mined, and hence reducing the processing effort. The feature(s) selected will depend on what the

objective(s) is/are; obviously, selecting the "right" feature(s) is fundamental to successful data

mining. The feature vectors are divided into two sets, the "training set" and the "test set". The training set is

used to "train" the data mining algorithm(s), while the test set is used to verify the accuracy of any

patterns found.

Data mining Data mining commonly involves four classes of task:

* Classification - Arranges the data into predefined groups. For example an email program might

attempt to classify an email as legitimate or spam. Common algorithms include Nearest neighbor,

Naive Bayes classifier and Neural network.

* Clustering - Is like classification but the groups are not predefined, so the algorithm will try to

group similar items together.

Business Intelligence (BI) - www.viplavkambli.com

Page 7: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

7

* Regression - Attempts to find a function which models the data with the least error. A common

method is to use Genetic Programming.

* Association rule learning - Searches for relationships between variables. For example a

supermarket might gather data of what each customer buys. Using association rule learning, the

supermarket can work out what products are frequently bought together, which is useful for

marketing purposes. This is sometimes referred to as "market basket analysis".

Results validation The final step of knowledge discovery from data is to verify the patterns produced by the datamining

algorithms occur in the wider data set. Not all patterns found by the datamining algorithms are necessarily valid. It is common for the datamining algorithms to find patterns in the training set which

are not present in the general data set, this is called overfitting. To overcome this, the evaluation

uses a test set of data which the datamining algorithm was not trained on. The learnt patterns are

applied to this test set and the resulting output is compared to the desired output. For example, a

datamining algorithm trying to distinguish spam from legitimate emails would be trained on a training

set of sample emails. Once trained, the learnt patterns would be applied to the test set of emails which it had not been trained on, the accuracy of these patterns can then be measured from how

many emails they correctly classify. A number of statistical methods may be used to evaluate the

algorithm such as ROC curves.

If the learnt patterns do not meet the desired standards, then it is necessary to reevaluate and

change the preprocessing and datamining. If the learnt patterns do meet the desired standards then

the final step is to interpret the learnt patterns and turn them into knowledge.

Notable uses

Surveillance Previous data mining to stop terrorist programs under the U.S. government include the Total

Information Awareness (TIA) program, Computer-Assisted Passenger Prescreening System (CAPPS

II), Analysis, Dissemination, Visualization, Insight, Semantic Enhancement (ADVISE), Multistate Anti-

Terrorism Information Exchange (MATRIX), and the Secure Flight program. These programs have

been discontinued due to controversy over whether they violate the US Constitution's 4th

amendment, although many programs that were formed under them continue to be funded by

different organizations, or under different names. Two plausible data mining techniques in the context of combating terrorism include "pattern mining"

and "subject-based data mining".

Business Intelligence (BI) - www.viplavkambli.com

Page 8: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

8

Pattern mining "Pattern mining" is a data mining technique that involves finding existing patterns in data. In this

context patterns often means association rules. The original motivation for searching association rules

came from the desire to analyze supermarket transaction data, that is, to examine customer

behaviour in terms of the purchased products. For example, an association rule "beer => crisps

(80%)" states that four out of five customers that bought beer also bought crisps. In the context of pattern mining as a tool to identify terrorist activity, the National Research Council

provides the following definition: "Pattern-based data mining looks for patterns (including anomalous data patterns) that might be associated with terrorist activity — these patterns might be regarded as

small signals in a large ocean of noise." Pattern Mining includes new areas such a Music Information

Retrieval (MIR) where patterns seen both in the temporal and non temporal domains are imported to

classical knowledge discovery search techniques.

Subject-based data mining "Subject-based data mining" is a data mining technique involving the search for associations between

individuals in data. In the context of combatting terrorism, the National Research Council provides the following definition: "Subject-based data mining uses an initiating individual or other datum that is

considered, based on other information, to be of high interest, and the goal is to determine what

other persons or financial transactions or movements, etc., are related to that initiating datum."

Games Since the early 1960s, with the availability of oracles for certain combinatorial games, also called

tablebases (e.g. for 3x3-chess) with any beginning configuration, small-board dots-and-boxes, small-

board-hex, and certain endgames in chess, dots-and-boxes, and hex; a new area for data mining has

been opened up. This is the extraction of human-usable strategies from these oracles. Current pattern

recognition approaches do not seem to fully have the required high level of abstraction in order to be

applied successfully. Instead, extensive experimentation with the tablebases, combined with an

intensive study of tablebase-answers to well designed problems and with knowledge of prior art, i.e.

pre-tablebase knowledge, is used to yield insightful patterns. Berlekamp in dots-and-boxes etc. and

John Nunn in chess endgames are notable examples of researchers doing this work, though they were

not and are not involved in tablebase generation.

Business Data mining in customer relationship management applications can contribute significantly to the bottom line. Rather than randomly contacting a prospect or customer through a call center or sending

Business Intelligence (BI) - www.viplavkambli.com

Page 9: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

9

mail, a company can concentrate its efforts on prospects that are predicted to have a high likelihood

of responding to an offer. More sophisticated methods may be used to optimize resources across

campaigns so that one may predict which channel and which offer an individual is most likely to respond to — across all potential offers. Finally, in cases where many people will take an action

without an offer, uplift modeling can be used to determine which people will have the greatest

increase in responding if given an offer. Data clustering can also be used to automatically discover the

segments or groups within a customer data set. Businesses employing data mining may see a return on investment, but also they recognize that the

number of predictive models can quickly become very large. Rather than one model to predict which

customers will churn, a business could build a separate model for each region and customer type.

Then instead of sending an offer to all people that are likely to churn, it may only want to send offers

to customers that will likely take to offer. And finally, it may also want to determine which customers

are going to be profitable over a window of time and only send the offers to those that are likely to be

profitable. In order to maintain this quantity of models, they need to manage model versions and

move to automated data mining. Data mining can also be helpful to human-resources departments in identifying the characteristics of

their most successful employees. Information obtained, such as universities attended by highly

successful employees, can help HR focus recruiting efforts accordingly. Additionally, Strategic

Enterprise Management applications help a company translate corporate-level goals, such as profit

and margin share targets, into operational decisions, such as production plans and workforce levels. Another example of data mining, often called the market basket analysis, relates to its use in retail

sales. If a clothing store records the purchases of customers, a data-mining system could identify

those customers who favour silk shirts over cotton ones. Although some explanations of relationships

may be difficult, taking advantage of it is easier. The example deals with association rules within

transaction-based data. Not all data are transaction based and logical or inexact rules may also be

present within a database. In a manufacturing application, an inexact rule may state that 73% of

products which have a specific defect or problem will develop a secondary problem within the next six

months. Market basket analysis has also been used to identify the purchase patterns of the Alpha consumer.

Alpha Consumers are people that play a key roles in connecting with the concept behind a product,

then adopting that product, and finally validating it for the rest of society. Analyzing the data collected

on these type of users has allowed companies to predict future buying trends and forecast supply

demands. Data Mining is a highly effective tool in the catalog marketing industry. Catalogers have a rich history

of customer transactions on millions of customers dating back several years. Data mining tools can

identify patterns among customers and help identify the most likely customers to respond to

upcoming mailing campaigns.

Business Intelligence (BI) - www.viplavkambli.com

Page 10: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

10

Related to an integrated-circuit production line, an example of data mining is described in the paper

"Mining IC Test Data to Optimize VLSI Testing." In this paper the application of data mining and

decision analysis to the problem of die-level functional test is described. Experiments mentioned in

this paper demonstrate the ability of applying a system of mining historical die-test data to create a

probabilistic model of patterns of die failure which are then utilized to decide in real time which die to test next and when to stop testing. This system has been shown, based on experiments with historical

test data, to have the potential to improve profits on mature IC products.

Science and engineering In recent years, data mining has been widely used in area of science and engineering, such as

bioinformatics, genetics, medicine, education and electrical power engineering. In the area of study on human genetics, the important goal is to understand the mapping relationship

between the inter-individual variation in human DNA sequences and variability in disease

susceptibility. In lay terms, it is to find out how the changes in an individual's DNA sequence affect the risk of developing common diseases such as cancer. This is very important to help improve the diagnosis, prevention and treatment of the diseases. The data mining technique that is used to

perform this task is known as multifactor dimensionality reduction.

In the area of electrical power engineering, data mining techniques have been widely used for

condition monitoring of high voltage electrical equipment. The purpose of condition monitoring is to

obtain valuable information on the insulation's health status of the equipment. Data clustering such as

self-organizing map (SOM) has been applied on the vibration monitoring and analysis of transformer

on-load tap-changers(OLTCS). Using vibration monitoring, it can be observed that each tap change

operation generates a signal that contains information about the condition of the tap changer contacts

and the drive mechanisms. Obviously, different tap positions will generate different signals. However,

there was considerable variability amongst normal condition signals for the exact same tap position.

SOM has been applied to detect abnormal conditions and to estimate the nature of the abnormalities. Data mining techniques have also been applied for dissolved gas analysis (DGA) on power

transformers. DGA, as a diagnostics for power transformer, has been available for many years. Data

mining techniques such as SOM has been applied to analyse data and to determine trends which are

not obvious to the standard DGA ratio techniques such as Duval Triangle. A fourth area of application for data mining in science/engineering is within educational research,

where data mining has been used to study the factors leading students to choose to engage in

behaviors which reduce their learning and to understand the factors influencing university student

retention.. A similar example of the social application of data mining its is use in expertise finding

systems, whereby descriptors of human expertise are extracted, normalized and classified so as to

facilitate the finding of experts, particularly in scientific and technical fields. In this way, data mining

can facilitate Institutional memory.

Business Intelligence (BI) - www.viplavkambli.com

Page 11: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

11

Other examples of applying data mining technique applications are biomedical data facilitated by

domain ontologies, mining clinical trial data, traffic analysis using SOM, et cetera.

In adverse drug reaction surveillance, the Uppsala Monitoring Centre has, since 1998, used data

mining methods to routinely screen for reporting patterns indicative of emerging drug safety issues in

the WHO global database of 4.6 million suspected adverse drug reaction incidents. Recently, similar

methodology has been developed to mine large collections of electronic health records for temporal

patterns associating drug prescriptions to medical diagnoses.

Privacy concerns and ethics Some people believe that data mining itself is ethically neutral. However, the way that data mining is

used can raise ethical questions regarding privacy, legality, and ethics. In particular, data mining

government or commercial data sets for national security or law enforcement purposes, such as in the

Total Information Awareness Program or in ADVISE, has raised privacy concerns. Datamining can uncover information or patterns which may compromise confidentiality and privacy

obligations. A common way for this to occur is through data aggregation. Data aggregation is when

the data which has been mined, possibly from various sources, has been put together so that it can be analyzed. The threat to an individual's privacy comes into play when the data, once compiled,

causes the data miner to be able to identify specific individuals, especially when originally the data

was anonymous. It is recommended that an individual is made aware of the following before data is collected:

* the purpose of the data collection and any data mining projects,

* how the data will be used,

* who will be able to mine the data and use it,

* the security surrounding access to the data, and in addition,

* how collected data can be updated. One may additionally modify the data so that it is anonymous, so that individuals may not be readily

identified.

Business Performance Management: Business performance management (BPM) (or Corporate performance management, Enterprise

performance management, Operational performance management, Business performance

optimization) consists of a set of processes that help organizations optimize their business

Business Intelligence (BI) - www.viplavkambli.com

Page 12: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

12

performance. It provides a framework for organizing, automating and analyzing business

methodologies, metrics, processes and systems that drive business performance. Some commentators[who?] see BPM as the next generation of business intelligence (BI). BPM helps businesses make efficient use of their financial, human, material and other resources.

In the past[update], owners have sought to drive strategy down and across their organizations, they

have struggled to transform strategies into actionable metrics and they have grappled with

meaningful analysis to expose the cause-and-effect relationships that, if understood, could give

profitable insight to their operational decision-makers. Corporate performance management (CPM) software and methods allow a systematic, integrated

approach that links enterprise strategy to core processes and activities. "Running by the numbers"

now means something: planning, budgeting, analysis and reporting can give the measurements that

empower management decisions.

History

Reference to non-business performance management occurs in Sun Tzu's The Art of War. Sun Tzu

claims that to succeed in war, one should have full knowledge of one's own strengths and weaknesses

and full knowledge of one's enemy's strengths and weaknesses. Lack of either one might result in

defeat. A certain school of thought[which?] draws parallels between the challenges in business and

those of war, specifically:

* collecting data - both internal and external

* discerning patterns and meaning in the data (analyzing)

* responding to the resultant information Prior to the start of the Information Age in the late 20th century, businesses sometimes took the

trouble to laboriously collect data from non-automated sources. As they lacked computing resources

to properly analyze the data, they often made commercial decisions primarily on the basis of intuition.

As businesses started automating more and more systems, more and more data became available.

However, collection remained a challenge due to a lack of infrastructure for data exchange or due to

incompatibilities between systems. Reports on the data gathered sometimes took months to generate.

Such reports allowed informed long-term strategic decision-making. However, short-term tactical

decision-making often continued to rely on intuition. Increasing standards, automation, and technologies have led to vast amounts of data becoming

Business Intelligence (BI) - www.viplavkambli.com

Page 13: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

13

available. Data warehouse technologies have set up repositories to store this data. Improved ETL and

Enterprise Application Integration tools have increased the speedy collecting of data. OLAP reporting

technologies have allowed faster generation of new reports which analyze the data. Business

intelligence has now become the art of sieving through large amounts of data, extracting useful

information and turning that information into actionable knowledge.

In 1989 Howard Dresner, a research analyst at Gartner, popularized "Business Intelligence" as an

umbrella term to describe a set of concepts and methods to improve business decision-making by

using fact-based support systems. Performance Management builds on a foundation of BI, but marries

it to the planning and control cycle of the enterprise - with enterprise planning, consolidation and

modeling capabilities. Use of the term "BPM" can cause confusion with "Business Process Management", and many[who?]

have started[when?] using terms like "Corporate Performance Management" or "Enterprise

Performance Management".

Definition and scope

Business performance management consists of a set of management and analytic processes,

supported by technology, that enable businesses to define strategic goals and then measure and

manage performance against those goals. Core BPM processes include financial and operational

planning, consolidation and reporting, business modeling, analysis, and monitoring of key

performance indicators linked to strategy. BPM involves consolidation of data from various sources, querying, and analysis of the data, and

putting the results into practice.

BPM enhances processes by creating better feedback loops. Continuous and real-time reviews can

help to identify and eliminate problems before they grow. BPM's forecasting abilities help companies

take corrective action in time to meet earnings projections. Forecasting is characterized by a high

degree of predictability which is put into good use to answer what-if scenarios. BPM can help in risk analysis and in predicting outcomes of merger and acquisition scenarios and in

planning to overcome potential problems. BPM provides key performance indicators (KPIs) that help companies monitor efficiency of projects

and employees against operational targets.

Methodologies

Business Intelligence (BI) - www.viplavkambli.com

Page 14: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

14

Various methodologies for implementing BPM exist. The discipline gives companies a top-down

framework by which to align planning and execution, strategy and tactics, and business unit and

enterprise objectives. Reactions may include the Six Sigma strategy, balanced scorecard, activity-

based costing (ABC), Total Quality Management, economic value-add, and integrated strategic

measurement. The balanced scorecard is the most widely adopted performance management methodology.

Methodologies on their own cannot deliver a full solution to an enterprise's CPM needs. Many pure

methodology implementations fail to deliver the anticipated benefits due to lack of integration with

the fundamental CPM processes.

Metrics / Key Performance Indicators For business data analysis to become a useful tool, an enterprise must understand its goals and objectives – essentially, it must know the desired direction of progress{[fact}}. To help with this

analysis, someone[who?] prescribes key performance indicators (KPIs) to assess the present state of

the business and to prescribe a course of action. Metrics and KPIs are critical in prioritization what has to be measured. The methodology used helps in

determining the metrics to be used by the organization. Managerial folk-wisdom says that one cannot

manage what cannot be measured. Identifying the key metrics and determining how they are to be

measured helps the organizations to monitor performance across the board without getting deluged

by a surfeit of data; a scenario plaguing most companies. More and more[weasel words] organizations have started to make data avalaible more rapidly. In the

past[update], some data only became available after a month or two, which did not help managers

react swiftly enough. Recently[update], banks have tried to make data available at shorter intervals

and have reduced delays. For example, for businesses which have higher operational/credit risk

loading (for example, credit cards and "wealth management"), a large multi-national bank makes KPI-

related data available weekly, and sometimes offers a daily analysis of numbers. It also provides real-

time dashboards. Data can become available within 24 hours, given automation and the use of IT

systems. Most of the time, BPM simply means use of several financial/non-financial metrics/key performance

indicators to assess the present state of a business and to prescribe a course of action. Some of the areas from which top management analysis could gain knowledge by using BPM may

include:

Business Intelligence (BI) - www.viplavkambli.com

Page 15: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

15

1. customer-related numbers:

1. new customers acquired

2. status of existing customers

3. attrition of customers (including breakup by reason for attrition)

2. turnover generated by segments of the customers - possibly using demographic filters

3. outstanding balances held by segments of customers and terms of payment - possibly using

demographic filters

4. collection of bad debts within customer relationships

5. demographic analysis of individuals (potential customers) applying to become customers, and the

levels of approval, rejections and pending numbers

6. delinquency analysis of customers behind on payments

7. profitability of customers by demographic segments and segmentation of customers by

profitability

8. campaign management

9. realtime dashboard on key operational metrics

1. overall equipment effectiveness

10. clickstream analysis on a website

11. key product portfolio trackers

12. marketing channel analysis

13. sales data analysis by product segments

14. callcenter metrics The above list more or less describes what a bank might monitor, but could also refer to a telephone company or similar service-sector company.

Items of generic importance might include:

1. consistent and correct KPI-related data providing insights into operational aspects of a company

2. timely availability of KPI-related data

3. KPIs designed to directly reflect the efficiency and effectiveness of a business

4. information presented in a format which aids decision-making for management and decision-

makers

5. ability to discern patterns or trends from organized information

Business Intelligence (BI) - www.viplavkambli.com

Page 16: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

16

BPM integrates the company's processes with CRM or ERP. Companies should become better able to

gauge customer satisfaction, control customer trends and influence shareholder value.

Application software types People working in business intelligence have developed tools that ease the work, especially when the

intelligence task involves gathering and analyzing large amounts of unstructured data. Tool categories commonly used for business performance management include:

* OLAP — online analytical processing, sometimes simply called "analytics" (based on dimensional

analysis and the so-called "hypercube" or "cube")

* scorecarding, dashboarding and data visualization

* data warehouses

* document warehouses

* text mining

* DM — data mining

* BPO — business performance optimisation

* EIS — executive information systems

* DSS — decision support systems

* MIS — management information systems

* SEMS — strategic enterprise management software

* business dashboards

Design and implementation

Issues when implementing a BPM program might include:

* goal-alignment queries: one must first determine the short- and medium-term purpose of the

program. What strategic goal(s) of the organization will be addressed by the program? What

organizational mission/vision does it relate to? A hypothesis needs to be crafted that details how this

initiative will eventually improve results / performance (i.e. a strategy map).

Business Intelligence (BI) - www.viplavkambli.com

Page 17: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

17

* baseline queries: current information-gathering competency needs assessing. Do we have the

capability to monitor important sources of information? What data is being collected and how is it

being stored? What are the statistical parameters of this data, e.g., how much random variation does

it contain? Is this being measured?

* cost and risk queries: someone should estimate the financial consequences of a new BI initiative.

It is necessary to assess the cost of the present operations and the increase in costs associated with

the BPM initiative. What is the risk that the initiative will fail? This risk assessment should be

converted into a financial metric and included in the planning.

* customer and stakeholder queries: determine who will benefit from the initiative and who will

pay. Who has a stake in the current procedure? What kinds of customers / stakeholders will benefit

directly from this initiative? Who will benefit indirectly? What quantitative / qualitative benefits follow?

Is the specified initiative the best way to increase satisfaction for all kinds of customers, or is there a

better way? How will customer benefits be monitored? What about employees, shareholders, and

distribution channel members?

* metrics-related queries: information requirements need operationalization into clearly defined

metrics. One must decide what metrics to use for each piece of information being gathered. Are these

the best metrics? How do we know that? How many metrics need to be tracked? If this is a large

number (it usually is), what kind of system can track them? Are the metrics standardized, so they can

be benchmarked against performance in other organizations? What are the industry standard metrics available?

* measurement methodology-related queries: one should establish a methodology or a procedure

to determine the best (or acceptable) way of measuring the required metrics. What methods will be

used, and how frequently will data be collected? Are there any industry standards for this? Is this the

best way to do the measurements? How do we know that?

* results-related queries: someone should monitor the BPM program to ensure that it meets

objectives. Adjustments in the programme may be necessary. The program should be tested for accuracy, reliability, and validity. How can it be demonstrated that the BI initiative, and not something

else, contributed to a change in results? How much of the change was probably random?

Text Mining Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text

analytics, refers generally to the process of deriving high-quality information from text. High-quality

information is typically derived through the dividing of patterns and trends through means such as

statistical pattern learning. Text mining usually involves the process of structuring the input text

(usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally

evaluation and interpretation of the output. 'High quality' in text mining usually refers to some

combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies,

sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).

Technology Assessment

Technology assessment (TA) is the study and evaluation of new technologies. It is based on the

conviction that new developments within, and discoveries by, the scientific community are relevant for

the world at large rather than just for the scientific experts themselves, and that technological

progress can never be free of ethical implications. Also, technology assessment recognizes the fact that scientists normally are not trained ethicists themselves and accordingly ought to be very careful

Business Intelligence (BI) - www.viplavkambli.com

Page 18: Business Intelligence - onkarsule.files.wordpress.com€¦ · business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected

18

when passing ethical judgement on their own, or their colleagues, new findings, projects, or work in

progress.

Technology assessment assumes a global perspective and is future-oriented rather than backward-

looking or anti-technological. ("Scientific research and science-based technological innovation is an

indispensable prerequisite of modern life and civilization. There is no alternative. For six or eight

billion people there is no way back to a less sophisticated life style."). TA considers its task as

interdisciplinary approach to solving already existing problems and preventing potential damage

caused by the uncritical application and the commercialization of new technologies. Therefore any

results of technology assessment studies must be published, and particular consideration must be

given to communication with political decision-makers. Technology transfer is the process of sharing of skills, knowledge, technologies, methods of

manufacturing, samples of manufacturing and facilities among governments and other institutions to

ensure that scientific and technological developments are accessible to a wider range of users who

can then further develop and exploit the technology into new products, processes, applications,

materials or services. It is closely related to (and may arguably be considered a subset of) Knowledge

transfer. Related terms, used almost synonymously, include "technology valorisation" and "technology

commercialisation". While conceptually the practice has been utilized for many years (in ancient

times, Archimedes was notable for applying science to practical problems), the present-day volume of research, combined with high-profile failures at Xerox PARC and elsewhere, has led to a focus on the

process itself.

Many companies, universities and governmental organizations now have an "Office of Technology

Transfer" (also known as "Tech Transfer" or "TechXfer") dedicated to identifying research which has

potential commercial interest and strategies for how to exploit it. For instance, a research result may

be of scientific and commercial interest, but patents are normally only issued for practical processes,

and so someone -- not necessarily the researchers -- must come up with a specific practical process.

Another consideration is commercial value; for example, while there are many ways to accomplish

nuclear fusion, the ones of commercial value are those that generate more energy than they require

to operate.

The process to commercially exploit research varies widely. It can involve licensing agreements or

setting up joint ventures and partnerships to share both the risks and rewards of bringing new

technologies to market. Other corporate vehicles, e.g. spin-outs, are used where the host organization

does not have the necessary will, resources or skills to develop a new technology. Often these approaches are associated with raising of venture capital (VC) as a means of funding the development

process, a practice more common in the US than in the EU, which has a more conservative approach

to VC funding.

Technology transfer offices may work on behalf of research institutions, governments and even large

multinationals. Where start-ups and spin-outs are the clients, commercial fees are sometimes waived

in lieu of an equity stake in the business. As a result of the potential complexity of the technology

transfer process, technology transfer organizations are often multidisciplinary, including economists,

engineers, lawyers, marketers and scientists. The dynamics of the technology transfer process has

attracted attention in its own right, and there are several dedicated societies and journals.

In recent years, there has been a marked increase in technology transfer intermediaries specialized in

their field. This was stimulated in large part by the Bayh-Dole Act and equivalent legislation in other

countries, which provided additional incentives for research exploitation.

Business Intelligence (BI) - www.viplavkambli.com