BA

28
Satinderpal Kaur MBA3(D) What is Business Analytics? Business Analytics (BA) is not a new phenomenon. It has been around for many years, but predominantly with companies operating in the technically oriented environment. Only recently it’s making its breakthrough and we can see more and more companies, especially in the financial and the telecom sector, deal with business analytics in order to support business processes and improve performance. So what does business analytics refers to? Business Analytics is translating data into information that is necessary for business owners to make informed decisions and investments. It is the difference between running business on a hunch or intuition versus looking at collected data and predictive analysis. It is a way of organizing and converting data into information to help answer questions about the business. It leads to better decision making by looking for patterns and trends in the data and by being able to forecast impact of decisions before they are taken. BA can serve throughout the whole company and all C-level executives can take an advantage of it. For example Chief Marketing Officers (CMOs) can use BA to get better customer insight and enhance customer loyalty. Chief Financial Officers (CFOs) can better manage financial performance and use financial forecasts. Chief Risk Officers (CROs) can get a holistic view of risk, fraud and Compliance information across the organization and take an action. Chief Operating Officers (COOs) can get better insight into supply chains and operations and enhance efficiency. Companies use business analytics to data-driven decision making. For being successful they need to treat their data as a corporate asset and leverage it for competitive advantage. Successful business analytics depends on data quality, skilled analysts who understand the technologies and the business and organizational commitment to data-driven decision making. Chandigarh Business School Page 1

description

good

Transcript of BA

Page 1: BA

Satinderpal Kaur MBA3(D)

What is Business Analytics?

Business Analytics (BA) is not a new phenomenon. It has been around for many years, but predominantly with companies operating in the technically oriented environment. Only recently it’s making its breakthrough and we can see more and more companies, especially in the financial and the telecom sector, deal with business analytics in order to support business processes and improve performance. So what does business analytics refers to? 

Business Analytics  is translating data into information that is necessary for business owners to make informed decisions and investments. It is the difference between running business on a hunch or intuition versus looking at collected data and predictive analysis. It is a way of organizing and converting data into information to help answer questions about the business. It leads to better decision making by looking for patterns and trends in the data and by being able to forecast impact of decisions before they are taken.

BA can serve throughout the whole company and all C-level executives can take an advantage of it. For example Chief Marketing Officers (CMOs) can use BA to get better customer insight and enhance customer loyalty. Chief Financial Officers (CFOs) can better manage financial performance and use financial forecasts. Chief Risk Officers (CROs) can get a holistic view of risk, fraud and Compliance information across the organization and take an action. Chief Operating Officers (COOs) can get better insight into supply chains and operations and enhance efficiency. 

Companies use business analytics to data-driven decision making. For being successful they need to treat their data as a corporate asset and leverage it for competitive advantage. Successful business analytics depends on data quality, skilled analysts who understand the technologies and the business and organizational commitment to data-driven decision making.

Examples of BA uses include: 

Exploring data to find new relationships and patterns (data mining) Explaining why a certain result occurred (statistical analysis, quantitative analysis) Experimenting to test previous decisions (A/B testing, multivariate testing Forecasting future results (predictive modeling, predictive analytics)

Why Is Business Analytics important?

Becoming an analytics-driven organization helps companies to extract insights from their enterprise data and help them to achieve costs reduction, revenue increase and competitiveness improvement. This is why business analytics is one of the top priorities for CIOs. An IBM study shows that CFOs in organizations that make extensive use of analytics report growth in revenues of 36 percent or more, a 15 percent greater return on invested capital and twice the rate of growth in EBITDA (earnings before interest, taxes, depreciation and amortization).

Chandigarh Business School Page 1

Page 2: BA

Satinderpal Kaur MBA3(D)

Business Analytics helps you make better, faster decisions and automate processes. It helps you address the questions and ensure you to stay one step ahead your competition. Some of the basic questions in retail environment could be:

How big should a store be? What market segments should be targeted? How should a certain market segment be targeted in terms of products, styles, price

points, store environment, location? Who are our customers? How should space in the store be allocated to the various product groups and price

points for maximum profitability? What mitigation strategies are effective and cost efficient – for example changes to

packaging, fixtures, placement of product? What is the best customer loyalty program for our customers? What is the optimal staffing level on the sales floor? How many checkouts are optimal in a store? Would acquisition of a particular store brand improve profitability? Would creation of a new store brand improve profitability?

DATA MODELLING

Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system.

According to Hoberman, data modeling is the process of learning about the data, and the data model is the end result of the data modeling process (data(Data Modeling a technique in which data is converted into easy way for decision making, easily understandable, , a blueprint, data modeling is the process of learning about the data, and the Data Model data model is the end result of the data modeling process)

For Example: A Company want to build guest house base)<end result>.They call building architect (Data Modeler)who tells how to do it. Then he will tell what is required building requirements(Business Requirement).then he build a plan how to do it blueprint(developed data model).

In other words

Data modeling is the formalization and documentation of existing processes and events that occur during application software design and development. Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data flows and processes, creating a blueprint for construction and/or re-engineering.

Chandigarh Business School Page 2

Page 3: BA

Satinderpal Kaur MBA3(D)

A data model can be thought of as a diagram or flowchart that illustrates the relationships

between data 

There are several different approaches to data modeling, including:

1) Conceptual Data Model2) Logical Data Model3) Physical Data Model

1) Conceptual data model

A conceptual data model identifies the highest-level relationships between the different entities. Features of conceptual data model include:

Includes the important entities and the relationships among them. No attribute is specified. No primary key is specified.

The figure below is an example of a conceptual data mode

Chandigarh Business School Page 3

Page 4: BA

Satinderpal Kaur MBA3(D)

2) Logical Data Model

A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include:

Includes all entities and relationships among them. All attributes for each entity are specified. The primary key for each entity is specified. Foreign keys (keys identifying the relationship between different entities) are

specified. Normalization occurs at this level.

3) Physical Data Model

Physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. Features of a physical data model include:

Specification all tables and columns. Foreign keys are used to identify relationships between tables. Renormalizations may occur based on user requirements. Physical considerations may cause the physical data model to be quite different from

the logical data model.

Chandigarh Business School Page 4

Page 5: BA

Satinderpal Kaur MBA3(D)

DATA MODELING TECHNIQUES

Data Modeling identifies the data or information a system needs to be able to store, maintain, or provide access to. Business analysts often use a combination of diagrams, textual descriptions, and matrices to model data. Each modeling technique helps us analyze and communicate different information about the data-related requirements. In this article, we’ll look at 4 different data modeling techniques and discuss when and how to use each technique.

1) Entity Relationship Diagram

An Entity Relationship Diagram (ERD) model entities, relationships, and attributes. Here’s a simple ERD (a more complex ERD is included in the Visual Model Sample Pack).

Chandigarh Business School Page 5

Page 6: BA

Satinderpal Kaur MBA3(D)

In This Example

“Customer” and “Order” are entities. The items listed inside each entity, such as “Customer Name” are attributes of the entity. The line connecting “Customer” and “Order” is showing the relationship between the two entities, specifically that a Customer can have 0 to many (or any number of) orders.

ERDs can be used to model data at multiple levels of specificity, from the low-level physical database model to mid-level logical database model, to the high-level business domain model.

An ERD is a good choice if you have multiple concepts or database table and are analyzing the boundaries of each concept or table. By defining the attributes, you figure out what belongs with each entity. By defining the relationships, you figure out how each entity relates to the other entities in your model.

2) Data Matrix

A Data Matrix provides more detailed information about the data model and can take a variety of different forms. Typically a Data Matrix is captured in a spreadsheet format and contains a list of attributes, along with additional information about each attribute. Some common types of additional information that might be captured in a column in a data matrix include the following:

Data Type

Allowable Values

Required or Optional

Sample Data

Notes

A Data Matrix is a good choice when it’s necessary to analyze detailed information about each attribute in your data model. This information is often used to design and build the physical database and so is needed by the data architect or database developer. A sample data matrix is included with the Data Model sample in theVisual Model Sample Pack.(there is a marketer, and his sales are growing and he want to know why, he is supplying every state of india, only one information should not represent it no of factors will as sales is increasing because of good quality lower price )

3) Data Mapping Specification

A Data Mapping Specification shows how information stored in two different databases connect to each other. The databases are often part of two different information technology systems which may be owned by your organization, your organization and a third party vendor, or two cooperating organizations.

Chandigarh Business School Page 6

Page 7: BA

Satinderpal Kaur MBA3(D)

For Example, when I worked for an online job board company, we created a data mapping specification to define how we’d import job content from some of our bigger clients who did not wish to manually input the details of each job using our employer portal.

Any time you are connecting two systems together through a data exchange or import, a data mapping specification will be a good choice. A sample data mapping specification and template are included in the Business Analyst Template Toolkit.

4) Data Flow Diagram

A Data Flow Diagram illustrates how information flows through, into, and out of a system. Data Flow Diagrams can be created using a simple workflow diagram or one of two formal notations listed in the BABOK® Guide – the Yourdon Notation or the Gane-Sarson Notation.

A Data Flow Diagram does not tell you much about what data is created or maintained by a system, but it does tell you a lot about how the data flows through the system or a set of inter-connected systems. A Data Flow Diagram shows the data stores, data processes, and data outputs.

A Data Flow Diagram is a good choice if your data goes through a lot of processing, as it helps clarify when and how those processes are executed. Then, each data store could be modeled using an ERD and/or Data Matrix and each process using a Data Mapping Specification. Samples of data flow diagrams in all three notations are included in the Visual Model Sample Pack.

MUTLIDIMENSIONAL MODELING

Dimensional Data Model

Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model. To understand dimensional data modeling, let's define some of the terms commonly used in this type of modeling:

Dimension: A category of information. For example, the time dimension.

Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension.

Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year → Quarter → Month → Day.

Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column.

Chandigarh Business School Page 7

Page 8: BA

Satinderpal Kaur MBA3(D)

SCHEMA

1) Star Schema

In the star schema design, a single object (the fact table) sits in the middle and is radically connected to other surrounding objects (dimension lookup tables) like a star. Each dimension is represented as a single table. The primary key in each dimension table is related to a foreign key in the fact table.

Sample star schema

All measures in the fact table are related to all the dimensions that fact table is related to. In other words, they all have the same level of granularity.

A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table.

Let's look at an example: Assume our data warehouse keeps store sales data, and the different dimensions are time, store, product, and customer. In this case, the figure on the left represents our star schema. The lines between two tables indicate that there is a primary key / foreign key relationship between the two tables. Note that different dimensions are not related to one another.

Chandigarh Business School Page 8

Page 9: BA

Satinderpal Kaur MBA3(D)

2) Snowflake Schema

The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy.

 Sample snowflake schema

For example, the Time Dimension that consists of 2 different hierarchies:

1. Year → Month → Day 2. Week → Day

We will have 4 lookup tables in a snowflake schema: A lookup table for year, a lookup table for month, a lookup table for week, and a lookup table for day. Year is connected to Month, which is then connected to Day. Week is only connected to Day. A sample snowflake schema illustrating the above relationships in the Time Dimension is shown to the right.

The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables.

Chandigarh Business School Page 9

Page 10: BA

Satinderpal Kaur MBA3(D)

3) Fact constellation

DATA MART

A data mart is a segment of a data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller than the corporate data warehouse.It is an indexing and extraction system. Instead of putting the data from all the departments of a company into a warehouse, data mart contains database of separate departments and can come up with information using multiple databases when asked.

IT managers of any growing company are always confused as to whether they should make use of data marts or instead switch over to the more complex and more expensive data warehousing. These tools are easily available in the market, but pose a dilemma to IT managers.

Difference between Data Warehousing and Data Mart

It is important to note that there are huge differences between these two tools though they may serve same purpose. Firstly, data mart contains programs, data, software and hardware of a specific department of a company. There can be separate data marts for finance, sales, production or marketing. All these data marts are different but they can be coordinated. Data mart of one department is different from data mart of another department, and though indexed, this system is not suitable for a huge data base as it is designed to meet the requirements of a particular department.

Chandigarh Business School Page 10

Page 11: BA

Satinderpal Kaur MBA3(D)

Data Warehousing is not limited to a particular department and it represents the database of a complete organization. The data stored in data warehouse is more detailed though indexing is light as it has to store huge amounts of information. It is also difficult to manage and takes a long time to process. It implies then that data marts are quick and easy to use, as they make use of small amounts of data. Data warehousing is also more expensive because of the same reason.

DATA WAREHOUSING

A data warehouse is a collection of data marts representing historical data from different operations in the company. This data is stored in a structure optimized for querying and data analysis as a data warehouse. Table design, dimensions and organization should be consistent throughout a data warehouse so that reports or queries across the data warehouse are consistent. A data warehouse can also be viewed as a database for historical data from different functions within a company. This is the place where all the data of a company is stored. It is actually a very fast computer system having a large storage capacity. It contains data from all the departments of the company where it is constantly updated to delete redundant data. This tool can answer all complex queries pertaining data.

DATA INTEGRATION

Data integration involves combining data from several disparate sources, which are stored using various technologies and provide a unified view of the data. Data integration becomes increasingly important in cases of merging systems of two companies or consolidating applications within one company to provide a unified view of the company's data assets. The later initiative is often called a data warehouse.

Probably the most well known implementation of data integration is building an enterprise's data warehouse. The benefit of a data warehouse enables a business to perform analyses based on the data in the data warehouse. This would not be possible to do on the data available only in the source system. The reason is that the source systems may not contain corresponding data, even though the data are identically named, they may refer to different entities.

EXTRACT ,TRANSFORM AND LOAD

The term ETL which stands for extract, transform, and load is a three-stage process in database usage and data warehousing. It enables integration and analysis of the data stored in different databases and heterogeneous formats. After it is collected from multiple sources (extraction), the data is reformatted and cleansed for operational needs (transformation). Finally, it is loaded into a target database,data warehouse or a data mart to be analyzed. Most of numerous extraction and transformation tools also enable loading of the data into the end target. Except for data warehousing and business intelligence, ETL Tools can also be used to move data from one operational system to another.

Chandigarh Business School Page 11

Page 12: BA

Satinderpal Kaur MBA3(D)

EXTRACT

The purpose of the extraction process is to reach to the source systems and collect the data needed for the data warehouse.Usually data is consolidated from different source systems that may use a different data organization or format so the extraction must convert the data into a format suitable for transformation processing. The complexity of the extraction process may vary and it depends on the type of source data. The extraction process also includes selection of the data as the source usually contains redundant data or data of little interest. For the ETL extraction to be successful, it requires an understanding of the data layout. A good ETL tool additionally enables a storage of an intermediate version of data being extracted. This is called "staging area" and makes reloading raw data possible in case of further loading problem, without re-extraction. The raw data should also be backed up and archived.

TRANSFORM

The transform stage of an ETL process involves an application of a series of rules or functions to the extracted data. It includes validation of records and their rejection if they are not acceptable as well as integration part. The amount of manipulation needed for transformation process depends on the data. Good data sources will require little transformation, whereas others may require one or more transformation techniques to meet the business and technical requirements of the target database or the data warehouse. The most common processes used for transformation are conversion, clearing the duplicates, standardizing, filtering, sorting, translating and looking up or verifying if the data sources are inconsistent. A good ETL tool must enable building up of complex processes and extending a tool library so custom user's functions can be added.

LOAD

The loading is the last stage of ETL process and it loads extracted and transformed data into a target repository. There are various ways in which ETL load the data. Some of them physically insert each record as a new row into the table of the target warehouse involving SQL insert statement build-in, whereas others link the extraction, transformation, and loading processes for each record from the source. The loading part is usually a bottleneck of the whole process. To increase efficiency with larger volumes of data we may need to skip SQL and data recovery or apply external high-performance sort that additionally improves performance.

Chandigarh Business School Page 12

Page 13: BA

Satinderpal Kaur MBA3(D)

An ideal ETL architecture contains a data warehouse

Below you’ll find the ideal ETL architecture supporting the three major steps in ETL.

DATA WAREHOUSE

DEFINITION

Different people have different definitions for a data warehouse. The most popular definition came from Bill Inmon, who provided the following:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.

Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For

Chandigarh Business School Page 13

Page 14: BA

Satinderpal Kaur MBA3(D)

example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.

Ralph Kimball provided a more concise definition of a data warehouse:

A data warehouse is a copy of transaction data specifically structured for query and analysis.

This is a functional view of a data warehouse. Kimball did not address how the data warehouse is built like Inmon did; rather he focused on the functionality of a data warehouse.

Functionality of Data Warehouses

        Data warehouses exist to facilitate complex, data-intensive and frequent adhoc queries. Data warehouses must provide far greater and more efficient query support than is demanded of transactional databases. The data warehouse access component supports enhanced spreadsheet functionality, efficient query processing, structured queries, adhoc queries, data mining and materialized views. Particularly enhanced spreadsheet functionality includes support for state-of-the art spreadsheet applications as well as for OLAP applications programs. These provide preprogrammed functionality such as the following:

Roll-up: Data is summarized with increasing generalization

Drill-down: Increasing levels of detail are revealed

Pivot: Cross tabulation that is, rotation, performed

Slice and dice: Performing projection operations on the dimensions

Sorting: Data is sorted by ordinal value

Selection: Data is available by value or range

Derived or computer attributes: Attributes are computed by operations on stored and derived values.

 OLTP (ON-LINE TRANSACTION PROCESSING)

OLTP (ON-LINE TRANSACTION PROCESSING) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP

Chandigarh Business School Page 14

Page 15: BA

Satinderpal Kaur MBA3(D)

database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). 

OLTP System deals with operational data. Operational data are those data  involved in the operation of a particular system.

Example: In a banking System, you withdraw amount from your account. Then Account Number, Withdrawal amount, Available Amount, Balance Amount, Transaction Number etc are operational data elements

 OLAP (ON-LINE ANALYTICAL PROCESSING) 

OLAP (ON-LINE ANALYTICAL PROCESSING)  is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema). 

OLAP deals with Historical Data or Archival Data. Historical data are those data that are archived over a long period of time. Data from  OLTP are collected over a period of time and store it in a very large database called Data warehouse. The Data warehouses are highly optimized for read (SELECT) operation.

Example: If we collect last 10 years data about flight reservation, The data can give us many meaningful information such as the trends in reservation. This may give useful information like peak time of travel, what kinds of people are traveling in various classes (Economy/Business)etc.

In other words, the ability to analyze metrics in different dimensions such as time, geography, gender, product, etc. For example, sales for the company is up. What region is most responsible for this increase? Which store in this region is most responsible for the increase? What particular product category or categories contributed the most to the increase? Answering these types of questions in order means that you are performing an OLAP analysis. Depending on the underlying technology used, OLAP can be broadly divided into two different camps: MOLAP and ROLAP. A discussion of the different OLAP types can be found in the MOLAP, ROLAP, and HOLAP section. 

MOLAP, ROLAP, And HOLAP:

          In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP.

Chandigarh Business School Page 15

Page 16: BA

Satinderpal Kaur MBA3(D)

MOLAP:

         This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats.

Advantages:

Excellent performance:

MOLAP cubes are built for fast data retrieval, and is optimal for slicing and dicing operations.

Can perform complex calculations:

All calculations have been pre-generated when the cube is created. Hence, complex calculations are not only doable, but they return quickly. 

Disadvantages:

Limited in the amount of data it can handle:

Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summary-level information will be included in the cube itself.

Requires additional investment:

Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed. 

ROLAP:

          This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.

Advantages:

Can handle large amounts of data:

The data size limitation of ROLAP technology is the limitation on data size of the underlying relational database. In other words, ROLAP itself places no limitation on data amount.

Chandigarh Business School Page 16

Page 17: BA

Satinderpal Kaur MBA3(D)

Can leverage functionalities inherent in the relational database: Often, relational database already comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these functionalities. 

Disadvantages:

Performance can be slow:

Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large.

Limited by SQL functionalities:

Because ROLAP technology mainly relies on generating SQL statements to query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions.

HOLAP:

          HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data. 

Chandigarh Business School Page 17

Page 18: BA

Satinderpal Kaur MBA3(D)

DATA MINING

DEFINITION OF 'DATA MINING'

A process used by companies to turn raw data into useful information. By using software to

look for patterns in large batches of data, businesses can learn more about their customers and

develop more effective marketing strategies as well as increase sales and decrease costs. Data

mining depends on effective data collection and warehousing as well as computer processing.

Data mining is a logical process that is used to search through large amount of data in order to find useful data. The goal of this technique is to find patterns that were previously unknown. Once these patterns are found they can further be used to make certain decisions for development of their businesses.

Three steps involved are

1) Exploration2) Pattern identification3) Deployment

1) Exploration: In the first step of data exploration data is cleaned and transformed into another form, and important variables and then nature of data based on the problem are determined.

2) Pattern Identification: Once data is explored, refined and defined for the specific variables the second step is to form pattern identification. Identify and choose the patterns which make the best prediction.

3) Deployment: Patterns are deployed for desired outcome.

Chandigarh Business School Page 18

Page 19: BA

Satinderpal Kaur MBA3(D)

ADVANTAGES AND DISADVANTAGES OF DATA MINING

ADVANTAGES OF DATA MINING :

1) Marking/Retailing:

Data mining can aid direct marketers by providing them with useful and accurate trends about their customers’ purchasing behavior.  

Based on these trends, marketers can direct their marketing attentions to their customers with more precision.

For example, marketers of a software company may advertise about their new software to consumers who have a lot of software purchasing history.

In addition, data mining may also help marketers in predicting which products their customers may be interested in buying.

Through this prediction, marketers can surprise their customers and make the customer’s shopping experience becomes a pleasant one.

Retail stores can also benefit from data mining in similar ways. 

For example, through the trends provide by data mining, the store managers can arrange shelves, stock certain items, or provide a certain discount that will attract their customers.

2) Banking/Crediting:

Data mining can assist financial institutions in areas such as credit reporting and loan information.

For example, by examining previous customers with similar attributes, a bank can estimated the level of risk associated with each given loan.

In addition, data mining can also assist credit card issuers in detecting potentially fraudulent credit card transaction.

Although the data mining technique is not a 100% accurate in its prediction about fraudulent charges, it does help the credit card issuers reduce their losses.

3) Law enforcement:

Data mining can aid law enforcers in identifying criminal suspects as well as apprehending these criminals by examining trends in location, crime type, habit, and other patterns of behaviors.

Chandigarh Business School Page 19

Page 20: BA

Satinderpal Kaur MBA3(D)

4) Researchers:

Data mining can assist researchers by speeding up their data analyzing process; thus, allowing them more time to work on other projects.  

DISADVANTAGES OF DATA MINING

1) Privacy Issues:

Personal privacy has always been a major concern in this country.  In recent years, with the widespread use of Internet, the concerns about privacy have increase tremendously.  Because of the privacy issues, some people do not shop on Internet.  They are afraid that somebody may have access to their personal information and then use that information in an unethical way; thus causing them harm.

Although it is against the law to sell or trade personal information between different organizations, selling personal information have occurred.  For example, according to Washing Post, in 1998, CVS had sold their patient’s prescription purchases to a different company.

In addition, American Express also sold their customers’ credit care purchases to another company.8  What CVS and American Express did clearly violate privacy law because they were selling personal information without the consent of their customers. 

The selling of personal information may also bring harm to these customers because you do not know what the other companies are planning to do with the personal information that they have purchased. 

2) Security issues:

Although companies have a lot of personal information about us available online, they do not have sufficient security systems in place to protect that information. 

For example, recently the Ford Motor credit company had to inform 13,000 of the consumers that their personal information including Social Security number, address, account number and payment history were accessed by hackers who broke into a database belonging to the Experian credit reporting agency.

This incidence illustrated that companies are willing to disclose and share your personal information, but they are not taking care of the information properly.  With so much personal information available, identity theft could become a real problem.

Chandigarh Business School Page 20

Page 21: BA

Satinderpal Kaur MBA3(D)

3) Misuse of information/inaccurate information:

Trends obtain through data mining intended to be used for marketing purpose or for some other ethical purposes, may be misused. 

Unethical businesses or people may used the information obtained through data mining to take advantage of vulnerable people or discriminated against a certain group of people. 

In addition, data mining technique is not a 100 percent accurate; thus mistakes do happen which can have serious consequence.

Chandigarh Business School Page 21