Transcript of By Matthew Rothmeyer Business Analytics & Data Mining.
- Slide 1
- By Matthew Rothmeyer Business Analytics & Data Mining
- Slide 2
- Overview of BA Discussion Business Analytics (BA) Overview
History Types of Business Analytics Real world examples Challenges
Relations to Data Mining
- Slide 3
- Business Analytics (BA) : an overview BA can be considered a
subset of Business intelligence A set of skills, technologies,
applications and practices exploration and investigation of past
business performance to gain insight and drive business planning.
Like Business Intelligence, BA can focus either on the business as
a whole or only on segments of it Focuses on developing new
insights and understanding of performance based on data and
statistical methods
- Slide 4
- BA : Short History Analytics in business dates far before
computing Frederick Taylor, father of scientific management, 19th
century time management exercises used in industrial settings Henry
Ford : assembly line pacing used to improve output and business
profitability BA becomes widespread when computers were used in DSS
systems in the 60s Evolved into ERP, data warehouses, etc.
- Slide 5
- Types of Business Analytics Reporting or Descriptive Analytics
Affinity grouping Clustering Modeling or Predictive analytics
- Slide 6
- BA: Reporting Based on the need to locate and distribute
business insights and experiences Often involves ETL procedures
used alongside a data warehousing scheme The data is then
collected, quantified, and organized using reporting tools
Reporting, allows for information describing different views of an
enterprise to come together one place A user could query a
production and marketing database to determine if production of a
product could be moved closer to where a product is sold
- Slide 7
- BA: Affinity grouping A tool used by businesses and
organizations to take ideas and data and organize them. Often takes
the form of an affinity diagram Enables data and ideas stemming
from brainstorming to be sorted into groups Sorting is based on
their natural relationships
- Slide 8
- BA: Clustering Placing a set of objects into groups (called
clusters) so that the objects in the same cluster are more similar
(in some sense or another) to each other than to those in other
clusters wikipedia Is a main task of explorative data mining and
statistical data analysis Clustering is a general task that does
not have one set solution Clustering can be hard or fuzzy Can be
done by people or machines The latter is preferred
- Slide 9
- BA: how do we model clusters? Connectivity models how data can
be connected to other points Density models defining a cluster by
determining where sets of data points are densest Distribution
model clusters are modeled using statistical distributions
Expectation maximization
- Slide 10
- BA: Predictive Analysis Stems from the desire to predict future
events through analyzing data an enterprise has collected Pattern
exploitation results in the identification of opportunities and
also risks Allow relationships in disparate data to be identified
Helps guide in decision making in a business Is often implemented
in the form of data mining
- Slide 11
- BA : Examples Credit company uses business analytics to track
credit risk of customers as well as matching customers to offerings
Sales and offers companies can track customer interaction, and use
that information to determine appropriate product offerings. Sales
groups can use BA to optimize inventory and analyze past sales
Could measure peak purchasing times for products Could decide
whether or not to stock poorly selling items Give examples of
business cases where data mining might be useful, and describe how
data mining would be used Preventing credit card fraud through
detecting spending patterns Inventory management by tracking
sales
- Slide 12
- BA : Challenges Acquiring sufficient volumes of high quality
data Most data acquired in the field is unsorted and appears in
many different formats When dealing with high volume data, deciding
what is important and what is noise Rapidly reacting storage
structures BA can influence customer interactions, and as such that
information must be available fast Ex: a customized sales
pitch
- Slide 13
- Business Analytics & Data Mining Data Mining is an
important sub task of Business Analytics Both Predictive analysis
and clustering tasks utilize information retrieved from data mining
Data mining helps handle some of the specific problems faced when
conducting Business Analytics Dealing with and sorting through
large data sets
- Slide 14
- Data Mining : An Overview What is Data Mining ? History
Applications of Data Mining Detecting data discrepancies or
outliers Relationship identification Data-Function mapping for
modeling/prediction Categorizing and Summarizing Data Standards
Challenges
- Slide 15
- Data Mining : What is it? Applying statistical analysis
techniques to data the goal often being to determine unnoticed
patterns or to collect categorized information turns collected data
into understandable structures Data Mining is often used as a buzz
word to describe processing large amounts of data In essence, its
correct use relates to discovery of new things through observation
Synonymous with knowledge discovery
- Slide 16
- Data Mining : History Though HNC trademarked the term in 1990,
hands on pattern extraction is centuries old As long as statistic
analysis has existed Discoveries in computer science have
increasingly shifted the field from hands on to machine dependent,
this allows for : The use of data indexing and DB systems to handle
data efficiently The application of statistical algorithms on a
large scale, possibly in a distributed manner, with less error
- Slide 17
- Data Mining : Use : Application Data Mining is often broken
into several different categories of tasks Detecting data
discrepancies or outliers Relationship identification Data-Function
mapping for modeling/prediction Categorizing and Summarizing
Data
- Slide 18
- Data Mining : Finding outliers The process of analyzing large,
mostly homogeneous, sets of data and determining which sets or
points go with the flow and conform with patterns the rest of the
data seem to follow do not follow expected results when viewed
against the entire set of data An outlier can be a point or set of
points, but can also be defined through other means A period of
time could yield unexpected results Ex. Network Intrusion
- Slide 19
- Data Mining : Techniques in finding outliers Rule Based
deciding a set of rules that determine an outlier (or what isnt
one) Can be fuzzy or hard rules Cluster Analysis As mentioned
earlier Distance or Standard Deviation Determining an average over
a data set and marking points that arent within a Deviation or
Distance
- Slide 20
- Applications of Outlier Detection Network Intrusion Detection
Unusual bursts of network activity Identity Theft Detection Unusual
spending or customer activity Detecting Software bugs Software does
not deliver expected outputs Sensor event detection Monitoring
patient health fluctuations in a medical setting Preprocessing
Removing data skews based on extenuating circumstances
- Slide 21
- Relationship Discovery: Basics Understanding how data is
related is a key factor in trend and knowledge discovery This is
the definition of data mining Ex: Which products are often bought
before a major forecasted storm {hamburger buns} => {???} With
small sets of data, or with correlations that arent subtle (as the
one above), identifying relationships is not as difficult With
large data sets or subtle relations a combination of rule
generation and data analysis can be used to expedite the
process
- Slide 22
- Relationship Discovery: How its done Since the number of
relationships between points of data could be boundless, two
important concepts are often introduced in relationship discovery:
The amount of data within which a relationship might exist, called
the support of a rule. The probability that data in the support
will verify a selected rule, called the confidence of a rule.
- Slide 23
- Relationship Discovery: How its done Generally we apply minimum
bounds to both the support of a rule and its confidence to
determine relationships First : determine possible relationships
Set a minimum support Orders with hamburgers, Orders with hamburger
buns Other, user specific rules can be used here Second : take the
remaining sets, look for patterns in the items sets such that
occurrence rate is above the minimum confidence How many people
bought hamburgers and buns together Ex: we find that if the
customer is a male, and they buy diapers, they will also buy beer
{male, diapers} => {beer}
- Slide 24
- Matching data to functions Often, it is desirable to match data
sets and the factors that determine them to functions Allows for
the possibility of predicting future results Involves learning how
dependent and independent variables in our data interact Dependent
: the result, or where a point exists Independent : an cause or
circumstance that determines the dependent variable If we know how
dependent and independent variables interact, we can create a
function and run simulations to see results
- Slide 25
- Uses of Function-Data Mapping Weather Forecasting Determining
what conditions lead to what kinds of weather Stock market analysis
When to buy and when to sell Crime Prevention What conditions cause
or prevent crime
- Slide 26
- Categorizing Categorizing Often we want to separate data based
off of a set of predefined attributes Very helpful in pattern
recognition Ex: a persons political preference The process : we
synthetically generate or measure a set of observations (data
points) with known categories we extract properties from said
observations which we believe contribute to the category These are
called explanatory variables Finally we examine new data for these
properties
- Slide 27
- Summarizing Summarizing we almost never want to look at all of
the data individually Having too much data can actually hider the
decision making process Known as information overload Summarizing
takes the results from data mining and transforms it into formats
that can be easily read without omitting important information
Summarizing might : Extract and display only important data
correlate and abstract data to display trends Formats Include :
Reports, Graphs, Dashboards, etc.
- Slide 28
- Standards : CRISP-DM Cross Industry Standard Process for Data
Mining describes common practice for conducting data mining in an
enterprise setting KD nuggets a community resource in DM and
analytics took polls and found CRISP-DM was the top methodology in
02, 04, & 07 Six step methodology Business Understanding Data
Understanding Data Preparation Modeling Evaluation Deployment
- Slide 29
- CRISP-DM : Explained Business Understanding Determining the
business purpose Define success conditions how do we know we
succeeded Ex : improved prediction accuracy Map purpose/success
conditions to data mining results Ex: fraud prevention => detect
deviations Data Understanding Collecting and exploring data
defining its attributes Data quality verification
- Slide 30
- CRISP-DM : Explained Data Preparation Data Cleaning
Normalization fitting data within ranges Outlier removal removing
cases that could skew the model Handle missing attributes the data
was not obtained Formatting changing data so that it fits with our
tools Modeling fitting the data to a model following the methods
previously described and then interpreting that model Assess the
accuracy of the collected data General purpose divided into
prediction or description
- Slide 31
- CRISP-DM : Explained Evaluation look at results and measure
them with respect to the success cases defined earlier Determine if
one has succeeded Determine next steps, how do we apply the results
Deployment The execution of a strategy for using the results of our
data mining Includes preparing ways to monitor and maintain the
application of data mining results in the day to day Includes some
sort of final summary
- Slide 32
- SEMMA Sample, Explore, Modify, Model and Assess Proposed by SAS
Institute : A producer of BI and BA software suites. Though this
model is often considered general SAS prefers to apply it directly
to their products Focuses mainly on data mining and not on applying
results to business (unlike CRISP-DM) Sampleselecting the data set
ExploreUnderstand data through discovering relationships, both
expected and otherwise ModifyTransform and clean the data in order
to prepare it for the modeling process ModelApply models to the
data in order to discover trends and make predictions
AssessEvaluate the results of the modeling process to determine the
reliability of the mined data
- Slide 33
- Challenges in data mining Not enough or too much data
Oftentimes it is difficult to access sufficient quantities of data
for small enterprises If the enterprise is large however, sometimes
there is too much and deciding what to keep is difficult Acquiring
clean data Multiple formats or no format at all Privacy and ethical
concerns Data aggregation : data compiled from multiple sources can
lead to revelations that violate privacy concerns Ex: anonymous
data is collected and aggregated, leading to identification
- Slide 34
- Questions For Exam What are some of the challenges in business
analytics and data mining. Gathering enough data Protecting privacy
Verifying data integrity How can finding outliers be useful in data
mining We can use them to clean up results Sometimes we wish to
isolate outliers so that we can find their cause