Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING...

11
Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013

Transcript of Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING...

Page 1: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

INTRODUCTION TO DATA AND TEXT MININGANDREW PEASE, 8 MARCH 2013

Page 2: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

SAS® ANALYTICS

Operations Research

Quality Improvement

Data Visualizatio

n

Model Management

Discrete Event Simulation

Analysis of Means

Cluster Analysis

Matrix Programming

Spectral Analysis

Ensemble Models

Sample Size Computations

Simulation

Categorical Data Analysis

Psychometric Analysis

Genetic Algorithms

Survival Analysis

Statistical Process Control

X11 & X12 Models

Decision Trees

Analysis of Variance

Survey Data Analysis

Vector Autoregressive Models

Nonlinear Programming Network Flow

Models

Nonparametric Analysis

Content Categorization

Study Planning

ARIMA Models

Linear Programming

Interior-Point Models

Scheduling

Bayesian

R Integration

Multivariate Analysis

Neural Networks

Gradient Boosting Machines

Automated Scoring

Exploratory Data Analysis

Random Forrests

Mixed Models

Design of Experiments

Predictive Modeling

Information Theory

Reliability Analysis

Constraint Programming

Discrete Event Simulation

Social Network Analysis

Ontology Management

Regression

Process Capability Analysis

Descriptive Modeling

Mixed-Integer Programming

Fractional Factorial

D-Optimal

Association & Sequence Analysis

Multinomical Discrete Choice

High Performance ForecastingText

Analytics

Content Categorization

Ontology Management

Sentiment Analysis

Forecasting

Econometrics

Large-Scale Forecasting

Time Series Analysis

Data Mining

Scoring Acceleration

Predictive Analytics

Statistics

Statistical Analysis

Interactive Matrix Programming

Page 3: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

DATA MINING IS:

Discovering patterns, trends and relationships represented in data

Developing models to understand and describe characteristics and activity based on these patterns

Use insights to help evaluate future options and take fact-based decisions

Deploy scores and results for timely, appropriate action

time….

…. Past Future ….

Observed Events Predicted Events

Page 4: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

INDUSTRY SPECIFIC DATA MINING APPLICATIONS

Application What is Predicted? Driven Business Decision

Credit Scoring (Banking)

Measure credit worthiness of new and existing set of customers

How to assess and control risk within existing (or new) consumer portfolios?

Market Basket Analysis (Retail)

Which products are likely to purchased together?

How to increase sales with cross-sell/up-sell, loyalty programs, promotions?

Asset Maintenance (Utilities, Mfg., Oil & Gas)

Identify real drivers of asset or equipment failure

How to minimize operational disruptions and maintenance costs?

Health & Condition Mgmt. (Health Insurance)

Identify patients at risk of a chronic illness & offer treatment program

How can we reduce healthcare costs and satisfy patients?

Fraud Mgmt. (Govt., Insurance, Banks)

Detect unknown fraud cases and future risks

How to decrease fraud losses and lower false positives?

Drug Discovery (Life Science)

Find compounds that have desirable effects & detect drug behavior during trials

How to bring drugs quickly and effectively to the marketplace?

Page 5: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

DATA MINING METHODOLOGY

SEMMA

SampleSample

ExploreExplore

ModifyModifyModelModel AssessAssess ScoreScore

Page 6: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

G T E W V G H U I B C X A Q W E T D F G J K O I U T C M N X H G A L O J U T Q A Z C F T E R T N J H Y U O P H Y R M W S D F M N B V H J U Y T I P Q A P G F S D W V B U I N S W B C Z A L K J T M A P I O I U X F E W I Y N H K D N Q U P Q P S F T E M X T R G E O

G T E W V G H U I B C X A Q W E T D F G J K O I U T C M N X H G A L O J U T Q A Z C F T E R T N J H Y U O P H Y R M W S D F M N B V H J U Y T I P Q A P G F S D W V B U I N S W B C Z A L K J T M A P I O I U X F E W I Y N H K D N Q U P Q P S F T E M X T R G E O

Page 7: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

Content Categorization Text MiningSentiment

AnalysisOntology

Management

SAS TEXT ANALYTICS:

UNCOVERING THE TECHNOLOGY

Page 8: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

•“If data is wrong, the basis for decision making is also faulty. Therefore, the Clinically Correct Time-True Registration system makes sense even beyond our department and hospital.”

- Sten Larsen, Chief Surgeon

•Creation of database to improving clinical work in research and diagnosis

LILLEBAELT HOSPITAL (Denmark)

HEA

LTH

CA

RE

•Reduce error in patient records

•Reduce manual effort of patient record audits

BUSINESS ISSUE RESULTS

Page 9: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

•"By decoding the 'messages' through statistical and root-cause analyses of complaints data, the government can better understand the voice of the people, and help government departments improve service delivery, make informed decisions and develop smart strategies. This in turn helps boost public satisfaction with the government, and build a quality city.”- Efficiency Unit’s Assistant Director, W.

F. Yuk

1823 HONG KONG EFFICIENCY UNIT

PU

BLI

C

•1823 operates round-the-clock, including during Sundays and public holidays.

•Answers 2.65 million calls and 98.000 e-mails, including inquiries, suggestions and complaints

•Developed a Compliant Intelligence System that uncovers the trends, patterns and relationships inherent in the complaints

BUSINESS ISSUE RESULTS

Page 10: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.

DATA/TEXT MINING RESEARCH CONSIDERATIONS

• Data Mining for patent research/control

• Copyright research/control• Metadata-driven approach avoids

‘permanent’ data duplication• Analyst needs ‘creative freedom’

in combining, transforming data• User interfaces – programming

vs point-and-click• Cost to implement highly variable• Future Indications

• In-Memory• Big Data• Cloud Com

Page 11: Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING ANDREW PEASE, 8 MARCH 2013.

Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved. www.SAS.com