Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING...
-
Upload
madeline-keene -
Category
Documents
-
view
216 -
download
2
Transcript of Copyright © 2012, SAS Institute Inc. All rights reserved. INTRODUCTION TO DATA AND TEXT MINING...
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
INTRODUCTION TO DATA AND TEXT MININGANDREW PEASE, 8 MARCH 2013
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
SAS® ANALYTICS
Operations Research
Quality Improvement
Data Visualizatio
n
Model Management
Discrete Event Simulation
Analysis of Means
Cluster Analysis
Matrix Programming
Spectral Analysis
Ensemble Models
Sample Size Computations
Simulation
Categorical Data Analysis
Psychometric Analysis
Genetic Algorithms
Survival Analysis
Statistical Process Control
X11 & X12 Models
Decision Trees
Analysis of Variance
Survey Data Analysis
Vector Autoregressive Models
Nonlinear Programming Network Flow
Models
Nonparametric Analysis
Content Categorization
Study Planning
ARIMA Models
Linear Programming
Interior-Point Models
Scheduling
Bayesian
R Integration
Multivariate Analysis
Neural Networks
Gradient Boosting Machines
Automated Scoring
Exploratory Data Analysis
Random Forrests
Mixed Models
Design of Experiments
Predictive Modeling
Information Theory
Reliability Analysis
Constraint Programming
Discrete Event Simulation
Social Network Analysis
Ontology Management
Regression
Process Capability Analysis
Descriptive Modeling
Mixed-Integer Programming
Fractional Factorial
D-Optimal
Association & Sequence Analysis
Multinomical Discrete Choice
High Performance ForecastingText
Analytics
Content Categorization
Ontology Management
Sentiment Analysis
Forecasting
Econometrics
Large-Scale Forecasting
Time Series Analysis
Data Mining
Scoring Acceleration
Predictive Analytics
Statistics
Statistical Analysis
Interactive Matrix Programming
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
DATA MINING IS:
Discovering patterns, trends and relationships represented in data
Developing models to understand and describe characteristics and activity based on these patterns
Use insights to help evaluate future options and take fact-based decisions
Deploy scores and results for timely, appropriate action
time….
…. Past Future ….
Observed Events Predicted Events
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
INDUSTRY SPECIFIC DATA MINING APPLICATIONS
Application What is Predicted? Driven Business Decision
Credit Scoring (Banking)
Measure credit worthiness of new and existing set of customers
How to assess and control risk within existing (or new) consumer portfolios?
Market Basket Analysis (Retail)
Which products are likely to purchased together?
How to increase sales with cross-sell/up-sell, loyalty programs, promotions?
Asset Maintenance (Utilities, Mfg., Oil & Gas)
Identify real drivers of asset or equipment failure
How to minimize operational disruptions and maintenance costs?
Health & Condition Mgmt. (Health Insurance)
Identify patients at risk of a chronic illness & offer treatment program
How can we reduce healthcare costs and satisfy patients?
Fraud Mgmt. (Govt., Insurance, Banks)
Detect unknown fraud cases and future risks
How to decrease fraud losses and lower false positives?
Drug Discovery (Life Science)
Find compounds that have desirable effects & detect drug behavior during trials
How to bring drugs quickly and effectively to the marketplace?
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
DATA MINING METHODOLOGY
SEMMA
SampleSample
ExploreExplore
ModifyModifyModelModel AssessAssess ScoreScore
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
G T E W V G H U I B C X A Q W E T D F G J K O I U T C M N X H G A L O J U T Q A Z C F T E R T N J H Y U O P H Y R M W S D F M N B V H J U Y T I P Q A P G F S D W V B U I N S W B C Z A L K J T M A P I O I U X F E W I Y N H K D N Q U P Q P S F T E M X T R G E O
G T E W V G H U I B C X A Q W E T D F G J K O I U T C M N X H G A L O J U T Q A Z C F T E R T N J H Y U O P H Y R M W S D F M N B V H J U Y T I P Q A P G F S D W V B U I N S W B C Z A L K J T M A P I O I U X F E W I Y N H K D N Q U P Q P S F T E M X T R G E O
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
Content Categorization Text MiningSentiment
AnalysisOntology
Management
SAS TEXT ANALYTICS:
UNCOVERING THE TECHNOLOGY
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
•“If data is wrong, the basis for decision making is also faulty. Therefore, the Clinically Correct Time-True Registration system makes sense even beyond our department and hospital.”
- Sten Larsen, Chief Surgeon
•Creation of database to improving clinical work in research and diagnosis
LILLEBAELT HOSPITAL (Denmark)
HEA
LTH
CA
RE
•Reduce error in patient records
•Reduce manual effort of patient record audits
BUSINESS ISSUE RESULTS
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
•"By decoding the 'messages' through statistical and root-cause analyses of complaints data, the government can better understand the voice of the people, and help government departments improve service delivery, make informed decisions and develop smart strategies. This in turn helps boost public satisfaction with the government, and build a quality city.”- Efficiency Unit’s Assistant Director, W.
F. Yuk
1823 HONG KONG EFFICIENCY UNIT
PU
BLI
C
•1823 operates round-the-clock, including during Sundays and public holidays.
•Answers 2.65 million calls and 98.000 e-mails, including inquiries, suggestions and complaints
•Developed a Compliant Intelligence System that uncovers the trends, patterns and relationships inherent in the complaints
BUSINESS ISSUE RESULTS
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved.
DATA/TEXT MINING RESEARCH CONSIDERATIONS
• Data Mining for patent research/control
• Copyright research/control• Metadata-driven approach avoids
‘permanent’ data duplication• Analyst needs ‘creative freedom’
in combining, transforming data• User interfaces – programming
vs point-and-click• Cost to implement highly variable• Future Indications
• In-Memory• Big Data• Cloud Com
Copy r ight © 2012, SAS Ins t i tu te Inc . A l l r ights reserved. www.SAS.com