© 2002 Megaputer intelligence, Inc. Mining data with PolyAnalyst Your Knowledge Partner TM .
-
Upload
ronan-tidball -
Category
Documents
-
view
225 -
download
0
Transcript of © 2002 Megaputer intelligence, Inc. Mining data with PolyAnalyst Your Knowledge Partner TM .
© 2002 Megaputer intelligence, Inc.
Mining data with
PolyAnalyst
Your Knowledge Partner TM
www.megaputer.com
© 2002 Megaputer intelligence, Inc.
OutlineOutline
Data Mining in BI chain
PolyAnalyst overview
Learning algorithms
Additional features
Future developments
© 2002 Megaputer intelligence, Inc.
Consider a fragment of the BI chain:
DM in Decision MakingDM in Decision Making
Data Data - is what we can capture and store
KnowledgeKnowledge - is what provides for informed decisions
Problem: How to get from Data to Knowledge?
Solution: Data Mining (Machine Learning)
DataData KnowledgeKnowledge DecisionDecision ActionAction
© 2002 Megaputer intelligence, Inc.
Data MiningData Mining
"Data Mining is the process of identifying valid, novel, potentially useful, and ultimately comprehensible knowledge from databases that is used to make crucial business decisions."
-- G. Piatetsky-Shapiro, KDNuggets editor
www.kdnuggets.com
ValidValid NovelNovel ActionableActionable ComprehensibleComprehensible
© 2002 Megaputer intelligence, Inc.
Data Mining vs. OLAPData Mining vs. OLAP
OLAPOLAP- Helps prove or reject your hypothesesby dissecting data along different dimensions
- But you have to guess the answer first !- But you have to guess the answer first !
Data MiningData Mining- Automatically develops and tests numerous hypotheses by learning from historical data- Analyzes raw data
© 2002 Megaputer intelligence, Inc.
Business Intelligence ChainBusiness Intelligence Chain Consider direct marketing automation
Analyze data Integrate applications
X
© 2002 Megaputer intelligence, Inc.
Data Mining TasksData Mining Tasks Predicting Classifying Clustering Segmenting Explaining Associating Visualizing Link Analysis Text Mining
© 2002 Megaputer intelligence, Inc.
Fields of applicationFields of applicationDatabase marketers Response prediction
Market segmentation
Customer valuation
Cross-sell analysis
Insurance and HMO companies Customer retention
Head-cost reduction
Premium sensitivity testing
Securities and currency traders Forecasting market behavior
Trading strategy optimization
Hospitals and physicians Medical diagnostics
Selecting medical interference
Government agencies Fraud detection
Revenue prediction
Retailers Market Basket Analysis
Recommendation systems
© 2002 Megaputer intelligence, Inc.
What makes DM hard?What makes DM hard?
Unfamiliar concept and lack of experience Results require interpretation by an analyst Poor integration in existing applications Difficulty processing very large databases Necessity to learn a new application High cost
© 2002 Megaputer intelligence, Inc.
Megaputer responseMegaputer response Challenge:Challenge: Unfamiliar concept and lack of experience
Response:Response: Collaborative Appliance Program – combines Megaputer analysts expertise in data mining and customer knowledge of the business project
Challenge:Challenge: Results require interpretation by an analystResponse:Response: Simple reporting and batch processing capabilities
Challenge:Challenge: Poor integration in existing applicationsResponse:Response: Easy scoring of external data with a few mouse clicks
Challenge:Challenge: Difficulty processing very large databasesResponse:Response: In-Place Data Mining
Challenge:Challenge: Necessity to learn a new applicationResponse:Response: An SDK of easy-to-integrate PolyAnalyst COM components
Challenge:Challenge: High costResponse:Response: Flexible licensing mechanism
© 2002 Megaputer intelligence, Inc.
What is PolyAnalyst?What is PolyAnalyst? Multi-strategy data mining suite
The largest selection of ML algorithms for diverse business tasks
Structured data and text processing tools
Ease-of-use: friendly data manipulation and visualization
Deep integration Applying models to external DB through the OLE DB
protocol Exporting models to XML COM components
Best Price/Performance ratio
© 2002 Megaputer intelligence, Inc.
Key differentiators of PolyAnalystKey differentiators of PolyAnalyst
Integrated analysis of structured (numeric and categorical) and unstructured (text) data
Easy to learn and operate visual analytical interface The largest selection of powerful machine learning algorithms Mouse-driven application of predictive models to data in any
external system through a standard OLEDB link Simple integration with external applications: SDK of COM
components In-Place Data Mining capabilities for processing huge
databases Step-by-step tutorials based on real-world case studies Rich data manipulation and visualization tools Reusable analytical scripts for batch process data mining The best Price/Performance ratio
© 2002 Megaputer intelligence, Inc.
PolyAnalyst
Boeing (USA) 3M (USA)
Chase Manhattan Bank (USA) McKinsey & Company (USA)
Siemens (Germany) Lockheed Martin (USA)
Allstate Insurance (USA) ICICI Bank (India)
Mars (USA) Taco Bell (USA)
DuPont (USA) Asea Skandia (Sweden)
France Telecom (France) Cambridge Technology Partners (USA)
Carlson Marketing (USA) Central Bank (Russia)
US Navy (USA) KPN Research (Netherlands)
Alka Insurance (Denmark) National Cancer Institute (USA)
Customer base: 300+ installations300+ installations
Sample customers
© 2002 Megaputer intelligence, Inc.
PolyAnalyst workplacePolyAnalyst workplaceProject
navigation tree
Control buttons
Data and Resultspane
Objects and Collectionsrepresented by icons
Exploration enginereport fragment
PolyAnalystlog journal
© 2002 Megaputer intelligence, Inc.
PolyAnalyst providesPolyAnalyst provides Access to data held in a database or data
warehouse Numerical Categorical Yes/no Date
Data manipulation and visualization
14 machine learning algorithms
Convenient results reporting and outputing
Integration with external applications
© 2002 Megaputer intelligence, Inc.
PolyAnalystmachine learning algorithms
Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc.
“Probably one the most impressive Probably one the most impressive characteristic of PolyAnalyst is the sheer characteristic of PolyAnalyst is the sheer number of data mining tasks it can tacklenumber of data mining tasks it can tackle.”
Mario ApicellaTechnology AnalystInfoWorld Test CenterJuly 3, 2000
© 2002 Megaputer intelligence, Inc.
Learning algorithmsLearning algorithms Find LawsFind Laws (SKAT algorithm) ClusterCluster (Localization of anomalies) Find DependenciesFind Dependencies (n-dimensional distributions) ClassifyClassify (Fuzzy logic modeling) Decision TreeDecision Tree (Information Gain criterion) PolyNet PredictorPolyNet Predictor (GMDH-Neural Net hybrid) Market Basket AnalysisMarket Basket Analysis (Association rules) Memory Based ReasoningMemory Based Reasoning (k-NN + GA) Linear RegressionLinear Regression (Stepwise and rule-enriched) Discriminate Discriminate (Unsupervised classification) Summary StatisticsSummary Statistics (Data summarization) Link Analysis Link Analysis (Visual correlation analysis) Text MiningText Mining (Semantic text analysis)
© 2002 Megaputer intelligence, Inc.
Cluster Cluster (FC)(FC)
Identifies clusters of similar records
Selects best variables for clustering
Suggests the number of clusters
Separates clusters of records in new data sets for
further investigation - preprocessing for other
algorithms
© 2002 Megaputer intelligence, Inc.
Cluster Cluster (continued)(continued)
Groups of similar records
© 2002 Megaputer intelligence, Inc.
Cluster Cluster (continued)(continued)
Based on analyzing distributions in hypercubes of all variables rather than on measuring distances between points
Hence, independent of rescaling of axes variable Finds only clusters actually present in data, on the
background of uniformly distributed cases
© 2002 Megaputer intelligence, Inc.
Classify Classify (CL)(CL) Fuzzy-logic based classification The function of belonging modeled by either Find
Laws, PolyNet Predictor, or LR Provides record scoring with Lift and Gain charts
used for visualization Assigns records to one of two classes and furnishes
utilized classification rule
© 2002 Megaputer intelligence, Inc.
Classify Classify (continued)(continued)
Mass mailing
Targeted mailingPolyAnalyst Lift chart illustrates an increase in the response to a campaign based on the discovered model - instead of random mailing %
of
max
ima l
p
oss
i ble
res
po
nse
Mass mailing
Targeted mailing
Pro
fit
($)
PolyAnalyst Gain chart helps optimize the profit obtained in a direct marketing campaign
© 2002 Megaputer intelligence, Inc.
Decision Tree Decision Tree (DT)(DT)
Intuitively classifies cases to selected categories Based on Information Gain splitting criteria The fastest algorithm in PolyAnalyst Scales linearly with increasing number of records
© 2002 Megaputer intelligence, Inc.
Decision Tree Decision Tree (continued)(continued)
Classification tree
Node characteristics
© 2002 Megaputer intelligence, Inc.
Decision Forest Decision Forest (DF)(DF)
The most efficient classification algorithm for tasks with multiple target categories
Transforms the task of categorizing data records to N classes into the problem of solving N tasks of categorizing records to two classes
Develops the best collection of N classification trees, with leaves containing probabilities of classifying records in the corresponding classes
Scales linearly with increasing number of records
© 2002 Megaputer intelligence, Inc.
Link Analysis Link Analysis (LK)(LK)
Reveals pairs of correlated objects Used in Fraud Detection, Text Analysis and other
correlation analysis tasks
© 2002 Megaputer intelligence, Inc.
Text Analysis Text Analysis (TA)(TA)
Extracts key concepts from natural language notes
Tags individual records with the main encountered
concepts
Recognizes synonyms and othe semantic relations
Can perform user-focused or unsupervised analysis
Integrates the analysis of text with the power of other
machine learning algorithms of PolyAnalyst
Facilitates categorization of textual documents
© 2002 Megaputer intelligence, Inc.
Basket Analysis Basket Analysis (BA)(BA)
Is used in Retailing, Fraud Detection and Medicine Identifies in transactional data groups of products
sold together well Finds directed association rules for each of these
groups Groups baskets containing similar sets of products Characterized by
Support Confidence Improvement
Based on new mathematics: works 10 to 50 times faster than traditional algorithms
© 2002 Megaputer intelligence, Inc.
Basket Analysis Basket Analysis (continued)(continued)
Groups of productssold together well
Directed Association Rules
© 2002 Megaputer intelligence, Inc.
Basket Analysis Basket Analysis (continued)(continued)
Works with both transactional and flat data format Easily finds many-to-one rules
“I would like to continue working together with I would like to continue working together with
Megaputer on other CTP customers’ projects Megaputer on other CTP customers’ projects (mainly Swedish and Danish Banks ).(mainly Swedish and Danish Banks ).”
-- Olof GoranssonSenior Data ConsultantCTP Skandinavien AB
© 2002 Megaputer intelligence, Inc.
Find Laws Find Laws (FL)(FL)
Models relationships hidden in data
Presents discovered knowledge explicitly
Searches the space of all possible hypotheses
“The unique Find Laws algorithm along with an easy to The unique Find Laws algorithm along with an easy to
use interface made PolyAnalyst the only choice for our use interface made PolyAnalyst the only choice for our
environment.environment.”
-- James Farkas, Senior Navigation Engineer, The Boeing Company
© 2002 Megaputer intelligence, Inc.
Find Laws Find Laws (continued)(continued)
FL is based on the Megaputer’s uniqueSymbolic Knowledge Acquisition TechnologySymbolic Knowledge Acquisition Technology (SKAT)
A good introduction to SKAT: PCAI magazinePCAI magazine, January 99, p. 48-52, January 99, p. 48-52
© 2002 Megaputer intelligence, Inc.
Find Dependencies Find Dependencies (FD)(FD)
Determines most influential variables Detects multi-dimensional dependencies Predicts target variable in a table format Used as preprocessing for FL
© 2002 Megaputer intelligence, Inc.
Find Dependencies Find Dependencies (continued)(continued)
Predicted Sales per Employee
© 2002 Megaputer intelligence, Inc.
PolyNet Predictor PolyNet Predictor (PN)(PN)
Predicts values of continuous attributes Hybrid GMDH-Neural Network method Works well with large amounts of data The best architecture network is built automatically
© 2002 Megaputer intelligence, Inc.
Memory Based ReasoningMemory Based Reasoning (MB)(MB)
Performs classification to multiple categories
Based on identifying similar cases in the previous history
Uses Genetic Algorithms to find the most suitable metric for the problem
© 2002 Megaputer intelligence, Inc.
Discriminate Discriminate (DS)(DS)
Determines what features of a selected data set distinguish it from the rest of the data
Requires no target variable Can be powered by
Find Laws PolyNet Predictor Linear Regression
© 2002 Megaputer intelligence, Inc.
Linear Regression Linear Regression (LR)(LR)
Incorporates categorical and yes/no variables in the analysis correctly
Stepwise Linear Regression: only influential variables included
Can be used as a preprocessing and benchmarking module
© 2002 Megaputer intelligence, Inc.
Data Analysis Project WorkflowData Analysis Project Workflow
Access data Understand, clean and transform data Run machine learning analysis Visualize, report and share results Integrate results in existing business process
© 2002 Megaputer intelligence, Inc.
Data AccessData Access
ODBC-compliant databases:
Oracle, DB2, Informix, Sybase, MS SQL Server, etc.
Dedicated access IBM Visual Warehouse
Oracle Express
OLE DB (can do In-Place Data Mining)
CSV or DBF files
Data can be appended to the project when necessary
© 2002 Megaputer intelligence, Inc.
Data cleansing and manipulationData cleansing and manipulation
SQL querying through OLE DB Records selection according to multiple
criteria Union, intersection, or complement of data
sets Categorical values aggregation Visual Drill-through Exceptional records filtering Split into n-tile percentage intervals Random sampling
© 2002 Megaputer intelligence, Inc.
VisualizationVisualization
Histograms Line and scatter plots with zoom and drill-
through capabilities Snake charts Interactive 3D-charts Interactive Rule-graphs with sliders for
visualizing multi-variable relations Frequency charts for categorical, integer,
or yes/no variables Lift and Gain charts for marketing
applications
© 2002 Megaputer intelligence, Inc.
Histograms and FrequenciesHistograms and Frequencies
Histogram displays distribution of numerical variables
Frequencies chart displays distribution of categorical and yes/no variables
© 2002 Megaputer intelligence, Inc.
2D charts and Rule-graphs2D charts and Rule-graphs
Sliders help visualize effects of other variables in more than two-dimensional models
The Find Laws model (red line) for a product market share dependence on the price predicts a dramatic change in the formula when the product goes on promotion
© 2002 Megaputer intelligence, Inc.
Snake-chartsSnake-charts
Quickly compare qualitatively several datasets on all their attributes
“Low”
“High” Compared data sets
All variables
© 2002 Megaputer intelligence, Inc.
Interactive 3D chartsInteractive 3D charts
You can use mouse to rotate the 3D-cube
© 2002 Megaputer intelligence, Inc.
Integration objectivesIntegration objectives
Use models to simply score data in various external databases
Deliver models to external applications in the format they understand - XML
Be able to analyze very large databases in their entirety
Integrate dedicated machine learning components in existing decision support systems
© 2002 Megaputer intelligence, Inc.
Applying models externallyApplying models externally
PolyAnalyst can readily apply predictive models directly to data in any external source through a standard OLE DB protocol
PolyAnalyst can export models to XML (PMML) format for their incorporation in external decision support applications
© 2002 Megaputer intelligence, Inc.
Analyzing large databasesAnalyzing large databases
In-Place Data Mining
Traditional Data Mining
© 2002 Megaputer intelligence, Inc.
PolyAnalyst COMPolyAnalyst COM
A kit of COM-based Data Mining components See DMReview magazine, January 2000, p. 42 and PCAI magazine, March 99, p. 16
Benefits Develop new applications quickly and effortlessly Incorporate third party components Choose best components from different vendors Extend functionality by adding new components Cross-platform applications Integration with most simple tools (Visual Basic)
© 2002 Megaputer intelligence, Inc.
PolyAnalyst COM PolyAnalyst COM (continued)(continued)
Offers individual machine learning engines Integration with external applications
Users see only the Users see only the familiar interface familiar interface enhanced by a few enhanced by a few new buttonsnew buttons
The main program The main program instructs instructs PolyAnalyst on PolyAnalyst on how to access the how to access the stored datastored data
Hard analytical work is Hard analytical work is performed by integrated performed by integrated PolyAnalyst machine PolyAnalyst machine learning components learning components behind the scenesbehind the scenes
© 2002 Megaputer intelligence, Inc.
PolyAnalyst platformsPolyAnalyst platforms
Standalone system:
PolyAnalyst - Windows 9x/NT/2000/XP
PolyAnalyst Pro - Windows NT/2000P/XP Pro
PolyAnalyst XL - Add-ins for MS Excel
Client/Server system:
PolyAnalyst Knowledge Server - Windows NT Client - Windows 9x/NT/2000 or OS/2
© 2002 Megaputer intelligence, Inc.
Timothy NagleTimothy NagleConsulting ScientistConsulting Scientist3M Corporation3M CorporationSt. Paul, MN, USASt. Paul, MN, USA
“Analytical engines do an excellent job of finding relations amongst many fields without overfitting.”
PolyAnalyst supports medical PolyAnalyst supports medical projects at 3Mprojects at 3M
© 2002 Megaputer intelligence, Inc.
James FarkasJames FarkasSenior Navigation Senior Navigation EngineerEngineerThe Boeing CompanyThe Boeing CompanyKent, WA, USAKent, WA, USA
“PolyAnalyst provides quick and easy access for inexperienced users to powerful modeling tools.
PolyAnalyst helps improving flight PolyAnalyst helps improving flight control system at Boeingcontrol system at Boeing
© 2002 Megaputer intelligence, Inc.
Raymond Burke Raymond Burke E.W. Kelley Professor of BA E.W. Kelley Professor of BA Kelley Business School Kelley Business School Indiana UniversityIndiana UniversityBloomington, IN, USABloomington, IN, USA
“PolyAnalyst provides a unique and powerful set of tools for data mining applications, including promotion response analysis, customer segmentation and profiling, and cross-selling analysis.”
PolyAnalyst facilitates marketing PolyAnalyst facilitates marketing research at Indiana Universityresearch at Indiana University
© 2002 Megaputer intelligence, Inc.
PolyAnalyst helps medical research at PolyAnalyst helps medical research at the University of Wisconsin-Madisonthe University of Wisconsin-Madison
Prof. Roger L. BrownProf. Roger L. BrownDirector of RDSUDirector of RDSUUniversity of University of WisconsinWisconsinMadison, WI, USAMadison, WI, USA
“PolyAnalyst suite enabled our researchers to search their data for rules and structure while providing a symbolic knowledge of the structure, the detail they needed.”
© 2002 Megaputer intelligence, Inc.
PolyAnalyst provides efficient machine PolyAnalyst provides efficient machine learning algorithmslearning algorithms
Mario ApicellaMario ApicellaTechnology AnalystTechnology AnalystInfoWorld Test CenterInfoWorld Test Center
“PolyAnalyst focuses more effectively on data discovery than its competition.”
© 2002 Megaputer intelligence, Inc.
Future developmentsFuture developments Further support for OLE DB for DM
Nested tables
New machine learning algorithms Time series analysis Kohonen maps
Enhanced data import and manipulation Visual development of workflow scripts New push-button vertical applications
© 2002 Megaputer intelligence, Inc.
PolyAnalyst -- WebAnalystPolyAnalyst -- WebAnalyst
PolyAnalyst supports support visual project development when used on top of a new Megaputer web-enabled enterprise server, WebAnalyst
© 2002 Megaputer intelligence, Inc.
PolyAnalyst evaluationPolyAnalyst evaluation
Download a FREE evaluation copy of PolyAnalyst from
www.megaputer.com
and enjoy using it hands-on following the provided step-by-step lessons, or exploring your own data.
© 2002 Megaputer intelligence, Inc.
Any Questions?Any Questions?Call Megaputer at(812) 330-0110
120 W Seventh Street, Suite 310Bloomington, IN 47404 USA
Your Knowledge Partner TM
© 2002 Megaputer intelligence, Inc.
Asea Skandia Asea Skandia
Established 1907 Largest Swedish distributor of electrical equipment About 1,400 employees and a turnover of SEK 5.1
billion About ten thousand product names Not good at CRM and DB marketing yet Had only transactional data in a database
© 2002 Megaputer intelligence, Inc.
Groups of products offeredGroups of products offered
Home Appliances
90 Cookers, cooker fans, microwave ovens
91 Fridges/Chillers/Freezers
92 Washing machines, dishwashers, dryers
93 Sauna unit, fans
94 Small appliances
Lightning17 IR, RF and Bus control systems
19 Light reg.. timers, plugs, CCE-con., car heaters
70 Interior light fittings
72 Industrial light fittings
73 Emergency luminaires
74 Spotlights and downlights, lighting tracks
75 Decorative interior light fittings
77 Exterior light fittings
79 Accessories and spare parts
80 Fluorescent lamps and other discharge lamps
81 Incandescent filament and halogen lamps
82 Special lamps
Ventilation and sheet metal15 Fastening and fixings, protective equipment
16 Tools, implements, protective equipment & clothin
66 Ventilation
67 Sheet Metal for Buildings
Telecommunications48 Low current cable
49 Data and optical fiber cable
50 Network material
51 Local data networks
52 Power Supply
53 Signalling equipment
55 Distress signal systems
57 Telephony
58 Internal communication systems
60 Aerial equipment
62 Sound and time distribution systems
63 Safety and Security Systems
64 Service Alarm Systems
Electrical Equipment1 Power and control cables
2 Electrical installation, wiring and flexible cable
6 Material kits, cable protection, lightning equipment
7 Terminations, joints, cabinets and electrical tape
8 Contact crimping
9 Electric meters
11 Cable ladders, trays, trunking, cable trolleys
14 Conduit, boxes, glands, fire protection
15 Fastening and fixings, protective equipment
16 Tools, implements, protective equipment & clothin
18 Switch systems
20 Fuses with accessories
21 Miniature circuit breaker systems
22 Distribution board systems IP20-IP43
23 Distribution board systems IP43-IP65
25 Equipment boxes, equipment cabinets
26 Distribution board accessories
28 Switchgear components, capacitors, busbar trunking
29 Connection terminals and marking materials
31 Motor, safety, load and MCCB breakers
32 Contactors and starters
35 Motors
37 Push switches
38 Sensors, monitors and regulators
40 Relays, time relays
42 Metering instruments
43 Spare parts for consumer goods
45 Programmable control system
85 Radiators and thermostats
87 Fan heaters
88 Water heaters and electric boilers
89 Heating cable
© 2002 Megaputer intelligence, Inc.
(continued)(continued)
Predicting cross-sell opportunities was possible Closer cooperation with the client was necessary Megaputer teamed with Cambridge Technology Partners (Sweden) Data was disguised prior to the analysis
Asea SkandiaAsea SkandiaAsea SkandiaAsea Skandia CTPCTPCTPCTP MegaputerMegaputerMegaputerMegaputer
Determined business Determined business potential of the datapotential of the data
Developed data Developed data exploration strategyexploration strategy
Carried out Market Basket Carried out Market Basket AnalysisAnalysis
Provided actionable Provided actionable results to CTPresults to CTP
Identified most suitable solution provider
Worked with the client
Collected available dataCollected available data
Aggregated data in Aggregated data in product categoriesproduct categories
Presented Megaputer Presented Megaputer results to the clientresults to the client
Identified new opportunity
Hired a consultant
Helped aggregating Helped aggregating products in groupsproducts in groups
Incorporated results in Incorporated results in marketing activitiesmarketing activities
© 2002 Megaputer intelligence, Inc.
PolyAnalyst MBAPolyAnalyst MBA
Works 10-50 times faster than traditional Easily finds many-to-one rules
“I would like to continue working together with I would like to continue working together with
Megaputer on other CTP customers’ projects Megaputer on other CTP customers’ projects (mainly Swedish and Danish Banks ).(mainly Swedish and Danish Banks ).”
-- Olof GoranssonSenior Data ConsultantCTP Skandinavien AB