SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid...

12
Turning increasing volumes of data into useful information is a challenge for most organizations. Relationships and answers that identify key opportunities lie buried somewhere in all of that data. • Which customers will purchase what products and when? • Which customers are leaving and what can be done to retain them? • How should insurance rates be set to ensure profitability? • How can you predict failures, reduce unnecessary maintenance and increase uptime to optimize asset performance? To get answers to complex questions and gain an edge in today’s competitive market, powerful advanced analytic solutions are required. Discovering previously unknown patterns can help decision makers across your enterprise create effective strategies. Those who choose to implement SAS ® data mining into their business processes will be able to stay competitive in today’s fast-moving markets. SAS ® Enterprise Miner 13.1 Create highly accurate analytical models that enable you to predict with confidence What does SAS ® Enterprise Miner do? It streamlines the data mining process so you can create accurate predictive and descriptive analytical models using vast amounts of data. Our customers use this soft- ware to detect fraud, minimize risk, anticipate resource demands, reduce asset down- time, increase response rates for marketing campaigns and curb customer attrition. Why is SAS ® Enterprise Miner important? It offers state-of-the-art predictive analytics and data mining capabilities that enable organizations to analyze complex data, find useful insights and act confidently to make fact-based decisions. For whom is SAS ® Enterprise Miner designed? It’s designed for those who need to analyze increasing volumes of data to identify and solve critical business or research issues – and help others make well-informed deci- sions. This includes data miners, statisticians, marketing analysts, database marketers, risk analysts, fraud investigators, engineers, scientists and business analysts. ›  Fact Sheet Benefits Understand key relationships and develop models intuitively and quickly. The graphi- cal user interface makes it easy for analytic professionals to interact with information at any point in the modeling cycle. Both analytical professionals and business analysts enjoy a common, easy-to-interpret visual view of the data mining process and can col- laborate to solve the toughest challenges. Build better models more efficiently with a versatile data mining workbench. An interactive self-documenting process flow diagram environment shortens model devel- opment time. It efficiently maps the data mining process to produce the best possible results. Easily derive insights in a self-sufficient and automated manner. The SAS Rapid Predictive Modeler enables business analysts and subject-matter experts with limited statistical skills to automatically generate models and act on them quickly. Analytic results are provided in easy-to-understand charts for improved decision making. Enhance the accuracy of predictions to ensure the right decisions are made and that best actions are taken. Better-performing models enhance the stability and accuracy of predictions, which can be verified easily by visual model assessment and validation metrics. Model profiling is also supported to provide an understanding of how the predictor variables contribute to the outcome being modeled. Ease model deployment and scoring processes for faster results. SAS ® Enterprise Miner automates the tedious process of scoring new data and provides complete scoring code for all stages of model development. The scoring code can be deployed in a variety of real-time or batch environments. This saves time and helps you achieve accurate results so you can make decisions that result in the most value.

Transcript of SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid...

Page 1: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Turning increasing volumes of data into useful information is a challenge for most organizations. Relationships and answers that identify key opportunities lie buried somewhere in all of that data.

• Whichcustomerswillpurchasewhatproducts and when?

• Whichcustomersareleavingandwhatcan be done to retain them?

• Howshouldinsuranceratesbesettoensure profitability?

• Howcanyoupredictfailures,reduceunnecessary maintenance and increase uptime to optimize asset performance?

To get answers to complex questions and gain an edge in today’s competitive market,powerfuladvancedanalyticsolutions are required. Discovering previously unknown patterns can help decision makers across your enterprise create effective strategies. Those who choose to implement SAS® data mining into their business processes will be able to stay competitive in today’s fast-moving markets.

SAS® Enterprise Miner™ 13.1Create highly accurate analytical models that enable you to predict with confidence

What does SAS® Enterprise Miner™ do?It streamlines the data mining process so you can create accurate predictive and descriptive analytical models using vast amounts of data. Our customers use this soft-waretodetectfraud,minimizerisk,anticipateresourcedemands,reduceassetdown-time,increaseresponseratesformarketingcampaignsandcurbcustomerattrition.

Why is SAS® Enterprise Miner™ important?It offers state-of-the-art predictive analytics and data mining capabilities that enable organizationstoanalyzecomplexdata,findusefulinsightsandactconfidentlytomakefact-based decisions.

For whom is SAS® Enterprise Miner™ designed?It’s designed for those who need to analyze increasing volumes of data to identify and solve critical business or research issues – and help others make well-informed deci-sions.Thisincludesdataminers,statisticians,marketinganalysts,databasemarketers,riskanalysts,fraudinvestigators,engineers,scientistsandbusinessanalysts.

›  Fact Sheet

Benefits• Understand key relationships and develop models intuitively and quickly. The graphi-

cal user interface makes it easy for analytic professionals to interact with information at any point in the modeling cycle. Both analytical professionals and business analysts enjoyacommon,easy-to-interpretvisualviewofthedataminingprocessandcancol-laborate to solve the toughest challenges.

• Build better models more efficiently with a versatile data mining workbench. An interactive self-documenting process flow diagram environment shortens model devel-opment time. It efficiently maps the data mining process to produce the best possible results.

• Easily derive insights in a self-sufficient and automated manner. The SAS Rapid Predictive Modeler enables business analysts and subject-matter experts with limited statistical skills to automatically generate models and act on them quickly. Analytic results are provided in easy-to-understand charts for improved decision making.

• Enhance the accuracy of predictions to ensure the right decisions are made and that best actions are taken. Better-performing models enhance the stability and accuracy ofpredictions,whichcanbeverifiedeasilybyvisualmodelassessmentandvalidationmetrics. Model profiling is also supported to provide an understanding of how the predictor variables contribute to the outcome being modeled.

• Ease model deployment and scoring processes for faster results. SAS® Enterprise Miner™ automates the tedious process of scoring new data and provides complete scoring code for all stages of model development. The scoring code can be deployed in a variety of real-time or batch environments. This saves time and helps you achieve accurate results so you can make decisions that result in the most value.

Page 2: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Product Overview

Everyone can benefit from incorporating analytics in a secure and scalable manner. But this requires collaboration across the organization and calls for a powerful,multipurposedataminingsolution that can be tailored to meet different needs.

Whileoneanalyticalapproachmayworkfineononedatacollection,itmaynotperform well with new data sources or be able to answer new business questions. This makes it crucial to have a wide selection of analysis tools at hand. Different tools produce different models,andonlywhenyoucomparemodels side by side can you see which data mining approach produces the best “fit.” If you start with a workbench thathaslimitedanalyticaltools(e.g.,onlyregressionoronlydecisiontrees),the end result could be a model with limited predictive value.

SAS Enterprise Miner is delivered as a distributed client/server system. This provides an optimized architecture so data miners and business analysts can work more quickly to create accurate predictiveanddescriptivemodels,andproduce results that can be shared and incorporated into business processes. Toenhancethedataminingprocess,thissoftware is designed to work seamlessly withotherSAStechnologies,suchasdataintegration,analyticsandreporting.

An integrated, complete view of your dataData mining is most effective when it is part of an integrated information delivery strategy – one that includes data gath-ered from hugely diverse enterprise sources.Callcenterlogs,surveyresults,customerfeedbackforms,webdata,timeseries data and transactional point-of-sale data can all be combined and analyzed with the industry’s most sophisticated

data mining package. Adding SAS Text Miner lets you analyze structured and unstructured data together for more accurate and complete results.

Easy-to-use GUIAneasy-to-use,drag-and-dropinterfaceis designed to appeal to analytic profes-sionals. The advanced analytic algorithms are organized under core tasks that are performed in any successful data mining endeavor. The SAS data mining process encompasses five primary steps: sam-pling,exploration,modification,modelingandassessment(SEMMA).Ineachstep,you perform an array of actions as the data mining project develops. By deploying nodes from the SEMMA toolbar,youcanapplyadvancedstatis-tics,identifythemostsignificantvariables,transform data elements with expression builders,developmodelstopredictoutcomes,validateaccuracyandgener-ate a scored data set with predicted values to deploy into your operational applications.

A quick, easy and self-sufficient way to generate models SAS Rapid Predictive Modeler automati-cally steps nontechnical users through a workflowofdataminingtasks(e.g.,transformingdata,selectingvariables,

fitting a variety of algorithms and assessing models) to quickly generate predictive models for a wide range of business problems. SAS Rapid Predictive Modeler is a SAS® Enterprise Guide® or SAS Add-In for Microsoft Office (Microsoft Excel only) task and uses prebuilt SAS Enterprise Miner modeling steps. A collaborative approach allows models developed using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise Miner.

Both classical and modern modeling techniquesSAS Enterprise Miner provides superior analyticaldepthwithasuiteofstatistical,data mining and machine learning algorithms.Decisiontrees,baggingandboosting,timeseriesdatamining,neuralnetworks,memory-basedreasoning,hierarchicalclustering,linearandlogisticregression,associations,sequenceandweb path analysis are all included. And more. The breadth of analytical algo-rithms extends to industry-specific algorithmssuchascreditscoring,andstate-of-the-art methods such as gradient boosting and least angular regression splines.

Figure1:Performprincipalcomponentanalysisfordimensionreduction,afrequentintermediatestep in the data mining process.

Page 3: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Sophisticated data preparation, summarization and exploration Preparing data is a time-consuming aspect of all data mining endeavors. A powerful set of interactive data prepara-tion tools is available for addressing missingvalues,filteringoutliersanddeveloping segmentation rules. Core data preparation tools include file importingandappending,andmergingand dropping variables. Extensive descriptive summarization features and interactive exploration tools let even novice users examine large amounts of dataindynamicallylinked,multidimen-sional plots. This produces quality data mining results tailored and optimally suited to specific business problems.

Business-based model compari-sons, reporting and managementAssessment features let you compare models to identify the ones that produce the best lift and overall ROI. Models generated with different algorithms can be evaluated consistently using a highly visual assessment interface. Data miners can discuss results with business domain experts for improved collaboration and better results. An innovative Cutoff node examines posterior probability distribu-tions to define the optimal actions for solving the business problem at hand.

Open, extensible design provides flexibilityThe customizable environment of SAS Enterprise Miner provides the ability to add tools and include personalized SAS code. Existing SAS models developed outside of the SAS Enterprise Miner environment can be integrated easily into the process flow environment while maintaining full control of each syntax statement. The Extension node includes interactive editor features for training and score codes. Users can edit and submit code interactively while viewing the log and output listings. Default selection lists

can be extended with custom-developed toolswrittenwithSAScodeorXMLlogic,which opens the entire world of SAS to data miners.

Open Source Integration nodeYou can now easily integrate R language code inside of a SAS Enterprise Miner process flow diagram. This enables you to perform data transformation and exploration as well as training and scoring supervised and unsupervised models in R. You can then seamlessly integratetheresults,assessyourRmodel and compare it to models generated by SAS Enterprise Miner.

In-database and in-Hadoop scoring delivers faster resultsScoring is the process of regularly applying a model to new data for implementation into an operational environment.Thiscanbetedious,especially when it entails manually rewritingorconvertingcode,whichdelays model implementation and can introduce potentially costly errors. SAS Enterprise Miner automatically generates scorecodeinSAS,C,JavaandPMML.The scoring code can be deployed in a variety of real-time or batch environ-mentswithinSAS,ontheweb,ordirectlyinrelationaldatabasesorHadoop.

Combined with a SAS Scoring Accelera-tor(availableforHadoop,PivotalGreenplum,DB2,IBMNetezza,Oracle,Teradata and SAS Scalable Performance DataServer),SASEnterpriseMinermodels can be published as database-specific scoring functions for execution directly in the database. Results can be passed to other SAS solutions for deployment of data mining results into real-time operational environments.

Parallelized grid-enabled workbenchScale from a single-user system to very largeenterprisesolutionswiththeJavaclient and SAS server architecture. Powerful servers can be dedicated to computing,whileusersmovefromofficeto home to remote sites without losing access to mining projects or services. Manyprocess-intensivetasks,suchasdatasorting,summarization,variableselectionandregressionmodeling,aremultithreaded,andprocessescanberunin parallel for distribution and workload balancing across a grid of servers or scheduled for batch processing.

Distributable data mining system suited for enterprisesSAS Enterprise Miner is deployable via a thin-client web portal for distribution to multiple users with minimal maintenance oftheclients.Alternatively,thecompletesystem can be configured on a stand-alone PC. SAS Enterprise Miner supports WindowsserversandUNIXplatforms,making it the software of choice for organizations with large-scale data mining projects. Model result packages can be created and registered to the SAS Metadata Server for promotion to SAS ModelManager,SASDataIntegrationStudio (a component of SAS Data Integration) and SAS Enterprise Guide.

High-performance data mining A select set of high-performance data mining nodes is included in SAS Enterprise Miner. Depending on the data and complexityofanalysis,usersmayfindperformance gains in a single-machine SMPmode.Inthefuture,asyouneedtoprocessbigdatafaster,aseparatelicensableproduct,SASHigh-PerformanceDataMining,letsyoudeveloptimelyandaccuratepredictivemodels.High-perfor-mance data mining procedures are available for those who prefer a coding environment. Many options are provided for complete customization of your data miningprograms.Formoredetails,visitsas.com/hpdatamining.

Page 4: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Intuitive interfaces• Easy-to-useGUIforbuildingpro-

cess flow diagrams:• Buildmore,bettermodelsfaster.• Deliverableviatheweb.• AccesstheSASprogramming

environment.• ProvidesXMLdiagramexchange.• Reusediagramsastemplatesfor

other projects or users. • Directlyloadaspecificdatamin-ingprojectordiagram,orchoosefromaProjectNavigatortreethatcontains the most recent projects or diagrams.

• Batchprocessing(programdevel-opment interface):• Encapsulatesallfeaturesofthe

GUI.• SASmacrobased.• Embedtrainingandscoringpro-

cesses into customized applica-tions.

Scalable processing• Server-basedprocessing.• Gridcomputing,in-databaseand

in-memory processing options.• Asynchronousmodeltraining.• Abilitytostopprocessingcleanly.• Parallelprocessing–runmultiple

tools and diagrams concurrently. • Multithreadedpredictivealgorithms.• Allstoragelocatedonservers.

Accessing and managing data• Accessandintegratestructuredandunstructureddatasources,includingtimeseriesdata,marketbaskets,webpathsandsurveydataas candidate predictors.

• FileImportnodeforeasyaccesstoMicrosoftExcel,comma-delimitedfiles,SASandothercommonfileformats.

• Supportforvariableswithspecial characters.

• SASLibraryExplorerandLibrary Assignment wizard.

Key Features

Figure2:WithintheSASEnterpriseMinerGUI,theprocessflowdiagramisaself-documentingtemplate that can be easily updated or applied to new problems and shared with modelers or other analysts.

• EnhancedExplorerwindowtoquickly locate and view table listings or develop a plot using interactive graph components.

• DropVariablesnode.• MergeDatanode.• Appendnode.• Filteroutliers:• Applyvariousdistributional

thresholds to eliminate extreme interval values.

• Combineclassvalueswithfewerthan n occurrences.

• Interactivelyfilterclassand numeric values.

• Metadatanodeformodifyingcolumnsmetadatasuchasrole,measurement level and order.

• IntegratedwithSASDataIntegrationStudio,SASEnterpriseGuide,SASModel Manager and SAS Add-In for Microsoft Office through SAS Metadata Server:• Buildtrainingtablesformining.• Deployscoringcode.

Sampling• Simplerandom.• Stratified.• Weighted.

• Cluster.• Systematic.• FirstN.• Rareeventsampling.• Stratifiedandevent-levelsampling

in Teradata 13.

Data partitioning• Createtraining,validationandtest

data sets.• Ensuregoodgeneralizationofyour

models through use of holdout data.• Defaultstratificationbytheclass

target.• Balancedpartitioningbyanyclass

variable.• OutputSAStablesorviews.

Transformations• Simple:log,log10,squareroot,inverse,square,exponentialandstandardized.

• Binning:bucketed,quantileandoptimalbinning for relationship to target.

• Bestpower:maximizenormality, maximize correlation with target and equalize spread with target levels.

• Interactionseditor:definepolynomialand nth degree interaction effects.

Page 5: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Figure3:DevelopcustomizedtransformationsusingtheinteractiveTransformVariablesnode Expression Builder.

• Interactivelydefinetransformations:• Definecustomizedtransformations

using the Expression Builder or SAS code editor.

• Comparethedistributionofthenewvariable with the original variable.

• Predefineglobaltransformationcode for reuse.

Interactive variable binning• Quantileorbucket.• Ginivariableselection.• Handlemissingvaluesasseparate

group. • Fineandcoarseclassingdetail.• Profilebinsbytarget.• Modifygroupsinteractively.• Savebinningdefinitions.

Rules Builder node• Createadhocdata-drivenrulesand

policies.• Interactivelydefinethevalueofthe

outcome variable and paths to the outcome.

Data replacement• Measuresofcentrality.• Distribution-based.• Treeimputationwithsurrogates.• Mid-mediumspacing.• RobustM-estimators.• Defaultconstant.• ReplacementEditor:• Specifynewvaluesforclass

variables.• Assignreplacementvaluesfor

unknown values.• Interactivelycapextremeinterval

values to a replacement threshold.

Descriptive statistics• Univariatestatisticsandplots:• Intervalvariables:n,mean,median,min,max,standarddeviation,scaleddeviation and percent missing.

• Classvariables:numberofcatego-ries,counts,mode,percentmodeand percent missing.

• Distributionplots.

• Statisticsbreakdownforeachlevel of the class target.

• Bivariatestatisticsandplots:• OrderedPearsonandSpearman

correlation plot.• Orderedchi-squareplotwithop-

tion for binning continuous inputs into nbins.

• Coefficientofvariationplot.• Variableselectionbylogworth.• Otherinteractiveplots:• Variableworthplotrankinginputs

based on their worth with the target.

• Classvariabledistributionsacrossthe target and/or the segment variable.

• Scaledmeandeviationplots.

Graphs/visualization• Batchandinteractiveplots:scatter,matrix,box,constellation,contour,

needle,lattice,densityandmultidi-mensionalplots;3-D,pieandareabar charts; and histograms.

• Segmentprofileplots:• Interactivelyprofilesegmentsof

data created by clustering and modeling tools.

• Easilyidentifyvariablesthatdetermine the profiles and the differences between groups.

• Easy-to-useGraphicsExplorerwizard and Graph Explore node:• Createtitlesandfootnotes.• ApplyaWHEREclause.• Choosefromcolorschemes.• Easilyrescaleaxes.• Surfacetheunderlyingdatafrom

standard SAS Enterprise Miner results to develop customized graphics.

• Plotsandtablesareinteractivelylinked,supportingtaskssuchasbrushing and banding.

Page 6: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Figure 4: Use link analysis to evaluate relationships between nodes to visually discover new patterns.

• Dataandplotscanbeeasilycopiedand pasted into other applications or saved as BMP files.

• Interactivegraphsareautomaticallysaved in the Results window of the node.

Clustering and self-organizing maps• Clustering:• Userdefinedorautomatically

chooses the best clusters.• Severalstrategiesforencoding

class variables into the analysis.• Handlesmissingvalues.• Variablesegmentprofileplots

show the distribution of the inputs and other factors within each cluster.

• Decisiontreeprofileusestheinputs to predict cluster member-ship.

• PMMLscorecode.• Self-organizingmaps:• BatchSOMswithNadaraya- Watsonorlocal-linearsmoothing.

• Kohonennetworks.• Overlaythedistributionofother

variables onto the map.• Handlesmissingvalues.

Market basket analysis• Associationsandsequencediscovery:• Gridplotoftherulesorderedby

confidence.• Expectedconfidenceversus

confidence scatter plot.• Statisticslineplotofthelift,confi-dence,expectedconfidenceandsupport for the rules.

• Statisticshistogramofthefre-quency counts for given ranges of support and confidence.

• Rulesdescriptiontable.• Networkplotoftherules.

• Interactivelysubsetrulesbasedonlift,confidence,support,chainlength,etc.

• Seamlessintegrationofruleswithother inputs for enriched predictive modeling.

Dimension reduction• Variableselection:• Removevariablesunrelatedto

target based on a chi-square or R2selectioncriterion.

• Removevariablesinhierarchies.• Removevariableswithmany

missing values.• Reduceclassvariableswitha

large number of levels.• Bincontinuousinputstoidentify

nonlinear relationships.• Detectinteractions.

• LeastAngleRegression(LARS) variable selection:• AIC,SBC,MallowsC(p),cross-

validation and other selection criteria.

• Plotsinclude:parameteresti-mates,coefficientpaths,iterationplot,scorerankingsandmore.

• GeneralizestosupportLASSO(least absolute shrinkage and selection operator).

• Supportsclassinputsandtargetsas well as continuous variables.

• Scorecodegeneration.• Principalcomponents:• CalculateEigenvaluesand

Eigenvectors from correlation and covariance matrices.

• Hierarchicalassociations:• Deriverulesatmultiplelevels.• Specifyparentandchildmappings

for the dimensional input table.

Web path analysis• Scalableandefficientminingofthe

most frequently navigated paths from clickstream data.

• Minefrequentconsecutivesubse-quences from any type of sequence data.

Link analysis• Convertsdataintoasetofintercon-

nected linked objects that can be visualized as a network of effects.

• Providesavisualmodelofhowtwovariables’ levels in relational data or between two items’ conoccurrence in transactional data are linked.

• Providescentralitymeasuresandcommunity information to understand linkage graphs.

• Providesweightedconfidencestatis-tics to provide next-best offer informa-tion.

• Generatesclusterscoresfordatareduction and segmentation.

Page 7: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Figure5:IntegratecustomizedSAScodetocreatevariabletransformations,incorporate SASprocedures,developnewnodes,augmentscoringlogic,tailorreportsandmore.

• Plotsinclude:principalcom-ponentscoefficients,principalcomponentsmatrix,Eigenvalue,Log Eigenvalue and Cumulative Proportional Eigenvalue.

• Interactivelychoosethenumberof components to be retained.

• Mineselectedprincipalcompo-nents using predictive modeling techniques.

• Variableclustering:• Dividevariablesintodisjointor

hierarchical clusters. • Eigenvalueorprincipalcompo-

nents learning.• Includesclassvariablesupport.• Dendrogramtreeoftheclusters.• Selectedvariablestablewith

cluster and correlation statistics. • ClusternetworkandR-square

plot. • Interactiveuseroverrideof

selected variables.• Timeseriesmining:• Reducetransactionaldatainto

a time series using several accumulation methods and transformations.

• Analysismethodsincludeseasonal,trend,timedomain,andseasonal decomposition.

• Minethereducedtimeseriesusingclustering and predictive modeling techniques.

SAS Code node• WriteSAScodeforeasy-to-com-

plex data preparation and transfor-mation tasks.

• IncorporateproceduresfromotherSAS products.

• Developcustommodels.• CreateSASEnterpriseMinerexten-

sion nodes.• Augmentscorecodelogic.• SupportforSASprocedures.• Batchcodeusesinputtablesof

different names and locations.•Batchcodenowintegratesproj-

ect-start code that you can use to define libraries and options.

•Easy-to-useprogramdevelop-ment interface:

• Macrovariablestoreferencedatasources,variables,etc.

• Interactivecodeeditorandsubmit.• Separatelymanagetraining,scor-

ing and reporting code. • SASOutputandSASLOG.

• Creategraphics.

Consistent modeling features• Selectmodelsbasedoneitherthetraining,validation(default)ortestdata using several criteria such as profitorloss,AlC,SBC,averagesquareerror,misclassificationrate,ROC,Gini,orKS(Kolmogorov-Smirnov).

• Incorporatepriorprobabilitiesintothe model development process.

• Supportsbinary,nominal,ordinaland interval inputs and targets.

• Easyaccesstoscorecodeandall partitioned data sources.

• Displaymultipleresultsinonewin-dow to help better evaluate model performance.

• Decisionsnodeforsettingtargetevent and defining priors and profit/loss matrices.

Regression• Linearandlogistic.• Stepwise,forwardandbackward

selection.

• Equationtermsbuilder:polynomi-als,generalinteractions,andeffecthierarchy support.

• Cross-validation.• Effecthierarchyrules.• Optimizationtechniquesinclude:ConjugateGradient,DoubleDogleg,Newton-RaphsonwithLineSearchorRidging,Quasi-Newtonand Trust Region.

• DmineRegressionnode:• Fastforwardstepwiseleast

squares regression.• Optionalvariablebinningto

detect nonlinear relationships. • Optionalclassvariablereduction.

• Includeinteractionterms.• In-databasemodelingforTera-

data 13. • PMMLscorecode.

Decision trees• Methodologies:• CHAID,classificationandregressiontrees,baggingandboosting,gradientboosting,and bootstrap forest.

• Treeselectionbasedonprofitor lift objectives and prune accordingly.

• K-foldcross-validation.• Splittingcriterion:ProbChi-squaretest,ProbF-test,Gini,Entropyandvariance reduction.

Page 8: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Figure6:FithighlycomplexnonlinearrelationshipsusingtheNeuralNetworknode.

• Switchtargetsfordesigningmulti-objective segmentation strategies.

• AutomaticallyoutputleafIDsasinputs for modeling and group processing.

• DisplaysEnglishrules.• Calculatesvariableimportancefor

preliminary variable selection and model interpretation.

• Displayvariableprecisionvaluesinthe split branches and nodes.

• Uniqueconsolidatedtreemaprepresentation of the tree diagram.

• Interactivetreecapabilities:• Interactivegrowing/pruning

of trees; expand/collapse tree nodes.

• Incorporatesvalidationdatatoevaluate tree stability.

• Definecustomizedsplitpoints,including binary or multiway splits.

• Splitonanycandidatevariable.• Copysplit.• Tablesandplotsaredynamically

linked to better evaluate the tree performance.

• Easy-to-printtreediagramsona single page or across multiple pages.

• Interactivesubtreeselection.• User-specifieddisplayoftextand

statistics in the Tree node.• User-controlledsamplesizewithin

interactive trees.• BasedonthefastARBORETUM

procedure.• PMMLscorecode.

Neural networks• NeuralNetworknode:• Flexiblenetworkarchitectures

with combination and activation functions.

• 10trainingtechniques.• Preliminaryoptimization.• Automaticstandardizationof

inputs.• Supportsdirectionconnections.

• AutoneuralNeuralnode:• Automatedmultilayerper-

ceptron building searches for optimal configuration.

• Typeandactivationfunctionselected from four different types of architectures.

• PMMLscorecode.• DMNeuralnode:• Modelbuildingwithdimension

reduction and function selection.• Fasttraining;linearandnonlinear

estimation.

Partial Least Squares node• Especiallyusefulforextracting

factors from a large number of potential correlated variables.

• Performsprincipalcomponentsregression and reduced rank regression.

• Userorautomatedselectionofthe number of the factors.

• Choosefromfivecross-validationstrategies.

• Supportsvariableselection.

Rule induction• Recursivepredictivemodeling

technique. • Especiallyusefulformodelingrare

events.

Two-stage modeling• Sequentialandconcurrentmodel-

ing for both the class and interval target.

• Chooseadecisiontree,regressionor neural network model for each stage.

• Controlhowtheclasspredictionisapplied to the interval prediction.

• Accuratelyestimatecustomervalue.

Memory-based reasoning• k-nearest neighbor technique to

categorize or predict observations. • PatentedReducedDimensionality

Tree and Scan.

Model ensembles• Combinemodelpredictionsto

form a potentially better solution. • Methodsinclude:Averaging,Vot-

ing and Maximum.

Open Source Integration node• WritecodeintheRlanguage

inside of SAS Enterprise Miner.

Page 9: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Figure 7: Analyze time series data using classical seasonal decomposition.

• SASEnterpriseMinerdataandmetadata are available to your R code with R results returned to SAS Enterprise Miner.

• Trainandscoresupervisedandunsupervised R models. The node allows for data transformation and exploration.

• GeneratemodelcomparisonsandSAS score code for supported models.

Incremental response/net lift models• Nettreatmentvs.controlmodels.• Binaryandintervaltargets.• Stepwiseselection.• Fixedorvariablerevenuecalcula-

tions. • Netinformationvaluevariable

selection.• Usercanspecifythetreatment

level of the treatment variable.• Usercanspecifyacostvariablein

addition to a constant cost.• PenalizedNetInformationValue(PNIV)forvariableselection.

• Separatemodelselectionoptionsavailable for an incremental sales model.

Time series data mining• Timeseriesdatapreparation:• Aggregate,transformand

summarize transactional and sequence data.

• Automaticallytransposethetime series to support similarity analysis,clusteringandpredictivemodeling.

• ProcessdatawithorwithoutTimeID variables.

• Similarityanalysis:• Usefulfornewproductforecast-ing,patternrecognitionandshortlifecycle forecasting.

• Computessimilaritymeasuresbetween the target and input se-ries,oramonginputtimeseries.

• Similaritymatrixforallcombina-tions of the series.

• Hierarchicalclusteringusingthesimilarity matrix with dendro-gram results.

• Constellationplotforevaluatingthe clusters.

• Exponentialsmoothing:• Controlweightsdecayusingone

or more smoothing parameters.• Best-fittingsmoothingmethod(simple,double,linear,dampedtrend,seasonalorWinters’method) is selected automatically.

• Dimensionreduction:• Supportsfivetimeseriesdimen-

sion reduction techniques: DiscreteWaveletTransform,DiscreteFourierTransform, SingularValueDecomposition,Line Segment Approximation withtheMean,andLineSeg-ment Approximation with the Sum.

• Cross-correlation:• Providesautocorrelationand

cross-correlation analysis for time-stamped data.

• TheTimeSeriesCorrelationnode outputs time-domain statistics based on whether auto-correlation or cross-correlation is performed.

• Seasonaldecomposition.

Survival analysis• Discrete time to event regression

with additive logistic regression.• Event probability for time effect is

modeled using cubic splines.• Userscannowenterthecubic

spline basis functions as part of the stepwise variable selection procedure in addition to the main effects.

• User-definedtimeintervalsforspecifying how to analyze the data and handle censoring.

• Automaticallyexpandsthedatawith optional sampling.

• Supportsnon-timevaryingcovariates.• Computessurvivalfunctionwith

holdout validation.• Generates competing risks or sub-

hazards.• Scorecodegenerationwithmean

residual life calculation. • Userscanenterthecubicspline

basis functions as part of the step-wise variable selection procedure in addition to the main effects.

• Incorporatetime-varyingcovariatesinto the analysis with user-specified dataformats,includingstandard,change-time and fully expanded.

• Userscanspecifyleft-truncationand censor dates.

Page 10: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Figure 8: Model the incremental impact of a marketing treatment in order to maximize the return on investment.

Group processing with the Start and End Groups nodes• Repeatprocessingoverasegment

of the process flow diagram. • Usesincludestratifiedmodeling,baggingandboosting,multipletargets,andcross-validation.

SAS® Rapid Predictive Modeler customized task in SAS® Enter-prise Guide® or SAS® Add-In for Microsoft Office (Excel only)• Automaticallygeneratespredic-

tive models for a variety of busi-ness problems.

• Modelscanbeopened,augment-ed and modified in SAS Enter-prise Miner.

• Producesconcisereports,includ-ingvariableimportanceliftcharts,ROC charts and model scorecards for easy consumption and review.

• Abilitytoscorethetrainingdatawith option to save the scored data set.

High-performance data mining procedures• Multithreadedproceduresexecute

concurrently and take advantage of all available cores on your existing symmetric multiprocess-ing (SMP) server to speed up processing:• HPBIN(high-performance

binning).• HPBNET(high-performance

Bayesian networks).• HPCLUS(high-performance

clustering).• HPCORR(high-performance

correlation).• HPDECIDE(high-performance

decide).• HPDMDB(high-performance

data mining database).• HPDS2(high-performanceDS2).• HPFOREST(high-performance

random forests).• HPIMPUTE(high-performance

imputation).

• HPNEURAL(high-performanceneural networks).

• HPREDUCE(high-performancevariable reduction).

• HPSAMPLE(high-performancesampling).

• HPSUMMARY(high-perfor-mance data summarization).

• HPSVM(high-performance SupportVectorMachine).

• HP4SCORE(high-performance4Score).

• High-performance-enabled SAS Enterprise Miner nodes:• HPCluster.• HPDataPartition.• HPExplore.• HPForest.• HPGLM.• HPImpute.• HPNeuralNetwork.• HPPrincipalComponents.• HPRegression.• HPSVM.• HPTree.• HPTransform.• HPVariableSelection.

Model Import node• RegisterSASEnterpriseMiner

models for reuse in other dia-grams and projects.

• Importandevaluateexternalmodels.

Model evaluation• ModelComparisonnodecom-

pares multiple models in a single framework for all holdout data.

• Automaticallyselectsthebestmodel based on the user-defined model criterion.

• Supportsuseroverride.• Extensivefitanddiagnostics

statistics.• Liftcharts;ROCcurves.• Intervaltargetscorerankingsand

distributions.• Profitandlosschartswithdeci-

sion selection; confusion (classifi-cation) matrix.

• Classprobabilityscoredistribu-tion plot; score ranking matrix plots.

• Cutoffnodetodetermineprob-ability cutoff point(s) for binary targets.

• Useroverridefordefaultselec-tion.

• MaxKSStatistic.• MinMisclassificationCost.• MaximumCumulativeProfile.• MaxTruePositiveRate.• MaxEventPrecisionfromTraining

Prior.• EventPrecisionEqualRecall.

Page 11: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

Figure 9: Automatically generate predictive models for a variety of business problems using the SAS Rapid Predictive Modeler task in SAS Enterprise Guide or SAS Add-In for Microsoft Office (Microsoft Excel only).

Figure10:Buildarandomforestmodel,whichconsistsofensemblingseveraldecisiontrees.Throughmultipleiterations,randomlyselectvariablesforsplittingwhilereducingthe dependence on sample selection. Use out-of-bag samples to form predictions.

Reporter node• UsesSASOutputDeliverySystem

to create a PDF or RTF of a pro-cess flow.

• Helpsdocumenttheanalysispro-cess and facilitate results sharing.

• Documentcanbesavedandincluded in SAS Enterprise Miner results packages.

• Includesimageoftheprocessflowdiagram.

• User-definednotesentry.

Save Data node• Savetraining,validation,test,score

or transaction data from a node to either a previously defined SAS library or a specified file path.

• ExportJMP®,Excel2010,CSVandtab delimited files. Default options are designed so that the node can be deployed in SAS Enterprise Miner batch programs without user input.

• Canbeconnectedtoanynodeina SAS Enterprise Miner process flowdiagramthatexportstraining,validation,test,scoreortransac-tion data.

Scoring• Scorenodeforinteractivescoring

in the SAS Enterprise Miner GUI.• Optimizedscorecodeiscreatedbydefault,eliminatingunused variables.

• AutomatedscorecodegenerationinSAS,C,JavaandPMML(Version4.0).

• SAS,CandJavascoringcodecapturesmodeling,clustering,transformations and missing value imputation code.

• ScoreSASEnterpriseMinermod-elsdirectlyinsideAsterData,DB2,Greenplum,Hadoop,IBMNetezza,Oracle and Teradata databases with SAS Scoring Accelerator.

Page 12: SAS Enterprise Miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise

To learn more about SAS Enter-prise Miner system requirements, download white papers, view screenshots and see other related material, please visit sas.com/enterpriseminer.

Figure 11: Evaluate multiple models together in one easy-to-interpret framework using the Model Comparison node.

Model registration and management• Registersegmentation,classifi-

cation or predictive models to the SAS Metadata Server. Input variables,outputvariables,targetvariables,miningfunction,trainingdata and SAS score code are regis-tered to the metadata. • RegisterModelnodeconsoli-

dates registration steps and pro-vides a registration mechanism that can run in SAS Enterprise Miner batch code.

• RegistrationofmodelsinSASMetadata Server enables:• IntegrationwithSASModel

Manager for complete lifecycle management of models.

• IntegrationwithSASEnterpriseGuide and SAS Data Integration Studio for scoring models.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names aretrademarksoftheirrespectivecompanies.Copyright©2014,SASInstituteInc.Allrightsreserved.101369_S114796.0414

Figure12:Developdecisiontreesinteractivelyorinbatch.Numerousassessmentplotshelpgauge overall tree stability.