A PRIMER TO MACHINE LEARNING FOR FRAUD … · Machine Learning For Fraud Prevention ... Limitations...
Transcript of A PRIMER TO MACHINE LEARNING FOR FRAUD … · Machine Learning For Fraud Prevention ... Limitations...
TABLEOFCONTENTS
GrowingNeedforReal-TimeFraudIdentification...................................................................3
MachineLearningToday........................................................................................................4
BigDataMakesAlgorithmsMoreAccurate............................................................................5
MachineLearningForFraudPrevention.................................................................................5
ApplyingMachineLearning....................................................................................................6
MachineLearningEngines......................................................................................................7
BeyondFraudPrevention.......................................................................................................8
LimitationsWithMachineLearning........................................................................................8
ThePromiseofMachineLearningforFraudPrevention..........................................................8
.................................................................................................................................................
3
GROWINGNEEDFORREAL-TIMEFRAUDIDENTIFICATION
Fraudattacksaregettingtobemoresophisticated–astechnologyevolvesfraudstershaveelevatedtheirgameonpaymentfraudandmoneylaundering.Withaccesstofasterandcheapercomputing,fraudstershaveshiftedtheirtargetstomoreprofitableweakerpointsinthefinancialserviceschain.
Sixty-fivepercentoforganizationswithannualrevenuesofatleast$1billionwerevictimsofpaymentsfraudin2014comparedto56percentofcompaniesreportingannualrevenuesoflessthan$1billion.1
Newerbusinessmodelsareconstantlyevolving-frominstantdeliveryofgoodstovirtualcashtodigitaldownloads.However,thegrowthinopportunitieshasledtoacorrespondinggrowthinonlinefraudandfraudlossesparticularlyinecommercewhereitis7timesmoredifficulttopreventfraudthanintheperson2.AccordingtoLexisNexisFraudMultiplier,in2015,every$100offraudcostsamerchant$223intruecosts.
Theever-faster,ever-biggercycleofattacksleadstoanumberofconsequences:
MagnitudesofattacksareexponentiallyhigherFraudstersareemployingdistributednetworks,internalknowledge,bigdata,andevenmachinelearningtoeasilydetectvulnerabilityandmaximizethesizeoftheattacks
WeakestlinkscreatethemostexposureFinancialsystemsareinterconnectedandconsistofalongvaluechain,anetworkedecosystemofmultipleentitiesconnectingbuyersandsellers.Fraudflowstotheleast-protectedcomponents.
UnexpectedattackscanbeunsettlinganddisruptiveOrganizationscangofromnothavingafraudproblemtobeingdevastatedinjustafewdays(e.g.,Target,NeimanMarcus)
62%56%
65% 69%
55%
0%10%20%30%40%50%60%70%80%
All RevenueLessThan$1Billion
RevenueAtLeast$1Billion
RevenueAtLeast$1BillionandFewerThan26PaymentAccounts
RevenueAtLeast$1BillionandMoreThan100PaymentAccounts
PercentofOrganizationsSubjecttoAttemptedand/orActualPaymentsFraudin2014
CONCLUSIONFraudsolutionsneedstomoresophisticatedtokeepinpacewiththefraudstersandreactwithintheshorttimefraudattackshappentowhentheyarediscovered.Organizationsthatwanttodefendthemselvesagainstfraudneedtohaveasuperior,faster-learningsolutionthatcanconstantlyevolveyetiseasytouseandmaintain.
4
MACHINELEARNINGTODAY
Machinelearningasadatasciencetouncoverpatternsandhiddeninsightsisnotentirelyanewconcept–Ithasbeeninplaywiththeuseofneuralnetworksstartinginthe1980’s.Thequestionthereforeis,“Whyisthereabigbuzzaroundmachinelearningtoday?”
Theanswerliesinthefactthatadvancementintechnologyandsciencehasenabledgame-changingdifferencesinhowmachine-learningalgorithmshaveevolvedandisbeingapplied.
Forexample,traditionally,human-generatedrulesetswerethemostprevalentapproachinfraudmanagementandstillcontinuetobeinpracticetoday.Butthequantumleapincomputingpowerandavailabilityofbigdataoverthelast5yearshasdisruptedhowdataisbeingusedtoidentifyandpreventfraud.Machinelearningusesartificiallyintelligentcomputersystemstoautonomouslylearn,predict,actandexplainwithoutbeingexplicitlyprogrammed.Simplyput,machinelearningeliminatestheuseofpreprogrammedrulesets-nomatterhowcomplex.
Machinelearningenables:
Real-timedecisionsAdvanceswithin-memory,eventstreamingtechnologyallowriskscoringanddecisionmakinginthesub-secondrange(i.e.,ultra-lowlatency).
BigDatasetprocessingAdvancesindistributeddataprocessingallowanalyzingmoredatawhilestillmaintainingreal-timedecisionswithouttrade-offsbetweendataandlatency.
ReducedcycletimeLearningcyclesarecontinuousunlikebatchlearningwheremodelsbecomeout-of-date;Withmachinelearning,thesametransactionsbeingscoredalsoupdate/teachthemachinelearningmodels.
IncreasedeffectivenessExtremelysubtlepatternsandvariationscanbedetectedanddelivered(e.g.precision,recall)betterthanhumansinmanytasks.
Error-freeprocessingEnormousamountsofdatacannowbeprocessedwithouthuman-biasorerror.
CostefficienciesAddresslongtail“cornercase”distribution.
CONCLUSIONApplicationofmachinelearninghasredefinedpreviousstrategiesandtoolsinfraudmanagementdeliveringbenefitsthatwerepreviouslynotpossiblewithtraditionalmethods.
5
BIGDATAMAKESALGORITHMSMOREACCURATE
AsbusinessescontinuestoevolveandmigratetotheInternetandasmodernmoneyistransactedelectronicallyinanever-growingcashlessbankingeconomy,commerceisincreasinglybecomingthebusinessofbigdatascience.Ofthe$11TinUSpersonalconsumptionexpendituresprojectedin2017,anastounding79%ofthatwillbeintheformofelectronicpaymentswithafacevalueof$8.5T,ornearly50%oftheGDPoftheUS3.
Fortunately,thisrapidlyexpanding“dataverse”alsofuelsmodernartificialintelligence,makingbigdataaninextricablecomponentoftoday’sfraudmanagement.JustlikeIBM’sDeepBluecomputeroutplayedGarryKasparovbyhavinglearnedfrommillionsofchessgames,machinelearningingeneralrequiresaccesstolargeamountsofdatatobeabletolearnandgeneralizeknowledge.
Withoutlargeamountsofdata,amachine-learningalgorithmcannotlearn.Theexistenceofefficientalgorithmstoprocessthisdataveryquicklyopenedupthepossibilityforsophisticatedmachinelearningalgorithmssuchasspamdetection,efficientcontentrecommendations,autonomousdrivingcars,imagerecognition,naturallanguageprocessing,automatictranslation,andofcourse,fraudmanagement.
MACHINELEARNINGFORFRAUDPREVENTION
Tounderstandwhymachinelearningisimportantinfraudmanagement,weneedtounderstandthecharacteristicsoffraudalongwiththeassociatedbusinessandtechnicalchallenges.
Fraud’sUniqueCharacteristics:
FraudhasalongtaildistributionToomanyuniquecasestopursue.
FraudpatternschangequicklySlow-learningcountermeasurescannotkeepup.
FraudisadversarialProfessionalopponentsactivelyworkingtosubvertthesystemattheweakestpoints.
FraudmimicsgoodcustomerbehaviorsGoodcustomersarepenalizedbyover-intrusivecountermeasures.
MachineLearningdirectlyaddressesmanybusinesschallengesthataretimeconsumingandexpensive–ForExample:manualreviewsandfalsepositivesaloneaccountforalmost40%ofthetotalcostoffraudprevention.AccordingtoLexusNexus“TheTotalCostofFraudPrevention”study,merchantsallocateasmuchasone-fourthofcostsdedicatedtofraudpreventiontomanualreview.Furthermore,newcustomerchannels(e.g.,mobile,social),newproductsandbusinesslinespresentnewriskvectors-fraudthroughremotechannelsisupto7timesasdifficulttopreventasin-personfraud.
6
MachineLearningcan:
Reducemanualreviewqueuesthroughfastiteratingmachinemodels
Bechannel-agnostic
Easilyadapttonewbusinesslinesusingexperientialdata
Augmenthumandecision-makingwithincreasedprecision
Reducefalsepositiveswithbehavioranalysis
APPLYINGMACHINELEARNING
MachineLearningmodelscanbeusedtoveryefficientlyperformanalyticsanddeliverriskscoresinreal-time,withgreateraccuracybyleveraginglargeamountsofuserdata.Feedzai’sexistingmodelwasabletodetect+60%ofallfraudtransactionsforamajorretailercorrespondingto+70%oftheirfraudmoney.Whentrainedtoincludetheretailerfraud,themodelimprovedtodetect+65%offraudtransactionsand+75%ofthetotalfraudmoney.
Behavioranalyticsbuilddigitalfootprintswhichcanthenbeusedtolearnfrompastdatainordertomakepredictionsonfuture,unseendatapatterns.Forexample,inaretailenvironment,intelligencearounduserbehaviorcanbeusedtodeterminetheirbuyingschema–merchandisetheybuy,storestheyfrequentlyvisit,timestheyshop,channelthroughwhichtheyshopetc.,Machinelearningalgorithmscanthensynthesizethisdatacollectedfrommultiplesources–onlineandoffline-tobaselinebehaviorprofiles.Userattributesandotherdatafieldsusedbymachinelearningalgorithmscanautomaticallylearnpatternswhicharethenusedtomakepredictions.
Machinelearningcanalsobeusedtoautomaticallyderiveoutcomemeasurementssuchasastatisticalrisk(Themeasurementofthelikelihoodofincurringloss).Theeffectivenessofthestatisticalriskscoredependsonthemodel’sabilitytodetectanomaliesfromknownpatterns,identifymatchestoknownpatterns,anduncovernewpatterns.
CONCLUSIONSophisticatedmodelscanreverseengineermachinelogictopresenthuman-readablelanguagetoexplainmodeldecisions.
7
MACHINELEARNINGENGINES
Mathematicalalgorithmspowermachinelearning.But,thetruthisthereisnotonesinglebestalgorithmthatisuniversallybetterinallsituations-choosingthebestalgorithmdependsontheproblemtype,size,availableresources,etc.Havingsaidthat,RandomForests(akaEnsembleofDecisionTrees)andDeepLearninghavebeenshowntoperformverywellinanumberofscenarios,withSVM(SupportVectorMachines)aclosesecond.RandomForestsaremorerobustforanumberofrealworldproblemssuchasmissingdata,noise,outliers,anderrors.Inaddition,RandomForestsalsoallowmultipletypesofdata(numbersofdifferentscales,text,Booleans,etc.),canscaleverywell,parallelizeveryeasily,arefasttotrainandscore,andrequirelessefforttoachievethebestresults.ItisnosurprisethatRandomForestswinmanymachinelearningcompetitions(asdescribedbyKaggle.com,theworld’sleadingmachinelearningcompetitionsiteanddatasciencecommunity).
Algorithm Pro Con
RandomForest,akaEnsembleofDecisionTrees
•Generalizespatternswell•Robusttodifferentinputtypes(texts,numbersofscales,etc.)•Robusttomissingdata•Robusttooutliersanderrors•Fasttotrainandscore•Triviallyparallel•Requireslesstuning•Probabilisticoutput(i.e.ascore)•Canadjustthresholdtotradeoffbetweenprecisionandrecall•Verygoodpredictivepower•Foundtowinmanymachinelearningcompetitors
•Canbecomecomplextointerpretasnumberofdecisionsgrow(inherentnatureofincreasedcapacitytomakedecisions),butbetterthanallothers,especiallywithWhiteboxscoringtodemystifydecisionnodes•Requireslabeleddata
DeepLearning•Doesnotrequirelabeleddata•Reducesfeaturedesigntasks•Learnsmultiplelevelsofrepresentation(e.g.eyes,head,person)•Highlyparallel•Verygoodpredictivepower,especiallyintextandimageclassificationproblems
•Veryslowtrain,butbenefitsfromrecentarchitectureadvances(e.g.GPU’s,largeclusters)•Cannothandledifferentinputtypes•Needscalinginputs•Needstuning•Doesnotprovideprobabilityestimates•Lackofgoodinterpretability•Stillmissingtheoreticalfoundations
SupportVectorMachines(SVM)
•Abletodetectnon-linearandcomplexpatterns•Effectiveinveryhighdimensionalspaces•Verygoodpredictivepower
•Requireslabeleddata•Cannothandledifferentinputtypes•Needscalinginputs•Cannothandlemissingvalues•Notscalable•Slow•Needstuning•Doesnotprovideprobabilityestimates•Lackofinterpretability•Stillmissingtheoreticalfoundations
NeutralNetworks•Abletorepresentcomplexpatterns•Goodpredictivepower
•Requireslabeleddata•Cannothandledifferentinputtypes•Needscalinginputs•Cannothandlemissingvalues•Notscalable•Slow•Needstuning•Lackofinterpretability
K-NearestNeighbors
•Robusttomissingdata•Robusttooutliers•Goodpredictivepower
•Requireslabeleddata•Cannothandledifferentinputtypes•Needscalinginputs•Cannothandlemissingvalues•Needstuning•Lackofinterpretability
8
BEYONDFRAUDPREVENTION
Machinelearningisnotjustisolatedtoidentifyingandpreventingfraudinonlineretailenvironment.Machinelearningcanalsobeappliedwhereverlargeamountsofdatacanbeusedtounderstandandinferbehaviorforeffectivedecisionmaking.
• Accountopening:Validatetheauthencityofuserssigninguponlinetoverifyandacceptmoreapplicants• Paymentauthorization:Scorepaymentrequestsandauthorizepaymentsinreal-time• Checkoutscoring:Preventpaymentchargebacksbyscoringtransactionsduringcheckout• Merchantunderwriting:Protectyourmerchantportfoliothroughmerchantunderwriting• Marketplace:Maintaincommunitytrustbyconnectingbuyersandsellers
LIMITATIONSWITHMACHINELEARNING
OneofthebiggestobstaclestoMListhesteeplearningcurve.Datascienceknowledge,plustheamountoftimeanddataneededtocreatemodelsarebeyondreachofmanyriskteams.AsteeplearningcurvemeansdatascientistwhodomachinelearningneedtomastermanydifferenttoolssuchasR,Weka,Python,DBMS,NoSQLdatastores,Hadoopjobs,streamingsystemsandmore.Plus,itisveryhardtoevolveprofilesandmodelstoreflecttheever-changingnatureofbusiness,e.g.somecompaniesdeploy1-yearoldmodelsthatweretrainedusing2-yearolddata.
Thesecondbiggestchallengeisthatalotofmachinelearningisgroundedonblackboxdecision-making.Thisisaseriouslimitationasmanypolicyexecutionorgovernancerequirementsneedclearexplanationsofdecisions,e.g.explaintocustomerwhytransactionwasblocked.Finally,increasedcapacitytoprocessbigdatacreatesaninherenttendencytowardsincludeirrelevantdata.Machineslackcommonsensesohumansarestillneededtosupervise.
THEPROMISEOFMACHINELEARNINGFORFRAUDPREVENTION
Whilethemultiplemethodologiesinplacetodaytopreventfraudhavebeensuccessfulatkeepingfraudrateslowfortypicalpaymentfraud,theevolvinglandscapeofecommerceandmcommerceposenewerchallenges.Thesechallengesnecessitatemoreinnovativesolutionsthatcanrespondandreactquicklytofraud.Theneedforcomputationalpowertoprocesslargeamountsofdataandmakedecisionsrealtimeisimperativeforbusinessestoreachquicklytofraudattacks.Machinelearninginthisaspectisapromisingsciencethathaspotentialacrossmultipleenvironments.Frompaymentfraudtoabuse,machinelearningcaneasilyscaletomeetthedemandsofbigdatawithgreaterflexibilitythantraditionalmethods.
ABOUTFEEDZAIFeedzaiisfoundedondatascience,usingreal-time,machine-basedlearningtohelppaymentproviders,banksandretailerspreventfraudinomni-channelcommerce.Feedzaiwasdesignedfromthegroundupasabigdataanalyticsplatform,tunedspecificallyforthefraudmanagementdomain.
9
Source:12015AFPPaymentsFraudandControlSurvey2LexisNexusTruecostoffraudstudy20153http://www.nilsonreport.com/publication_chart_and_graphs_archive.php?1=1&year=2013,“PersonalConsumptionExpendituresintheU.S.”; USGDPin2012:$16.2T,http://data.worldbank.org/data-catalog/GDP-ranking-table).LexisNexus“TrueCostofFraud”2015study