RepèRes Bayesia Consumer Segmentation Skim Conf08

Bayesian Networks : a new toolfor consumer segmentationSkim Conference – Barcelona – May 28 th 2008

2

Skim Conference – Barcelona – May 28 th 2008

� Introduction to consumer segmentations

� A brief overview of Bayesian Networks

� Computing a segmentation with Bayesian Networks

� Conclusion

Summary

3


Introduction to consumer segmentations




� Conclusion

4


Different marketing strategies# Concepts# Products# Communication# Advertising

MORE EFFICIENT

Why a segmentation ?

� Valuable tool to understand a market

� Homogeneous marketing targets- people who behave the same way- people who have homogeneous motivations / attitudes.

� Groups of people to whom it is possible to speak the same language

5


A good segmentation - some important features

�Homogeneous segments

�Clear differences between segments

�Stable…

�Easy to understand

�Operational / Actionable

�Fair representation of the real world

Interpretation / Analysis

OutputStatistical procedure

Only a part of the whole process.How important is it ?

Preparation stage

TECHNICALQUALITY

AND OTHERVERY

IMPORTANTELEMENTS

6


The marketer’s dream…and cruel reality

� Obvious groups !

� Any kind of computation should lead to the same results

� More complicated

� Unlimited number of typologies

Procedure should guarantee a relevant clustering

7


Classical procedures

� A factorial analysis followed by a clustering of the individuals

� Canonical segmentation

Drawbacks : Difficult to choose what are the attitu des / what are the behaviours (declarative statements) – Time consuming .

CANONICAL ANALYSISCANONICAL ANALYSIS

ATTITUDESATTITUDES

Projection of the individuals on the factorial axisProjection of the individuals on the factorial axis

Clustering of the individualsClustering of the individuals

BEHAVIOURSBEHAVIOURS

8


A brief overview of Bayesian Networks




� Conclusion

9


Bayesian Networks

� A computational Tool to Model Uncertaintybased both on graphs theoryreadability – Powerful communication tool

and probability theorysound computations

� Manual modelling through brainstormingProbabilistic Expert Systems

� Induction by automatic learningData analysis, data mining

� Growing popularityIndustry, Defense, Health, …and now, Market Research

10


A complete framework for Data Mining

� Parametric estimationUse of the database to estimate the probabilities of a given structure

� Robust Missing values processingExpectation-Maximization (EM)Structural EM

� Structural learningUnsupervised learning to discover all the direct probabilistic relationsSupervised learning to characterize a target variableVariable clustering to induce “factors” made of highly connected variablesProbabilistic Structural Equations

and… Data Clustering to find groups of data sharing the same characteristics

11


Formalism : 2 distinctive parts

� StructureDirected acyclic graphs

� ParametersProbability distributions associated to each node

Example: Anti-doping agency using two different tests to

screen competitors

12


A reasoning engine 1/3

� Sound evidence propagation on the entire networkSimulationDiagnosisAnd any combination of these 2 types of inference

13



� Sound evidence propagation on the entire networkSimulationDiagnosis

If a competitor is doped...

…there is 99.5% chancethat he is disqualified

14



� Sound evidence propagation on the entire networkSimulationDiagnosis : thinking the other way round

… there is a slight probability (8%)that he is nevertheless clean.

If a competitor has beendisqualified…

15





� Conclusion

Segmentation with Bayesian Networks

Real case study: Segmentation of women as regards s hopping and fashionFor confidentiality reasons, consumer statements an d outputs have been modified.

16


1st Stage : segmentation induction

17


Unsupervised learningDiscovering relations between consumer statements

Usage and attitude survey conducted for a clothes retailer.

Sample=1065 women.

234 consumer statements: attitudes and behaviours towards fashion in general, retailers, brand image…

�Heuristic Search Algorithm to find the best representation of the joint probability distribution.

�Minimum Description Length score to evaluate the quality of the network based on fitnessand compactness

Induced network

18


Variables clustering and factor induction Simplifying the information

� Analysis of the network to discover groups of variables that are strongly connected and that form a “concept”Ascendant Hierarchical Clustering algorithm based o n the arcs’ Kullback Leibler forces(non linear and global measure – contribution of the relation to the network).

� For each cluster of variablesCreation of a latent variable summarizing the infor mation.

42 factors computed

Example of factor 15 : dimension summarizing originality .

Based on attitude statements (importance to be original, like to differentiate with clothes) and behaviours (buy brands X, Y and Z more often).

Latent variable

19


Factor clustering: overview of the procedureSegmentation of the individuals based on the main f actors

� Introducing a new variable (consumer segments) which i s the hidden cause of the main factors.

� Learning the probabilities with Expectation – Maximis ation

� Score derived from MDL to assess the quality of the cl ustering

20


Selecting the number of clusters

� Pseudo random walk to find the best number of clustersexample: find the best clustering with random walk between 2 and 6 clusters– 20 iterations

� Also possible to define the desired number of cluste rs

� Possible to define the minimal purity of the clusters. The purity is computed as the mean of the probability of each clus ter point.

The best segmentation is the one that minimizes the score

21


2nd stage : segmentation analysis

22


� LEARNING the relations between…THE TARGET VARIABLE = SEGMENTATIONTHE CONSTITUTIVE VARIABLES = CONSUMER STATEMENTS

Target Variable= consumer segments

Supervised learningFocusing on consumer clusters

23


� Identification of the key variables and associated v aluesFor each consumer group, we use the % of shared inf ormation to sort the variables according to their importance in the characterisati on of the group.

Cluster ProfileUsing the network to describe the consumer groups

4 m

ost c

ontr

ibut

ing

varia

bles

for

Clu

ster

#5

Arrows symbolize the change in the probability dist ributionwhen observing cluster #5.

Compared with total sample,women of cluster#5 :

- Buy brand X more often- Are older women (59 in average)- Do not consider originality as important- Do not like discovering new shops

24


Generation of the cluster mapping

Map generation

The size of the cluster is proportional to its prob ability

The proximity of the clusters is a probabilistic pr oximity

The darkness of the blue is proportional to the pur ity of the cluster(in this example all clusters have a purity > 95%)

25


Summarizing segmentation results

Superstars

Fashion cheap

8%

10%

18%

Classical upmarket

20%

Functional before all

18%

Neutral

20%

14%

Classical

Young manager / executive

women

-- Money devoted to

clothes

++ Money devoted to

clothes

AgeFashionable originality

SuperstarsSuperstars

Fashion cheap

8%

10%

18%

Classical upmarket

20%

Functional above all

18%

Neutral

20%

14%

Classical

Young manager / executive

women

-- Money devoted to

clothes

++ Money devoted to

clothes

AgeAgeFashionable originality

26


Going further : identifying a more compact target m odel

� Markov procedure to select a subset of statements to d etermine to which category consumers belong

� Selection of a subset of variables…

� …knowing the values of these variables makes the ta rget independent of all the other variables

Subset of 11 variables

Overall prediction score = 68%

Interesting to quickly recruit consumer groups amongst the total population.

27


Conclusion




� Conclusion

28


Benefits

� Our experience : a powerful tool - Relevant typologies- Easy to carry out

� Modelling the consumer variables : good representation of reality- Non-supervised modelling : no strong hypothesis- Discovering interactions between variables (behaviours / attitu des)- Use of qualitative / quantitative variables

� Data clustering quality- Possible to set the minimum purity of the clusters : enables the marketer to discover “niche” markets (usually less pure) or focus on main stream groups.

� Added-value in the analysis of the clusters- Easy ranking of the key variables for each consumer cluster- Proximity mapping to summarize results

� Development of robust models to identify consumer grou ps- Interesting in the case of upcoming recruitment .

29


� Modelling the consumer network and computing latent v ariables can be long when the number of variables is very important .234 variables and 1065 lines: 30-40 minutesTo speed up the process, possible to learn a simpli fied network : e.g. maximum spanning tree or increase of the structural complex ity parameter.

� Continuous variables have to be discretizedResults will depend on the quality of the discretiz ation.Possible to use K-Means to adapt discretization to the distribution of the data.Expertise of the user also helps.

And most of the time in consumer research variables are discrete !

Some drawbacks. How to deal with them ?

30


Perspectives

� Flexibility : can be used far beyond usage and attit udes surveys

� Easy to carry out

� Can be adapted to any type of data

� Well designed to process large amounts of data

� Example: segmentation of trains using client’s intern al data

� In the future…- typology of clients (turnover, potential…) to feed a business strategy- segmentation of consumers based on utilities (CBC d ata)

Travelers' Data10 Million individuals

Train data (turnover, occupancy rate…)15.000 trains Clustering of trains

31


Contact

Jouffe LionelManaging Director

[email protected]

Craignou FabienData Mining Department Manager

[email protected]

RepèRes Bayesia Consumer Segmentation Skim Conf08

Business

Transcript of RepèRes Bayesia Consumer Segmentation Skim Conf08