Web Site Example

Web Site Example

• Web site for clothing catalogue company• Company has customer data on purchases from

site, but wants to know more about all visitors to their web site

• Buys web panel data– from Nielsen//NetRatings or Media Metrix (not in NZ)

• E.g. Nielsen//NetRatings universe for the At Home Internet audience measurement is all individuals aged 2+ living in homes that have access to the Internet via a PC owned or leased by a household member and using a Windows operating system

Respondent Data

ID # of Visits Income Sex Age HH Size

1 0 $87,500 1 48 2

2 5 $17,500 1 57 1

3 0 $65,000 0 28 2

4 0 $55,000 1 52 3

5 0 $55,000 1 17 3

6 0 $55,000 0 19 3

7 0 $72,500 0 39 2

8 1 $125,000 0 59 2

9 0 $22,500 0 70 1

10 0 $55,000 0 47 3

Frequency DistributionNumber of Visits Frequency Count

0 2046

1 318

2 129

3 66

4 38

5 30

6 16

7 11

8 9

9 10

10+ 55

Fit Poisson Model

• R Code:visit.dist <- c(2046,318,129,66,38,30,16,11,9,10,55)lpois <- function(lambda,data) { visits <- 0:9 sum(data[1:10]*log(dpois(visits,lambda))) +

data[11]*log(ppois(9,lambda,lower.tail=FALSE))}optimise(function(param){-

lpois(param,visit.dist)},c(0,10))

• Result: maximum value of log-likelihood is achieved at λ=0.72

Simple Poisson Model

Fit of the Poisson Model

0

500

1000

1500

2000

2500

0 1 2 3 4 5 6 7 8 9 10+# of Visits

Ex

pe

cte

d #

Pe

op

le

Obs

Exp

Nature of Heterogeneity

• Unobserved (or random) heterogeneity– The visiting rate λ is assumed to vary across the

population according to some distribution– No attempt is made to explain why people differ in

their visiting rates

• Observed (or determined) heterogeneity– Explanatory variables are observed for each person– We explicitly link the value of λ for each person to

their values of the explanatory variables• E.g. Poisson regression model

Poisson Regression Model

• Let Yi be the number of times that individual i visits the web site

• Assume Yi is distributed as a Poisson random variable with mean λi

• Suppose each individual’s mean λi is related to their observed explanatory characteristics by

• Take logs of household income and age first• R code, using glm function for Poisson regression:

glm.siteVisits <- glm(Visits ~ logHouseholdIncome + Sex + logAge + HH.Size, family=poisson(), data=siteVisits)

summary(glm.siteVisits)

ix

i xe i )ln(lyequivalentor

Poisson Regression Estimates

Coefficient Std Error t value

Intercept -3.122 0.405 -7.7

log(income) 0.093 0.034 2.7

sex 0.004 0.041 0.1

log(age) 0.589 0.055 10.8

Hhld size -0.036 0.015 -2.3

LL (Pois. reg.) (A) -6291.5

LL (Pois.) (B) -6378.5

LR 174 (df=4)

Can also fit model using maximum likelihood as for simple Poisson model, but this will not give standard errors

Poisson vs Poisson Regression

• The simple Poisson model (model B) is nested within the Poisson regression model (model A)

• So we can use a likelihood ratio test to see whether model A fits the data better

• Compute the test statistic

and reject the null hypothesis of no difference if

AB LLLLLR 2

2

,05.0 dfLR

Expected Number of Visits

Person 1 Person 2

Income $87,500 $55,000

Sex 1 0

Age 48 19

Hhld size 2 3

• So person 2 should visit the site less often than person 1

621.0

3036.019ln588.055000ln094.0126.3exp

164.1

2036.048ln588.0004.087500ln094.0126.3exp

2

1

Poisson Regression Model

Fit of Poisson Regression

0

500

1000

1500

2000

2500

0 1 2 3 4 5 6 7 8 9 10+Number of Visits

Ex

pe

cte

d #

of

Pe

op

le Obs

Exp

Poisson Regression Fit

• Poisson regression model improves fit over simple Poisson model– But not by much

• Try introducing random heterogeneity instead of, or as well as, observed heterogeneity

• Possibilities include:– Zero-inflated Poisson model– Zero-inflated Poisson regression– Negative binomial distribution– Negative binomial regression

Zero-inflated Poisson Regression

• Assume that a proportion π of people never visit the site

• However other people visit according to Poisson model

• Probability distribution:

!y

ey1y0)yiY(P

Zero-inflated Poisson Model

• Note that Poisson model predicts too few zeros• Assume that a proportion π of people never visit the site

– Remaining people visit according to Poisson distribution• No deterministic component• R code:

lzipois <- function(pi,lambda,data) { visits <- 1:9 data[1]*log(pi + (1-pi)*dpois(0,lambda)) +

sum(data[2:10]*log((1-pi)*dpois(visits,lambda))) + data[11]*log((1-pi)*ppois(9,lambda,lower.tail=FALSE))

}optim(c(0.5,1),function(param){-

lzipois(param[1],param[2],visit.dist)})

• Likelihood maximised at π=0.73, λ=2.71

Zero-Inflated Poisson Model

Fit of the Zero-Inflated Poisson Model

0

500

1000

1500

2000

2500

0 1 2 3 4 5 6 7 8 9 10+Number of Visits

Ex

pe

cte

d #

Pe

op

le

Obs

Exp

Zero-inflated Poisson Regression

• Can add deterministic heterogeneity to zero-inflated Poisson (ZIP) model

• Again assume that a proportion π of people never visit the site

• However other people visit according to Poisson regression model


!1)( 0 y

eeyyYP

ixi eyx

i

Fit ZIP Regression Model

• R code:siteVisits <- read.csv(“visits.csv”)lzipreg <- function(param,data) { zpi <- param[1] lambda <- exp(param[2] + data[,3:6] %*% param[3:6]) sum(log(ifelse(data[,2] == 0,zpi,0) +

(1-zpi)*dpois(data[,2],lambda))) }

optim(c(.7,2,0,-0.1,0.1,0),function(param){-lzipreg(param,as.matrix(siteVisits))},control=list(maxit=1000))

• Likelihood maximised at π=0.74, β=(1.90,-0.09,-0.13, 0.11,0.02)

ZIP Regression Predictions

Fit of ZIP Regression Model

0

500

1000

1500

2000

2500

0 1 2 3 4 5 6 7 8 9 10+Number of Visits

Ex

pe

cte

d #

Pe

op

le Obs

Exp

Simple NBD Model

• Recall the negative binomial distribution– The number of visits Y made by each individual has a

Poisson distribution with rate λ– λ has a Gamma distribution across the population

– At the population level, the number of visits has a negative binomial distribution

y

y

yyYP

1

1

1!

0,1

eg

Fitting NBD Model

• R code:lnbd2 <- function(alpha,beta,data) {

visits <- 0:9

prob <- beta/(beta+1)

sum(data[1:10]*log(dnbinom(visits,alpha,prob))) + data[11]*log(1-pnbinom(9,alpha,prob))

}

optim(c(1,1),function(param) {-lnbd2(param[1],param[2],visit.dist)})

• Likelihood maximised for α=0.157 and β=0.197

NBD Model Predictions

Fit of the NBD Model

0

500

1000

1500

2000

2500

0 1 2 3 4 5 6 7 8 9 10+Number of Visits

Ex

pe

cte

d #

Pe

op

le

Obs

Exp

NBD Regression

• Can also add deterministic heterogeneity to NBD model

• Again assume that a proportion π of people never visit the site

• However other people visit according to an NBD regression model


• Reduces to simple NBD model when =0

y

x

x

xi i

i

i e

e

ey

yyYP

!)(

NBD Regression Estimates

Coefficient Std Error t value

Intercept (β/2) -4.047 1.102 -3.7

Theta (α) 0.139 0.007 19.1

log(income) 0.075 0.096 0.8

Sex -0.005 0.116 -0.0

log(age) 0.890 0.141 6.3

Hhld size -0.025 0.042 -0.6

Can also fit model using maximum likelihood, but this will not give standard errors

NBD Regression Fit

Fit of NBD Regression

0

500

1000

1500

2000

2500

0 1 2 3 4 5 6 7 8 9 10+Number of Visits

Ex

pe

cte

d #

Pe

op

le Obs

Exp

Covariates In General

• Choose a probability distribution that fits the individual-level outcome variable– This has parameters (a.k.a. latent traits) θi

• Think of the individual-level latent traits θi as a function of covariates x

• Incorporate a mixing distribution to capture the remaining heterogeneity in the θi

– The variation in θi not explained by x

• Fit this model (e.g. using maximum likelihood)

New Concepts

• How to incorporate covariates in probability models– Poisson, zero-inflated Poisson and NBD

regression models for count data

• However, getting the outcome variable distribution right was more crucial here than introducing covariates

• Importance of covariates is often exaggerated

Reach and Frequency Models

• Advertising is a major industry– NZ Ad expenditure reached $1.5bn in 2000– Many companies spend millions each year

• Crucial to understand the effects of this expenditure

• Major outcomes include how many people are reached by an ad campaign, and how many times– Known as reach and frequency (R&F)– Typically analysis is limited to calculating media

exposure, not advertising exposure

Reach and Frequency Models

• Data on TV viewing, newspaper and magazine, radio listening etc is routinely gathered– Ratings and readership figures determine the price of

space in these media• However this data typically does not enable

detailed reach and frequency analysis– E.g. readership questions ask about the last issue

read, and how many read out of average 4 issues– Longitudinal data is collected on TV viewing, but item

non-response causes problems with direct analysis• Models are needed to derive complete reach

and frequency analyses from the collected data

R&F Analysis Examples

Beta-Binomial Model for R&F

• If an advertiser has placed an ad in each of 10 issues of a magazine, the beta-binomial model assumes that:– Each person has a probability p of reading each issue– These probabilities follow a beta distribution

– Each issue is read independently, between and across individuals

• Distribution of # issues read for each person is binomial

• The resulting aggregate exposure distribution is the beta-binomial

• Applied to R&F analysis by Metheringham (1964)• Still widely used• But not very accurate

11 1,

1)(

pp

Bpg

Typical Exposure Distribution

1

10

100

1000

10000

Number of Issues Read

0

1

2

3

4

Modified BBM

• One problem with the beta-binomial model is that it does not model loyal viewers/readers/listeners well

• By adding a point mass at 1 to the beta distribution of exposure probabilities, the BBM can be modified to accommodate loyal readers etc– Derived by Chandon (1976); improved by

Danaher (1988), Austral. J. Statist.

Multiple Media Vehicles

• The BBM (and modified BBM) focus on exposure to one media vehicle (e.g. one magazine) over the course of an ad campaign

• Need to extend to multiple vehicles– Model both reading choice and times read, in one

combined model• Could assume independence

– E.g. Dirichlet-multinomial model• Assumes independence of irrelevant alternatives (IIA)

– But there are known to be correlations between different media vehicles

• E.g. women’s magazines, business papers, programmes on TV1 vs TV3

Multiple Media Vehicles

• Models need to take correlations between media vehicles into account

• Log-linear models have been used– But these are computationally intensive for moderately large

advertising schedules

• Canonical expansion model (Danaher 1992)– Uses Goodhardt and Ehrenburg’s “duplication of viewing” law to

minimise need for multivariate correlations• Data on pairwise correlations used, but higher order joint

probabilities are derived using this law– Higher order interactions are assumed to be zero

• Canonical expansions are used for the joint probabilities to minimise computations

FMCG Sales/Purchasing

• Retail sales figures for fast moving consumer goods– Have good aggregate weekly sales figures

• Data available down to SKU level• Data collected at store level

– Know when total sales are changing over time– Can also investigate overall response to promotions

• Using store level data can give more accurate results, and even allow some segmentation by chain or region

– However sales figures cannot show us who is buying more when sales increase, or who is affected by promotions

• Heavy buyers? Light buyers? New buyers?• Households with kids? Retired couples? Flatters?• Even when overall sales are flat, there may be hidden changes

– Marketing activities could be made more effective using this sort of information, so how can we find out about this?

Household Purchasing Data

• Data about FMCG purchases collected from a panel of households– Can be collected through diaries

• Or even weekly interviews, based on recall

– Best method is currently to equip panel with scanners• This is used by each household member to record all items

bought• ACNielsen (NZ) runs a scanner panel of over 1000

households

– Data includes amount purchased, price, date, product details down to SKU level

– Also have demographic characteristics of household

Common Research Questions

• Who buys my product?– Perhaps better answered by U&A (usage and attitudes) study

• How much do they buy?• How often?• Who are my heavy buyers? Light buyers? Frequent

buyers?• How many are repeat buyers?• How does this compare to my other brands? How about

my competitors?• Are my results normal?

– How do they compare to similar products in other categories?

Purchases in Week Number:

1 2 3 4 5 6 7 8 9 10 11 12 13

Household 1 - A - - A A B A - - A -Household 2 A - - A - B - - B - - - -Household 3 - - B - - - - - - - A - -Household 4 - - - - - - - - - - - - -

… . . . . . . . . . . . . .

Example of Purchase Data

Results for 4-Week Months

Purchases by Month

1 2 3 … Total

Household 1 1A 3A,1B 1A . 5A,1B

Household 2 2A 1B 1B . 2A,2B

Household 3 1B - 1A . 1A,1B

Household 4 - - - . -

Total 3A,1B 3A,2B 2A,1B . 8A,4B

Observations

• Usually there will be a wide range of purchasing intensity among buyers of each brand– Also a proportion who do not buy the brand

• Instead of a whole brand, we can also look at a brand/package size combination– Similar findings apply at both levels

Another Example

• Data gathered from a panel of 983 households• Purchases of Lux Flakes over a 12 week period

– Various summary measures shown below

# of Purchase Occasions

1 2 3 4+Total %

Average # of Purchases per Buyer

All Buyers # Buyers

# Units Bought

17 3 2 0

17 6 6 0

22 100%

29 100%

1.3

Bought in Last 12 Weeks

# Repeat Buyers

# Units Bought

6 2 1 0

6 4 3 0

9 41%

13 45%

1.4

Didn’t Buy in Last 12 weeks

# “New” Buyers

# Units Bought

11 1 1 0

11 2 3 0

13 59%

16 55%

1.2

Cumulative purchases from at least this # of purchase occasions

29 12 6 0 - -

Example (continued)

• Low penetration overall– 22 buyers, about 2% of panel

• More than half the purchases were by “new” buyers

• The cumulative purchasing distribution looks similar to the cumulative reach distributions from the last lecture

Negative Binomial Model

• Fit NBD model – assumes Poisson process for purchase occasions, with Gamma heterogeneity

• R code:purchase.dist <- c(961,17,3,2)lnbd3 <- function(alpha,beta,data) { visits <- 0:3 prob <- beta/(beta+1)sum(data[1:4]*log(dnbinom(visits,alpha,prob)))}optim(c(1,1),function(param) {-

lnbd3(param[1],param[2],purchase.dist)})

• Likelihood maximised for α=0.045 and β=1.514

Negative Binomial Model

• Can also fit the model based on the observed values of two quantities– The proportion of people p0 making no

purchases during the study period– The mean number of purchases made m

(assuming that only one item is purchased at each purchase occasion)

• Then solve for α and β numerically

Multivariate NBD

• Generalise to multiple time periods with durations Ti, i=1,…,t

• Various partitionings of the Ti lead to variables that are also NBD– E.g. divide into the first s time periods and the

remaining t - s– The values for the latter t - s periods, conditional on

those for the first s, are multivariate NBD• α is incremented by the total purchases from the first s

periods, and the mean is updated as a weighted average of the original mean and the observed mean.

• So can easily apply empirical Bayes techniques using this model

NBD Model for Longer Periods

• Another property of the NBD is that purchases over a longer time period are also NBD (assuming that the purchasing process remains the same)

• The mean number of purchases increases in proportion to the length of the period

• But the parameter α remains fixed

NBD Model

• The NBD model has been applied to products in a wide range of categories

• It generally fits very well• The main exception (for diary data) is when the

recording period is too short compared to the purchase frequency– Often people record shopping once in each period,

rather than multiple times– Can cause problems if many people are expected to

purchase once or more each period

α is Usually Constant

• Typically α will be relatively constant across different products in the same category

• This means that the heterogeneity in purchasing rates is similar across products

• However β will vary to reflect the penetrations of the different products

Multiple Brands

• So far have only looked at the NBD for purchases of a single brand

• Want to model multiple brands

• Will use a combined model for brand choice and for number of purchases– The NBD-Dirichlet distribution

Model for Brand Choice

• Assume that brand choices are made independently for each purchase, with a individual i having a fixed probability pij of choosing brand j

• These probabilities pij are assumed to vary among people according to a Dirichlet distribution– This is a generalisation of the Beta distribution

The Dirichlet Distribution

• Recall that the Beta distribution has pdf

setting θ1=p, θ2=(1-p), α1=α and α2=β

• The Dirichlet distribution generalises this to k dimensions

12

11

21

21

11

21

1,

1)(

pp

Bpg

Brand Choice

• These assumptions mean that the joint distribution of brands purchased, across all consumers, is a mixture of multinomials with a Dirichlet distribution– i.e. a Dirichlet-multinomial distribution– For two brands, this is just the familiar Beta-

binomial distribution

Purchasing Model

• We now turn to the purchasing process• Assume that purchases made by individual i

occur randomly and independently with mean rate λi, resulting in a Poisson process

• Also assume that the means vary across the population according to a Gamma distribution

• This means that the number of purchases will follow a negative binomial distribution

Combined Model

• Finally, assume that the purchase rates and brand choice probabilities are independent of one another

• The resulting distribution of brand purchases follows an NBD-Dirichlet distribution (often called Dirichlet for short)– This has k+2 parameters, one for each brand

and 2 for the NBD purchase distribution

Discussion

• This model has been found to describe many aspects of buyer behaviour well, across a wide range of situations

• It assumes that buying behaviour is stationary, i.e. is not showing any trends over time

• The Dirichlet distribution assumes that the probabilities are independent apart from the constraint that they sum to 1– In marketing terms, this means that the market is not

segmented• The proportion of purchases that go to one brand is

independent of how the remaining purchases are spread across the other brands

Discussion (continued)

• One implication of this is an additivity property– Any two brands can be combined into a

“super-brand” with expected purchases equal to the sum of the individual brand means

– The rest of the model is unaffected by this change

• The NBD-Dirichlet has also been applied to pack sizes, stores, TV programmes, etc

Example of Model Fit

A Single Brand

• The NBD-Dirichlet model does not give exactly an NBD distribution for a single brand– However in practice the difference is minimal

Duplication Between Brands

• Generally constant down columns– Reflects unsegmented nature of most markets

Duplication Law

• The duplication between two brands is usually proportional to the product of their penetrations

Web Site Example

Documents

Transcript of Web Site Example