Sampling and non sampling errors in the Italian Television Audience Measurement system
description
Transcript of Sampling and non sampling errors in the Italian Television Audience Measurement system
Sampling and non sampling errors in the Italian Television Audience Measurement system
European Conference on Quality in Official Statistics - Q2008
Rome, 9-11 July 2008
Participants to research group: De Vitiis, D’Alò, Di Consiglio, P.D. Falorsi (chief), S. Falorsi, Orsini, Pallara, Russo, Seeber, Tuoto
Speaker : Alessandro Pallara Istituto Nazionale di Statistica
Outline of the talk
Television Audience Measurement (TAM) and the “meter panel”
Survey parameters and sampling design
Estimation of sampling error
Sources of bias in TAM estimates
Measurement errors: E&I
Panel attrition and conditioning
Comments and concluding remarks
Rome, 10 July 2008
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Television Audience Measurement (TAM) data have a high social and economic impact.
Essential information to: Broadcasters, for programming policy and
programme scheduling Broadcasters and advertising agencies,
for agreeing upon the price of commercial air-time and advertising campaigns
Television Audience MeasurementE
uro
pea
n C
on
fere
nce
on
Qu
ali
ty i
n O
ffic
ial
Sta
tist
ics
-
Q20
08
Rome, 10 July 2008
Context and purposes of the research
Purposes (and Research reports)
1) review current estimation procedures for estimating daily ratings and associated sampling errors (released June ’07);
2) evaluate accuracy of the survey estimates with respect to the various sources of non sampling errors (Dec. ’07);
3) put forward tools and recommendations for checking statistical quality (both sampling and non sampling errors) of the output of TAM survey (under release, July ‘08)
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
The context for this research is the agreement signed in 2006 between Italian NSI and the Italian Communications Regulatory Authority (Agcom), so that Istat has been appointed for carrying out a study on the statistical methodology behind the national TAM system.
Current worldwide standard in TAM methodology has two basic features:
a viewing household panel sample (the People meter panel) selected according to certain household demographic characteristics (age of the householder, number of components, city size, geographical region)
a measurement device (the people meter) that register (a) TV set status (i.e. which channel is being tuned to with certainty) and (b) viewer presence, which is quite demanding on panelists (i.e. pressing their remote control button each time they enter or leave a television viewing session)
Standard TAM methodology
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Survey Parameters
Let r denotes a generic TV channel and T a given time interval (daypart, day, week)
Main Parameters
Uk
Ta,kr
T
t Uk
ta,krTr yy
TA
1
1
The Reach (or cover/cume) is the cumulative percentage or total (usually expressed in thousands) of a population that has been counted as viewers at least once during a specified interval.
The Audience is the average number of individuals (homes or target groups) viewing a TV channel over a given time interval (e.g. programme, daypart).
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Uk
Tc,krTr yC
The Share (of Audience) is defined as the percent of Households Using Television (HUT) or Persons Viewing Television (PVT) which are tuned to a specific program or station at a specific time.
The Rating is the size of television audience relative to the total universe, expressed as a percentage
N
APN Tr
Tr 100
1001
1
R
rTr
TrTr
A
ASH
Survey Parameters (cont.ed)E
uro
pea
n C
on
fere
nce
on
Qu
ali
ty i
n O
ffic
ial
Sta
tist
ics
-
Q20
08
Rome, 10 July 2008
Survey population, statistical units, data analyzed
Survey population : members of household aged 4 or more
→Survey estimates refer to in-home TV viewing (persons and households, including viewing of guests of the sample households), of total population and selected target subpopulations
Elementary data used for estimating parameters
Individual viewing statement: meter records (raw data) converted after data processing into summary statements of individual viewing over time (each minute). Each statement contains information concerning (a) Start and end time of the viewing session; (b) identification of signal source and TV set being viewed; (c) identity of viewer
Data analyzed
→Raw and validated panel meter micro-data (daily data for 4 weeks between Sept. ’05 through June ‘06)
→Population total of auxiliary variables and sampling weights used in the estimation procedure
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Sampling design of most TAM survey
Two phases in TAM sampling strategy
1. In the first phase, a face-to-face interview (the Establishment Survey, ES) is carried out each year, based (in 2006) on a sample of approximately 30,000 households and using a two-stage stratified sample:
provides certain universe estimates (in terms both of individuals and household) which will be used in the TAM estimation procedure, such as education attainment, socio-economic status or number of children per household,
provides a database of potential households for recruitment in the second phase sampling
2. In the second phase a panel of about 5100 households, is “broadly” randomly selected (within control strata) from ES respondents (the people meter panel sample).
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
The Meter panel sample
Meter panel selected characteristics (used for panel turnover control), Active vs. Lost Panelists, Compared to Total population benchmarks
CharacteristicsUniverse
(Demographic data 08/05)
Panel Distribution (unweighted)
Active households
Lost panelists (Sept. '05/Jun. '06)4 Sept., '05 10 June, '06
Region
C 19,6 19,5 19,3 21,1
NE 19,5 19,2 19,4 19,0
NO 28,7 28,3 28,7 28,9
SI 32,3 32,9 32,7 31,0
City size <100,000 inh. 74,9 73,9 74,8 72,6
>100,000 inh. 25,1 26,1 25,2 27,4
Age of householder
<=45* 32,6 32,6 30,5 36,0
46-64* 34,9 34,5 35,8 32,2
=>65* 32,4 32,9 33,7 31,8
Number of components
1 24,9 23,2 23,7 22,2
2 27,1 26,3 27,5 22,6
3 21,6 22,7 21,9 22,1
4 18,9 20,2 19,2 23,2
5+ 7,5 7,5 7,6 9,8* Estimated through ES
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
.
Problems with TAM sampling design in Italy
• quota sampling
• unknown selection probability of units from the recruitment households database (originating from different ES’s)
• rules for field substitution of non responding households: different contact rates between basic households and substitutes, interviewer may influence substitutions
• very high total (non response to ES + refusal to panel recruitment) non response rate: >90%
Non respondents may be different as for the amount of television viewed: light viewers out of home a lot and less available for interview, light viewers may feel their cooperation less important
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
MSE of an estimator of an unknown population parameter
Approach to quality assessment :
•Direct (smooth) estimators of the sampling variance
•(Indirect) indicators of the Bias
Approach to measuring accuracy of TAM estimates
)ˆ()ˆ(V)ˆ()ˆ( 22 BEMSE
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Estimation of sampling variance
The model assisted approach cannot be utilized because it is not possible to know the inclusion probabilities of the observed panel (units are selected by different sampling designs, some of which use purposive selections, very high rate of non response) On the other hand, using some suitable approximations a linear model can be found whose parameter estimates allow to properly approximate TAM actual estimates (details in the proceedings paper) Sampling variance has then been estimated through a robust estimation technique (sandwich estimator,Valliant et al., 2000) based on the residuals of the linear model. How much robust? The estimators are model unbiased consistent under quite general variance structure, different from the one used for producing the survey estimates.
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Estimates of Audience (each minute) and Coefficient of Variation (CV) for a large channel of the public network – 4 Sept., 2005
0
2000000
4000000
6000000
8000000
10000000
12000000
time: HH:MM
Aud
ienc
e
0
5
10
15
20
25
30
35
40
45
CV %
CV % estimates and CI
Variance estimation – an example
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Scatter plot of CV by Audience Size (minutes and dayparts)
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Variance estimation – an example (cont.d)
Sources of bias in TAM estimates
Potential sources of bias in meter panel sample
• coverage errors (e.g. non-TV homes not included in estimates, ≈ 1,500,000 est. persons in Italy) • (wave) non responses • model assumptions errors • measurement errors
• attrition and panel conditioning
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Measurement Errors in meter panel data
Main sources of measurement errors (data gathering and editing phases)
Meter statements indicating that the TV set is switched on, but without any persons registered as present (uncovered viewing).
Long viewing session without any change in registered set use or viewer presence (signing on/off of viewing individuals, channel switching, long/constant viewing)
TV OFF viewing
Same individual registered as a viewer for two or more TV sets at the same time (concurrent viewing)
Undue or wrong re-assignment of uncovered viewing to a household component (processing errors)
measurement errors: mis-match between a signal source of a TV set being viewed and a person registered as a viewer through the people meter
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Processing TAM data – E&I
Editing checks:
• rejection of certain panel households from the daily reporting samples because of suspected faulty compliance by panelists [excess (24 hours) viewing, long/constant viewing above set threshold values] • records of individual viewing are canceled out (concurrent viewing, overnight constant viewing, unassigned uncovered viewing) • records of individual viewing are edited in (uncovered viewing assigned to viewer) E
uro
pea
n C
on
fere
nce
on
Qu
ali
ty i
n O
ffic
ial
Sta
tist
ics
-
Q20
08
Rome, 10 July 2008
Percent variation of audience estimates (unweighted) from raw to validated data resulting from treatment of uncovered viewing
Processing data - Editing and ImputationE
uro
pea
n C
on
fere
nce
on
Qu
ali
ty i
n O
ffic
ial
Sta
tist
ics
-
Q20
08
Rome, 10 July 2008
Processing data - Editing and Imputation
Percent variation of audience estimates using different cut-off values and criteria for deletion of records with long constant viewing
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Panel attrition
Annual rates of panel attrition (years 2005 – 2006)
2005 2006
House moving 2,8% 2,4%
Fatigue (drop-out) 10,9% 9,6%
Fatigue (discard) 1,8% 0,9%
Discard for stratification 1,2% 1,7%
Inability to continue 1,2% 1,2%
Total 18% 16%
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Annual attrition rates by subgroup (indices – avg. years 2005-2006)
Drop-out Discard
Number in household
1 69 105
2 94 96
3 101 94
4 125 102
5+ 150 113
Age of householder
≤45 109 77
46-64 106 82
65+ 86 141
Region
NW 101 88
NE 111 64
C 103 96
SI 91 135
Attrition rates by subgroup of populationE
uro
pea
n C
on
fere
nce
on
Qu
ali
ty i
n O
ffic
ial
Sta
tist
ics
-
Q20
08
Rome, 10 July 2008
Panel attrition and conditioning
0,00
0,01
0,01
0,02
0,02
0,03
0,03
0,04
0,04
0,050 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 10
2
108
114
120
126
132
138
145
151
158
164
171
179
188
195
201
208
214
223
10 June, ’06Installed
Households
# households 5.093
Average Age 63,7
SD 50,8
Max 231
75% 102
Median age 51
25% 20
Months-in-sample percent distribution of households panel sample
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Age effectsE
uro
pea
n C
on
fere
nce
on
Qu
ali
ty i
n O
ffic
ial
Sta
tist
ics
-
Q20
08
Rome, 10 July 2008
Day_ par t = 12: 00: 00- 14: 59: 59 Day_ par t = 15: 00: 00- 17: 59: 59
t
400
600
800
1000
can11_ sot
can11_ sop
t
500
1000
1500
2000
can11_ sot
can11_ sop
Day_ par t = 18: 00: 00- 20: 29: 59 Day_ par t = 20: 30: 00- 22: 29: 59
t
500
1000
1500
can11_ sot
can11_ sop
t
1000
1500
2000
can11_ sot
can11_ sop
Daily estimates of audience (thous. of individuals) of satellite TV channels by some dayparts (4 weeks between Sept. ’05 through June ’06) – households below and above median of time-in-sample
Comments and concluding remarks - 1Sampling errors
The CV decreases as the estimation increases.
The larger estimates (major networks) quite reliable
The lower estimates (local networks) quite unreliable
The CV slowly decreases as the size of time interval of estimates
increases
Non sampling errors • coverage errors related to list problems (non-TV homes, non-voting resident households, ……..) • non standardized criteria for substitution of non responding households to ES may lead to select in the panel heavy viewer households
• some evidences of the presence of an upper bias in survey estimates: editing checks seems to be unbalanced towards editing viewing statements in rather than out, threshold values for considering long viewing as unrealistic result in canceling out viewing statements only in the case of overnight viewing
• the lack of an upper limit to time-in-sample for households in the panel suggests the presence of panel attrition and conditioning because of modifications in panelist viewing behavior and compliance with the measurement device during their presence in sample
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Comments and concluding remarks – 2
Recommendations for Improving quality
• coincidental surveys on a regular basis to check real viewing status of panelists vs. registered meter data• occasional surveys of non respondents to analyze independence of response mechanism from viewing behavior • introducing a method for panel rotation, with an upper limit to time-in-sample of panel households
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008
Thank you for your attention!
Eu
rop
ean
Co
nfe
ren
ce o
n Q
ua
lity
in
Off
icia
l S
tati
stic
s -
Q
2008
Rome, 10 July 2008