Incorporating Reliability in a TV Recommender
description
Transcript of Incorporating Reliability in a TV Recommender
Incorporating ReliabilityIncorporating Reliabilityin a TV Recommenderin a TV RecommenderIncorporating ReliabilityIncorporating Reliabilityin a TV Recommenderin a TV Recommender
Verus PronkVerus Pronk
2
Context
• Increasing availability of TV programs• Availability of electronic program guides
(EPGs)
How about a personal TV recommender?
Applications• Highlights in EPG• Auto-recording/deletion on HD recorders• Creation of personalized channels
3
Summary
Introduction Naive Bayesian classificationAn exampleReliable classificationResultsConcluding remarks
4
Introduction
Thousands of programs offered each day
People tend to browse only a limited number of channels
EPGs provide easier access
Low percentage of interesting programs
More advanced solutions required
5
Introduction
Programs are described by metadata (EPG)User rates a number of programs as or User profile describes relation between them
TV programrecommender
TV program
trainingset
user
userprofile
6
Introduction
Example of metadata
An Officer and a Gentleman: ( date : Tuesday, Nov. 23, 2004;
time : 20:30 h.;station : SBS 6;genre : drama;cast : Richard Gere;credit : Taylor Hackford;...
)
7
Naive Bayesian classification
Given : a training set X: i-th feature value of x
known class of xGiven : an instance t
Asked : c(t)
Approach: estimatebased on the user profile calculated from X
Xx
Cjjtc ),)(Pr(
Cxc )(ii Vx
8
Naive Bayesian classification
Problem issues
• Cold start• Changing preferences• Feature selection• Accuracy• Reliability• ...
9
Naive Bayesian classification
))(Pr( jtc
)Pr(
))(|Pr())(Pr(
)Pr(
))(|Pr())(Pr(
)|)(Pr(
tx
jxctxjxc
tx
jxctxjxc
txjxc
iii
prior probabilities
conditional probabilities
posterior probabilities
10
Naive Bayesian classification
Conditional independence violation
• The BBC news is always broadcast on the BBC
• Clint Eastwood generally plays in action movies
NBC is nevertheless successfully applied in many application areas
11
Naive Bayesian classification
Priors set to pj
Conditionals estimated using training set
Denominator irrelevant
)Pr(
))(|Pr())(Pr())(Pr(
tx
jxctxjxcjtc iii
12
Naive Bayesian classification
User profile
)(
),,( ~ ))(Pr(
jN
jtiNpjtc i
ij
)0( |})(|{| )(
|})(|{| ),,(
jxcXxjN
vxjxcXxjviN i
)(
),,( argmax)(ˆ
jN
jtiNptc i
ijCj
13
Naive Bayesian classification
Classification error
E is a convex combination of the Ejs
))(|)(ˆPr(
))()(ˆPr(
jxcjxcE
xcxcE
j
14
Naive Bayesian classification
On the prior probabilities
15
An examplefeature value day Monday 31 7 Tuesday 12 43 ... (57) (50) time 20:30 21 7 20:35 22 10 ... (57) (83) genre romance 8 12 drama 17 4 ... (75) (84) cast Richard Gere 23 1 Sandra Bullock 3 6 ... (74) (93) credit Steven Spielberg 11 2 Taylor Hackford 18 4 ... (71) (94)
1
1
1
1
1
16
feature value day Monday 31 7 Tuesday 12 43 ... (57) (50) time 20:30 21 7 20:35 22 10 ... (57) (83) genre romance 8 12 drama 17 4 ... (75) (84) cast Richard Gere 23 1 Sandra Bullock 3 6 ... (74) (93) credit Steven Spielberg 11 2 Taylor Hackford 18 4 ... (71) (94)
100
12
100
43100
21
100
7100
17
100
4100
23
100
1100
18
100
4
2.0
8.0
51055.3
71085.3
Training set:
100 TV programs
100 TV programs
Program: Tue. 20:30 Drama R. Gere T. Hackford
17
Reliable classification
X random N(i, v, j) and N( j) randomand dependent
X uniform both binomially distributed
)0( |})(|{| )(
|})(|{| ),,(
jxcXxjN
vxjxcXxjviN iX
X
)(
),,( argmax)(ˆ
jN
jtiNptc i
ijCj
statisticalanalysis
18
Reliable classification
Theorem 1
Let Z ~ Bin(N, p), 0 < p < 1, Yn ~ Bin(n, q)
Z0 :
Then ...
)0|Pr()Pr( 0 ZnZnZ
0
0
Z
YR Z
19
Reliable classification
where
,)1()1()1(1
)1()1( NNN
N
HpHp
pqqR
qRE
.)(1
N
n
n
N nH
20
Reliable classification
21
Reliable classification
22
Reliable classification
Theorem 2
Let Ri, i = 1, 2, ..., f, independent
r constant
Then
(Ris not actually independent)
22222 iiiiiii RERERrRr
iiii RErRrE
23
Reliable classification
)(
),,(
)(
jN
jviNq
X
jNp
XN
Back to the original problem
24
Reliable classification
Standard deviation of can be estimated by
),( jt
22
1 )(
),,(
)(
),,(1
)(1
)(11
)(1
)(
),,(1
)(
),,(
jN
jtiN
jN
jtiN
n
XjN
XjN
XjN
jN
jtiN
jN
jtiNp i
ii
X
n
n
X
X
iiij
)(
),,(
jN
jtiNp i
ij
),( jtP
25
Reliable classification
Confidence intervals for
),(),( jtjtP
),( jtP
Two approaches
A: Fix and don’t classify if
intervals overlap: coverage
B: Choose such that intervals
just do not overlap: explicitnotion of confidence
26
Results
Simulation TV recommenderTraining sets Briarcliff data
Prior probabilities Set such that E E
EConfidence levels = 0, 0.1, 0.2, ..., 1Training set sizes 100, 400
Approach Aoffset classification error against coverage
27
Results
28
Results
29
Concluding remarks
• Reliability adds another dimension to classification
• Our approach is explicit and robust• Separates difficult from easy instances• Also applicable to other domains
– medical diagnosis– biometrics (e.g. face recognition)
AcknowledgementsSrinivas Gutta, Wim Verhaegh, Dee Denteneer