Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes...

32
Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013 Le Song

Transcript of Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes...

Page 1: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Bayes Decision Rule and Naïve Bayes Classifier

Machine Learning I CSE 6740, Fall 2013

Le Song

Page 2: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Gaussian Mixture model

A density model 𝑝(𝑋) may be multi-modal: model it as a mixture of uni-modal distributions (e.g. Gaussians)

Consider a mixture of 𝐾 Gaussians

𝑝(𝑋) = 𝜋𝑘𝓝(𝑋|𝜇𝑘 , Σ𝑘)𝐾𝑘=1

Learn 𝜋𝑘 ∈ 0,1 , 𝜇𝑘, Σ𝑘;

2

mixing proportion

mixture Component

𝓝(𝜇1, Σ1)

𝓝(𝜇2, Σ2)

Page 3: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

For each data point 𝑥𝑖:

Randomly choose a mixture component, 𝑧𝑖 = 1,2,…𝐾 , with

probability 𝑝 𝑧𝑖|𝜋 = 𝜋𝑧𝑖

Then sample the actual value of 𝑥𝑖 from a Gaussian distribution

𝑝 𝑥𝑖 𝑧𝑖 = 𝓝 𝑥 𝜇𝑧𝑖 , Σ𝑧𝑖

Joint distribution over 𝑥 𝑎𝑛𝑑 𝑧 𝑝 𝑥, 𝑧 = 𝑝 𝑧 𝑝 𝑥 𝑧 = 𝜋𝑧𝓝 𝑥 𝜇𝑧, Σ𝑧

Marginal distribution 𝑝(𝑥)

𝑝 𝑥 = 𝑝 𝑥, 𝑧

𝐾

𝑧=1

= 𝜋𝑧𝓝 𝑥 𝜇𝑧, Σ𝑧

𝐾

𝑧=1

𝑧1 = 1

𝑥1

Imagine generating your data …

3

Page 4: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Bayes rule

𝑃 𝑧 𝑥 = 𝑃 𝑥 𝑧 𝑃 𝑧

𝑃 𝑥=𝑃(𝑥, 𝑧)

𝑃(𝑥, 𝑧)𝑧

4

posterior

likelihood Prior

normalization constant

Prior: 𝑝 𝑧 = 𝜋𝑧 Likelihood: 𝑝 𝑥 𝑧 = 𝓝 𝑥 𝜇𝑧, Σ𝑧

Posterior: 𝑝 𝑧 𝑥 =𝜋𝑧𝓝 𝑥 𝜇𝑧,Σ𝑧 𝜋𝑧𝑧 𝓝 𝑥 𝜇𝑧,Σ𝑧

Page 5: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Details of EM

We intend to learn the parameters that maximizes the log−likelihood of the data

𝑙 𝜃; 𝐷 = log 𝑝 𝑥𝑖 , 𝑧𝑖 𝜃

𝐾

𝑧𝑖=1

𝑚

𝑖=1

Expectation step (E-step): What do we take expectation over?

𝑓 𝜃 = 𝐸𝑞 𝑧1,𝑧2,..,𝑧𝑚 [log 𝑝 𝑥𝑖 , 𝑧𝑖 𝜃

𝑚

𝑖=1

]

Maximization step (M-step): how to maximize? 𝜃𝑡+1 = 𝑎𝑟𝑔𝑚𝑎𝑥𝜃 𝑓 𝜃

5

Nonconvex Difficult!

Page 6: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

E-step: what is 𝑞 𝑧1, 𝑧2, . . , 𝑧𝑚

𝑞 𝑧1, 𝑧2, . . , 𝑧𝑚 : posterior distribution of the latent variables

𝑞 𝑧1, 𝑧2, . . , 𝑧𝑚 = 𝑝(𝑧𝑖|𝑥𝑖 , 𝜃𝑡)

𝑚

𝑖

For each data point 𝑥𝑖, compute 𝑝(𝑧𝑖 = 𝑘|𝑥𝑖) for each 𝑘

𝜏𝑘𝑖 = 𝑝 𝑧𝑖 = 𝑘 𝑥𝑖

=𝜋𝑘𝓝 𝑥

𝑖 𝜇𝑘 , Σ𝑘 𝜋𝑘𝑘 𝓝 𝑥

𝑖 𝜇𝑘 , Σ𝑘

6

Page 7: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

E-step: compute the expectation

𝑓 𝜃 = 𝐸𝑞 𝑧1,𝑧2,..,𝑧𝑚 log 𝑝 𝑥𝑖 , 𝑧𝑖 𝜃

𝑚

𝑖=1

= 𝐸𝑝(𝑧𝑖|𝑥𝑖,𝜃𝑡) log 𝑝 𝑥𝑖 , 𝑧𝑖 𝜃

𝑚

𝑖=1

In the case of Gaussian mixture:

𝑓(𝜃) = 𝐸𝑝(𝑧𝑖|𝑥𝑖,𝜃𝑡) log 𝜋𝑧𝑖 − 𝑥𝑖 − 𝜇𝑧𝑖

⊤Σ𝑧𝑖 𝑥

𝑖 − 𝜇𝑧𝑖 + log Σ𝑧𝑖 + 𝑐

𝑚

𝑖=1

=

𝐾

𝑘=1

𝜏𝑘𝑖 log 𝜋𝑘 − 𝑥

𝑖 − 𝜇𝑘⊤Σ𝑘 𝑥𝑖 − 𝜇𝑘 + log Σ𝑘 + 𝑐

𝑚

𝑖=1

7

Page 8: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

M-step: maximize 𝑓 𝜃

𝑓 𝜃 = 𝐾𝑘=1 𝜏𝑖𝑘 log 𝜋𝑘 − 𝑥

𝑖 − 𝜇𝑘⊤Σ𝑘 𝑥𝑖 − 𝜇𝑘 + log Σ𝑘 + 𝑐 𝑚

𝑖=1

For instance, we want to find 𝜋𝑘, and 𝜋𝑘 = 1𝐾𝑖=1

Form Lagrangian

𝐿 =

𝐾

𝑘=1

𝜏𝑘𝑖 log 𝜋𝑘 + 𝑜𝑡ℎ𝑒𝑟 𝑡𝑒𝑟𝑚𝑠

𝑚

𝑖=1

+ 𝜆(1 − 𝜋𝑘

𝐾

𝑖=1

)

Take partial derivative and set to 0

𝜕𝐿

𝜕𝜋𝑘= 𝜏𝑘𝑖

𝜋𝑘

𝑚

𝑖=1

− 𝜆 = 0

⇒ 𝜋𝑘 =1

𝜆 𝜏𝑘𝑖

𝑚

𝑖=1

⇒ 𝜆 = 𝑚

8

Page 9: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

EM graphically

9

𝑓(𝜃) 𝑙 𝜃; 𝐷

𝑙 𝜃𝑡; 𝐷

E − step

M− step 𝜃𝑡+1

Page 10: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Experiments with mixture of 3 Gaussians

10

Page 11: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Why do we need density estimation?

Learn the “shape” of the data cloud

Assess the likelihood of seeing a particular data point

Is this a typical data point? (high density value)

Is this an abnormal data point / outlier? (low density value)

Building block for more sophisticated learning algorithms

Classification, regression, graphical models …

11

Page 12: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Outlier detection

Declare data points in low density region as outliers

First fit density 𝑝 𝑥 , and then set a threshold 𝑡

Data points in region with density small than 𝑡, ie., 𝑝 𝑥 ≤ 𝑡

12

Page 13: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Geometric method for outlier detector

Rather than first fit density and then threshold, directly find the boundary between inlier and outlier

13

-6 -4 -2 0 2 4 6-6

-4

-2

0

2

4

6

boundary

Page 14: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Minimum enclosing ball

Find the smallest ball such that it include all data points.

Parameterized with a center 𝑐 and radius 𝑟

Optimization problem (quadratically constrained quadratic programming)

min𝑟,𝑐 𝑟2

𝑠. 𝑡. 𝑥𝑖 − 𝑐⊤𝑥𝑖 − 𝑐 ≤ 𝑟2,

∀𝑖 = 1,… ,𝑚

14

𝑟

𝑐

Page 15: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Find center 𝑐 and radius 𝑟

Lagrangian of the problem:

𝐿 = 𝑟2 + 𝛼𝑖 𝑥𝑖 − 𝑐

⊤𝑥𝑖 − 𝑐 − 𝑟2

𝑚

𝑖=1

, 𝛼𝑖 ≥ 0

Set partial derivative of 𝐿 to 0

𝜕𝐿

𝜕𝑟= 2𝑟 − 2 𝛼𝑖

𝑚

𝑖=1

𝑟 = 0 ⇒ 𝛼𝑖 = 1

𝑚

𝑖=1

𝜕𝐿

𝜕𝑐= 𝛼𝑖(−2𝑥

𝑖 + 2𝑐)

𝑚

𝑖=1

= 0 ⇒ c = 𝛼𝑖𝑥𝑖

𝑚

𝑖=1

15

c and r not in explicit

form

Page 16: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Manipulate the Lagrangian

𝐿 = 𝑟2 + 𝛼𝑖 𝑥𝑖 − 𝑐

⊤𝑥𝑖 − 𝑐 − 𝑟2

𝑚

𝑖=1

𝐿 = 𝑟2 1 − 𝛼𝑖

𝑚

𝑖=1

+ 𝛼𝑖𝑥𝑖⊤𝑥𝑖

𝑚

𝑖

− 2𝑐⊤ 𝛼𝑖𝑥𝑖

𝑚

𝑖

+ 𝛼𝑖

𝑚

𝑖=1

𝑐⊤𝑐

𝐿 = 𝛼𝑖𝑥𝑖⊤𝑥𝑖 −

𝑚

𝑖=1

𝛼𝑖𝛼𝑗𝑥𝑖⊤𝑥𝑗

𝑚

𝑗=1

𝑚

𝑖=1

16

Page 17: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Express Lagrangian 𝐿 in 𝛼

max𝛼𝐿(𝛼) = 𝛼𝑖𝑥

𝑖⊤𝑥𝑖 −

𝑚

𝑖=1

𝛼𝑖𝛼𝑗𝑥𝑖⊤𝑥𝑗

𝑚

𝑗=1

𝑚

𝑖=1

𝑠. 𝑡. 𝛼𝑖 = 1

𝑚

𝑖=1

, 𝛼𝑖 ≥ 0

A quadratic programming problem 𝐿 = 𝑏⊤𝛼 − 𝛼⊤𝐴𝛼 with linear constraints

17

𝐴𝑖𝑗 = 𝑥𝑖⊤𝑥𝑗 𝑏𝑖 = 𝑥

𝑖⊤𝑥𝑖 𝛼

Page 18: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Experiment with minimum enclosing ball

18

-6 -4 -2 0 2 4 6-6

-4

-2

0

2

4

6

Page 19: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Classification

Represent the data

A label is provided for each data point, eg., 𝑦 ∈ −1,+1

Classifier

19

Page 20: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Outline

What is the theoretically the best classifier?

Bayesian decision rule for minimum error .

What is the difficulty of achieve the minimum error?

What do we do in practice?

20

Page 21: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Decision-making: divide high-dimensional space

Distributions of sample from normal (positive class) and abnormal (negative class) tissues

21

Page 22: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Class conditional distribution

In classification case, we are given the label for each data point

Class conditional distribution: 𝑃 𝑥 𝑦 = 1 , 𝑃(𝑥|𝑦 = −1) (how to compute?)

Class prior: 𝑃 𝑦 = 1 , 𝑃(𝑦 = −1) (how to compute?)

22

𝑝 𝑥 𝑦 = 1 = 𝓝(𝑥; 𝜇1, Σ1)

𝑝 𝑥 𝑦 = −1 = 𝓝(𝑥; 𝜇2, Σ2)

Page 23: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

How to come up with decision boundary

Given class conditional distribution: 𝑃 𝑥 𝑦 = 1 , 𝑃(𝑥|𝑦 =− 1), and class prior: 𝑃 𝑦 = 1 , 𝑃(𝑦 = −1)

23

𝑃 𝑥 𝑦 = 1 = 𝓝(𝑥; 𝜇1, Σ1)

𝑃 𝑥 𝑦 = −1 = 𝓝(𝑥; 𝜇2, Σ2)

?

Page 24: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Use Bayes rule

𝑃 𝑦 𝑥 = 𝑃 𝑥 𝑦 𝑃 𝑦

𝑃 𝑥=𝑃(𝑥, 𝑦)

𝑃(𝑥, 𝑦)𝑧

24

posterior

likelihood Prior

normalization constant

Prior: 𝑃 𝑦 Likelihood (class conditional distribution : 𝑝 𝑥 𝑦 =

𝓝 𝑥 𝜇𝑦, Σ𝑦

Posterior: 𝑃 𝑦 𝑥 =𝑃(𝑦)𝓝 𝑥 𝜇𝑦,Σ𝑦

𝑃(𝑦)𝓝 𝑥 𝜇𝑦,Σ𝑦𝑦

Page 25: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Bayes Decision Rule

Learning: prior: 𝑝 𝑦 ,class conditional distribution : 𝑝 𝑥 𝑦

The poster probability of a test point

𝑞𝑖 𝑥 ≔ 𝑃 𝑦 = 𝑖 𝑥 = 𝑃 𝑥 𝑦 𝑃 𝑦

𝑃 𝑥

Bayes decision rule:

If 𝑞𝑖 𝑥 > 𝑞𝑗(𝑥), then 𝑦 = 𝑖, otherwise 𝑦 = 𝑗

Alternatively:

If ratio 𝑙 𝑥 =𝑃(𝑥|𝑦=𝑖)

𝑃(𝑥|𝑦=𝑗)>𝑃 𝑦=𝑗

𝑃 𝑦=𝑖, then 𝑦 = 𝑖, otherwise 𝑦 = 𝑗

Or look at the log-likelihood ratio h x = − ln𝑞𝑖(𝑥)

𝑞𝑗 𝑥

25

Page 26: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Example: Gaussian class conditional distribution

Depending on the Gaussian distributions, the decision boundary can be very different

Decision boundary: h x = − ln𝑞𝑖 𝑥

𝑞𝑗 𝑥= 0

26

Page 27: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Is Bayesian decision rule optimal?

Optimal with respect to what criterion?

Bayes error: expected probability of a sample being incorrectly classified

Given a sample, we compute posterior distribution 𝑞1 𝑥 ≔𝑃 𝑦 = 1 𝑥 and 𝑞2 𝑥 ≔ 𝑃 𝑦 = 2 𝑥 , the probability of error:

𝑟 𝑥 = min{𝑞1 𝑥 , 𝑞2(𝑥)}

Bayes error

𝐸𝑝 𝑥 𝑟 𝑥 = ∫ 𝑟 𝑥 𝑝 𝑥 𝑑𝑥

27

Page 28: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Bayes error

The Bayes error:

𝜖 = 𝐸𝑝 𝑥 𝑟 𝑥 = ∫ 𝑟 𝑥 𝑝 𝑥 𝑑𝑥

= min 𝑞1 𝑥 , 𝑞2 𝑥 𝑝 𝑥 𝑑𝑥

= min𝑃 𝑦 = 1 𝑝(𝑥|𝑦 = 1)

𝑝(𝑥),𝑝 𝑦 = 2 𝑝(𝑥|𝑦 = 2)

𝑝(𝑥)𝑝 𝑥 𝑑𝑥

= min 𝑃 𝑦 = 1 𝑝 𝑥 𝑦 = 1 , 𝑃 𝑦 = 2 𝑝(𝑥|𝑦 = 2) 𝑑𝑥

28

Page 29: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Bayes error II

𝜖 = min 𝑃 𝑦 = 1 𝑝 𝑥 𝑦 = 1 , 𝑃 𝑦 = 2 𝑝(𝑥|𝑦 = 2) 𝑑𝑥

= 𝑃 𝑦 = 1 𝑝 𝑥 𝑦 = 1𝐿1

𝑑𝑥 + 𝑃 𝑦 = 2 𝑝(𝑥|𝑦 = 2)𝐿2

𝑑𝑥

29

𝑃(𝑦 = 1)𝑝(𝑥|𝑦 = 1)

𝑃(𝑦 = 2)𝑝(𝑥|𝑦 = 2)

𝐿2 𝐿1

Page 30: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

More on Bayes error

Bayes error is the lower bound of probability of classification error

Bayes decision rule is the theoretically best classifier that minimize probability of classification error

However, computing Bayes error or Bayes decision rule is in general a very complex problem. Why?

Need density estimation

Need to do integral, eg. ∫ 𝑃 𝑦 = 1 𝑝 𝑥 𝑦 = 1𝐿1𝑑𝑥

30

Page 31: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

What do people do in practice?

Use simplifying assumption for 𝑝 𝑥 𝑦 = 1

Assume 𝑝 𝑥 𝑦 = 1 is Gaussian

Assume 𝑝 𝑥 𝑦 = 1 is fully factorized

Rather than model density of 𝑝 𝑥 𝑦 = 1 , directly go for the

decision boundary h x = − ln𝑞𝑖 𝑥

𝑞𝑗 𝑥

Logistic regression

Support vector machine

Neural networks

31

Page 32: Bayes Decision Rule and Naïve Bayes Classifierlsong/teaching/CSE6740fall13/lecture8.pdf · Bayes Decision Rule and Naïve Bayes Classifier Machine Learning I CSE 6740, Fall 2013

Naïve Bayes Classifier

Use Bayes decision rule for classification

𝑃 𝑦 𝑥 = 𝑃 𝑥 𝑦 𝑃 𝑦

𝑃 𝑥

But assume 𝑝 𝑥 𝑦 = 1 is fully factorized

𝑝 𝑥 𝑦 = 1 = 𝑝(𝑥𝑖|𝑦 = 1)

𝑑

𝑖=1

Or the variables corresponding to each dimension of the data are independent given the label

32