Clustering Seasonality Patterns in the Presence of Errors

23
Clustering Seasonality Patterns in the Presence of Errors Advisor Dr. Hsu Graduate You-Cheng Che n Author Mahesh Kumar Nitin R. Patel Jonathan Woo

description

Clustering Seasonality Patterns in the Presence of Errors. Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo. Outline. Motivation Objective Introduction Seasonality Estimation Distance Function Experimental results - PowerPoint PPT Presentation

Transcript of Clustering Seasonality Patterns in the Presence of Errors

Page 1: Clustering Seasonality Patterns in the Presence of Errors

Clustering Seasonality Patterns in the Presence of Errors

Advisor : Dr. HsuGraduate : You-Cheng ChenAuthor : Mahesh Kumar

Nitin R. Patel Jonathan Woo

Page 2: Clustering Seasonality Patterns in the Presence of Errors

Motivation Objective Introduction Seasonality Estimation Distance Function Experimental results Conclusions Personal opinion

Outline

Page 3: Clustering Seasonality Patterns in the Presence of Errors

Motivation

Most traditional clustering algorithms assume that the data is provided without measurement error

Page 4: Clustering Seasonality Patterns in the Presence of Errors

Objective

To present a clustering method that incorporates information contained in these error estimates and a new distance function that is based on the distribution of errors in data

Page 5: Clustering Seasonality Patterns in the Presence of Errors

Introduction

Definition of a good distance or dissimilarity function is a critical step in any distance based clustering method.

Problem:Most traditional clustering methods assume that data is without any error,but errors are natural in any data measurement.

Example:Sample average

Page 6: Clustering Seasonality Patterns in the Presence of Errors

Introduction

This study and results are focused on time-series clustering in the retail industry

This study assume that each point comes from a multidimensional Gaussian distribution

Page 7: Clustering Seasonality Patterns in the Presence of Errors

Seasonality Estimation (1/4)

Seasonality is defined as the normalized underlying demand of a group of similar merchandize as a function of time of the year after taking into account other factors that impact sales such as discounts,inventory,promotions and random effects.

Saleit=fI(Iit)*fP(Pit)*fQ(Qit)*fR(Rit)*PLCi(t-ti0)*Seasit (1)

After (1) remove the effects of all these nonseasonal factors Saleit= PLCi(t-ti

0)*Seasit

Page 8: Clustering Seasonality Patterns in the Presence of Errors

Seasonality Estimation (2/4)

S is a set of items following similar seasonality ,therefore, S consists of items having a variety of PLCs differing in their shape and time duration

Theorem 1:

Page 9: Clustering Seasonality Patterns in the Presence of Errors

Seasonality Estimation (3/4)

If we take the average of weekly sales of all items in S then it would nullify the effect of PLCs as suggested by the following equations.

Page 10: Clustering Seasonality Patterns in the Presence of Errors

Seasonality Estimation (4/4)

Seasonality values,Seast, can be estimated by appropriate Scaling of weekly sales average, Salet

The above procedure provides us with a large number of seasonal patterns, one for each set S, along with estimates of associated errors.

Page 11: Clustering Seasonality Patterns in the Presence of Errors

Distance Function(1/4)

Consider two seasonalities : Ai={(xi1,σi1),(xi2, σi2),…,(xiT, σiT)}Aj={(xj2, σj2),(xj2, σj2),…,(xjT, σjT)}

We define similarity between two seasonalities as follows: If the null hypothesis H0:Ai~Aj is true then similarity between Ai and Aj is the probability of accepting the hypothesis.

The distance dij between Ai and Aj is defined as ( 1-similarity)which is the probability of rejecting the H0

Page 12: Clustering Seasonality Patterns in the Presence of Errors

Distance Function(2/4)

Consider tth samples of both seasonalities

Ait=(xit, σit) and Ajt=(xjt, σjt).

(xit-xjt) ~ N( uit-ujt, (σ2it+ σ2

jt)1/2 ) (1)

If Ai~Aj then uit=ujt and consequently the statistic follows a t-distribution.

22jtit

jtit xx

Page 13: Clustering Seasonality Patterns in the Presence of Errors

Distance Function(3/4)

Finally distance

)22

2)((2

1jtit

jtxitx

TXijd

Comparison with Euclidean Distance

dij is monotonically increasing with respect to 22

2

1

)(

jtit

jtitTt

xx

Page 14: Clustering Seasonality Patterns in the Presence of Errors

Distance Function(4/4)

Comparison with Euclidean Distance If all σ’s were the same and equal to σ then it would become the rank order of (1) which is the same as the rank order of the Euclidean distance,(2)2

12)(

2

1jtit

Tt xx

21 )( jtit

Tt xx

Page 15: Clustering Seasonality Patterns in the Presence of Errors

Clustering Clustering

Algorithm

Page 16: Clustering Seasonality Patterns in the Presence of Errors

Experimental Results (1/6)

Simulated Data

Figure 5: Individual(prior to clustering) seasonality estimates with associated errors

Page 17: Clustering Seasonality Patterns in the Presence of Errors

Experimental Results (2/6)Figure 6:Seasonalities obtained by hError

Page 18: Clustering Seasonality Patterns in the Presence of Errors

Experimental Results (3/6)

Figure 7: Seasonalities obtained by kmeans and Ward’s method using Euclidean distances

Page 19: Clustering Seasonality Patterns in the Presence of Errors

Experimental Results (4/6)

Clustering Method

Average # misclassification

Average Estimation Error

hError Ward’s method kmeans

0.87 2.63 2.94

2.0182 4.7021 5.0337

Table 1:Average # misclassifications and Average Estimation Error for different clustering methods

Page 20: Clustering Seasonality Patterns in the Presence of Errors

Experimental Results (5/6)

tTt

ttTt

ActualSale

eForeastSalActualSalerorForecastEr

1

1

Clustering Method

Average Forecast Error %

hError Ward’s Kmeans No clustering

18.7 23.9 24.2 31.5

Table 2: Average Forecast Error(Retailer Data)

Page 21: Clustering Seasonality Patterns in the Presence of Errors

Experimental Results (6/6)

Page 22: Clustering Seasonality Patterns in the Presence of Errors

Conclusions

The distance function dij is invariant under different scales for data and the clustering method obtain better cluster than others.

Page 23: Clustering Seasonality Patterns in the Presence of Errors

Personal Opinion

The concept of incorporating information abouterrors in the distance function is very good and can beused in many other clustering applications.