Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran [email protected]...

19
Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran [email protected] Eleventh International Conference on Fuzzy Set Theory and Applications (FSTA 2012) Penalized Trimmed Squares and Quadratic Mixed Integer Programming for Deleting Outliers in Fuzzy Liner Regression

Transcript of Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran [email protected]...

Page 1: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

Peiman PazhoheshfarYoung Researchers Club, Azad University of Tafresh,Iran

[email protected]

Eleventh International Conference on Fuzzy Set Theory and Applications (FSTA 2012)

Penalized Trimmed Squares and Quadratic Mixed Integer Programming for Deleting Outliers in Fuzzy Liner Regression

Page 2: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

Outline

1) Introduction

2) Fuzzy regression models

3) A mathematical Programming Approach

4) Quadratic mixed integer programming for penalized trimmed

squares (PTS)

5) Numerical Example

6) Conclusion

Page 3: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

• The use of statistical linear regression is bounded by some strict assumptions about the given data

• Fuzzy regression is introduced which is an extension of the conventional regression and is used in estimating the relationships among variables.

1- Introduction

Page 4: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

The goal of FR analysis is to find a regression model that fits all observed fuzzy data within a specified fitting criterion

Two approaches of FR:

1.Minimizing fuzziness as an optimal criterion

Simplicity in programming and computationProvide too wide ranges in estimation which could not give much help in application

2. Least squares of errors as a fitting criterion to minimize the total square error of the output.

1- Introduction

Page 5: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

In fuzzy linear regression models data often contain outliers and bad influential observations.

If the data are contaminated with a single or few outliers the problem of identifying such observations is not difficult.

Detection of outliers can identify system faults and fraud before they escalate with potentially catastrophic consequences.

1- Introduction

Mechanical faults

Outliers

Changes in system behavior

Instrument error

Human error

Page 6: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

NNNi XcXccY ),(...),(),(~

11100

TNXXXX ],...,,[ 10 T

NAAAA ],...,,[ 10 ),( 0 jj cA

𝛼𝑗 + 𝑐𝑗 𝑗 - 𝑐𝑗 𝑗

A~

otherwise

Ni

ccc

ajjjjj

j

jj

jA j

0

,...,2,1

1

)(

NN XAXAXAy~~

...~~~~~

1100

𝛼𝑗 is its central value and is the spread value.𝑐𝑗

2- Fuzzy regression models

Page 7: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

jcjx

jxjc

.,...,0,0,

,,...,1)1(0

))1((

,,...,1)1())1(0

(:

kjjcfreej

nii

eHiyj ij

xj

cHj

nii

eHiyij

xj

cHj jtosubject

k

k

kccccMinimize ...210

2- Fuzzy regression models

are supposed to be non-negative, because the fuzziness in estimated intervals usually increases for larger values of independent variables

The results are s scale dependent and many might equal to zeroTotal Vagueness of the given data should be minimize

To repair this problem, replacement for sum of spreads of FR model’s coefficients, sum of spreads of the estimated intervals can be used as an objective function

Page 8: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

.,...,0,0,

,,...,1)1(0

))1((

,,...,1)1())1(0

(:

kjjcfreej

nii

eHiyj ij

xj

cHj

nii

eHiyij

xj

cHN

j jtosubject

k

n

i

k

jijj xcMinimize

1 0

Each H-certain estimated interval is needed to involve the corresponding H-certain observed interval.

This affects in large coefficient spreads j c if any dependent variable has large spreads je or if there are outliers.

2- Fuzzy regression models

Page 9: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

Penalized Trimmed Squares PTS:

PTS is defined by minimizing a convex objective function (loss function), which is the sum of squared residuals and penalty costs for discarding bad observations.

The robust estimate is obtained by the unique optimum solution of the convex

mathematical formula called QMIP

Assumptions:Crisp InputCrisp OutputRelation between input and output = Fuzzy function

3- A mathematical programming approach

Page 10: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

The basic idea is to insert fixed penalty costs into the loss function for possible deletion.

Only observations that produce reduction larger than their penalty costs are deleted from the data set.

The proposed PTS estimator minimizes:Sum of the k square residuals in the clean data Sum of the penalties for deleting the rest observations.

3- A mathematical programming approach

k(Clean data) M-k (outliers)

M observations

Page 11: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

Nj ijXjcjiY L

0 )(

ijXj

Nj

hiY

01

ijXj

cj

NjiY U )0 (

),1,(~ UL

iYhiYiY

iY

,0,0,0

,...,2,1,0,0,1

,01

)(

YX

MiYX

XXc

XY

Yu

t

ti

i

),...,,( 10 Nt cccc

3- A mathematical programming approach

),...,,( 10 N 𝛼𝑗 + 𝑐𝑗 𝑗 - 𝑐𝑗 𝑗

A~

Page 12: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

.,...,2,1,1 MihXc

XYt

ti

Mki ck

iNj iyijX

jMin 1

2)(2)1 1(

.10,,...,2,1,0,

,,...,10 0

2/1)1(

,,...,10

2/1)1(0

:

ixNjjcRj

MiiyN

j

N

j ijxjcHijxj

MiiyN

j ijxjcHijxN

j jtosubject

3- A mathematical programming approach

The above analysis leads to the following quadratic programming problem:

The value cσ can be interpreted as a threshold for the allowable size of the residuals.

Page 13: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

The constant c is well known from robust cut-off parameter , and it will be a cut-off parameter between data outlier and prediction vagueness.

2.5σ or 3σ is a reasonable threshold under Gaussian conditions.

The penalty cost is defined a priory and the estimator’s performance is very sensitive to this penalty which regulates the robustness and the efficiency of the estimator.

The term (𝑐σ)2can be interpreted as a penalty cost for deleting any observation where σ is a robust residual scale, and c is a cut-off parameter.

3- A mathematical programming approach

Page 14: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

Construct a regression estimator that has high breakdown point combined with good efficiency.

For this purpose appropriate penalties for high-leverage observations are developed :

•Unmask the multiple outliers

•Delete bad high-leverage outliers whereas keeping all of good high-leverage points

3- A mathematical programming approach

Page 15: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

.01

2)(2)(1

,...,2,1,1,0

.10,,...,2,1,0,

,,...,10 0

2/1)1(

,,...,10

2/1)1(0

:

otherwise

N

iq

iy

ijx

ji

Mii

ixNjjcRj

MiiyN

j

N

j ijxjcHijxj

MiiyN

j ijxjcHijxN

j jtosubject

i

qiNj iyijX

jMin M 2)(2)1 1(

( 0, 0), ( 1, 1)… ( , )𝑋 𝑌 𝑋 𝑌 𝑋𝑀 𝑌𝑀

Robust penalties =( )2𝑞𝜎

<< If = 1 the residual is reduced to zero the loss function is penalized with( )2 >>𝛿𝑖 𝑞𝜎

4- Quadratic mixed integer programming for PTS

Page 16: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

How the proposed method performs in fuzzy regression analysis in comparison with other methods ?

This example has fuzzy observations only for dependent variable.

Example

Tanaka et al. (1989) designed an example to illustrate their regression model for dealing with the problem of crisp independent variable and fuzzy dependent variable.

•Diamond(1988)•Kim and Bishu (1998)•Savice and Pederyzc (1991)•Kao. C, Chyu. C.-L., (2003) •Nasrabadi et al. (2003)

5- Numerical example

Page 17: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

I X i (y i, ei) Error in estimation

Tanaka et al. Diamond Kim-bishu Kao chyu Nasrabadi et al Proposed

1 1 )8.0,1.8( 3.350 2.207 2.07 2.217 2.564 2.805

2 2 )6.4,2.4( 2.850 3.050 3.025 3.024 2.813 2.170

3 3 )9.5,2.6( 1.522 1.092 1.042 1.082 0.718 0.551

4 4 )13.5,2.6( 2.257 2.844 2.902 2.812 3.062 2.073

5 5 )13,2.4( 2.415 0.950 0.850 0.954 0.614 0.65

Total Error 12.39 10.143 10.026 10.089 9.771 8.249

In the study of Tanaka et al. (1989), three types of fuzzy regression models: Min problem, Max

problem, and Conjunction problem, were discussed. For he sake of simplicity, the results of the

Min problem at h = 0 is used for comparison.

5- Numerical example

Page 18: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

New methodology for deleting outliers in liner fuzzy regression is presented which reduces the problem to a quadratic mixed integer program.

The approach is shown to perform well when compared to other models in fuzzy regression literature.

6- Conclusion

Page 19: Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran P.Pazhohesh@gmail.com Eleventh International Conference on Fuzzy Set Theory.

Thanks for your attention