Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran [email protected]...
-
Upload
hanna-trees -
Category
Documents
-
view
220 -
download
0
Transcript of Peiman Pazhoheshfar Young Researchers Club, Azad University of Tafresh,Iran [email protected]...
Peiman PazhoheshfarYoung Researchers Club, Azad University of Tafresh,Iran
Eleventh International Conference on Fuzzy Set Theory and Applications (FSTA 2012)
Penalized Trimmed Squares and Quadratic Mixed Integer Programming for Deleting Outliers in Fuzzy Liner Regression
Outline
1) Introduction
2) Fuzzy regression models
3) A mathematical Programming Approach
4) Quadratic mixed integer programming for penalized trimmed
squares (PTS)
5) Numerical Example
6) Conclusion
• The use of statistical linear regression is bounded by some strict assumptions about the given data
• Fuzzy regression is introduced which is an extension of the conventional regression and is used in estimating the relationships among variables.
1- Introduction
The goal of FR analysis is to find a regression model that fits all observed fuzzy data within a specified fitting criterion
Two approaches of FR:
1.Minimizing fuzziness as an optimal criterion
Simplicity in programming and computationProvide too wide ranges in estimation which could not give much help in application
2. Least squares of errors as a fitting criterion to minimize the total square error of the output.
1- Introduction
In fuzzy linear regression models data often contain outliers and bad influential observations.
If the data are contaminated with a single or few outliers the problem of identifying such observations is not difficult.
Detection of outliers can identify system faults and fraud before they escalate with potentially catastrophic consequences.
1- Introduction
Mechanical faults
Outliers
Changes in system behavior
Instrument error
Human error
NNNi XcXccY ),(...),(),(~
11100
TNXXXX ],...,,[ 10 T
NAAAA ],...,,[ 10 ),( 0 jj cA
𝛼𝑗 + 𝑐𝑗 𝑗 - 𝑐𝑗 𝑗
A~
otherwise
Ni
ccc
ajjjjj
j
jj
jA j
0
,...,2,1
1
)(
NN XAXAXAy~~
...~~~~~
1100
𝛼𝑗 is its central value and is the spread value.𝑐𝑗
2- Fuzzy regression models
jcjx
jxjc
.,...,0,0,
,,...,1)1(0
))1((
,,...,1)1())1(0
(:
kjjcfreej
nii
eHiyj ij
xj
cHj
nii
eHiyij
xj
cHj jtosubject
k
k
kccccMinimize ...210
2- Fuzzy regression models
are supposed to be non-negative, because the fuzziness in estimated intervals usually increases for larger values of independent variables
The results are s scale dependent and many might equal to zeroTotal Vagueness of the given data should be minimize
To repair this problem, replacement for sum of spreads of FR model’s coefficients, sum of spreads of the estimated intervals can be used as an objective function
.,...,0,0,
,,...,1)1(0
))1((
,,...,1)1())1(0
(:
kjjcfreej
nii
eHiyj ij
xj
cHj
nii
eHiyij
xj
cHN
j jtosubject
k
n
i
k
jijj xcMinimize
1 0
Each H-certain estimated interval is needed to involve the corresponding H-certain observed interval.
This affects in large coefficient spreads j c if any dependent variable has large spreads je or if there are outliers.
2- Fuzzy regression models
Penalized Trimmed Squares PTS:
PTS is defined by minimizing a convex objective function (loss function), which is the sum of squared residuals and penalty costs for discarding bad observations.
The robust estimate is obtained by the unique optimum solution of the convex
mathematical formula called QMIP
Assumptions:Crisp InputCrisp OutputRelation between input and output = Fuzzy function
3- A mathematical programming approach
The basic idea is to insert fixed penalty costs into the loss function for possible deletion.
Only observations that produce reduction larger than their penalty costs are deleted from the data set.
The proposed PTS estimator minimizes:Sum of the k square residuals in the clean data Sum of the penalties for deleting the rest observations.
3- A mathematical programming approach
k(Clean data) M-k (outliers)
M observations
Nj ijXjcjiY L
0 )(
ijXj
Nj
hiY
01
ijXj
cj
NjiY U )0 (
),1,(~ UL
iYhiYiY
iY
,0,0,0
,...,2,1,0,0,1
,01
)(
YX
MiYX
XXc
XY
Yu
t
ti
i
),...,,( 10 Nt cccc
3- A mathematical programming approach
),...,,( 10 N 𝛼𝑗 + 𝑐𝑗 𝑗 - 𝑐𝑗 𝑗
A~
.,...,2,1,1 MihXc
XYt
ti
Mki ck
iNj iyijX
jMin 1
2)(2)1 1(
.10,,...,2,1,0,
,,...,10 0
2/1)1(
,,...,10
2/1)1(0
:
ixNjjcRj
MiiyN
j
N
j ijxjcHijxj
MiiyN
j ijxjcHijxN
j jtosubject
3- A mathematical programming approach
The above analysis leads to the following quadratic programming problem:
The value cσ can be interpreted as a threshold for the allowable size of the residuals.
The constant c is well known from robust cut-off parameter , and it will be a cut-off parameter between data outlier and prediction vagueness.
2.5σ or 3σ is a reasonable threshold under Gaussian conditions.
The penalty cost is defined a priory and the estimator’s performance is very sensitive to this penalty which regulates the robustness and the efficiency of the estimator.
The term (𝑐σ)2can be interpreted as a penalty cost for deleting any observation where σ is a robust residual scale, and c is a cut-off parameter.
3- A mathematical programming approach
Construct a regression estimator that has high breakdown point combined with good efficiency.
For this purpose appropriate penalties for high-leverage observations are developed :
•Unmask the multiple outliers
•Delete bad high-leverage outliers whereas keeping all of good high-leverage points
3- A mathematical programming approach
.01
2)(2)(1
,...,2,1,1,0
.10,,...,2,1,0,
,,...,10 0
2/1)1(
,,...,10
2/1)1(0
:
otherwise
N
iq
iy
ijx
ji
Mii
ixNjjcRj
MiiyN
j
N
j ijxjcHijxj
MiiyN
j ijxjcHijxN
j jtosubject
i
qiNj iyijX
jMin M 2)(2)1 1(
( 0, 0), ( 1, 1)… ( , )𝑋 𝑌 𝑋 𝑌 𝑋𝑀 𝑌𝑀
Robust penalties =( )2𝑞𝜎
<< If = 1 the residual is reduced to zero the loss function is penalized with( )2 >>𝛿𝑖 𝑞𝜎
4- Quadratic mixed integer programming for PTS
How the proposed method performs in fuzzy regression analysis in comparison with other methods ?
This example has fuzzy observations only for dependent variable.
Example
Tanaka et al. (1989) designed an example to illustrate their regression model for dealing with the problem of crisp independent variable and fuzzy dependent variable.
•Diamond(1988)•Kim and Bishu (1998)•Savice and Pederyzc (1991)•Kao. C, Chyu. C.-L., (2003) •Nasrabadi et al. (2003)
5- Numerical example
I X i (y i, ei) Error in estimation
Tanaka et al. Diamond Kim-bishu Kao chyu Nasrabadi et al Proposed
1 1 )8.0,1.8( 3.350 2.207 2.07 2.217 2.564 2.805
2 2 )6.4,2.4( 2.850 3.050 3.025 3.024 2.813 2.170
3 3 )9.5,2.6( 1.522 1.092 1.042 1.082 0.718 0.551
4 4 )13.5,2.6( 2.257 2.844 2.902 2.812 3.062 2.073
5 5 )13,2.4( 2.415 0.950 0.850 0.954 0.614 0.65
Total Error 12.39 10.143 10.026 10.089 9.771 8.249
In the study of Tanaka et al. (1989), three types of fuzzy regression models: Min problem, Max
problem, and Conjunction problem, were discussed. For he sake of simplicity, the results of the
Min problem at h = 0 is used for comparison.
5- Numerical example
New methodology for deleting outliers in liner fuzzy regression is presented which reduces the problem to a quadratic mixed integer program.
The approach is shown to perform well when compared to other models in fuzzy regression literature.
6- Conclusion
Thanks for your attention