Red Wine Quality Assessment
-
Upload
weiyang-abbie-bi -
Category
Data & Analytics
-
view
268 -
download
0
Transcript of Red Wine Quality Assessment
![Page 1: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/1.jpg)
Red Wine Quality
EvaluationWeiyang Bi
Shilin WangZheng Xue
![Page 2: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/2.jpg)
Data Description
• Source:
Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho Verde Region(CVRVV), Porto, Portugal @2009
![Page 3: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/3.jpg)
Data Description
• The dataset is related to red variant of the Portuguese "Vinho Verde" wine.
• Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available.
![Page 4: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/4.jpg)
Dataset
> nrow(data[!complete.cases(data),])[1] 0
Missing values check
![Page 5: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/5.jpg)
Attribute informationInput variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric
acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 -
sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)
![Page 6: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/6.jpg)
Correlation matrix
![Page 7: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/7.jpg)
R code for training set and test set
B<-20
for(i in 1:B){
set.seed(i)
indexes<-sample(1:nrow(data),size=1000,replace=F)
train<-data[indexes[1:1000],]
test<-data[-indexes[1:1000],]
}
![Page 8: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/8.jpg)
Methods
Three methods were applied to the data set:
1) CART
2) Bagging
3) Random Forest
![Page 9: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/9.jpg)
Classification and Regression Trees (CART)
![Page 10: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/10.jpg)
CP Table
![Page 11: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/11.jpg)
Pruned Tree
![Page 12: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/12.jpg)
![Page 13: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/13.jpg)
Variable Selection
Total sulfur
dioxide
Volatile acidity
sulfatesResidual
sugar
alcohol
![Page 14: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/14.jpg)
Number of splits
![Page 15: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/15.jpg)
Error rate
![Page 16: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/16.jpg)
Bagging
![Page 17: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/17.jpg)
Merging data
![Page 18: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/18.jpg)
CP Table
![Page 19: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/19.jpg)
Misclassification Rate
![Page 20: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/20.jpg)
Misclassification rate ofbagged 100 trees
![Page 21: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/21.jpg)
ROC Graph
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1-Sepecificity
Sen
sitiv
ity
Best single tree:0.64
Bagged 100 trees:0.644
![Page 22: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/22.jpg)
Frequency Table
![Page 23: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/23.jpg)
Evaluation of Variable Importance
![Page 24: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/24.jpg)
Random Forest
![Page 25: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/25.jpg)
Data Structure
![Page 26: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/26.jpg)
Random Forest Fit
![Page 27: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/27.jpg)
Random Forest Plot
![Page 28: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/28.jpg)
Importance
![Page 29: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/29.jpg)
Relative Variable Importance
![Page 30: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/30.jpg)
Partial Dependence Plot
Alcohol Sulphates
Volatiles acidity Total sulfur dioxide
Par
tial
de
pe
nd
en
ce
Volatiles acidity
Par
tial
de
pe
nd
en
ce
Par
tial
de
pe
nd
en
ce
Par
tial
de
pe
nd
en
ce
![Page 31: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/31.jpg)
CART Bagging and RF Comparison
CART Bagging Random Forest
VariableSelection
Alcoholtotal sulfur dioxide
volatile aciditySulphates
Residual sugar
AlcoholSulphates
volatile aciditytotal sulfur dioxide
Densityfixed acidity
residual sugarcitric acid
pHfree sulfur dioxide
Chlorides
AlcoholSulphates
volatile aciditytotal sulfur dioxide
DensityChlorides
fixed acidityfree sulfur dioxide
pHcitric acid
residual sugar
![Page 32: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/32.jpg)
CART Bagging and RF Comparison
![Page 33: Red Wine Quality Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062302/58aa17611a28ab8a488b6eb5/html5/thumbnails/33.jpg)
Conclusion
• Random forest is the best prediction tool in this case over CART and bagging in terms of the lowest estimate test error rate.