Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these...
-
Upload
byron-benson -
Category
Documents
-
view
218 -
download
1
Transcript of Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these...
![Page 1: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/1.jpg)
Ch. Eick
COSC 6335Fall 2013
Post Analysis Project1
Christoph F. Eick
![Page 2: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/2.jpg)
Ch. EickCh. Eick
Post Analysis Project1Disclaimer The main purpose of these slides is not criticize groups but
rather to learn how to do a better job when analyzing data and interpreting data mining results.
Most of you do not have much experience in these tasksLearning without making errors is impossible; therefore,
students can benefit from discussing errors of other students VisualizationUse large, high resolution displays—some students used
displays that did not reveal much because of too high density.Quality of the visualization impacts what you are able to see If you compare displays, put them next to each other!! Use the same coordinate systems/scale in displays you compare2
![Page 3: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/3.jpg)
Ch. EickCh. Eick
Post Analysis Project1 Part2Interpretation Scatterplot: the key question is if the attribute/pair of attributes
can provide some evidence for the dominance of a particular class in a particular region in the attribute space; not if the attribute pair clearly separates the classes.
Vague interpretation of quantitative results; e.g. “Att1 seems to be more important that Att2” versus “the fact the regression coefficient of Att1 is 12 times as large as the regression coefficient of Att2 suggest that attribute Att1 has a much stronger impact on class membership”.
Overlooking patterns in displays; e.g. regions that are dominated by one class or only looking for pattern in E/W direction when there are also clear patterns in N/S direction.
Not giving summaries at all or giving very “quick” summaries 3
![Page 4: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/4.jpg)
Ch. Eick
4
Ch. Eick
Some Displays
![Page 5: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/5.jpg)
Ch. Eick
5
Ch. Eick
Discuss Scatter Plots generated by Group 8
![Page 6: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/6.jpg)
Ch. Eick
6
Ch. Eick
Regression Results
No Scaling:
R2: Multiple R-squared: 0.286 Adjusted R-squared: 0.282Coefficients:(Intercept) V2 V3 V6 V7 -0.9930791 0.0066490 0.0006933 0.0126270 0.1399540
With Scaling:
GlucoseConc BloodP BMI Pedigree 121.6867628 72.4051842 32.4574637 0.4718763
CoefficientsIntercept scale(GlucoseC
onc)scale(BloodP)
scale(BMI) scale(Pedigree)
0.343923
0.204457
0.008583
0.086987
0.046509
Mean Value
The fact that the R2 is 0.28 suggests that the results a suggestive but do notIndicate a strong finding about the importance of the attributes.
![Page 7: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/7.jpg)
Ch. Eick
7
Ch. Eick
Box Plots Thanks to Group 10!
![Page 8: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/8.jpg)
Ch. EickCh. Eick
Post Analysis Project1 Part3Statistical Summaries If there are minor disagreement I took away 1 pointIf the results do not make any sense, I took away a lot of points (only happened
once)If it was not clear how the results were generated (no R-code or incomplete R-
code or lack of explanation), I also took away points. OtherYou were also supposed to interpret the histograms, but the project specification
failed to ask you to do that! discuss another example inReview2Importance of AttributesGC is definitely very helpful for diagnosing diabetes (scatter plot, regression); e.g.
if it is quite low, it is very unlikely that the person has diabetes (useful for diabetes test)
BMI (boxplot, scatterplot, regression coefficients) and to a lesser extend Pedigree have some usefulness in diagnosing diabetes.
No evidence has been suggested by any group that DBP has any usefulness in diagnosing diabetes, although it has a week positive correlation of 0.28 with BMI8
![Page 9: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.](https://reader036.fdocuments.in/reader036/viewer/2022082711/56649ec85503460f94bd543d/html5/thumbnails/9.jpg)
Ch. EickCh. Eick
Post Analysis Project1 Part4Linear Regression If you do not scale data, interpretation of the observed coefficients
is quite complicated (see previous slide). Lack of quantitative assessment of resultsStar PlotsWhat is in your opinion the usefulness of this techniques?I myself have difficulties making sense of those, but some of you do
seem to like Star Plots much more... Conclusion/Other Findings Half of the groups of quite short conclusions and most summaries
are somewhat vague; e.g. they do not write aboutThe importance/usefulness of the attributesThe usefulness of the employed techniquesKnowledge about diabetes generated in Project1…
Project Weights Fall 2013 Project2>Project3??>Project4 Project1
9