Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these...

9
Ch. Eick COSC 6335 Fall 2013 Post Analysis Project1 Christoph F. Eick

Transcript of Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these...

Page 1: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. Eick

COSC 6335Fall 2013

Post Analysis Project1

Christoph F. Eick

Page 2: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. EickCh. Eick

Post Analysis Project1Disclaimer The main purpose of these slides is not criticize groups but

rather to learn how to do a better job when analyzing data and interpreting data mining results.

Most of you do not have much experience in these tasksLearning without making errors is impossible; therefore,

students can benefit from discussing errors of other students VisualizationUse large, high resolution displays—some students used

displays that did not reveal much because of too high density.Quality of the visualization impacts what you are able to see If you compare displays, put them next to each other!! Use the same coordinate systems/scale in displays you compare2

Page 3: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. EickCh. Eick

Post Analysis Project1 Part2Interpretation Scatterplot: the key question is if the attribute/pair of attributes

can provide some evidence for the dominance of a particular class in a particular region in the attribute space; not if the attribute pair clearly separates the classes.

Vague interpretation of quantitative results; e.g. “Att1 seems to be more important that Att2” versus “the fact the regression coefficient of Att1 is 12 times as large as the regression coefficient of Att2 suggest that attribute Att1 has a much stronger impact on class membership”.

Overlooking patterns in displays; e.g. regions that are dominated by one class or only looking for pattern in E/W direction when there are also clear patterns in N/S direction.

Not giving summaries at all or giving very “quick” summaries 3

Page 4: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. Eick

4

Ch. Eick

Some Displays

Page 5: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. Eick

5

Ch. Eick

Discuss Scatter Plots generated by Group 8

Page 6: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. Eick

6

Ch. Eick

Regression Results

No Scaling:

R2: Multiple R-squared: 0.286 Adjusted R-squared: 0.282Coefficients:(Intercept) V2 V3 V6 V7 -0.9930791 0.0066490 0.0006933 0.0126270 0.1399540

With Scaling:

GlucoseConc BloodP BMI Pedigree 121.6867628 72.4051842 32.4574637 0.4718763

CoefficientsIntercept scale(GlucoseC

onc)scale(BloodP)

scale(BMI) scale(Pedigree)

0.343923

0.204457

0.008583

0.086987

0.046509

Mean Value

The fact that the R2 is 0.28 suggests that the results a suggestive but do notIndicate a strong finding about the importance of the attributes.

Page 7: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. Eick

7

Ch. Eick

Box Plots Thanks to Group 10!

Page 8: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. EickCh. Eick

Post Analysis Project1 Part3Statistical Summaries If there are minor disagreement I took away 1 pointIf the results do not make any sense, I took away a lot of points (only happened

once)If it was not clear how the results were generated (no R-code or incomplete R-

code or lack of explanation), I also took away points. OtherYou were also supposed to interpret the histograms, but the project specification

failed to ask you to do that! discuss another example inReview2Importance of AttributesGC is definitely very helpful for diagnosing diabetes (scatter plot, regression); e.g.

if it is quite low, it is very unlikely that the person has diabetes (useful for diabetes test)

BMI (boxplot, scatterplot, regression coefficients) and to a lesser extend Pedigree have some usefulness in diagnosing diabetes.

No evidence has been suggested by any group that DBP has any usefulness in diagnosing diabetes, although it has a week positive correlation of 0.28 with BMI8

Page 9: Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Ch. EickCh. Eick

Post Analysis Project1 Part4Linear Regression If you do not scale data, interpretation of the observed coefficients

is quite complicated (see previous slide). Lack of quantitative assessment of resultsStar PlotsWhat is in your opinion the usefulness of this techniques?I myself have difficulties making sense of those, but some of you do

seem to like Star Plots much more... Conclusion/Other Findings Half of the groups of quite short conclusions and most summaries

are somewhat vague; e.g. they do not write aboutThe importance/usefulness of the attributesThe usefulness of the employed techniquesKnowledge about diabetes generated in Project1…

Project Weights Fall 2013 Project2>Project3??>Project4 Project1

9