Applications e
-
Upload
sachin-bassaiye -
Category
Documents
-
view
229 -
download
0
Transcript of Applications e
-
8/10/2019 Applications e
1/25
Choice of an appropr iate statistical technique
a complex issue
somewhat arbi traryReal-l i fe data often contain mixtures of di fferent types of data
two statisticians may select different methods
depending upon what assumptions they are wil l ing to take
into account
extraneous factors
availabil i ty of software and its l imitations
availabil i ty of time and financial resources
General Principles of Data
Analysis
-
8/10/2019 Applications e
2/25
Warnings
F igures allow us to calculate them
Applying different techniques and obtaining di ff erent resul tsdoes not mean that something is wrong
Looking for an answer to the same question by using several
methods may lead to a better understanding
Obtaining negative resul ts may be as informative as getti ng a
positi ve one
Obtaining no answer by using one technique, does not mean
that there is no answer at all
Etc.
General Principles of Data
Analysis
-
8/10/2019 Applications e
3/25
The choice of a statistical technique depends essentially upon
Characteristics of the analysis question;
Characteristics of the data;
Characteristics of the sampling design.
Character istics of the Analysis Question
Whether there is a distinction between independent and dependent
variables or not?
Whether the nature of the research problem requires:
Description, exploration, estimation, or
Testing of a hypothesis or model
Whether the focus of research is on ' var iables' or 'objects.
General Principles of Data
Analysis
-
8/10/2019 Applications e
4/25
Character istics of the Data
Types of data sets
I ndividuals - var iables data sets
Proximi ties data sets
Variable - Variable Proximities
I ndividual - I ndividual Proximities
Types of Variables
Continuous or Quantitative Variables
Discrete or Quali tative Variables
Variable types by measurement level
General Principles of Data
Analysis
Nominal-scale variables
Ordinal -scale variables
I nterval-scale vari ables
Ratio-scale variables
-
8/10/2019 Applications e
5/25
-
8/10/2019 Applications e
6/25
Techniques for problems with distinction between independent anddependent variables
General Principles of Data
Analysis
Analysis Method
Dependent Independent Dependent Independent
One One Nominal Nominal Non-parametric tests, Chi-squareOne One Nominal
(dichotomous)
Nominal Multiple Classification Analysis
One One Nominal Nominal
(Dichotomous)
Wilcoxon's two sample test, Chi-square,
Kolmogorov-Smirnov Test
One One Interval-scale Nominal
(Dichotomous)
t-test, Analysis of Variance
One One Interval-scale Interval-scale Regression AnalysisOne One Interval-scale Nominal Analysis of Variance
One More Nominal Interval-scale Discriminant Analysis
One More Interval-scale Nominal Analysis of Variance, Multiple Regression
Analysis, Multiple Classification Analysis
One More Interval scale Dummy Analysis of Variance, Multiple Regression
Analysis, Multiple Classification Analysis
One More Interval-scale Interval-scale Multiple Regression Analysis
No. of Variables Measurement Level
-
8/10/2019 Applications e
7/25
Usual way of statistical problem solving
Formulate the question using terms and logics of the specif ic
field of the problem (science management, pedagogy,
economics, etc.)
Reformulate the question using statistical terms and logics
F ind appropriate statistical model(s) and technique(s)
Use the selected model(s) and technique(s)
Give statistical interpretation to the resul ts obtained
Reformulate the interpretation wi th terms of the original f ield
of application
General Principles of Data
Analysis
-
8/10/2019 Applications e
8/25
Question in research management
Research groups have multiple outputs comprising publications,
patents, experimental mater ials etc. What are the differences if any
in the performance of the Research Groups of selected countr ies?
Statistical question
Can we construct a reasonable productivity index, using the
following measures of the scienti f ic output
Articles in country PatentsArticles abroad Algor ithms and designs
Original research reports Exper imental mater ial
Can we find a signif icant dif ference by countr ies in the productivity
index?
Scientific products by
country
-
8/10/2019 Applications e
9/25
Statistical model and technique
Partial order scor ing for constructing the index of research output
Analysis of variance for testing the hypothesis concerning the
signif icance of the difference
Use of the selected model and technique
Scientific products by
country
RUN POSCOR
FILES
PRINT = POSCOR.LST
DICTIN = R2R3RU.DIC
DATAIN = R2RU.DAT
DICTOUT =POSCOR.DIC
DATAOUT =POSCOR.DAT
SETUP
POSCOR SCORES OF RU OUTPUTS
BADDATA=MD1 -
IDVAR=V2 -
TRANSVARS=(V1)
POSCOR ORDER=DESR -
ANAME= OUTPUT
VARS=(V116,V118,V122,V126,V128,V13
0)
RUN ONEWAY
FILES PRINT = ONEWAY1.LST
DICTIN = POSCOR.DIC
DATAIN = POSCOR.DAT
SETUP
ANALYSIS OF VARIANCE OF RU OUTPUT
BADDATA=MD1 -
PRINT=CDICT DEPVARS=(V8) CONVARS=(R1)
RECODE
R1=RECODE V15 (40)=1, (360)=2, (410)=3, (638)=4, (844)=5, (868)=6
-
8/10/2019 Applications e
10/25
Scientific products by
country
Use of the selected model and technique (results)
Weight-
sum
1 334 334 22.9 37.731 35.794 1.26E+04 16.8 9.02E+052 239 239 16.4 45.213 35.778 1.08E+04 14.4 7.93E+05
3 200 200 13.7 77.585 27.336 1.55E+04 20.7 1.35E+06
4 225 225 15.4 52.547 35.43 1.18E+04 15.7 9.02E+05
5 233 233 16 36.7 33.266 8.55E+03 11.4 5.71E+05
6 229 229 15.7 69.074 36.255 1.58E+04 21.1 1.39E+06
Code
Label N % Mean
S.D.(esti
m.) Sum of X %
Sum of X-
square
Total sum of squares 2048467For 6 groups , Eta 0.4018943
For 6 groups , Etasq 0.161519
For 6 groups , Eta(adj) 0.3982909
For 6 groups , Etasq(adj) 0.1586357
Between means sum of squares 330866.5
Within groups sum of squares 1717601
F( 5,1454) 56.018
-
8/10/2019 Applications e
11/25
Scientific products by
countryStatistical interpretation
The F( 5,1454)=56.018 value shows that there is a highly
signi f icant dif ference by country in the constracted performance
index. We see also a medium strength differentiation between the
countr ies: Eta(adj)=0.398.
The Mean values show the level of each country.
I nterpretation for research management
There are two countr ies with low, two ones with medium and two
other ones with high productivity index.
Source
P.S. Nagpaul : Guide to Advanced Data Analysis using I DAMS Software
-
8/10/2019 Applications e
12/25
Question in psychology - pedagogy
I ntellectual performance, motivation and creativity of school children can
be measured by using several indicators. Some of them are produced by
the chi ldren themselves (e.g. IQ tests) others are based on the evaluationgiven by their teachers (e.g. average grade). What are the perceivable
dimensions if any behind these indicators?
Statistical question
I n the set of the listed indicators, are there any groups within which
statistical inter-correlation and between which statistical independencecan be detected?
TAverage grade TCreative behaviourC IQ C Achievement motivationC Creativity test TMotivated behaviourC Creative atti tude TM otivation index
Performance, motivation
and creativity of schoolchildren
-
8/10/2019 Applications e
13/25
Statistical model and technique
Pearsonian correlation between the measured indicators
Mul tidimensional scaling, cluster analysis
Use of the selected model and technique
Executing PEARSON, MDSCAL, CLUSFINDin IDAMS
MDSCALresul t
Performance, motivation
and creativity of schoolchildren
Teachers
Children
-
8/10/2019 Applications e
14/25
Use of the selected model and technique
CLUSFINDresul t
Performance, motivation
and creativity of schoolchildren
C IQ
C Creativity test
C Creative atti tude
C Achievem. motivation
TAverage grade
TCreative behaviour
TMotivated behaviour
TM otivation index
0,75
0,71
0,40
0,45
0,27
0,13
0,02
-
8/10/2019 Applications e
15/25
Performance, motivation
and creativity of schoolchildrenStatistical interpretation
Mul tidimensional scaling shows clear separation of indicators produced
by children and teachers
Cluster analysis supports the finding of the separation of var iablescoming from teachers and children
Pedagogical/psychological interpretation
Just one aspect: ratings given by teachers to chi ldren are near ly the
same, independently of the evaluated abil i ty, atti tude or behaviour
dimensionSource
M. Hunya: Mul tidimensional statistical techniques in pedagogical studies
Data
A.Deak, B. Kozeki : Study in to the eff ect of motivation and creativity factors on the
performance of school children
-
8/10/2019 Applications e
16/25
Question in hydrology
We have water level data on four r ivers in North-Afr ica (mor
than 40 years). Can the water f low level be predicted on the basis of
data from the past? I f so, with what precision?
What if the average f low level is considered instead of the individual
ones?
Statistical question
Can the r iver f low values be predicted by using a set of valuesfrom the preceding per iod?
How does the prediction change if 6 month average flow is
used?
Prediction of river flow
values
-
8/10/2019 Applications e
17/25
Statistical model and technique
Autoregression model (wi th a lag of 12 to 36) applied to the river f low
time ser ies
Transformation of the original data into a time series of movingaverages (interval length = 6)
Use of the selected model and technique
Time Ser ies Analysis option from the IDAMS interactive facil i ties
Original series Moving average series
12 months R* * 2=0,32 12 months R* * 2=0,92
24 months R* * 2=0,35 24 months R* * 2=0,93
36 months R** 2=0,36
Prediction of river flow
values
-
8/10/2019 Applications e
18/25
Use of the selected model and technique
Original ser ies
Prediction of river flow
values
Moving average ser ies
-
8/10/2019 Applications e
19/25
Prediction of river flow
valuesStatistical interpretation
Autoregression shows that individual values can be predicted (Unbiased
R* * 2 = 0,32 - 0,36; for 12 to 36 months) with moderate or avarage
precision, high peak values are very poor ly reproduced.
I n the case of a 6 month moving average, the prediction is near ly perfect
(Unbiased R** 2 = 0,92; for 12 months).
Hydrological interpretation
Although the pattern of changes can fair ly be reproduced, even thr ee
years data from the past are not enough at al l to predict the height ofpeak flows.
But if we consider 6 month averages, they can be predicted almost wi th
ful l precision.
Data
UNESCO, Water Science Di vision
-
8/10/2019 Applications e
20/25
Question concerning company management
What are the factors that inf luence the economic performance
of a company? Economic performance is measured by the
return on capital employed.Statistical question
Can the return on capital be predicted by using a set of
economic and production indicators from those character izing
the company?
How does the prediction change if we are loking for a subset of
best predictors?
Statistical model and technique
Mul tiple linear regression
Stepwise regression
Business
-
8/10/2019 Applications e
21/25
Use of the selected model and technique
Running REGRESSN
Results
The fu ll regression model explains 70% of the adjusted variance
of the dependant variable. I ts standard error is about one hal f of
the mean, value of the determinant of the correlation matr ix is
.79478E-05. There are 8 variables (out of 12) with high
covar iance ratio
values. The stepwise regression model selects 3 variables for explaining
80 % variance. No multicol l ineari ty (0.77647 ). Standard error of
the estimate of the dependent var iable = 0.06135 which is qui te
low: high rel iabil i ty of estimation.
Business
-
8/10/2019 Applications e
22/25
Business
Statistical interpretation
Ful l r egression model:the reliabil i ty of prediction is poor. Strong
mul ticol l ineari ty is shown. Variables, which contr ibute to
mul ticol l ineari ty can be identi f ied
The stepwise regression model: 3 variables for explaining 80%
variance. No mul ticoll ineari ty. H igh reliabil i ty of estimation.
I nterpretation for management
Al though the ful l indicator set can give nice prediction, it can not
be suggested for real use because of the poor predictionreliability.
But i f we consider 3 careful ly selected indicators, we can get a
fair prediction.Source
P.S. Nagpaul, I ndia
-
8/10/2019 Applications e
23/25
Question concerning measurement of knowledge level
Tests are used very often in education for checking the level of
knowledge in one or in another subject. Long tests with many
questions can meet relatively easily the reliability requirement.
The question i s if we can make a shor t interactive, adaptive test
from a long test, preserving at least nearly the original rel iabi l i ty.
Statistical question
Can we give a good estimate of the original test value by using atree structure based prediction?
Statistical model and technique
Regression tree
Education
-
8/10/2019 Applications e
24/25
Use of the selected model and technique
Running SEARCH
Results
Starting f rom a standardized test (f or checking a specif ic verbal
aptitude) containing 20 questions, a regression tree with 3-4
questions was obtained. The regression tree contains 10 final
subgroups (leaves) with estimates for the original test value ranging
from 6,4 to 59,2. The explained variance is 90,4%.
Education
-
8/10/2019 Applications e
25/25
Education
Statistical interpretation
A very good estimate can be given for the original test value by using the
obtained regression tree.
I nterpretation for test designers
Using the the tree structur e, cumputer assisted test can be constructed,
which is much shor ter, without loosing the power of the or iginal test.
SourceM . Hunya: F inding optimal in teractive test structures (1982)