Applications e

download Applications e

of 25

Transcript of Applications e

  • 8/10/2019 Applications e

    1/25

    Choice of an appropr iate statistical technique

    a complex issue

    somewhat arbi traryReal-l i fe data often contain mixtures of di fferent types of data

    two statisticians may select different methods

    depending upon what assumptions they are wil l ing to take

    into account

    extraneous factors

    availabil i ty of software and its l imitations

    availabil i ty of time and financial resources

    General Principles of Data

    Analysis

  • 8/10/2019 Applications e

    2/25

    Warnings

    F igures allow us to calculate them

    Applying different techniques and obtaining di ff erent resul tsdoes not mean that something is wrong

    Looking for an answer to the same question by using several

    methods may lead to a better understanding

    Obtaining negative resul ts may be as informative as getti ng a

    positi ve one

    Obtaining no answer by using one technique, does not mean

    that there is no answer at all

    Etc.

    General Principles of Data

    Analysis

  • 8/10/2019 Applications e

    3/25

    The choice of a statistical technique depends essentially upon

    Characteristics of the analysis question;

    Characteristics of the data;

    Characteristics of the sampling design.

    Character istics of the Analysis Question

    Whether there is a distinction between independent and dependent

    variables or not?

    Whether the nature of the research problem requires:

    Description, exploration, estimation, or

    Testing of a hypothesis or model

    Whether the focus of research is on ' var iables' or 'objects.

    General Principles of Data

    Analysis

  • 8/10/2019 Applications e

    4/25

    Character istics of the Data

    Types of data sets

    I ndividuals - var iables data sets

    Proximi ties data sets

    Variable - Variable Proximities

    I ndividual - I ndividual Proximities

    Types of Variables

    Continuous or Quantitative Variables

    Discrete or Quali tative Variables

    Variable types by measurement level

    General Principles of Data

    Analysis

    Nominal-scale variables

    Ordinal -scale variables

    I nterval-scale vari ables

    Ratio-scale variables

  • 8/10/2019 Applications e

    5/25

  • 8/10/2019 Applications e

    6/25

    Techniques for problems with distinction between independent anddependent variables

    General Principles of Data

    Analysis

    Analysis Method

    Dependent Independent Dependent Independent

    One One Nominal Nominal Non-parametric tests, Chi-squareOne One Nominal

    (dichotomous)

    Nominal Multiple Classification Analysis

    One One Nominal Nominal

    (Dichotomous)

    Wilcoxon's two sample test, Chi-square,

    Kolmogorov-Smirnov Test

    One One Interval-scale Nominal

    (Dichotomous)

    t-test, Analysis of Variance

    One One Interval-scale Interval-scale Regression AnalysisOne One Interval-scale Nominal Analysis of Variance

    One More Nominal Interval-scale Discriminant Analysis

    One More Interval-scale Nominal Analysis of Variance, Multiple Regression

    Analysis, Multiple Classification Analysis

    One More Interval scale Dummy Analysis of Variance, Multiple Regression

    Analysis, Multiple Classification Analysis

    One More Interval-scale Interval-scale Multiple Regression Analysis

    No. of Variables Measurement Level

  • 8/10/2019 Applications e

    7/25

    Usual way of statistical problem solving

    Formulate the question using terms and logics of the specif ic

    field of the problem (science management, pedagogy,

    economics, etc.)

    Reformulate the question using statistical terms and logics

    F ind appropriate statistical model(s) and technique(s)

    Use the selected model(s) and technique(s)

    Give statistical interpretation to the resul ts obtained

    Reformulate the interpretation wi th terms of the original f ield

    of application

    General Principles of Data

    Analysis

  • 8/10/2019 Applications e

    8/25

    Question in research management

    Research groups have multiple outputs comprising publications,

    patents, experimental mater ials etc. What are the differences if any

    in the performance of the Research Groups of selected countr ies?

    Statistical question

    Can we construct a reasonable productivity index, using the

    following measures of the scienti f ic output

    Articles in country PatentsArticles abroad Algor ithms and designs

    Original research reports Exper imental mater ial

    Can we find a signif icant dif ference by countr ies in the productivity

    index?

    Scientific products by

    country

  • 8/10/2019 Applications e

    9/25

    Statistical model and technique

    Partial order scor ing for constructing the index of research output

    Analysis of variance for testing the hypothesis concerning the

    signif icance of the difference

    Use of the selected model and technique

    Scientific products by

    country

    RUN POSCOR

    FILES

    PRINT = POSCOR.LST

    DICTIN = R2R3RU.DIC

    DATAIN = R2RU.DAT

    DICTOUT =POSCOR.DIC

    DATAOUT =POSCOR.DAT

    SETUP

    POSCOR SCORES OF RU OUTPUTS

    BADDATA=MD1 -

    IDVAR=V2 -

    TRANSVARS=(V1)

    POSCOR ORDER=DESR -

    ANAME= OUTPUT

    VARS=(V116,V118,V122,V126,V128,V13

    0)

    RUN ONEWAY

    FILES PRINT = ONEWAY1.LST

    DICTIN = POSCOR.DIC

    DATAIN = POSCOR.DAT

    SETUP

    ANALYSIS OF VARIANCE OF RU OUTPUT

    BADDATA=MD1 -

    PRINT=CDICT DEPVARS=(V8) CONVARS=(R1)

    RECODE

    R1=RECODE V15 (40)=1, (360)=2, (410)=3, (638)=4, (844)=5, (868)=6

  • 8/10/2019 Applications e

    10/25

    Scientific products by

    country

    Use of the selected model and technique (results)

    Weight-

    sum

    1 334 334 22.9 37.731 35.794 1.26E+04 16.8 9.02E+052 239 239 16.4 45.213 35.778 1.08E+04 14.4 7.93E+05

    3 200 200 13.7 77.585 27.336 1.55E+04 20.7 1.35E+06

    4 225 225 15.4 52.547 35.43 1.18E+04 15.7 9.02E+05

    5 233 233 16 36.7 33.266 8.55E+03 11.4 5.71E+05

    6 229 229 15.7 69.074 36.255 1.58E+04 21.1 1.39E+06

    Code

    Label N % Mean

    S.D.(esti

    m.) Sum of X %

    Sum of X-

    square

    Total sum of squares 2048467For 6 groups , Eta 0.4018943

    For 6 groups , Etasq 0.161519

    For 6 groups , Eta(adj) 0.3982909

    For 6 groups , Etasq(adj) 0.1586357

    Between means sum of squares 330866.5

    Within groups sum of squares 1717601

    F( 5,1454) 56.018

  • 8/10/2019 Applications e

    11/25

    Scientific products by

    countryStatistical interpretation

    The F( 5,1454)=56.018 value shows that there is a highly

    signi f icant dif ference by country in the constracted performance

    index. We see also a medium strength differentiation between the

    countr ies: Eta(adj)=0.398.

    The Mean values show the level of each country.

    I nterpretation for research management

    There are two countr ies with low, two ones with medium and two

    other ones with high productivity index.

    Source

    P.S. Nagpaul : Guide to Advanced Data Analysis using I DAMS Software

  • 8/10/2019 Applications e

    12/25

    Question in psychology - pedagogy

    I ntellectual performance, motivation and creativity of school children can

    be measured by using several indicators. Some of them are produced by

    the chi ldren themselves (e.g. IQ tests) others are based on the evaluationgiven by their teachers (e.g. average grade). What are the perceivable

    dimensions if any behind these indicators?

    Statistical question

    I n the set of the listed indicators, are there any groups within which

    statistical inter-correlation and between which statistical independencecan be detected?

    TAverage grade TCreative behaviourC IQ C Achievement motivationC Creativity test TMotivated behaviourC Creative atti tude TM otivation index

    Performance, motivation

    and creativity of schoolchildren

  • 8/10/2019 Applications e

    13/25

    Statistical model and technique

    Pearsonian correlation between the measured indicators

    Mul tidimensional scaling, cluster analysis

    Use of the selected model and technique

    Executing PEARSON, MDSCAL, CLUSFINDin IDAMS

    MDSCALresul t

    Performance, motivation

    and creativity of schoolchildren

    Teachers

    Children

  • 8/10/2019 Applications e

    14/25

    Use of the selected model and technique

    CLUSFINDresul t

    Performance, motivation

    and creativity of schoolchildren

    C IQ

    C Creativity test

    C Creative atti tude

    C Achievem. motivation

    TAverage grade

    TCreative behaviour

    TMotivated behaviour

    TM otivation index

    0,75

    0,71

    0,40

    0,45

    0,27

    0,13

    0,02

  • 8/10/2019 Applications e

    15/25

    Performance, motivation

    and creativity of schoolchildrenStatistical interpretation

    Mul tidimensional scaling shows clear separation of indicators produced

    by children and teachers

    Cluster analysis supports the finding of the separation of var iablescoming from teachers and children

    Pedagogical/psychological interpretation

    Just one aspect: ratings given by teachers to chi ldren are near ly the

    same, independently of the evaluated abil i ty, atti tude or behaviour

    dimensionSource

    M. Hunya: Mul tidimensional statistical techniques in pedagogical studies

    Data

    A.Deak, B. Kozeki : Study in to the eff ect of motivation and creativity factors on the

    performance of school children

  • 8/10/2019 Applications e

    16/25

    Question in hydrology

    We have water level data on four r ivers in North-Afr ica (mor

    than 40 years). Can the water f low level be predicted on the basis of

    data from the past? I f so, with what precision?

    What if the average f low level is considered instead of the individual

    ones?

    Statistical question

    Can the r iver f low values be predicted by using a set of valuesfrom the preceding per iod?

    How does the prediction change if 6 month average flow is

    used?

    Prediction of river flow

    values

  • 8/10/2019 Applications e

    17/25

    Statistical model and technique

    Autoregression model (wi th a lag of 12 to 36) applied to the river f low

    time ser ies

    Transformation of the original data into a time series of movingaverages (interval length = 6)

    Use of the selected model and technique

    Time Ser ies Analysis option from the IDAMS interactive facil i ties

    Original series Moving average series

    12 months R* * 2=0,32 12 months R* * 2=0,92

    24 months R* * 2=0,35 24 months R* * 2=0,93

    36 months R** 2=0,36

    Prediction of river flow

    values

  • 8/10/2019 Applications e

    18/25

    Use of the selected model and technique

    Original ser ies

    Prediction of river flow

    values

    Moving average ser ies

  • 8/10/2019 Applications e

    19/25

    Prediction of river flow

    valuesStatistical interpretation

    Autoregression shows that individual values can be predicted (Unbiased

    R* * 2 = 0,32 - 0,36; for 12 to 36 months) with moderate or avarage

    precision, high peak values are very poor ly reproduced.

    I n the case of a 6 month moving average, the prediction is near ly perfect

    (Unbiased R** 2 = 0,92; for 12 months).

    Hydrological interpretation

    Although the pattern of changes can fair ly be reproduced, even thr ee

    years data from the past are not enough at al l to predict the height ofpeak flows.

    But if we consider 6 month averages, they can be predicted almost wi th

    ful l precision.

    Data

    UNESCO, Water Science Di vision

  • 8/10/2019 Applications e

    20/25

    Question concerning company management

    What are the factors that inf luence the economic performance

    of a company? Economic performance is measured by the

    return on capital employed.Statistical question

    Can the return on capital be predicted by using a set of

    economic and production indicators from those character izing

    the company?

    How does the prediction change if we are loking for a subset of

    best predictors?

    Statistical model and technique

    Mul tiple linear regression

    Stepwise regression

    Business

  • 8/10/2019 Applications e

    21/25

    Use of the selected model and technique

    Running REGRESSN

    Results

    The fu ll regression model explains 70% of the adjusted variance

    of the dependant variable. I ts standard error is about one hal f of

    the mean, value of the determinant of the correlation matr ix is

    .79478E-05. There are 8 variables (out of 12) with high

    covar iance ratio

    values. The stepwise regression model selects 3 variables for explaining

    80 % variance. No multicol l ineari ty (0.77647 ). Standard error of

    the estimate of the dependent var iable = 0.06135 which is qui te

    low: high rel iabil i ty of estimation.

    Business

  • 8/10/2019 Applications e

    22/25

    Business

    Statistical interpretation

    Ful l r egression model:the reliabil i ty of prediction is poor. Strong

    mul ticol l ineari ty is shown. Variables, which contr ibute to

    mul ticol l ineari ty can be identi f ied

    The stepwise regression model: 3 variables for explaining 80%

    variance. No mul ticoll ineari ty. H igh reliabil i ty of estimation.

    I nterpretation for management

    Al though the ful l indicator set can give nice prediction, it can not

    be suggested for real use because of the poor predictionreliability.

    But i f we consider 3 careful ly selected indicators, we can get a

    fair prediction.Source

    P.S. Nagpaul, I ndia

  • 8/10/2019 Applications e

    23/25

    Question concerning measurement of knowledge level

    Tests are used very often in education for checking the level of

    knowledge in one or in another subject. Long tests with many

    questions can meet relatively easily the reliability requirement.

    The question i s if we can make a shor t interactive, adaptive test

    from a long test, preserving at least nearly the original rel iabi l i ty.

    Statistical question

    Can we give a good estimate of the original test value by using atree structure based prediction?

    Statistical model and technique

    Regression tree

    Education

  • 8/10/2019 Applications e

    24/25

    Use of the selected model and technique

    Running SEARCH

    Results

    Starting f rom a standardized test (f or checking a specif ic verbal

    aptitude) containing 20 questions, a regression tree with 3-4

    questions was obtained. The regression tree contains 10 final

    subgroups (leaves) with estimates for the original test value ranging

    from 6,4 to 59,2. The explained variance is 90,4%.

    Education

  • 8/10/2019 Applications e

    25/25

    Education

    Statistical interpretation

    A very good estimate can be given for the original test value by using the

    obtained regression tree.

    I nterpretation for test designers

    Using the the tree structur e, cumputer assisted test can be constructed,

    which is much shor ter, without loosing the power of the or iginal test.

    SourceM . Hunya: F inding optimal in teractive test structures (1982)