Kirk Allen Defense

download Kirk Allen Defense

of 89

Transcript of Kirk Allen Defense

  • 7/30/2019 Kirk Allen Defense

    1/89

    Dissertation Defense

    Kirk Allen

    May 2, 2006

    The Statistics Concept Inventory:

    The Development and Analysis of a CognitiveAssessment Instrument in Statistics

  • 7/30/2019 Kirk Allen Defense

    2/89

    Organization

    Book One

    Creation of the SCI

    Book Two Expanding, Doing more with the data

    Book Three

    Re-validating Book Four

    Summarize, Speculation on the future

  • 7/30/2019 Kirk Allen Defense

    3/89

    My personal timeline

    Fall 2002 started Grad school

    Summer/Fall 2004 decided to go straight

    for Ph.D.

    Fall 2005 General exams

    Spring 2006 Taking my final class

    Spring 2006 Graduating!

  • 7/30/2019 Kirk Allen Defense

    4/89

    Background

    Statistics Concept Inventory (SCI)

    project began in Fall 2002

    Based on the format of the Force

    Concept Inventory (FCI)

    Shifts focus away from problem solving,

    which is the typical classroom format

    Focus on conceptual understanding

    Multiple choice, around 30 items

  • 7/30/2019 Kirk Allen Defense

    5/89

    Force Concept Inventory

    Focuses on Newtons three laws andrelated concepts

    Scores and gains on initial testing much

    lower than expected Led to evaluating teaching styles

    Interactive engagement found to be

    most effective at increasing studentunderstanding

  • 7/30/2019 Kirk Allen Defense

    6/89

    Other Concept Inventories

    Many engineering disciplines are

    developing concept inventories

    e.g., thermodynamics, circuits, materials,

    dynamics, statics, systems & signals

    Foundation Coalition http://www.foundationcoalition.org/home/keycomponents/concept/index.html

  • 7/30/2019 Kirk Allen Defense

    7/89

    Book One

    The process of creating the SCI I would have defended this as my Masters thesis

    Traditional (approximately), five-chapter format

    1. Introduction (short) 2. Test Theory

    The methods that were used in creating the SCI

    3. Concept inventories Descriptions of other work along similar lines

    4. Methods and Results Combined because Methods is short

    5. Preliminary conclusions (short)

  • 7/30/2019 Kirk Allen Defense

    8/89

    Results Spring 2005Course Level Mean, pre Mean, post SD, post

    Quality Junior IE 44.9% -- 13.4% (pre)

    Engr Intro, calc 40.7% 44.9% 14.6%

    Math #1 Intro, calc 46.8% 44.0% 14.4%

    Math #2 Intro, calc 48.5% 45.6% 14.4%

    External Intro, calc -- 49.8% 13.8%

    Psych Intro,

    algebra

    38.6% 43.9% 10.9%

  • 7/30/2019 Kirk Allen Defense

    9/89

    Reliability

    (Spring 2005)

    Course Pre-Test Alpha Post-Test Alpha

    Quality 0.7084 --

    Engr 0.6619 0.7744

    Math #1 0.6071 0.7676

    Math #2 0.7640 0.7079

    Psych 0.4284 0.5918

  • 7/30/2019 Kirk Allen Defense

    10/89

    Content Validity

    Content validity refers to the extent towhich items are (1) representative of theknowledge base being tested and (2)constructed in a sensible manner

    (Nunnally)

    Focus groups ensure that the

    question is being properly interpretedand help develop useful distracters

  • 7/30/2019 Kirk Allen Defense

    11/89

    Content Validity

    Faculty survey statistics topics were rated for theirimportance to the faculty helps provide a list of

    which topics to include on the SCI

    AP Statistics course outline also consulted for topic

    coverage

    Gibbs criteria identify poorly written questions

  • 7/30/2019 Kirk Allen Defense

    12/89

    Concurrent Validity

    For Spring 2004 Three courses: 1 Engr, 2 Math

    Course SCI Pre SCI Post SCI Gain SCI Norm.Gain

    Engr(n=29)

    r = 0.060(p = 0.758)

    r = 0.133(p = 0.493)

    r = 0.080(p = 0.679)

    r = 0.108(p = 0.578)

    Math #1

    (n=30)

    r = 0.323

    (p = 0.081)

    r = 0.502**

    (p = 0.005)

    r = 0.316

    (p = 0.089)

    r = 0.353

    (p = 0.056)

    Math #2

    (n=26)

    r = 0.219

    (p = 0.282)

    r = 0.384

    (p = 0.053)

    r = 0.303

    (p = 0.133)

    r = 0.336

    (p = 0.094)

  • 7/30/2019 Kirk Allen Defense

    13/89

    Construct Validity

    Three-factor and four-factor FIML with general factor

    Descriptive, inferential, probability, and graphical sub-tests

    Graphical a priorigrouped with Descriptive in 3-factorConfirmatory Model

    Overall results: Item Uniqueness 70.1% and 70.4%

    Preference is for four-factor model because graphical items are aseparate sub-test

    More on this later!

  • 7/30/2019 Kirk Allen Defense

    14/89

    Item Discrimination Index

    Compares top quartile to bottom quartile on

    each item

    Generally around 1/3 of the items fall into eachof the ranges poor (< 0.20), moderate (0.20 to

    0.40) and high (> 0.40)

  • 7/30/2019 Kirk Allen Defense

    15/89

    Item Analysis

    Discrimination index

    Alpha-if-deleted Reported by SPSS or SAS

    Shows how overall alpha would change if that one item weredeleted

    Answer distribution

    Try to eliminate or improve choices which are consistently notchosen

    Focus group comments

  • 7/30/2019 Kirk Allen Defense

    16/89

    Understanding p-values

    A researcher performs a t-test to test the followinghypotheses:

    He rejects the null hypothesis and reports a p-value

    of 0.10. Which of the following must be correct?

    a) The test statistic fell within the rejection region at thesignificance level

    b) The power of the test statistic used was 90%

    c) Assuming Ho: is true, there is a 10% possibility that theobserved value is due to chance **

    d) The probability that the null hypothesis is not true is 0.10

    e) The probability that the null hypothesis is actually true is 0.9

    00 : H

    01 : H

  • 7/30/2019 Kirk Allen Defense

    17/89

    Results for 4 classesPre #1 Post #1 Pre #2 Post #2 Pre #3 Post #3 Pre #4 Post #4

    Choice a% 15% 41% 32% 52% 5% 67% 17% 18%

    Choice b% 16% 18% 14% 20% 14% 0% 6% 9%

    Choice c% 41% 35%

    (-6%)

    41% 15%

    (-24%)

    62% 27%

    (-35%)

    47% 42%

    (-5%)

    Choice d% 18% 6% 14% 12% 19% 7% 19% 24%

    Choice e% 2% 0% 0% 0% 0% 0% 11% 6%

  • 7/30/2019 Kirk Allen Defense

    18/89

    Analysis

    Discrimination

    Pre: 0.25, -0.17, 0.52, 0.15

    Post: 0.00, -0.14, 0.25, 0.33

  • 7/30/2019 Kirk Allen Defense

    19/89

    P-value question

    Problems?

    too definitional

    p-value taught from an interpretive

    standpoint

    when to reject or not reject the null hypothesis

    Therefore

  • 7/30/2019 Kirk Allen Defense

    20/89

    New question(not a replacement)

    An engineer performs a hypothesis testand reports a p-value of 0.03. Based on asignificance level of 0.05, what is the

    correct conclusion?

    a)The null hypothesis is true.

    b)The alternate hypothesis is true.c)Do not reject the null hypothesis.

    d)Reject the null hypothesis **

  • 7/30/2019 Kirk Allen Defense

    21/89

    Results of New Question

    Discrimination better (post-test)

    0.20, 0.29, 0.75, 0.12

    still not great overall (0.19)

    Percent correct and gains low

    Post-test % correct (gain +/-%) 6% (-17%)

    20% (-3%) 33% (+4%)

    19% (+10%)

  • 7/30/2019 Kirk Allen Defense

    22/89

    Moving on

    Similar analyses were conducted for allitems, a sort of bottom-up approach todeveloping the test

    For right or wrong, the test has changedvery little since Spring 2004

    No need to continually repeat the item

    analysis tables with such fine detail Lets see what else we can do with the

    SCI!

  • 7/30/2019 Kirk Allen Defense

    23/89

    Exploring Reliability

    Results (older, also presented during Proposal) Demonstrated strong relationships between:

    Alpha-if-deleted (a measure of item reliability)

    Discrimination index

    Gap the average total score of students who answered anitem correctly average total score of students whoanswered an item incorrectly

    Mean(correct) Mean(incorrect)

    Focus groups tell us that guessing or using test-taking

    tricks are valid causal agents for poor item statistics This meshes with theory because you would expect these

    items to have a Gap of zero, which lowers total-scorevariance

  • 7/30/2019 Kirk Allen Defense

    24/89

    Online Test

    From Proposal Defense Comparing online

    vs. paper Differences found

    9 items

    Sub-test (except Probability) and overall scores Reliability: Probability and Inferential

    Problem with study: nearly all paper students at

    one university, which had very good overall results

    Differences are still not large

  • 7/30/2019 Kirk Allen Defense

    25/89

    Other findings

    Order effects

    No systematic bias in question order

    No correlation between percent correct and

    order position

    Small (but significant) downward trend in

    answer confidence

    Only about 5% of the total rating scale frombeginning of test to the end

  • 7/30/2019 Kirk Allen Defense

    26/89

    Round 2

    Spring 2006, pre-test

    Two sections of same course, taught by sameprofessor, took SCI on the same day

    One paper (n=14), one online (n=16) Very similar demographics

    Comparative results (next slide)

    Interesting finding

    Online: time and number correct inversely related Paper: opposite

    Not rigorously assessed

  • 7/30/2019 Kirk Allen Defense

    27/89

    MeasureFall2005

    Spring2006

    Probability Reliability

    Inferential

    Total Descriptive

    Inferential Mean

    Graphical

    Variance Probability

    #1 (Probability)

    #2 (Inferential)

    #3 (Descriptive) #7 (Graphical)

    #9 (Descriptive)

    #15 (Descriptive)

    #18 (Inferential)

    #19 (Inferential)

    #21 (Probability) #22 (Inferential)

    #28 (Graphical)

    #35 (Inferential)

    Items

    #36 (Inferential)

  • 7/30/2019 Kirk Allen Defense

    28/89

    Problems

    Fall 2005

    Confounding with university

    Spring 2006

    Pre-test

    Small sample size

  • 7/30/2019 Kirk Allen Defense

    29/89

    The Problem with Educational Research

    Rigour / Control

    StatisticalP

    ower

  • 7/30/2019 Kirk Allen Defense

    30/89

    Chapter 8

    Part A: Lit review

    Background on difficulties

    Attitudes

    Reasoning skills: Probability

    Kahneman & Tversky

    Reasoning skills: Statistics

    Some teaching strategies

  • 7/30/2019 Kirk Allen Defense

    31/89

    Chapter 8

    Part B: Confidence on the SCI

    Original?

    The reviewed studies are generally very specific andin-depth on a certain topic

    Or, they are very general as to why students have

    difficulties (e.g., attitudes)

    Nothing which provides a broad comparisonidentifying conceptual difficulties across statistics

    So use the SCI to do this.

  • 7/30/2019 Kirk Allen Defense

    32/89

    Chapter 8

    Method After students answer each question for theonline SCI, the following is presented to them.

  • 7/30/2019 Kirk Allen Defense

    33/89

    Results big picture

  • 7/30/2019 Kirk Allen Defense

    34/89

    Results sample item

    Rank 10th in correct (low)

    Rank 25th in confidence (high)

    Students are over-confident

    Which would be more likely to have 70% boys born on a given day: A small rural hospitalor a large urban hospital?

    a) Ruralb) Urbanc) Equally likelyd) Both are extremely unlikely

  • 7/30/2019 Kirk Allen Defense

    35/89

    Results the graphs

  • 7/30/2019 Kirk Allen Defense

    36/89

    Results comparison

    Kahneman & Tversky studied a very similar

    problem as part of the representativeness

    misconception of probability

    Subjects do not appreciate that large samples

    are more likely to be representative of the

    population

    20% correct, 56% equally likely Smaller N of subjects, also inexperienced

    SCI online: 37% correct, 45% equally likely

  • 7/30/2019 Kirk Allen Defense

    37/89

  • 7/30/2019 Kirk Allen Defense

    38/89

    Reliability

    Common measure Cronbachs alpha is an

    under-estimate of reliability for a multi-

    dimensional test

    Other measures account for this

    Theta based on largest eigenvalue from a

    principal component analysis

    Omega based communalities from a factoranalysis, thus depends on the number of

    factors

  • 7/30/2019 Kirk Allen Defense

    39/89

    Results indicate multi-dimensionality But is it meaningful?

    0.7980.8081

    0.8446

    0.8907

    omega

    alpha = 0.7650

    theta = 0.8123

    0.75

    0.80

    0.85

    0.90

    0.95

    1.00

    1 38

    E l F A l i

  • 7/30/2019 Kirk Allen Defense

    40/89

    Exploratory Factor Analysis

    (EFA) Many decisions to be made

    Extraction method

    Number of factors

    Factor loadings

    Rotation method

    Simple structure Each variable ideally loads along exactly one factor

    Minimize number of variables per factor Not the best paradigm

    Other concept inventories have done it

    Curiosity (I wonder what all those options in SPSS

    mean.)

  • 7/30/2019 Kirk Allen Defense

    41/89

    Decisions

    Extraction method

    Principal components chosen

    Not ideal maximizes extracted variance

    Does not optimize the prediction of the overallcorrelation structure

    Quick comparison of PC vs. ML

    First-factor loadings from a four-factor solution had

    a mean absolute-difference of 0.030 (small)

  • 7/30/2019 Kirk Allen Defense

    42/89

    Decisions

    Number of factors

    Eigenvalues > 1

    Fifteen factors

    Scree plot (next slide) One Four Nine ?

    Parallel analysis

    Compare eigenvalues to random data One or four

  • 7/30/2019 Kirk Allen Defense

    43/89

    0

    1

    2

    3

    4

    5

    1 38

  • 7/30/2019 Kirk Allen Defense

    44/89

    Decisions

    Assigning items to factors

    How large does the factor loading need to be

    for the variable to be assigned to a given

    factor? Investigated 0.1 to 0.5

  • 7/30/2019 Kirk Allen Defense

    45/89

    0

    5

    10

    15

    20

    25

    30

    35

    0.1 0.2 0.3 0.4 0.5

  • 7/30/2019 Kirk Allen Defense

    46/89

    Decisions

    Rotation

    Orthogonal

    Next slide: five-factor solution

    Oblique Involves extra parameters depending on method

    Following slide: promax rotation, with parameter

    Kappa

  • 7/30/2019 Kirk Allen Defense

    47/89

    0

    5

    10

    15

    20

    25

    30

    35

    Unrotated Equamax Quartimax Varimax

  • 7/30/2019 Kirk Allen Defense

    48/89

    0

    5

    10

    15

    20

    25

    30

    35

    2 3 4 5 6 8

  • 7/30/2019 Kirk Allen Defense

    49/89

    Decisions

    Number of factors

    Four (unrotated) and five (rotated) best approximate

    simple structure

    One-dimensional structure is mostly likely, based onscree plot and parallel analysis

    Factor loadings

    Values around 0.32 best; use 0.30 for simplicity

    Rotation Unrotated: Varimax

    Rotated: Promax with Kappa =3

  • 7/30/2019 Kirk Allen Defense

    50/89

    Conclusions

    Items generally do not group in a

    meaningful way

    But, some pairs of highly similar items

    grouped along the same factors

    What now?

    C fi t F t A l i

  • 7/30/2019 Kirk Allen Defense

    51/89

    Confirmatory Factor Analysis

    (CFA) Presumes the analyst has a pre-conceived

    notion of the underlying structure

    Its probably not wise to write a test to assess

    a domain that you dont have a map of.

    Decisions are made a priori, with model

    comparisons more formal

  • 7/30/2019 Kirk Allen Defense

    52/89

    Models

    1. Uni-dimensional

    2. Capture finer clustering of similar items

    3. Prior work concluded general factor plus

    four sub-topics

  • 7/30/2019 Kirk Allen Defense

    53/89

    Statistics

    (G)

    Q1 Q2 Q3 Q38

    e1 e3 e38e2

    w1

    w2 w3 w38

  • 7/30/2019 Kirk Allen Defense

    54/89

    Statistics

    (G)

    Q1 Q2 Q36 Q38

    w1 w38

    Teststatistics

    s2 s36

    f1

    e1 e36 e38e2

  • 7/30/2019 Kirk Allen Defense

    55/89

    Model Fit

    Overall fit (chi-square)

    Ho: model fits

    Function of sample size

    Nearly always reject the null in practice

    Fit indices

    Alternate way to assess fit

    Too many to name

  • 7/30/2019 Kirk Allen Defense

    56/89

    Results

    Chi-sq d.f. (p) GFI PGFI

    (1) 785 665 0.0009 0.8805 0.8329

    (2) 771 659 0.0017 0.8828 0.8275

    (3) 682 617 0.0355 0.8952 0.7857

  • 7/30/2019 Kirk Allen Defense

    57/89

    Conclusions

    One-factor (1) is most parsimonious

    (2) not appreciably worse

    Model is too sparse based on current SCI

    (3) provides best overall fit, but withnoticeable loss of parsimony

    Different data

    Different methods Im not throwing it out still there for those so

    inclined

  • 7/30/2019 Kirk Allen Defense

    58/89

    Problems

    Used regular correlation instead of

    tetrachoric

    Normality violated

    Sample size too small

    Literature indicates these problems will

    affect the absolute magnitude of estimates

    but not the relative magnitudes

    So its ok for what I did.

  • 7/30/2019 Kirk Allen Defense

    59/89

    Proposal

    Statistics

    Test

    Statistics

    Confidence

    Intervals

    pvalue

    Standard

    Deviation

    Correlation

  • 7/30/2019 Kirk Allen Defense

    60/89

    Proposal

    Could resemble original four proposed

    topic areas

    Youre gonna need a bigger boat.

  • 7/30/2019 Kirk Allen Defense

    61/89

    Reliability Revisited

    Based on the preferred uni-dimensional

    model, shortening the current SCI seems

    reasonable

    Use objective criteria

    Discriminatory index

    Alpha-if-deleted

    Communalities

    Strong correspondence between metrics

  • 7/30/2019 Kirk Allen Defense

    62/89

    Reliability Revisited

    By alpha-if-deleted, 23 items is optimallength

    Selected 25 as my preferred length due to

    correspondence between metrics andbecause its a nice round number

    Cross-validation indicates a shorter SCImaintains the overall reliability

    Full: 0.7650

    Cut: 0.7655 (simulated based on 23-item SCI)

  • 7/30/2019 Kirk Allen Defense

    63/89

    Chapter 10

    Re-assess Content Validity

    Interviews

    Faculty survey

  • 7/30/2019 Kirk Allen Defense

    64/89

    Interviews

    Prefer interviews over focus groups

    because SCI does not involve group

    decision-making

    Informal approach

    IE grad students

    Experienced statistics students

    Not hand-cuffed by pre/post timetable

  • 7/30/2019 Kirk Allen Defense

    65/89

    Interviews

    Sample item (retained) Text is ok, as opposed to symbols

    Un-anticipated approach

    B is the more conservative test Incorrect reasoning

    D is what is good for the company

    A bottling company believes a machine is under-filling 20-ounce bottles. What will bethe alternate hypothesis to test this belief?

    a) On average, the bottles are being filled to 20 ounces.

    b) On average, the bottles are not being filled to 20 ounces.

    c) On average, the bottles are being filled with more than 20 ounces.

    d) On average, the bottles are being filled with less than 20 ounces.

  • 7/30/2019 Kirk Allen Defense

    66/89

    A coin of unknown origin is flipped twelve times in a row, each time landing with heads

    up. What is the most likely outcome if the coin is flipped a thirteenth time?

    a) Tails, because even though for each flip heads and tails are equally likely, since

    there have been twelve heads, tails is slightly more likely

    b) Heads, because this coin has a pattern of landing heads upc) Tails, because in any sequence of tosses, there should be about the same number

    of heads and tails

    d) Heads and tails are equally likely

    Interviews

  • 7/30/2019 Kirk Allen Defense

    67/89

    Interviews

    Deleted item

    Context of coins seems ingrained 50/50

    always; fixated

    But probability has to be different

    Still answered D though!

    Consideration of control!

    But erred for gamblers fallacy

    Recommend new context for this item

  • 7/30/2019 Kirk Allen Defense

    68/89

    Faculty Survey

    Rated the importance of 87 statistics

    topics on 1 to 4 scale

    24 participants

    IE faculty listservand emailed SCI contacts

    Not at OU

    Compared with previous survey conducted

    at OU

  • 7/30/2019 Kirk Allen Defense

    69/89

    Results

    Generally strong correspondence between

    old and new surveys

    Correlation 0.69 ranks, 0.67 numbers

    Scales differ New median 2.95, old median 2.61

    Consider two surveys in tandem, using ranks

    Based on 25 retained items

  • 7/30/2019 Kirk Allen Defense

    70/89

    Results

    16 topics ranked in Top 25 on both

    surveys

    14 of these are covered

    Very good!

    9 topics in Top 25 of new but not old

    Only 2 topics covered

    Not so good

    Exactly the same for old (2 of 9)

  • 7/30/2019 Kirk Allen Defense

    71/89

    Conclusions

    Pretty good coverage

    Basing results on full 38 items is even better

    Could help to survey non-engineers to

    allow comparisons

    IE is the most statistically-inclined engineer,

    so thats the best audience if you are limited

  • 7/30/2019 Kirk Allen Defense

    72/89

    Concept Inventories

    Remember where we came from!

    Whats the reference point?

    How does the SCI compare to other

    concept inventories?

    Especially others in engineering

  • 7/30/2019 Kirk Allen Defense

    73/89

    Process

  • 7/30/2019 Kirk Allen Defense

    74/89

    Process

    From the author of a physics test (not FCI)

    Generally its pretty good but obviously a

    simplification

    Many activities occur simultaneously

    Also I think you need to acknowledge that you

    enhance your validity, reliability, etc as you

    feedback

  • 7/30/2019 Kirk Allen Defense

    75/89

    Sample Size

    Compare SCI to other engineering

    concept inventories

    Uncertainty: unpublished results

    Statics is way ahead

    Speaks of generalizability of results

    We are in good shape

    3000

  • 7/30/2019 Kirk Allen Defense

    76/89

    Statistics

    Statics

    0

    500

    1000

    1500

    2000

    2500

    1 2 3 4

  • 7/30/2019 Kirk Allen Defense

    77/89

    Scores and Reliability

    Scores are low, but this is common inearly-phase inventories

    Higher scores typically found when teaching

    methods are assessed Reliability in a similar range to other

    inventories

    Between 0.70 and 0.80

    Statistics seems more difficult to assess inone test (cf. factor analysis)

  • 7/30/2019 Kirk Allen Defense

    78/89

    49.245.5

    49.7 49.6 50.546.3 45.7

    52.3

    0.74 0.75 0.720.67 0.67 0.69

    0.70

    0.77

    0

    20

    40

    60

    80

    100

    Su 2003

    n = 103

    Fa 2003

    n = 280

    Sp 2004

    n = 94

    Su 2004

    n = 16

    Fa 2004

    n = 163

    Sp 2005

    n = 260

    Su 2005

    n = 60

    Fa 2005

    n = 429

    0.00

    0.20

    0.40

    0.60

    0.80

    1.00

    alpha

    post-test

    CI S i

  • 7/30/2019 Kirk Allen Defense

    79/89

    CI Suggestions

    Develop a sequence for relatedinventories

    FCI / MBT Statics Dynamics / Strength

    of Materials Could Statistics fit with others?? Not currently.

    Discuss who uses concept inventories

    Colleagues? Friends? Outsiders?

    Speaks of instructor and thus studentmotivation

    A l i T h i

  • 7/30/2019 Kirk Allen Defense

    80/89

    Analysis Techniques

    Simple: discriminatory index, percentcorrect, correlations

    Got it!

    Advanced: factor analysis, SEM, IRT

    Got it!

    Doesnt appear that anyone else has

    everything, although others have parts

    Oth R lt

  • 7/30/2019 Kirk Allen Defense

    81/89

    Other Results

    Andreas IRT (dissertation in Mathematics)

    Analyze response probability by ability level,

    for each response

    Could this be integrated with confidence?? Pedagogical implications

  • 7/30/2019 Kirk Allen Defense

    82/89

  • 7/30/2019 Kirk Allen Defense

    83/89

    C t ib ti

  • 7/30/2019 Kirk Allen Defense

    84/89

    Contributions

    The SCI is an original creation

    Part of the larger concept inventory scheme

    Draws on and allows comparisons to literature

    on statistics and probability reasoning

    Analysis and synthesis of the creation

    process itself

    Insights into test reliability and validity

    Publications

  • 7/30/2019 Kirk Allen Defense

    85/89

    Publications General development (Book One)

    FIE 2003, ASEE 2004 conferences Reliability (Chapter 6) Under revision for JEE

    Online test (Chapter 7) Will be acknowledged as a data source in all future publications

    Confidence (Chapter 8) FIE 2006 (draft paper accepted pending revision)

    Theres much much more to pull from here, possiblyincorporating interviews

    Factor analysis and interviews / survey

    Certainly offer proposals for future research. Not sure if publishable at present.

    Concept Inventories (Chapter 11) JEE?

    Summary paper (Chapter 12)

    JEE?

    P

  • 7/30/2019 Kirk Allen Defense

    86/89

    Process

    The structure of the dissertation reflects theprevailing methods and conclusions used in

    constructing, analyzing, and adapting the SCI.

    There is meaning in this structure. We couldnt have created the SCI without some

    background in test theory, cognitive research, etc.

    But very important the SCI also allowed us an

    avenue for further exploration. Chicken? or Egg?

    The methods evolved along with the instrument.

    C iti i

  • 7/30/2019 Kirk Allen Defense

    87/89

    Criticisms

    Lacks focus I wanted to do EVERYTHING!!

    Final chapters are open-ended, more likeproposals than finished products Phase II NSF grant ?

    Plus thats life.

    No formal hypothesis Question: Can you design a test to assess statistics

    concepts?

    Hypothesis: Yes, I can!

    Conclusion: Heres how I did it.

    F th b i i

  • 7/30/2019 Kirk Allen Defense

    88/89

    From the beginning.

    Increase input and participation acrossdepartments and universities

    Improving!

    More lit review (Kahneman & Tversky, Pollatsek,Piaget, etc.)

    Got it now!

    Participation hindered by not teaching Intro Stats

    Ditto!

    But: Does this introduce bias?

    Th F t

  • 7/30/2019 Kirk Allen Defense

    89/89

    The Future

    What is being taught? And how?

    Instructor surveys (easy)

    Classroom observation (difficult)

    Integrate confidence ratings with IRT

    Interviews / focus groups more often

    New items How long has it been??