Measurement issues Jean Bourbeau, MD Respiratory Epidemiology and Clinical Research Unit McGill...
-
Upload
lenard-owens -
Category
Documents
-
view
216 -
download
1
Transcript of Measurement issues Jean Bourbeau, MD Respiratory Epidemiology and Clinical Research Unit McGill...
Measurement issues
Jean Bourbeau, MD
Respiratory Epidemiology and Clinical Research Unit
McGill University
Clinical Epidemiology (679)
June 19, 2006
Objectives Define categorical and continuous variables
Define 2 sources of variation: biological and measurement error (random and bias)
Describe the classification measures and their focus: functional, descriptive and methodological
Define and discuss the advantages and disadvantages of objective and subjective health measures
Define the psychometric properties of measurement instruments: reliability, validity, responsiveness
Discuss key questions and concerns about each of the psychometric properties of an instrument: reliability, validity and responsiveness
Define and discuss minimal clinically important difference
Reading
Fletcher, Chapter 2
Outline
of Measurement issues
1. Measurements
2. Sources of variation
3. Classification
4. Health measurements
5. Measurement properties
Outline
of Measurement issues
1. Measurements
2. Sources of variation
3. Classification
4. Health measurements
5. Measurement properties
Examples
In a 60-year-old patient after right hemicolectomy, the DUKE stage is a widely accepted, indispensable descriptive tool for planning further treatment.
Adjuvant post operative chemotherapy is currently the recommended treatment for resected Duke C colon cancer.
Examples
In a 20-year-old woman with right lower quadrant pain and vomiting, the likely diagnosis is an appendicitis or a gynecological infection.
After excluding pelvic inflammatory disease, an experienced surgeon or gastroenterologist will diagnose appendicitis based on history, clinical findings and ultrasound.
Measurement
We need to assign numbers to certain clinical phenomena to make them manageable and “scientific”
Measurement
Measure:
•A scale or test is an instrument to measure a clinical phenomenon; a score is a value on the scale in a given patient
Measurement
The attributes or events that are measured in
a research study are called « variables »
Variables are measured according to 2 types:
•Categorical
•Continuous
Categorical variables
•Also called discrete variable
•Dichotomous
or Polychotomous (multilevel):
- Nominal
- Ordinal
Dichotomous categorical variables
Examples:
•Vital status (alive vs dead)
•Yes or no (response to a question)
•Sex (male vs female)
Polychotomous categorical variables
Nominal:
•Named categories that bear no ordered relationship to one another
Example:
•Hair colour, race, or country of origin
Nominal scale
Hierarchy of mathematical adequacy:
•Lowest level (not a measurement but a classification)
•Use numbers as a labels (such as male or female)
•No inference can be drawn from the relative size of the numbers used
Ordinal:
•Named categories that bear an ordered relationship to one another
•The intervals need not be equal
Example:
•Ordinal pain scale that include « pain severity »: none, mild, moderate, and severe
•Deep tendon reflex: absent, 1+,2+, 3+, or 4+
Polychotomous categorical variables
Ordinal scale
Hierarchy of mathematical adequacy:
•Numbers are again used as a labels for response categories
•Numbers reflect the increasing order of the characteristics being measured (mild, moderate,severe)
•The numeric values, and the differences between them, hold no intrinsic meaning
Continuous variables
•Also called dimensional, quantitative or interval variables
•Expressed as integers, fractions, or decimals in which equal distances exist between successive intervals
•Examples: age, blood pressure, temperature
Interval scale
Hierarchy of mathematical adequacy:
•Numbers are assigned to the response categories in such a way that a unit change represents a constant change across the range of the scale (temperature in degrees Celsius)
Hierarchy of mathematical adequacy:
•With a ratio scale, it becomes possible to state how many times greater one score is than another
•This improves on the interval scale by including a zero point
Ratio scale
Binary
Rank order (small to large)
Continuous (0 to ∞)
Ratios
Scales
Outline
of Measurement issues
1. Measurements
2. Sources of variation
3. Classification
4. Health measurements
5. Measurement properties
Sources of variation
2 sources of variation:
•Biological variation
•Measurement error
Sources:
•Dynamic nature of most biologic entities (differences in age, sex, race, or disease status)
•Temporal variation
(sometimes predictable, such as the diurnal cycle of plasma cortisol)
Biological variation
2 different types:
•Random (chance error)
•Bias (systematic error)
Measurement error
Measurement error
Can arise from:
•The method (measuring instrument )
•Observer (the measurer)
We can talk about the variability between methods of making the measurement or between the observers
Repeated measurements by the same method or observer
• Intramethod or Intraobserver
Between two or more methods or observers
• Intermethod or Interobserver
Measurement error
Individual•Makes no difference whether the error is systematic or random
Group•Variability in the absence of bias should not change the average group value
•However, it can have deleterious consequences when one is seeking associations or correlations between 2 measures (analytic bias)
Consequences of erroneous measurement
Regression toward the mean•Individual measurement is subject to both biologic variation and measurement error
•An extremely high or low value obtained in an individual from a group is more likely to be an error than is an intermediate value
•Tendency toward a less extreme value is greater than the tendency for an intermediate value to become more extreme
Outline
of Measurement issues
1. Measurements
2. Sources of variation
3. Classification
4. Health measurements
5. Measurement properties
Classifications of measures
Functional classifications focus on:
• Purpose of application of the measures
Descriptive classifications focus on:
• Their scope
Methodological classifications focus on:
• Technical aspects
Functional classification
•Measures have discriminative, evaluative or predictive properties
•Choice of measure depends on the purpose(s) for which it will be used
Functional classification
Discriminative instrument:
Can discriminate between people with different levels of a particular attribute or disease
• For example:
•NYHA scale
•MRC dyspnea scale
MRC Dyspnea Scale
Grade 1 Breathless with strenuous exercise
Grade 2 Short of breath when hurrying on the
level or walking up a slight hill
Grade 3 Walks slower than people of the same
age on the level or stops for breath while
walking at own pace on the level
Grade 4 Stops for breath after walking 100 yards
Grade 5 Too breathless to leave the house or
breathless when dressing
none
severe
Functional classification
Predictive instrument:
•Can predict the probability of a clinical diagnosis (diagnostic test) or the likelihood of a future event (prognostic test)
...according to staging as defined by the ATS Guidelines (% predicted FEV1)
...according to the level of dyspnea as evaluated by the MRC Dyspnea Scale
Dyspnea MRC scaleFEV1
Nishimura K, et al. Chest 2002; 121: 1434-1440.
5-year survival COPD
Evaluative instrument:
Can measure change over time in the same person
•For example:
•Dyspnea subscale of the Chronic Respiratory Questionnaire (CRQ) (COPD disease-specific quality of life questionnaire)
Functional classification
Descriptive classification
•Large number of possible categories
•Can categorize instruments by:
•Content: domains of interest (dyspnea, fatigue, emotion)
•Generic or disease-specific
used in any population cross-condition comparison co-morbid conditions and
effects to treatment covered do not focus on HRQL/ COPD
irrelevant items insensitive to small changes
focus on relevant aspects of HRQL
greater sensitivity for disease changes
increased responsiveness no comparisons
General
QuestionnairesDisease-Specific
COPD
Methodological classification
•Large number of possible categories
•Can categorize by:
• Interviewer versus self-administered
•Objective versus subjective
Outline
of Measurement issues
1. Measurements
2. Sources of variation
3. Classification
4. Health measurements
5. Measurement properties
Health measurements
Measurements may be based on:
•Laboratory or diagnostic tests (objective)
• Indicators in which the patient or the clinician makes a judgement (subjective)
Health measurements
Unfortunately subjective is also used in other ways:
•To indicate if the variable is observable or not
Examples:
•Objective indicator such as « The ability to climb stairs »
•Subjective indicators such as « pain or feelings »
Objective vs Subjective
Objective:• More often continuous (lab data)
• Few categorical (vital status, sex and race)
Subjective:• Greater potential, for bias or variability on the part of
the observer
• Many variables that are most important in caring for
patients are « soft » and subjective
• For example: pain, mood, dyspnea, ability to work, HRQL
The example of CABG
Why is quality of life important in studies
of CABG patients?
•Survival with surgery > medical treatment for patients with left main and triple vessels
•Survival similar in patients with less severe disease
CASS NEJM 1984; European cooperative study Lancet 1982.
As Feinstein has emphasized
The tendency of clinical investigators to focus on “objective” rather than
“subjective” measurements can result in research that is both dehumanizing
and irrelevant
Subjective vs Objective measurement
Objective vs Subjective
Data traditionally considered objective or “hard” can be seen to have feet of softer clay
Example:
•X-ray or cytopathologic diagnoses have been shown to be subject to considerable intra- and interobserver variability
Subjective health measurements
May be grouped into 3 main categories:
• General feelings of well-being
• Symptoms of illness
• Adequacy of a person’s functioning
Subjective health measurements
Advantages:
• Amplify the data obtainable from morbidity and mortality statistics
• Give insights into matters of human concern such as pain suffering or depression
• Offer a systematic way to record the « voice of the patient »
• Do not require expensive or invasive procedures
Subjective health measurements
Disadvantages:
•Contrast sharply with the inherent reliability of mortality rates
•Seem more susceptible to bias
•Applying these measures to an entire population more difficult or impossible
Subjective health measurements
The use of rating methods suitable for statistical analysis permit subjective health measurements to rival the quantitative strengths of the traditional “objective” indicators
Health measurements
Scientific basis:
•Subjective judgements as a valid approach to measurement derive from the field of psychophysics;
•Psychophysical principles were later incorporated into psychometrics from which most of the techniques used to develop subjective measurements of health have been derived
Outline
of Measurement issues
1. Measurements
2. Source of variation
3. Classification
4. Health measurements
5. Measurement properties
Psychometric properties
Definition:
•Psychometrics is the science of using standardized tests or scales to measure attributes of a person or object
Numerical estimates of health
Many scaling methods exist for:
•Translating « indicators » into numerical estimates of severity
•When it is done, they may be combined into an overall score, termed « health index »
Criteria for a scoring system:
•Reliability
•Validity
•Responsiveness
•Minimal clinically important difference (MCID)
Psychometric properties
Definition:
•The extent to which the same results are obtained when the measurement is repeated
It may reflect either (temporal) variation or random measurement error
Reliability
Reliability
Key Questions:
•Internal consistency
•Test-retest reliability (reproducibility)
Key Concern:
•Error
(error attenuates relationships between variables, and makes it more difficult to detect treatment effects)
Validity
Definition:
•The extent to which the measurement corresponds to the « true » value (some accepted « gold standard »), or behaves as expected
Validity depends on minimizing measurement error caused by bias
Type of measurement validity
Content validity
Construct validity (convergent, discriminant)
Criterion validity (predictive, concurrent)
Cross-cultural validity
“Situational” validity
Content validity
Definition:
•The extent to which the items sampled for inclusion in the instrument adequately represent the domain of content (particular domain area) addressed by the instrument
Content validityKey Questions:
•Theoretical foundation of the instrument
• Instrument development: primary sources of information, sources of items and scaling structure selection
•Rules applied for content validation: patient and/or clinician validation; scientific review
• Instrument is appropriate for the study under consideration
Content validity
Key concern:
•Without validity, an instrument has no meaning
Definition:
•The extent to which the instrument measures an abstract concept (construct) or attribute; evaluated by comparison with instruments measuring related constructs
•Convergent (come together, same concept) or discriminant with other instruments (truly measures something different from other instruments)
Construct validity
Definition:
•Extent to which the instrument relates to an external criterion (criterion of practical value)
•Concurrent (able to correlate with a present criterion) or predictive (able to correlate with a future criterion)
Criterion validity
Construct validity
It is important to understand that a direct test of the validity of an abstract
concept such as impaired health due to disease is not possible
Construct validityKey Questions:
•Factor structure of the measure consistent with expectations
•Scores from the instrument correlate with those of other instruments (measuring the same or related constructs)
•Score from the instrument independent of scores from instruments measuring dissimilar constructs
•Differentiate groups known to differ on the attribute being measured, e.g. on HRQL
Testing construct validity
•The most widely method used is the multitrait-multimethod matrix
•It involves testing a series of hypotheses concerning relationships between the new instrument and a range of reference measures of disease activity
Construct validity
Key concern:
•Without validity, an instrument has no meaning
Definition:
•The extent to which an instrument developed and tested in one cultural group is appropriate for, and behaves similarly in, another
Cross-cultural validity
Cross-cultural validityKey Questions:
•Items appropriate for the culture under consideration
•Instrument translated culturally and linguistically
•Evidence of reliability and validity
Definition:
•The extent to which an instrument is appropriate for use in any given situation
“Situational” validity
“Situational” validityKey Questions:
•Instrument should measure an appropriate outcome for the trial
•Instrument should be valid for the specific purpose of the trial
•Sufficiently reliable and responsive for this purpose
•Sample size sufficient to detect change in the outcome measure of interest
“Situational” validityKey Issues:
•Validity can be situation specific; an instrument valid for one situation is not necessarily valid for another
•Failure to detect treatment effects may be a function of study design, rather than a limitation of the instrument
Definition:
The extent to which scores change with a given change in the condition or disease state
Key Questions:• Instrument has been evaluated for responsiveness
• Effects sizes have been associated with the instrument in well designed trials.
Key concerns:• The ability to track changes
Responsiveness
MCID
Definition:
The smallest difference that clinicians and patients would care about
Key Questions:• Has the MCID been established?
• What was the method used?
Key concerns:• The ability to detect true treatment effects
Benefits of Pulmonary Rehabilitation
Functional exercise capacity6-MWD (N=444)
Health statusCRQ dyspnea (N=519)
Lacasse Y, et al. Cochrane Database Syst Rev 2002; 3:CD003793.
Key messagesSome simple criteria:•The system must address a well defined clinical phenomenon
•The scale has to have a clearly defined ranking in a hierarchical order (reasonable clinical or mathematical criteria)
•The different stages or categories have to be mutually exclusive
•The scale has to be adapted to the area of measurement where it will be applied
•Creating complex or composite scores such as quality of life requires one to address issues concerning the inner structure of a score
Key messagesQuote from McDowell and Newell:•Ultimately the selection of a measurement contains an element of art and perhaps even luck; it is often prudent to apply more than one measurement whenever possible.
•This has the advantage of reinforcing the conclusions of the study when the results from ostensibly similar methods are in agreement, and it also serves to increase our general understanding of the comparability of the measurements we use.