Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs.
-
Upload
gerard-floyd -
Category
Documents
-
view
216 -
download
1
Transcript of Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs.
Designing an Evaluation of Designing an Evaluation of the Effectiveness of NIH’s the Effectiveness of NIH’s
Extramural Loan Repayment Extramural Loan Repayment ProgramsPrograms
2
Goals of Meeting
Review designReview design --research questions and conceptual framework--research questions and conceptual framework --choice of comparison group--choice of comparison group --data sources and outcome measures--data sources and outcome measures --methods --methods --possible options for timing and sample selection--possible options for timing and sample selection
Respond to commentsRespond to comments
Discuss proposed options and possible Discuss proposed options and possible modifications to optionsmodifications to options
3
Goals of LRPs and of Evaluation
To increase number of individuals conducting To increase number of individuals conducting research in certain fields, NIH implemented 5 research in certain fields, NIH implemented 5 extramural loan repayment programs (LRPs):extramural loan repayment programs (LRPs):
Clinical (began in 2002)Clinical (began in 2002) Clinical for those from disadvantaged backgrounds (2001)Clinical for those from disadvantaged backgrounds (2001) Pediatric (2003)Pediatric (2003) Health disparities (2001)Health disparities (2001) Contraception and infertility (1997)Contraception and infertility (1997)
Evaluation objective: assess whether programs are Evaluation objective: assess whether programs are achieving their goals of recruiting and retaining achieving their goals of recruiting and retaining researchers in these fieldsresearchers in these fields
4
Evaluation Research Questions
Do LRPs have a “recruitment effect” -- increase Do LRPs have a “recruitment effect” -- increase number of individuals who begin research careers in number of individuals who begin research careers in the designated LRP field?the designated LRP field?
Do LRPs have a “retention effect” -- increase length Do LRPs have a “retention effect” -- increase length
of time individuals conduct research in LRP field, or of time individuals conduct research in LRP field, or in in anyany biomedical field? biomedical field?
Do LRPs have a “productivity effect” -- make Do LRPs have a “productivity effect” -- make awardees more successful than they would have awardees more successful than they would have been without the program?been without the program?
5
Conceptual framework for how LRPs might affect outcomes
Extramural LRPs might affect:Extramural LRPs might affect:
Recruitment Recruitment into research field, if individuals know into research field, if individuals know about, and are motivated by, LRPs prior to choosing about, and are motivated by, LRPs prior to choosing to pursue research careerto pursue research career
Research retentionResearch retention in LRP field (or in any field) by in LRP field (or in any field) by relieving financial pressures that could otherwise relieving financial pressures that could otherwise cause individuals to leave research for higher-paying cause individuals to leave research for higher-paying positionspositions
Research productivityResearch productivity by enabling individuals to by enabling individuals to devote more time and focus to researchdevote more time and focus to research
6
Choosing a comparison group
To determine what would have happened to To determine what would have happened to extramural LRP awardees absent the extramural LRP awardees absent the program, we need a comparison group.program, we need a comparison group.
Should comparison group be “external” Should comparison group be “external” (outside the applicant pool) or “internal” (outside the applicant pool) or “internal” (from the applicant pool)?(from the applicant pool)?
7
Why external comparison group not feasible
Comparison group would need to be broadly defined because LRP Comparison group would need to be broadly defined because LRP applicants come from such a wide variety of backgrounds.applicants come from such a wide variety of backgrounds.
Recruitment might be measured by comparing all doctoral degree Recruitment might be measured by comparing all doctoral degree recipients who were barely eligible to those who were barely ineligible recipients who were barely eligible to those who were barely ineligible according to debt-to-salary ratio. But: according to debt-to-salary ratio. But: --For MDs, sample size needed would be enormous since the portion --For MDs, sample size needed would be enormous since the portion of MDs conducting research in particular field is so small.of MDs conducting research in particular field is so small.
--For PhDs, sample sizes in available data sources are not large --For PhDs, sample sizes in available data sources are not large enough to detect even maximum possible impact of the extramural enough to detect even maximum possible impact of the extramural LRPs.LRPs.
Retention might be measured with external comparison group, but Retention might be measured with external comparison group, but matching diverse backgrounds and work experiences of LRP matching diverse backgrounds and work experiences of LRP participants would be difficult.participants would be difficult.
8
Attractive Features/Possible Concerns of Internal Comparison Group
Attractive FeaturesAttractive Features
All applicants were interested in LRP and awardees / All applicants were interested in LRP and awardees / non-awardees have similar characteristics.non-awardees have similar characteristics.
Administrative data available for full sample. Administrative data available for full sample.
Possible ConcernsPossible Concerns
Selection Bias: Can we control for likelihood that funded Selection Bias: Can we control for likelihood that funded applicants are more promising researchers than non-applicants are more promising researchers than non-funded applicants?funded applicants?
Will sample sizes be large enough to detect program Will sample sizes be large enough to detect program effects?effects?
Could recruitment be measured, since all applicants Could recruitment be measured, since all applicants must have positions in field to be eligible?must have positions in field to be eligible?
9
Overcoming Selection Bias
If scoring process is known and measured, we can use statistical If scoring process is known and measured, we can use statistical models to obtain unbiased program effects. Regression models to obtain unbiased program effects. Regression discontinuity design can be used if score cut-off point or range is discontinuity design can be used if score cut-off point or range is used to make funding decisions.used to make funding decisions.
LRP scoring process is suitable for regression discontinuity design LRP scoring process is suitable for regression discontinuity design because:because:
Applicants are scored on the basis of their research potential Applicants are scored on the basis of their research potential according to standardized criteria.according to standardized criteria.
ICs seem to fund all those above a funding cut-off point or range. ICs seem to fund all those above a funding cut-off point or range. (Sometimes ICs go strictly by score in determining who to fund; (Sometimes ICs go strictly by score in determining who to fund; other times, ICs look at all scores close to “payline” and may other times, ICs look at all scores close to “payline” and may choose applicants with lower scores who are doing research in choose applicants with lower scores who are doing research in areas of particular interest.)areas of particular interest.)
10
Hypothetical Effect of Extramural LRP on Length of Time in Research Career
100 200 300 400 500
Application Scores
Mo
nth
s P
ers
iste
d in
Re
sea
rch
Po
st-L
RP
A
pp
lica
tion
Funding Cutoff
Regression Line
Program Effect
5 50
40
30
20
10
11
What size program effects could be detected with available sample sizes?
Sample sizes large enough to detect whether Sample sizes large enough to detect whether 5 LRPs collectively had effect of 10 5 LRPs collectively had effect of 10 percentage pointspercentage points
An effect of 15 percentage points could be An effect of 15 percentage points could be detected for certain subgroupsdetected for certain subgroups
Effects of 10 to 20 percentage points could Effects of 10 to 20 percentage points could be detected for the larger LRPs (and would be detected for the larger LRPs (and would be able to report outcomes for all applicants be able to report outcomes for all applicants in each LRP)in each LRP)
12
Measuring Recruitment Effects
Recruitment effect is difficult to measure Recruitment effect is difficult to measure through comparison to non-funded through comparison to non-funded applicants because they must have been applicants because they must have been funded in relevant field before applyingfunded in relevant field before applying
But, retrospective survey could gauge:But, retrospective survey could gauge:
-- whether applicants knew about LRP before -- whether applicants knew about LRP before taking research positiontaking research position
-- extent to which LRP influenced decision-- extent to which LRP influenced decision-- how they gauged chances of receiving award-- how they gauged chances of receiving award
13
Outcome Measures
Ideally, we could measure LRPs’ effect on whether individuals:Ideally, we could measure LRPs’ effect on whether individuals:
Conducted research in LRP field (and persistence)Conducted research in LRP field (and persistence) Conducted research in Conducted research in anyany field (and persistence) field (and persistence) Devoted > 50% of time to research in LRP field or any fieldDevoted > 50% of time to research in LRP field or any field Obtained an NIH R-01 grant Obtained an NIH R-01 grant Were PIs on NIH grant or any grantWere PIs on NIH grant or any grant Had NIH research funding or any research fundingHad NIH research funding or any research funding Applied for NIH funding Applied for NIH funding Had tenured academic positionHad tenured academic position Conducted research in nonprofit or government settingConducted research in nonprofit or government setting Were peer reviewers for NIH Were peer reviewers for NIH Were peer reviewers for journalsWere peer reviewers for journals Had publicationsHad publications Had their work cited Had their work cited
14
Data Sources
Applicant data from OLRSApplicant data from OLRS
Publications databases such as PubMedPublications databases such as PubMed
Funding databases, such as NIH’s IMPAC-IIFunding databases, such as NIH’s IMPAC-II
Proposed survey of past applicantsProposed survey of past applicants
15
Reasons for Proposed Survey
Critical information (e.g., field of research, non-NIH Critical information (e.g., field of research, non-NIH funding, being part of a research team) not readily funding, being part of a research team) not readily available from secondary sources. available from secondary sources.
Ability to track publications without name-matching.Ability to track publications without name-matching.
Data can be collected sooner (do not have to wait for Data can be collected sooner (do not have to wait for publication time delays).publication time delays).
Response bias can be gauged by comparing program Response bias can be gauged by comparing program effects on certain outcomes (such as whether PI on NIH effects on certain outcomes (such as whether PI on NIH grant) for full sample to sample of survey respondents.grant) for full sample to sample of survey respondents.
16
Methods
Regression discontinuity design will be used Regression discontinuity design will be used to obtain unbiased program effects.to obtain unbiased program effects.
Primary analysis would estimate combined Primary analysis would estimate combined effects for all LRPs pooled together.effects for all LRPs pooled together.
Model would control for differences between Model would control for differences between ICs and LRPs (such as scoring patterns or ICs and LRPs (such as scoring patterns or applicant characteristics).applicant characteristics).
17
Possible Subgroups Each of the larger LRPsEach of the larger LRPs
MDs vs. PhDsMDs vs. PhDs
Those who received NIH funding vs. those Those who received NIH funding vs. those who did notwho did not
Those who had higher vs. lower debt levelsThose who had higher vs. lower debt levels
Those who received their degree recently vs. Those who received their degree recently vs. longer agolonger ago
18
Options for Timing of Data Collection
Need to strike a balance between providing Need to strike a balance between providing information on research careers (which could information on research careers (which could take years) vs. providing timely information take years) vs. providing timely information to policymakers.to policymakers.
Propose measuring early outcomes 4 to 5 Propose measuring early outcomes 4 to 5 years from time of application.years from time of application.
Possibly measure long-term outcomes 7 to 9 Possibly measure long-term outcomes 7 to 9 years after application.years after application.
19
Sample Selection
Propose to include only 2003 and/or 2004 Propose to include only 2003 and/or 2004 cohorts since number of non-funded cohorts since number of non-funded applicants in 2001 and 2002 was so small.applicants in 2001 and 2002 was so small.
Sample size? Sample size? The larger the sample, the more likely we will The larger the sample, the more likely we will
detect program effects if they exist.detect program effects if they exist. Large sample is particularly important for Large sample is particularly important for
measuring effects of LRPs separately or for measuring effects of LRPs separately or for other subgroups.other subgroups.
But, collecting data on large sample will be But, collecting data on large sample will be more costly. more costly.
20
Option 1: Include all individuals from 2003 and 2004 cohorts
Able to detect smallest program impacts Able to detect smallest program impacts (about 9 percentage points for survey (about 9 percentage points for survey respondents)respondents)
Most costly because it has largest sampleMost costly because it has largest sample
Including 2004 pool means that data Including 2004 pool means that data collection, analysis would occur later than in collection, analysis would occur later than in Option 2Option 2
21
Option 2: Include only the 2003 cohort
Less costly than Option 1, involving half the Less costly than Option 1, involving half the samplesample
Data collection occurs a year earlier than Data collection occurs a year earlier than Option 1 or 3Option 1 or 3
Minimum detectable effects largest among 3 Minimum detectable effects largest among 3 options (10 to 13 percentage points) and options (10 to 13 percentage points) and reduces ability to measure subgroup impactsreduces ability to measure subgroup impacts
22
Option 3: Clinical LRP only
Middle of the three options in terms of Middle of the three options in terms of sample size and minimum detectable effects sample size and minimum detectable effects (9 to 11 percentage points)(9 to 11 percentage points)
Would only have results for one LRPWould only have results for one LRP
Data would be collected a year later than for Data would be collected a year later than for Option 2Option 2
23
Recommendations and Issues for Consideration
If OLRS desires separate estimates for large LRPs If OLRS desires separate estimates for large LRPs and for subgroups, implement Option 1and for subgroups, implement Option 1
If subgroups not a priority and/or if timeliness is a If subgroups not a priority and/or if timeliness is a priority, implement Option 2priority, implement Option 2
Option 3 is suitable if OLRS wants to detect Option 3 is suitable if OLRS wants to detect relatively small program impacts but is concerned relatively small program impacts but is concerned about cost of surveying full sample from all LRPsabout cost of surveying full sample from all LRPs
OLRS needs to consider how small an effect it needs OLRS needs to consider how small an effect it needs to be able to detect. (Would 7-percent effect be so to be able to detect. (Would 7-percent effect be so small that the program would not be considered small that the program would not be considered cost-effective?)cost-effective?)