Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs.

Designing an Evaluation of Designing an Evaluation of the Effectiveness of NIH’s the Effectiveness of NIH’s

Extramural Loan Repayment Extramural Loan Repayment ProgramsPrograms

2

Goals of Meeting

Review designReview design --research questions and conceptual framework--research questions and conceptual framework --choice of comparison group--choice of comparison group --data sources and outcome measures--data sources and outcome measures --methods --methods --possible options for timing and sample selection--possible options for timing and sample selection

Respond to commentsRespond to comments

Discuss proposed options and possible Discuss proposed options and possible modifications to optionsmodifications to options

3

Goals of LRPs and of Evaluation

To increase number of individuals conducting To increase number of individuals conducting research in certain fields, NIH implemented 5 research in certain fields, NIH implemented 5 extramural loan repayment programs (LRPs):extramural loan repayment programs (LRPs):

Clinical (began in 2002)Clinical (began in 2002) Clinical for those from disadvantaged backgrounds (2001)Clinical for those from disadvantaged backgrounds (2001) Pediatric (2003)Pediatric (2003) Health disparities (2001)Health disparities (2001) Contraception and infertility (1997)Contraception and infertility (1997)

Evaluation objective: assess whether programs are Evaluation objective: assess whether programs are achieving their goals of recruiting and retaining achieving their goals of recruiting and retaining researchers in these fieldsresearchers in these fields

4

Evaluation Research Questions

Do LRPs have a “recruitment effect” -- increase Do LRPs have a “recruitment effect” -- increase number of individuals who begin research careers in number of individuals who begin research careers in the designated LRP field?the designated LRP field?

Do LRPs have a “retention effect” -- increase length Do LRPs have a “retention effect” -- increase length

of time individuals conduct research in LRP field, or of time individuals conduct research in LRP field, or in in anyany biomedical field? biomedical field?

Do LRPs have a “productivity effect” -- make Do LRPs have a “productivity effect” -- make awardees more successful than they would have awardees more successful than they would have been without the program?been without the program?

5

Conceptual framework for how LRPs might affect outcomes

Extramural LRPs might affect:Extramural LRPs might affect:

Recruitment Recruitment into research field, if individuals know into research field, if individuals know about, and are motivated by, LRPs prior to choosing about, and are motivated by, LRPs prior to choosing to pursue research careerto pursue research career

Research retentionResearch retention in LRP field (or in any field) by in LRP field (or in any field) by relieving financial pressures that could otherwise relieving financial pressures that could otherwise cause individuals to leave research for higher-paying cause individuals to leave research for higher-paying positionspositions

Research productivityResearch productivity by enabling individuals to by enabling individuals to devote more time and focus to researchdevote more time and focus to research

6

Choosing a comparison group

To determine what would have happened to To determine what would have happened to extramural LRP awardees absent the extramural LRP awardees absent the program, we need a comparison group.program, we need a comparison group.

Should comparison group be “external” Should comparison group be “external” (outside the applicant pool) or “internal” (outside the applicant pool) or “internal” (from the applicant pool)?(from the applicant pool)?

7

Why external comparison group not feasible

Comparison group would need to be broadly defined because LRP Comparison group would need to be broadly defined because LRP applicants come from such a wide variety of backgrounds.applicants come from such a wide variety of backgrounds.

Recruitment might be measured by comparing all doctoral degree Recruitment might be measured by comparing all doctoral degree recipients who were barely eligible to those who were barely ineligible recipients who were barely eligible to those who were barely ineligible according to debt-to-salary ratio. But: according to debt-to-salary ratio. But: --For MDs, sample size needed would be enormous since the portion --For MDs, sample size needed would be enormous since the portion of MDs conducting research in particular field is so small.of MDs conducting research in particular field is so small.

--For PhDs, sample sizes in available data sources are not large --For PhDs, sample sizes in available data sources are not large enough to detect even maximum possible impact of the extramural enough to detect even maximum possible impact of the extramural LRPs.LRPs.

Retention might be measured with external comparison group, but Retention might be measured with external comparison group, but matching diverse backgrounds and work experiences of LRP matching diverse backgrounds and work experiences of LRP participants would be difficult.participants would be difficult.

8

Attractive Features/Possible Concerns of Internal Comparison Group

Attractive FeaturesAttractive Features

All applicants were interested in LRP and awardees / All applicants were interested in LRP and awardees / non-awardees have similar characteristics.non-awardees have similar characteristics.

Administrative data available for full sample. Administrative data available for full sample.

Possible ConcernsPossible Concerns

Selection Bias: Can we control for likelihood that funded Selection Bias: Can we control for likelihood that funded applicants are more promising researchers than non-applicants are more promising researchers than non-funded applicants?funded applicants?

Will sample sizes be large enough to detect program Will sample sizes be large enough to detect program effects?effects?

Could recruitment be measured, since all applicants Could recruitment be measured, since all applicants must have positions in field to be eligible?must have positions in field to be eligible?

9

Overcoming Selection Bias

If scoring process is known and measured, we can use statistical If scoring process is known and measured, we can use statistical models to obtain unbiased program effects. Regression models to obtain unbiased program effects. Regression discontinuity design can be used if score cut-off point or range is discontinuity design can be used if score cut-off point or range is used to make funding decisions.used to make funding decisions.

LRP scoring process is suitable for regression discontinuity design LRP scoring process is suitable for regression discontinuity design because:because:

Applicants are scored on the basis of their research potential Applicants are scored on the basis of their research potential according to standardized criteria.according to standardized criteria.

ICs seem to fund all those above a funding cut-off point or range. ICs seem to fund all those above a funding cut-off point or range. (Sometimes ICs go strictly by score in determining who to fund; (Sometimes ICs go strictly by score in determining who to fund; other times, ICs look at all scores close to “payline” and may other times, ICs look at all scores close to “payline” and may choose applicants with lower scores who are doing research in choose applicants with lower scores who are doing research in areas of particular interest.)areas of particular interest.)

10

Hypothetical Effect of Extramural LRP on Length of Time in Research Career

100 200 300 400 500

Application Scores

Mo

nth

s P

ers

iste

d in

Re

sea

rch

Po

st-L

RP

A

pp

lica

tion

Funding Cutoff

Regression Line

Program Effect

5 50

40

30

20

10

11

What size program effects could be detected with available sample sizes?

Sample sizes large enough to detect whether Sample sizes large enough to detect whether 5 LRPs collectively had effect of 10 5 LRPs collectively had effect of 10 percentage pointspercentage points

An effect of 15 percentage points could be An effect of 15 percentage points could be detected for certain subgroupsdetected for certain subgroups

Effects of 10 to 20 percentage points could Effects of 10 to 20 percentage points could be detected for the larger LRPs (and would be detected for the larger LRPs (and would be able to report outcomes for all applicants be able to report outcomes for all applicants in each LRP)in each LRP)

12

Measuring Recruitment Effects

Recruitment effect is difficult to measure Recruitment effect is difficult to measure through comparison to non-funded through comparison to non-funded applicants because they must have been applicants because they must have been funded in relevant field before applyingfunded in relevant field before applying

But, retrospective survey could gauge:But, retrospective survey could gauge:

-- whether applicants knew about LRP before -- whether applicants knew about LRP before taking research positiontaking research position

-- extent to which LRP influenced decision-- extent to which LRP influenced decision-- how they gauged chances of receiving award-- how they gauged chances of receiving award

13

Outcome Measures

Ideally, we could measure LRPs’ effect on whether individuals:Ideally, we could measure LRPs’ effect on whether individuals:

Conducted research in LRP field (and persistence)Conducted research in LRP field (and persistence) Conducted research in Conducted research in anyany field (and persistence) field (and persistence) Devoted > 50% of time to research in LRP field or any fieldDevoted > 50% of time to research in LRP field or any field Obtained an NIH R-01 grant Obtained an NIH R-01 grant Were PIs on NIH grant or any grantWere PIs on NIH grant or any grant Had NIH research funding or any research fundingHad NIH research funding or any research funding Applied for NIH funding Applied for NIH funding Had tenured academic positionHad tenured academic position Conducted research in nonprofit or government settingConducted research in nonprofit or government setting Were peer reviewers for NIH Were peer reviewers for NIH Were peer reviewers for journalsWere peer reviewers for journals Had publicationsHad publications Had their work cited Had their work cited

14

Data Sources

Applicant data from OLRSApplicant data from OLRS

Publications databases such as PubMedPublications databases such as PubMed

Funding databases, such as NIH’s IMPAC-IIFunding databases, such as NIH’s IMPAC-II

Proposed survey of past applicantsProposed survey of past applicants

15

Reasons for Proposed Survey

Critical information (e.g., field of research, non-NIH Critical information (e.g., field of research, non-NIH funding, being part of a research team) not readily funding, being part of a research team) not readily available from secondary sources. available from secondary sources.

Ability to track publications without name-matching.Ability to track publications without name-matching.

Data can be collected sooner (do not have to wait for Data can be collected sooner (do not have to wait for publication time delays).publication time delays).

Response bias can be gauged by comparing program Response bias can be gauged by comparing program effects on certain outcomes (such as whether PI on NIH effects on certain outcomes (such as whether PI on NIH grant) for full sample to sample of survey respondents.grant) for full sample to sample of survey respondents.

16

Methods

Regression discontinuity design will be used Regression discontinuity design will be used to obtain unbiased program effects.to obtain unbiased program effects.

Primary analysis would estimate combined Primary analysis would estimate combined effects for all LRPs pooled together.effects for all LRPs pooled together.

Model would control for differences between Model would control for differences between ICs and LRPs (such as scoring patterns or ICs and LRPs (such as scoring patterns or applicant characteristics).applicant characteristics).

17

Possible Subgroups Each of the larger LRPsEach of the larger LRPs

MDs vs. PhDsMDs vs. PhDs

Those who received NIH funding vs. those Those who received NIH funding vs. those who did notwho did not

Those who had higher vs. lower debt levelsThose who had higher vs. lower debt levels

Those who received their degree recently vs. Those who received their degree recently vs. longer agolonger ago

18

Options for Timing of Data Collection

Need to strike a balance between providing Need to strike a balance between providing information on research careers (which could information on research careers (which could take years) vs. providing timely information take years) vs. providing timely information to policymakers.to policymakers.

Propose measuring early outcomes 4 to 5 Propose measuring early outcomes 4 to 5 years from time of application.years from time of application.

Possibly measure long-term outcomes 7 to 9 Possibly measure long-term outcomes 7 to 9 years after application.years after application.

19

Sample Selection

Propose to include only 2003 and/or 2004 Propose to include only 2003 and/or 2004 cohorts since number of non-funded cohorts since number of non-funded applicants in 2001 and 2002 was so small.applicants in 2001 and 2002 was so small.

Sample size? Sample size? The larger the sample, the more likely we will The larger the sample, the more likely we will

detect program effects if they exist.detect program effects if they exist. Large sample is particularly important for Large sample is particularly important for

measuring effects of LRPs separately or for measuring effects of LRPs separately or for other subgroups.other subgroups.

But, collecting data on large sample will be But, collecting data on large sample will be more costly. more costly.

20

Option 1: Include all individuals from 2003 and 2004 cohorts

Able to detect smallest program impacts Able to detect smallest program impacts (about 9 percentage points for survey (about 9 percentage points for survey respondents)respondents)

Most costly because it has largest sampleMost costly because it has largest sample

Including 2004 pool means that data Including 2004 pool means that data collection, analysis would occur later than in collection, analysis would occur later than in Option 2Option 2

21

Option 2: Include only the 2003 cohort

Less costly than Option 1, involving half the Less costly than Option 1, involving half the samplesample

Data collection occurs a year earlier than Data collection occurs a year earlier than Option 1 or 3Option 1 or 3

Minimum detectable effects largest among 3 Minimum detectable effects largest among 3 options (10 to 13 percentage points) and options (10 to 13 percentage points) and reduces ability to measure subgroup impactsreduces ability to measure subgroup impacts

22

Option 3: Clinical LRP only

Middle of the three options in terms of Middle of the three options in terms of sample size and minimum detectable effects sample size and minimum detectable effects (9 to 11 percentage points)(9 to 11 percentage points)

Would only have results for one LRPWould only have results for one LRP

Data would be collected a year later than for Data would be collected a year later than for Option 2Option 2

23

Recommendations and Issues for Consideration

If OLRS desires separate estimates for large LRPs If OLRS desires separate estimates for large LRPs and for subgroups, implement Option 1and for subgroups, implement Option 1

If subgroups not a priority and/or if timeliness is a If subgroups not a priority and/or if timeliness is a priority, implement Option 2priority, implement Option 2

Option 3 is suitable if OLRS wants to detect Option 3 is suitable if OLRS wants to detect relatively small program impacts but is concerned relatively small program impacts but is concerned about cost of surveying full sample from all LRPsabout cost of surveying full sample from all LRPs

OLRS needs to consider how small an effect it needs OLRS needs to consider how small an effect it needs to be able to detect. (Would 7-percent effect be so to be able to detect. (Would 7-percent effect be so small that the program would not be considered small that the program would not be considered cost-effective?)cost-effective?)

Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs.

Documents

Transcript of Designing an Evaluation of the Effectiveness of NIH’s Extramural Loan Repayment Programs.