Fully Characterizing the Type 1 R01 Prevention Research Portfolio Myles et... · 2017-06-16 ·...

1
Fully Characterizing the Type 1 R01 Prevention Research Portfolio Background The Research, Condition, and Disease Categorization (RCDC) system has been used to classify NIH grants that are prevention research, but the sensitivity and specificity of RCDC’s ability to accurately classify grants is unknown. Additionally, RCDC does not provide detailed characteristics of prevention research (i.e. research method, population, etc.). The NIH Office of Disease Prevention (ODP) developed a more comprehensive method to fully characterize the prevention research portfolio while also assessing the accuracy of RCDC using a newly developed machine learning algorithm. Taxonomy for Prevention Research A detailed protocol provides instructions for classifying grants with definitions of terms, topics, and examples. The coding process utilizes a prevention research taxonomy for coding grant abstracts partnered with the Prevention Abstract Classification Tool (PACT), a custom software designed to capture individual and group consensus coding. Ranell L. Myles, PhD, MPH, CHES; Jennifer Villani, PhD, MPH; Jocelyn A. Lee, PhD, MPH; Ashley J. Vargas, PhD, MPH, RDN; Jessica Y. Wu, PhD; Patricia L. Mabry, PhD; David M. Murray, PhD; Sheri D. Schully, PhD ODP development and refinement of Taxonomy and Protocol Manual coding by contractor via PACT software QC and kappa statistics by ODP staff Gold standards for Machine Learning Algorithms at OPA Refinement of Machine Learning Algorithm Run algorithm to determine sensitivity and specificity and retrieve additional prevention grants Strategic Priority I Objectives 1. Establish a taxonomy for prevention research that ODP can apply to analyze the NIH prevention research portfolio. 2. Develop, test, and implement portfolio analysis tools to classify NIH awards based on the taxonomy for prevention research. 3. Develop and implement a process to regularly assess the progress and results of NIH investments in prevention research. Methods Grant Selection: All NIH type 1 R01 grant abstracts awarded during FY12-15 that were classified by RCDC as prevention (RCDC+), a 5% random sample of the RCDC non-prevention (RCDC-), and an additional set of abstracts identified through machine learning (N=5,467) were coded by ODP’s contract staff using a team- based consensus approach. The coding identified the study focus, population focus, study design, and type of prevention research. Team-Based Coding: Contracted Research Analysts (RAs) and ODP staff were trained in the team based coding process. RAs coded grants based on the taxonomy and protocol with 10% of those being coded by ODP staff members for quality control. To ensure high quality data, PACT is integrated with SAS to calculate inter-rater reliability (Kappa scores) among coders and between coders and ODP staff. Analysis: All of the RCDC+ grants and a 5% random sample of the RCDC- grants were used to detect the level of false negative coding by RCDC. The results from the 5% sample were extrapolated to reflect the entire RCDC- category. The sensitivity and specificity of the RCDC system for identifying prevention grants were calculated using the observed true positives and false positives and using the extrapolated true negatives and false negatives. Results Average Kappa scores exceeded 0.8 each month We estimate that 92% of all prevention type 1 R01 grants from FY12-15 have been identified and coded with the help of the machine learning algorithm The RCDC system underestimates the number of prevention grants at NIH Prevention research represents ~17% of NIH awards and 21% of NIH dollars Sensitivity and Specificity (Type 1 R01 RCDC Identified Grants Only): Results Projected Number of Awards and Support (All Type 1 R01s, RCDC and Machine): Study Rationale (FY12-15 Type 1 R01s, RCDC and Machine): Study Design and Type of Prevention (FY12-15 Type 1 R01s, RCDC and Machine): Conclusions This analysis provides our first report based on the manual coding of all RCDC Type 1 R01 prevention awards, 5% of the RCDC non-prevention awards, and those identified through machine learning. It is the most detailed and carefully validated analysis of the prevention portfolio ever conducted at NIH. The most commonly selected topics for study rationale include many of the expected health conditions and risk behaviors associated with prevention research. There was a high proportion of studies that utilized secondary data as well as observational study designs. Unsurprisingly, primary prevention was the most common type of prevention research in the portfolio. Next Steps The ODP will continue working with the Office of Portfolio Analysis (OPA) to apply machine learning to other grant mechanisms (i.e. 2R01, R03, R21) to gain a deeper understanding of the prevention research funded by NIH. Discuss coding as a team of 3 and develop a consensus code for the abstract Select appropriate topics for each category A – F, selecting all that apply Read entire grant abstract, including Public Health Relevance Coding Process FY Coded RCDC+ (100%) Coded RCDC- (5%) Total Coded ODP Prevention Sensitivity Specificity 2012 783 148 931 414 61.0% 87.4% 2013 738 137 845 410 56.4% 87.0% 2014 825 142 967 464 57.6% 86.5% 2015 881 160 1041 450 54.1% 85.8% Total 3227 587 3814 1738 57.2% 86.7% FY Awards % of NIH Awards Prevention $ (millions) % off NIH $ 2012 652 17.6% $342 21.4% 2013 582 17.4% $299 21.0% 2014 606 17.0% $352 22.0% 2015 624 15.8% $360 20.3% Total 2,464 16.9% $1,352 21.2% Rationale Coded Awards % Prevention Mortality 668 27.1% Infectious Disease 461 18.7% Cancer 345 14.0% Maternal/Paternal Child Health 344 14.0% Other or Unclear 331 13.4% Observational study, 60.1% Analysis of existing data, 43.1% Randomized intervention study, 27.4% Methods research, 22.2% Non-randomized intervention study , 7.8% Pilot/feasibility/proof-of- concept/safety , 6.6% Other or unclear, 3.0% STUDY DESIGN Disease prevention/health promotion, 69.5% Methods research, 20.8% Prevention of progression or recurrence, 19.2% Screening for early disease, 7.2% Screening for risk factor, 0.9% TYPE OF PREVENTION

Transcript of Fully Characterizing the Type 1 R01 Prevention Research Portfolio Myles et... · 2017-06-16 ·...

Page 1: Fully Characterizing the Type 1 R01 Prevention Research Portfolio Myles et... · 2017-06-16 · portfolio. 2. Develop, test, and implement portfolio analysis tools to classify NIH

Fully Characterizing the Type 1 R01 Prevention Research Portfolio

BackgroundThe Research, Condition, and Disease Categorization (RCDC) system has been used to classify NIH grants that are prevention research, but the sensitivity and specificity of RCDC’s ability to accurately classify grants is unknown. Additionally, RCDC does not provide detailed characteristics of prevention research (i.e. research method, population, etc.). The NIH Office of Disease Prevention (ODP) developed a more comprehensive method to fully characterize the prevention research portfolio while also assessing the accuracy of RCDC using a newly developed machine learning algorithm.

Taxonomy for Prevention ResearchA detailed protocol provides instructions for classifying grants with definitions of terms, topics, and examples. The coding process utilizes a prevention research taxonomy for coding grant abstracts partnered with the Prevention Abstract Classification Tool (PACT), a custom software designed to capture individual and group consensus coding.

Ranell L. Myles, PhD, MPH, CHES; Jennifer Villani, PhD, MPH; Jocelyn A. Lee, PhD, MPH; Ashley J. Vargas, PhD, MPH, RDN; Jessica Y. Wu, PhD; Patricia L. Mabry, PhD; David M. Murray, PhD; Sheri D. Schully, PhD

ODP development and refinement of Taxonomy

and Protocol

Manual coding by contractor via PACT

software

QC and kappa statistics by ODP staff

Gold standards for Machine Learning Algorithms at OPA

Refinement of Machine Learning Algorithm

Run algorithm to determine sensitivity and specificity and retrieve additional

prevention grants

Strategic Priority I Objectives1. Establish a taxonomy for prevention

research that ODP can apply to analyze the NIH prevention research portfolio.

2. Develop, test, and implement portfolio analysis tools to classify NIH awards based on the taxonomy for prevention research.

3. Develop and implement a process to regularly assess the progress and results of NIH investments in prevention research.

MethodsGrant Selection: All NIH type 1 R01 grant abstracts awarded during FY12-15 that were classified by RCDC as prevention (RCDC+), a 5% random sample of the RCDC non-prevention (RCDC-), and an additional set of abstracts identified through machine learning (N=5,467) were coded by ODP’s contract staff using a team-based consensus approach. The coding identified the study focus, population focus, study design, and type of prevention research.

Team-Based Coding: Contracted Research Analysts (RAs) and ODP staff were trained in the team based coding process. RAs coded grants based on the taxonomy and protocol with 10% of those being coded by ODP staff members for quality control. To ensure high quality data, PACT is integrated with SAS to calculate inter-rater reliability (Kappa scores) among coders and between coders and ODP staff.

Analysis: All of the RCDC+ grants and a 5% random sample of the RCDC- grants were used to detect the level of false negative coding by RCDC. The results from the 5% sample were extrapolated to reflect the entire RCDC- category. The sensitivity and specificity of the RCDC system for identifying prevention grants were calculated using the observed true positives and false positives and using the extrapolated true negatives and false negatives.

Results• Average Kappa scores exceeded 0.8 each month• We estimate that 92% of all prevention type 1 R01 grants from FY12-15 have

been identified and coded with the help of the machine learning algorithm• The RCDC system underestimates the number of prevention grants at NIH• Prevention research represents ~17% of NIH awards and 21% of NIH dollars

Sensitivity and Specificity (Type 1 R01 RCDC Identified Grants Only):

ResultsProjected Number of Awards and Support (All Type 1 R01s, RCDC and Machine):

Study Rationale (FY12-15 Type 1 R01s, RCDC and Machine):

Study Design and Type of Prevention (FY12-15 Type 1 R01s, RCDC and Machine):

ConclusionsThis analysis provides our first report based on the manual coding of all RCDC Type 1 R01 prevention awards, 5% of the RCDC non-prevention awards, and those identified through machine learning. It is the most detailed and carefully validated analysis of the prevention portfolio ever conducted at NIH. The most commonly selected topics for study rationale include many of the expected health conditions and risk behaviors associated with prevention research. There was a high proportion of studies that utilized secondary data as well as observational study designs. Unsurprisingly, primary prevention was the most common type of prevention research in the portfolio.

Next StepsThe ODP will continue working with the Office of Portfolio Analysis (OPA) to apply machine learning to other grant mechanisms (i.e. 2R01, R03, R21) to gain a deeper understanding of the prevention research funded by NIH.

Discuss coding as a team of 3 and develop a

consensus code for the abstract

Select appropriate

topics for each category A – F,

selecting all that apply

Read entire grant abstract,

including Public Health

Relevance

Coding Process

FYCoded RCDC+ (100%)

Coded RCDC-(5%)

Total Coded

ODP Prevention

Sensitivity Specificity

2012 783 148 931 414 61.0% 87.4%

2013 738 137 845 410 56.4% 87.0%

2014 825 142 967 464 57.6% 86.5%

2015 881 160 1041 450 54.1% 85.8%

Total 3227 587 3814 1738 57.2% 86.7%

FY Awards% of NIH Awards

Prevention $ (millions)

% off NIH $

2012 652 17.6% $342 21.4%

2013 582 17.4% $299 21.0%

2014 606 17.0% $352 22.0%

2015 624 15.8% $360 20.3%

Total 2,464 16.9% $1,352 21.2%

Rationale Coded Awards % Prevention

Mortality 668 27.1%

Infectious Disease 461 18.7%

Cancer 345 14.0%

Maternal/Paternal Child Health 344 14.0%

Other or Unclear 331 13.4%

Observational study, 60.1%

Analysis of existing data, 43.1%

Randomized intervention study, 27.4%

Methods research,

22.2%

Non-randomized intervention study , 7.8%

Pilot/feasibility/proof-of-concept/safety , 6.6%

Other or unclear, 3.0%

STUDY DESIGN

Disease prevention/health promotion, 69.5%

Methods research, 20.8%

Prevention of progression or

recurrence, 19.2%

Screening for early disease, 7.2%

Screening for risk factor, 0.9%

TYPE OF PREVENTION