Download - Be an All-Star Manuscript Reviewer! · 2020-01-14 · Be an All-Star Manuscript Reviewer! Charles E. Kahn, Jr., MD, MS Editor, Radiology: Artificial Intelligence. Conflict of Interest

Be an All-Star Manuscript Reviewer!

Charles E. Kahn, Jr., MD, MS

Editor, Radiology: Artificial Intelligence

Conflict of Interest

• Nothing to disclose

Learning Objectives

• Understand the manuscript review process

• Describe the most important criteria to evaluate a manuscript

• Define special considerations when reviewing work about AI in

radiology

Overview

• Process

▫ Timeliness

• Content

▫ Guidelines + standards

▫ Critical thinking

• Ethics

▫ The Golden Rule

Review Timeline

Journal Office

• Complete manuscript?

• IRB/ethics statement?

Deputy Editor

• Appropriate?• Novel?• Good enough to

review?• Assign reviewers

Reviewers

• Important?• Scientifically

valid?

Deputy Editor

• Integrate reviews

• Recommend

Editor

• Make decision• Accept• Revise• Reject

1-3 days 5 days 14 days 7 days 7 days

Time to First Decision: 31 – 37 days

Time is Key

• Respond promptly!

▫ Quick “No” better than Slow “Yes”

• Provide dates you’re unavailable

• Honor the deadline

Your Review

• Strengths

▫ Timeliness of topic, novelty, size of study

• Weaknesses

▫ Scientific concerns, lack of generalizability

• Specific comments

▫ Page + line numbers

Focus Your Review

• Give constructive comments

▫ Seek to improve the work

• Help the editors

▫ Weaknesses

▫ Scientific quality

▫ Priority

“Would I want to read this article in this journal?”

The Golden Rule of Reviewing

• “Do unto others as you would have them do unto you”

▫ Be constructive

Try to improve the work

▫ Maintain confidentiality

Your “sneak peek” at another’s work is a privilege!

Don’t quote or share the content

The Rewards

• Reviewer recognition

▫ Journal appreciation

▫ Publons

• Editorial Board appointment

• Honor + Glory

Before You Begin…

• Correct manuscript category?

• Any potential conflict or bias?

• Important problem?

• Previously published?

The Review

• Abstract

• Introduction

• Methods

• Results

• Discussion

• Figures

• Tables

• References

• Summary

The Abstract

• Summarizes the manuscript appropriately

▫ Discrepancies between Abstract and rest of manuscript

• Stands alone

▫ Understandable without reading the manuscript

The Introduction

• Concise

• Clearly defines the purpose of the study

• Explains why the study is important

• Defines terms

• Well-defined and testable hypothesis

The Methods Section

• Reproducible methods

• Justified study design

▫ Choice of model

▫ Cohort size

• Methods test the hypothesis

The Results Section

• Clearly explained

• Order of results parallels the methods

• Reasonable? Unexpected?

• Any results not tied to methods?

The Discussion Section

• Summarize results

• Place results in context

▫ Relate to prior literature

▫ Indicate impact on the field

• Describe limitations

• Envision future work

The Discussion Section

• Concise

• Hypothesis verified?

▫ Research question answered?

• Unexpected results

• Limitations

• Next steps

Figures

• Necessary

• Understandable

• Appropriate

Hypothesis

• Quantitative vs.

qualitative

• Retrospective vs.

prospective

• Feasibility vs.

performance

Data

• De-ID

• Data protection

• Ethics

• Preparation

• Augmentation

• Partitions

• Ground truth

Model

• Architecture

• Software

• Initialization

• Pretraining /

transfer learning

• Hyperparameters

• Training rules

Evaluation

• Metrics

• Sensitivity

analysis

• External

testing

• Statistical

analysis

Data

• Where did the data come from?

• How were variables defined?

▫ Common Data Elements (RadElement.org), where applicable

• Inclusion / exclusion criteria

• How was the quantity of data to be used determined?

• How well do the training data match the intended clinical use?

Trouble in Paradise

• Of 516 eligible published studies, only 6% (31 studies)

performed external validation

• None of the 31 studies adopted all three design features:

▫ Diagnostic cohort design

▫ Inclusion of multiple institutions

▫ Prospective data collection for external validation

Kim DW, et al. Korean J Radiol. 2019;20:405-10

doi.org/10.1016/j.jacr.2019.06.009

ACR “TOUCH-AI”

acrdsi.org/DSI-Services/Define-AI/Use-Cases/Acute-Appendicitis

radelement.org/element/RDE195

De-Identification

• Images

▫ DICOM header

Conventional + “private" fields

▫ Image data

Jewelry, implant IDs, burned-in labels,

facial views

• Reports

▫ Patient + provider names

“Hide in Plain Sight” and others

▫ DatesParks CL, Monson KL.

J Digit Imaging 2017; 30:204

“Ground Truth”

• Well defined

• Who (or what) annotated the data?

▫ Qualifications / training

▫ Instructions

▫ How was inter-rater variability

measured?

▫ Was intra-rater variability assessed?

• Single vs. multiple annotation

▫ Blinding

▫ Adjudication of discrepancies

Data Preparation

• “Data wrangling”

▫ Specific software, version number, specified options

• Normalization

• Resampling

▫ Image matrix size

E.g., 512 x 512 224 x 224

▫ Bit depth

E.g., 16-bit grayscale 8-bit RGB

Data Augmentation

• Was it used? If so, how?

▫ Horizontal flip

▫ Vertical flip

▫ Translation (sliding)

▫ Rotation

▫ Affine transformation

Data Partitions

• How?

▫ Any differences? If so, why?

Data source, annotation source, or

preparation

• Disjoint

▫ By image, study, patient, or

institution

▫ Should be at least by patient

Training70%

Tuning (Validation)

20%

Testing10%

Architectures

• LeNet

• ResNet34

• U-Net

• VGG

• LSTM

• Inception

Model Evaluation

• How many models were evaluated against the test set, and how

were these models selected?

▫ Ideally, one

▫ If greater than one, justify reasoning

Metrics

• Sørensen-Dice coefficient = 2 𝑋∩𝑌

𝑋 + 𝑌

▫ Dice similarity coefficient (DSC)

=2𝑇𝑃

2𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁

.

Metrics

• Jaccard index =

Intersection over Union = IoU = |𝑋∩𝑌|

|𝑋∪𝑌|

Metrics

• Hausdorff distance

▫ Measures how far apart two subsets are

The greatest of all the distances from a point

in one set to the closest point in the other set

Model Performance

Handelman GS, et al. AJR 2019; 212:38-43

Overfitting Underfitting

arxiv.org/abs/1807.00431

Irrelevant

Features

Leakage

• Unintended use of known information as unknown

• Outcome leakage

▫ Independent variables can be used to infer outcomes

For example, a risk factor that spans into the future may be used to predict the

future itself

• Validation leakage

▫ Ground truth from training set propagates to validation set

For example, same patient is used in both training and validation

K-fold Cross-validation

• Split the data into K equal parts

▫ Train the model on K−1 parts

▫ Validate on the remaining part

• Repeat the process K times

• Report average results for K-folds

• For small classes and rare categorical factors, stratified K-fold

splitting ensures equal presence of classes and factors in each fold

Evaluation

• ROC analysis

• Calibration curve

• Confusion matrix

• Review of misclassifications

http://arogozhnikov.github.io/2015/10/05/roc-curve.html

Allen B Jr, et al. J Am Coll Radiol 2019; 16:1179

AI Need Not Be Superhuman

Calibration

Actual prevalence of

malignancy versus

estimated risk of

malignancy for each decile

of the probability scale.

Ayer T et al. Cancer 2010; 116:3310-21

Misclassification

https://doi.org/10.1148/ryai.2019180001

False Positives False Negatives

Good Science

• Hypothesis

▫ Well-defined

▫ Testable

▫ Innovative

• Methods

▫ Appropriate to stated problem

▫ Described in detail

▫ Correct metrics

• Results

▫ Appropriate level of detail

• Discussion

▫ Summarize results

▫ Place work into context

▫ Describe limitations

▫ Envision future work

http://jasonya.com/wp/wp-content/uploads/2015/04/car_peer_review_comic_12.jpg

Thank you !

[email protected]

@Radiology_AI

rsna.org/ai