PATHOLOGY INFORMATICS · 2019-08-05 · PATHOLOGY INFORMATICS A Proof-of-concept use of Generative...

PATHOLOGY INFORMATICS

A Proof-of-concept use of Generative Adversarial

Networks (GANs) to Remove Pen Marks from Whole Slide

ImagesFangYao Hu1 and Cleopatra Kozlowski2

1Dept. of Safety Assessment2Dept. of Oncology Biomarker Development Genentech Inc.

Conflict of Interest

FangYao and I are both employees and stock-holders of Genentech / Roche

Outline

Introduction

Why remove marks from slides?

Methods

Preparation of training set samples

Generative Adversarial Networks (GANs)

Results

Example images

Computing GAN errors

Future Directions

Introduction

Why do pathologists mark slides?

Tumor area marked for RNA extraction

Notes on diagnosis and other metadata

Why can’t we just clean the slides?

Only the images (but not the slides) may be available

The marks may be still needed for downstream processes

Too much time required to clean

Cleaning can occasionally damage slides

Marker may be permanent

Benefits of restoring the original image

Allows other analyses (eg. image segmentation, patient response prediction, etc) to be applied without losing information

Solve anonymization issues (scribbles may have patient identifying information)

In our hands, color deconvolution was insufficient due to marker

Opacity variation

color gradients

GANs might be an easier, faster approach

Input Image

Generator

Discriminator

Generative Adversarial Networks (GANs)

Training ImageGenerated Image

The ‘art forgery expert’

Is it real or generated?

The ‘art critic’

Methods Precious clinical dataset (leave untouched) Hematoxylin and Eosin (H&E) Slides

Clean set of H&E tissue slides

Scan images

Mark up (at random)With same marker used for clinical dataset(red, green, blue and black)

Build a training dataset of procured samples (100 slides) similar to clinical dataset

HamamatsuSlide scanner

Register clean and marked images

4High magnification tile extractionon only marked areas

Slidematch softwareMicrodimensions

Identify marker areas, then only add data from marked area for GAN training

• 86 slides (~87,000 tiles) for training • 10 slides for validation• 3 slides for testing• 1 failed registration step

DefiniensDeveloperSoftware

Example of Paired Tiles used for Training

Original (unmarked)With Marks

512x512 pixels at 0.46um/pixel (20x)

Situation A

Marker is present, but underlying image is discernable by a human(enough information exists to restore image to recognize cellular objects)

Contrast / brightness adjustment

Situation B

Marker is present, underlying image information is mostly gone

Situation B : We don’t want GANs to make up data!

https://dribbble.com/shots/3798731-Mona-Liza-800-X600

This is better.

I’m sorry, I really didn’t know what to

put here, but at least I didn’t take a

wild guess?

Train the GAN to return black pixels if it

does not have enough information.

Marked image Original Image + modification

Augmentation: random cropping and rotations

GAN model: conditional GANs (require pairs of images for training) -pix2pix model.

Parameters: 40 epochs on 4 GPUs

Training details

Jun-Yan Zhu and Taesung Park

https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

Unseen marked image

Original, unmarked image

GAN-restored image

Results

Unseen marked image

GAN-restored image

Unseen marked image

Original, unmarked image (but with the imposed black)

GAN-restored image

Unseen marked image

GAN-restored image

Unseen marked image

GAN-restored image

Unseen marked image

GAN-restored image

Unseen marked image

GAN-restored image

Need to measure algorithm performance quantitatively in order to assess improvement.

2 types of discrepancies between original image and GAN

• Position (where objects in the image are)

• Color balance (notably red vs blue)

• H&E = hematoxylin/eosin

• Information is in balance between pink and purple

Measuring GAN Error Rate and Severity

Quantify error in positionQuantify difference in pixel positions: Abs [Original Image (ground truth) – GAN generated image ]average of RGB channels = grayscale

10.168- =

GANOriginal

These images were glass areas with marker

Position diff

Quantification of GAN error rate: Position

Original, unmarked imageUnseen marked image

GAN can remove marker from glass… Mostly...

GAN-restored image

These images had other issues with the ground truths (not registered, out of focus)

Position diff

Registrationproblem

Blurred region

Blurred

GAN-restored image

These images had other issues with the ground truths (not registered, out of focus)

Most images (2.5 < x<25) 78%

Mean ~ 10

Position diff

Quantify error in colorQuantify different in color: Abs [Original Image (R/B) - GAN generated Image (R/B) ]Measure change in Red: Blue color balance ratio

GANOriginal

Green GreenRed Blue Red Blue

Quantification of GAN error rate: Red/Blue balance

Color diff

Position diff

Original image Marked imageGAN restored image

Conclusions and Future PlansLikely more work is needed before restored images can be used for other analyses.

• Some errors are due to ground truth errors with scanning (out of focus) and poor registration• Correct manually, or algorithm to specifically fix this

• Works quite well for green pen-marks, less for blue and red• Most serious errors come from color balance.

• Eg. Blue markers => “hematoxylin-rich” fake cells appear.• Maybe train with greyscale images

• Algo needs to return more black areas where it is uncertain• Re-train with more black areas

Conclusions and Future Plans• More data

• More data augmentation• More marked slides

• Improve error metric • Incorporate human evaluation• Incorporate ‘machine’ evaluation

• test if segmentation algorithms etc. perform similarly on restored images compared to ‘real’ images

• Improve algorithm• Parameter tuning• Try other GAN networks

Acknowledgements

Jennifer Giltnane

Aicha BenTaieb

Quincy Wong

Priti Hegde

Craig Cummings

Andrew Whitehead

Doing now what patients need next

Backup

Previous work with GANs in biomedical space

Generation of virtual lung single‐photon emission computed tomography/CT fusion images for functional avoidance radiotherapy planning using machine learning algorithms

BumSup Jang and Ji Hyun Chang, J Medical Imaging and Radiation Oncology. 2019

onlinelibrary.wiley.com/doi/full/10.1111/1754-9485.12868

SHIFT: speedy histopathological-to-immunofluorescent translation of whole slide images using conditional generative adversarial networks

Erik A. Burlingame, Adam A. Margolin, Joe W. Gray, and Young Hwan Chang Proc SPIE Int Soc Opt Eng. 2018

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6166432/

PATHOLOGY INFORMATICS · 2019-08-05 · PATHOLOGY INFORMATICS A Proof-of-concept use of Generative...

Documents

Transcript of PATHOLOGY INFORMATICS · 2019-08-05 · PATHOLOGY INFORMATICS A Proof-of-concept use of Generative...

The Promise of Pathology Informatics James J. Cimino, M.D. Department of Medical Informatics Columbia University.

Gans, Gans & Associates HA-Director of Operations-pd .pdf · Gans, Gans & Associates 7445 Quail Meadow Road, Plant City, FL 33565 813-986-4441 . Director of Operations Durham, NC

Pathology Imaging and Informatics as the Cornerstone of ...€¦ · Pathology Imaging and Informatics as the Cornerstone of Precision Medicine: Two years of Experience with the NIDDK

· KS GAnS _K6S _S .h._ laaD.AG KS GAnS _K6S _S ?_AahA G )? hh.G KS GAnS _K6S _S AG; KS GAnS _K6S _S ?_AahA G

Digital pathology and pathology informatics

Advantages of Integrated (Homogenous) Systems John Gilbertson MD Associate Chief of Pathology Director of Pathology Informatics Massachusetts General Hospital.

Bar-coding in AP: OmniTrax as a Full Middleware Solution Rodney Schmidt, MD, PhD Professor of Pathology, Director of Medical Informatics (Pathology) University.

Pathology Informatics Essentials for Residents ... IRG R3.1.pdf · clinical informatics (AP/CP).” Exhibit 2: PIER Essentials and the ACGME Informatics Milestones SBP7: Informatics

CAP Today Session: New Directions in Pathology Informatics ... · Benedum Oncology Informatics Center / Center for Pathology Informatics Slide 2 2 Disclosures by MJB APIII •Corporate

The Future of Pathology Informatics: The Biomedical Informatics Department Model Michael J. Becich, MD PhD Chairman and Professor Department of Biomedical.

Olga Medvedeva Center for Pathology Informatics University ... · Olga Medvedeva Center for Pathology Informatics University of Pittsburgh. 6th International Protégé Workshop Outline

De-identifying Pathology Reports for Pathology Informatics James Gardner, Li Xiong Department of Math and Computer Science Fusheng Wang, Andrew Post, Joel.

Ulysses J. Balis, M.D. Director, Division of Pathology Informatics Department of Pathology

Pathology Informatics Essentials for Residents Instructional Resource Guide pier irg... · 2016-07-06 · The prototype, Pathology Informatics Essentials for Residents – Instructional

Pathology Informatics as a Key Element in New …...Pathology Informatics as a Key Element in New Healthcare Delivery Models and Technologies Bruce A. Friedman, M.D. Active Emeritus

Challenges to Pathology Informatics in the Era of the Electronic Medical Record Bruce A. Friedman, M.D. Department of Pathology University of Michigan.

Computer Science, Biology and Biomedical Informatics ...€¦ · The University of Pittsburgh’s Department of Biomedical Informatics and Division of Pathology Informatics created

De-identifying Pathology Reports for Pathology Informatics

Edinburgh-St Andrews Consortium for Molecular Pathology, … · 2016-07-01 · Programme. 4 Edinburgh-St Andrews Consortium for Molecular Pathology, Informatics and Genome Sciences

Laboratory Information Digital Data Exchange (LIDDEx): An update Ulysses J. Balis, M.D. Director, Division of Pathology Informatics Department of Pathology.