1 End-to-End Learning for Automatic Cell Phenotyping Paolo Emilio Barbano, Koray Kavukcuoglu, Marco...

1 End-to-End Learning for Automatic Cell Phenotyping Paolo Emilio Barbano, Koray Kavukcuoglu, Marco Scoffier, Yann LeCun April 26, 2006 2 Outline -Image Processing with ConvNets -Zebra Fish Identification -Cell Identification -Matlab Tool for Signal Processing using ConvNets 3 Problem Signal Processing Problem -Identify original signal from distorted versions Signal Structure -Active Region (carries data) -Inactive Region (mostly noise) Datasets -4 different datasets used -Active vs Inactive Classification -Class 1 vs Class 2... Class n vs Inactive Classification (Target Classification) 4 1 st Dataset Active vs Inactive Active Region Inactive Region Noisy Data 10 additional datasets with increasing noise levels Problem Signal Processing 5 OFDM Signals -1 st Set +1 / -1 Identification problem -2 nd Set More general problem (8 different words) Walsh constants 6 Goal - Identify normal vs. curved fish - Count number of fish per phenotype Zebra Fish Screening 7 - Convolutional Neural Network Labelling Strategy - Generate Region of Interest (ROI) Maps - Decide window label from maps Proposed Solution 8 A. Use given images for training with maps Rotated / Non-Rotated images - Large input images - Long training time - Better representation for congested situations deg steps, 12 replicated images per image Use gray-scale sub-sampled images Proposed Solution Alternative Approaches 9 B. Generate Training Set with Small Images Rotated / Non-Rotated images - Small input images, Robust training - Single fish per image - Poor representation for congested situations... Four different gray-scale fish images are used (48 images) Proposed Solution Alternative Approaches 10 Proposed Solution Labeling Strategies Classification Possibilities a. fish non-fish (background) b. background curved fish straight fish c. background head curved tail straight tail d. background curved head straight head curved tail straight tail How to - Generate ROI with different colors and define a mapping from colors to classes straight head straight tail curved tail curved head 11 Proposed Solution Network Structure Input Window Size - 8x8 to 80x80 variations increase feature area Number of Feature Maps - variations from 1 3 5 10 to 1 6 16 80 - variations from ~1000 trainable parameters to ~100K Segmented Classification a. Classify fish vs. non-fish b. Classify curved vs. straight in results of (a)... 12 Results Zebra Fish Head Identification classification results input 1 st Conv Layer 2 nd Conv Layer 3 rd Conv Layer 13 Results Zebra Fish Tail Identification 14 Results - Clear identification for not congested images - Confusion when several fish overlap background - correct confusion with background confusion head / tail head / tail - correct 15 Results ROC Curves Background Straight TailCurved Tail Head 16 Results Counting Head Straight Tail Curved Tail 17 Goal -Automatically characterize phenotypes found in a multi-wavelength cell image Problem Cell Phenotyping 18 Images -Measurements taken at 3 wavelengths saved as16 bit images -Converted to 8 bit RGB for visualization and compact representation -Large dataset (> images), concentrated on Kc cells for proof of concept Cell Phenotyping Dataset 19 Labeling Strategy -Each input image has a one-to-one label map -Label maps are color coded for nucleus, body, wall, -Simply pick the window label using center pixel (works as well as more complicated methods) Proposed Solution 20 Outputs of the Network -A one-to-one output map is produced Proposed Solution -Output map is fed into object recognition layer -Recompute the map using local influence regions and apply thresholding over network confidence to eliminate noise 21 Object Recognition -Compute connected areas to identify objects Proposed Solution -Walls help identify individual cells in a cluster -Mark nuclei as the cell, disregard body and wall 22 Two network approach -One network identifies base elements (nucleus, body, wall) -Second network identifies mono-nucleate and bi- nucleate cells -Merge information from second network into first -It may be possible to eliminate this approach Proposed Solution Specifics 23 -Trained with 128 by 128 random patches -Tested on full size images including unknown phenotypes -Results show that the machine is capable of identifying mono-nucleate vs multi-nucleate cells -Continuing to train and test on more samples Results 24 Network Specifications -Language Lush -Net 3 layer convolutional net -Structure Adjustable -Input Data (3D Lush Matrix) -Features -Convolution and subsampling kernels -Output Classes -Outputs - ROC Curves - Label maps 25 Network Specifications -Options -Normalization (per window) / Scaling (dataset) -Bias preventation between classes -Mapping from labels to classes -Select which classes to train -Auto-labeling with threshold maps -Internal state output -Auto testing with trained parameters -Suitable to run on queue scheduling clusters -Generic input file format to specify these options 26 C-elegans egg counting Problem -Help biologists to sort vast amounts of microscope imaging Create a tool -which allows biologists to identify the parts of the image they want automatically recognized -with minimal manual labeling get valuable counts of important elements identified by the biologists to speed their work Datasets C-elegans Genes are blocked --> eggs don't hatch Count how many eggs there are in an image we can know which gene was blocked 27 C-elegans egg counting -34 images 1600x1200 pixels, eggs are labeled manually x1200x34 => 65Million windows can be picked. We want an automatic way to choose those most suitable for training. - Red are the hand labeled eggs - Green are 40x40 windows - Blue are 100x100 windows - 40x40 windows seem to be a good fit for the eggs we wish to count. 28 C-elegans egg counting - by picking a window where the hand labeled pixels cover between 40 and 60% of the image we are sure of getting an edge. Class 1 50% coverage Class % coverage 29 C-elegans egg counting - by picking a window where the labels are >90% of a single class we are sure of getting an interior image. - just adjusting these scaling factors could make the tool applicable to many situations. Class 1 90% coverage Class % coverage 30 C-elegans egg counting -top two images are from training the edges 50% class 1 coverage vs. a background of 90% class 0 coverage - bottom two images are from training the interior 90% class 1 coverage vs. a background of 90% class 0 coverage.

1 End-to-End Learning for Automatic Cell Phenotyping Paolo Emilio Barbano, Koray Kavukcuoglu, Marco...

Documents

Transcript of 1 End-to-End Learning for Automatic Cell Phenotyping Paolo Emilio Barbano, Koray Kavukcuoglu, Marco...