Monitoring%Illegal%Fishing%through%Image%Classiﬁcaon%cs229.stanford.edu/proj2016/poster/GeislerFrostMahajan-Monitoring... ·...

Modeling SIFT Fisher Encoding (FE) (GMM Model on SIFT to construct a visual dic:onary) Linear SVM Bag of Visual Words (KMeans on SIFT) Linear SVM CIFAR10 CNN VGG16 Random Forest (15 trees) SVM (RBF, linear) Monitoring Illegal Fishing through Image Classifica:on Jason Frost [email protected] Taylor Geisler [email protected] Antariksh Mahajan [email protected] Problem Nature Conservancy, an NGO, monitors illegal fishing by installing cameras on fishing boats. Image classification using computers can help them to speed up and automate the process. We built a multiclass classification system for 8 different species. Feature Extraction 1. Scale-Invariant Feature Transform (SIFT) • Identifies keypoints (shapes with directional orientations) • Invariant to rotations and translations • Dense SIFT (dSIFT) runs SIFT on a grid 2. VGG-16 Convolutional Neural Network • 16-layer CNN, has won image recognition competitions by ImageNet • Pre-trained weights, outputs 1000 features 3. CIFAR-10 Convolutional Neural Network • 10 convolutional layers • Trained our own weights Data • Original dataset: 3777 images, 8 labels • Data preprocessing and augmentation: • Feature-wise centering set input mean to 0 • Random rotations, shifts, flips increased dataset size to >20,000 images • Rescaled to 224x224 for easier processing • Challenge: fish are tiny part of most images Label: ALB BET Baseline models • Naïve Bayes (by occurrences of colors): 8% • 1-nearest neighbor: 97%! • But this is because similar boats catch similar fish. Need more complex model for greater generalizability. Training image with SIFT keypoints Image 2(conv64), maxpool, ReLU 2(conv128), maxpool, ReLU 3(conv256), maxpool, ReLU 6(conv512), maxpool, ReLU 2(FC4096), 1(FC1000) Results Extrac’on Method Model Training Acc. Test Acc. VGG16 RF 99.5% 83.4% SVMRBF 44.3% 45.6% SVMLinear 54.1% 53.3% CIFAR10 95.6% 77.3% Conclusions • VGG-16 to RF performed best • However, difficult to extract most important features from VGG-16 • Better and more generalizable results can be obtained by extracting fish from images, possibly using another CNN

Upload
others
Category

Documents
view
1
download
0

Embed Size (px):

Transcript of Monitoring%Illegal%Fishing%through%Image%Classiﬁcaon%cs229.stanford.edu/proj2016/poster/GeislerFrostMahajan-Monitoring... ·...

Page 1: Monitoring%Illegal%Fishing%through%Image%Classiﬁcaon%cs229.stanford.edu/proj2016/poster/GeislerFrostMahajan-Monitoring... · Nature Conservancy, an NGO, monitors illegal ﬁshing

Modeling

SIFT

Fisher Encoding (FE) (GMM Model on SIFT to construct

a visual dic:onary)

Linear SVM

Bag of Visual Words (K-‐Means on SIFT)

Linear SVM

CIFA

R-‐10

CNN

VGG-‐16 Random Forest (15 trees)

SVM (RBF, linear)

Monitoring Illegal Fishing through Image Classifica:on Jason Frost

[email protected] Taylor Geisler

[email protected] Antariksh Mahajan [email protected]

ProblemNature Conservancy, an NGO, monitors illegal fishing by installing cameras on fishing boats. Image classification using computers can help them to speed up and automate the process. We built a multiclass classification system for 8 different species.

Feature Extraction

1.  Scale-Invariant Feature Transform (SIFT)•  Identifies keypoints (shapes with

directional orientations)•  Invariant to rotations and translations•  Dense SIFT (dSIFT) runs SIFT on a grid

2.  VGG-16 Convolutional Neural Network•  16-layer CNN, has won image recognition

competitions by ImageNet•  Pre-trained weights, outputs 1000 features

3.  CIFAR-10 Convolutional Neural Network•  10 convolutional layers•  Trained our own weights

Data•  Original dataset: 3777 images, 8 labels•  Data preprocessing and augmentation: •  Feature-wise centering set input mean to 0•  Random rotations, shifts, flips increased

dataset size to >20,000 images•  Rescaled to 224x224 for easier processing

•  Challenge: fish are tiny part of most images

Label: ALB BET

Baseline models•  Naïve Bayes (by occurrences of colors): 8%•  1-nearest neighbor: 97%!•  But this is because similar boats catch

similar fish. Need more complex model for greater generalizability.

Training image with SIFT keypoints

Image 2(conv-‐64), maxpool, ReLU

2(conv-‐128), maxpool, ReLU

3(conv-‐256), maxpool, ReLU

6(conv-‐512), maxpool, ReLU

2(FC-‐4096), 1(FC-‐1000)

ResultsExtrac'on Method Model Training Acc. Test Acc.