Monitoring%Illegal%Fishing%through%Image%Classificaon%cs229.stanford.edu/proj2016/poster/GeislerFrostMahajan-Monitoring... ·...
Transcript of Monitoring%Illegal%Fishing%through%Image%Classificaon%cs229.stanford.edu/proj2016/poster/GeislerFrostMahajan-Monitoring... ·...
Modeling
SIFT
Fisher Encoding (FE) (GMM Model on SIFT to construct
a visual dic:onary)
Linear SVM
Bag of Visual Words (K-‐Means on SIFT)
Linear SVM
CIFA
R-‐10
CNN
VGG-‐16 Random Forest (15 trees)
SVM (RBF, linear)
Monitoring Illegal Fishing through Image Classifica:on Jason Frost
[email protected] Taylor Geisler
[email protected] Antariksh Mahajan [email protected]
ProblemNature Conservancy, an NGO, monitors illegal fishing by installing cameras on fishing boats. Image classification using computers can help them to speed up and automate the process. We built a multiclass classification system for 8 different species.
Feature Extraction
1. Scale-Invariant Feature Transform (SIFT)• Identifies keypoints (shapes with
directional orientations)• Invariant to rotations and translations• Dense SIFT (dSIFT) runs SIFT on a grid
2. VGG-16 Convolutional Neural Network• 16-layer CNN, has won image recognition
competitions by ImageNet• Pre-trained weights, outputs 1000 features
3. CIFAR-10 Convolutional Neural Network• 10 convolutional layers• Trained our own weights
Data• Original dataset: 3777 images, 8 labels• Data preprocessing and augmentation: • Feature-wise centering set input mean to 0• Random rotations, shifts, flips increased
dataset size to >20,000 images• Rescaled to 224x224 for easier processing
• Challenge: fish are tiny part of most images
Label: ALB BET
Baseline models• Naïve Bayes (by occurrences of colors): 8%• 1-nearest neighbor: 97%!• But this is because similar boats catch
similar fish. Need more complex model for greater generalizability.
Training image with SIFT keypoints
Image 2(conv-‐64), maxpool, ReLU
2(conv-‐128), maxpool, ReLU
3(conv-‐256), maxpool, ReLU
6(conv-‐512), maxpool, ReLU
2(FC-‐4096), 1(FC-‐1000)
ResultsExtrac'on Method Model Training Acc. Test Acc.
VGG-‐16 RF 99.5% 83.4%
SVM-‐RBF 44.3% 45.6% SVM-‐Linear 54.1% 53.3%
-‐ CIFAR-‐10 95.6% 77.3%
Conclusions• VGG-16 to RF performed best• However, difficult to extract most important
features from VGG-16• Better and more generalizable results can
be obtained by extracting fish from images, possibly using another CNN