CSE 252C: Advanced Computer Visioncseweb.ucsd.edu/~mkchandraker/classes/CSE252C/... · •...

Lecture2:Correspondence,MetricLearning

CSE252C:AdvancedComputerVision

ManmohanChandraker

CSE252C,SP20:ManmohanChandraker

Virtualclassrooms• VirtuallecturesonZoom

– Onlyhostsharesthescreen

– Keepvideooffandmicrophonemuted

– Butpleasedospeakup(remembertounmute!)

• VirtualinteractionsonZoom

– Askandanswerplentyofquestions

– “Raisehand”featureonZoomwhenyouwishtospeak

– Postquestionsonchatwindow

– TAwillhelpkeeptrackofraisedhandsandchatwindow

• LecturesrecordedanduploadonKaltura

– Availableunder“MyMedia”onCanvas


Enrollmentlogistics• Ifonwaitlist,needtosend“Requesttoenroll” email

– Statewhetheryousatisfythelistedpre-requisitecourserequirements

– Explainwhetheryouhaverequiredbackground(courses,projects)

• Ifnotclearedtoenroll,orwhileonthewaitlist

– Youarewelcometoattendlectures

– TolimitTAworkload,wecangradeonlyenrolledstudents

• AllannouncementswillbepostedonPiazza

– SendemailtoTA(CCinstructor)ifdidnotgetPiazzainvite

• MSComps:Hostedasthefinalexamofthecourse

• 2unitsorS-U

– SendinstructoranemailandCCTAs

– Assignmentsoptional,gradedonfinalexamsandpresentationCSE252C,SP20:ManmohanChandraker

Coursedetails• Classwebpage:

– http://cseweb.ucsd.edu/~mkchandraker/classes/CSE252C/Spring2020/

• Instructoremail:

– [email protected]

• TAs:Zhengqin LiandYou-YiJau

– Emails:[email protected] [email protected]

• Grading

– 10%presentation

– 60%assignments

– 30%finalexam

• Aimistolearntogether,discussandhavefun!


Coursedetails

• “Lightning”presentations

– Fourstudentstopresentinoneclass

– Timelimit:5minutes

– Paperstobeassignedbyinstructor

– Orderofpresentation:alphabetic

– https://docs.google.com/spreadsheets/d/1VcdhEKmvDqLFfYIYh7x6oO

AqiVNfdbc6JLD5omBpXIA/edit#gid=0

• Sendpresentation1daybeforeclass

– Well-practicedandfluentpresentation

– Includenarrationifasynchronous

– Askandanswerquestionsafterpresentation


Coursedetails

• Presentationformat(1slideforeach):

1.Motivationandproblemdescription

2.Priorwork

3.Methodoverview

4.Methodanalysis

5.Experiments

6.Futureworkanddiscussion


• SuperPoint:Self-Supervised InterestPointDetection and Description

– https://arxiv.org/abs/1712.07629

• WarpNet:Weakly SupervisedMatching for Single-viewReconstruction


• AnchorNet:AWeakly Supervised Networkto Learn Geometry-Sensitive

Featuresfor SemanticMatching


• SIFTFlow:DenseCorrespondenceacrossDifferentScenes

– http://people.csail.mit.edu/celiu/ECCV2008/

PapersforWed,Apr15


Correspondence


WhyDoWeHaveTwoEyes?

Depthinformationlost

inimageformation

Binocular(stereo)vision

enablesdepthestimation


Howdoweperceivedepthinimages?

Relateprojectionsofthesamepoint intwoormoreimagesofthescene.



Relateintensitiesofimagepointsintwoormoreimages.



Usepriorinformation intheformofsemantics,function,affordance.


Visualcorrespondenceaids3Dreconstruction

Akeyproblem in3Dreconstruction: relatesimilarelementsacrossasetofimages

Geometric Photometric Semantic


Simple matching methodsInterestpoint:• Localizedposition

• Informativeaboutimagecontent

• Repeatableundervariations


Simple matching methods

W1(x,y):k xk pixelpatchinimage1


Descriptor:• Function appliedonW1 andW2,to

enablecomparing them

Interestpoint:• Localizedposition

• Informativeaboutimagecontent

• Repeatableundervariations


Simple matching methods• SSD (Sum of Squared Differences)




Simple matching methods• SSD (Sum of Squared Differences)

• NCC (Normalized Cross Correlation)

• What advantages might NCC have over SSD?

,

(Mean) (Standarddeviation)




Featuredistance:threshold

• Thedistancethresholdaffectsperformance

• Onlymatcheswithdistancelessthanthresholdareallowed

• Truepositives=numberofdetectedmatchesthatarecorrect

– Supposewewanttomaximizethese— howtochoosethreshold?

• Falsepositives=numberofdetectedmatchesthatareincorrect

– Supposewewanttominimizethese— howtochoosethreshold?

5075

200

feature distance

false match

true match


Choosingamatch:ratiotest

• Firstapproach:useSSD=

• Betterapproach:ratiodistance=

– f2 isbestSSDmatchtof1 inI2

– f2’issecondbestSSDmatchtof1 inI2

– Giveslargevalues(closeto1)forambiguousmatches.

I1 I2

f1 f2f2'


SIFT


Desirableproperty:invariance

Findfeaturesthatareinvarianttotransformations

– geometricinvariance:translation, rotation,scale

– photometric invariance:brightness, exposure,…


Idea of SIFT n For better image matching, need to develop an interest

operator invariant to scale and rotation.

n Also, need a descriptor robust to typical variations. The descriptor is the most-used part of SIFT.

CSE 252C, SP20: Manmohan Chandraker [Adaptedfrom:D.Lowe]

Overall Procedure at a High Level1. Scale-space extrema detection

2. Keypoint localization

3. Orientation assignment

4. Keypoint description

Search over multiple scales and image locations.

Fit a model to determine location and scale.Select keypoints based on a measure of stability.

Compute best orientation(s) for each keypoint region.

Use local image gradients at selected scale and rotationto describe each keypoint region.


1. Scale-space extrema detectionn Goal: Identify locations and scales that can be

reliably assigned under different views of the same scene or object.

n Method: search for stable features across multiple scales using a continuous function of scale.

n The scale space of an image is a function L(x,y,s)that is produced from the convolution of a Gaussian kernel (at different scales) with the input image.


Aside: Gaussian Smoothing

A1DGaussianwithmean0,variance1

A2DGaussianwithmean(0,0),variance1

CSE 252C, SP20: Manmohan Chandraker

Example: Gaussian Smoothed Scale Space


• Scalespaceaxioms:linearity,shiftinvariant,rotationinvariant,nospuriousextrema• Gaussianfilteruniquely satisfiesaxioms

Aside: Image Pyramids

Bottom level is the original image.

2nd level is derived from theoriginal image according tosome function

3rd level derived from 2nd level according to same function

And so on.


Aside: Gaussian PyramidAt each level, image is smoothed and reduced in size.

Bottom level is the original image.

At 2nd level, each pixel is the resultof applying a Gaussian mask tothe first level and then subsamplingto reduce the size.

And so on.

Apply Gaussian filter


Example: Subsampling with Gaussian Pre-filtering


Scale Space Pyramid


s+2 filtersss+1=2(s+1)/ss0

.

.si=2i/ss0

.

.s2=22/ss0

s1=21/ss0

s0

s+3imagesincludingoriginal

s+2differenceimages

The parameter s determines the number of images per octave.

2. Key point localization

n Detect maxima and minima of difference-of-Gaussian in scale space

n Each point is compared to its 8 neighbors in the current image and 9 neighbors each in the scales above and below For each max or min found,

output is the location andthe scale.

s+2 difference images.top and bottom ignored.s planes searched.


Difference of Gaussiansn Scale-space detection

q Find local maxima across scale/spaceq A good “blob” detector

[ T. Lindeberg IJCV 1998 ]CSE 252C, SP20: Manmohan Chandraker

3. Orientation assignment

n Create histogram of local gradient directions at selected scale

n Assign canonical orientation at peak of smoothed histogram

If 2 major orientations, use both.


Keypoint localization with orientation

832

729536

233x189keypoints

keypoints aftergradient threshold

keypoints afterratio threshold


4. Keypoint Descriptors

n At this point, each keypoint hasq locationq scaleq orientation

n Next is to compute a descriptor for the local image region about each keypoint that isq highly distinctiveq as invariant as possible to variations such as changes in

viewpoint and illumination


Normalizationn Rotate the window to standard orientation

n Scale the window size based on the scale at which the point was found.


SIFT Keypoint Descriptor(shown with 2 X 2 descriptors)

In implementation, 4x4 arrays of 8 bin histogram are used, a total of 128 features for one keypoint.


SIFT for Correspondence

[Codeandtutorial:https://www.vlfeat.org/overview/sift.html]CSE 252C, SP20: Manmohan Chandraker

SIFT for Correspondence

CSE 252C, SP20: Manmohan Chandraker[Codeandtutorial:https://www.vlfeat.org/overview/sift.html]

Uses for SIFT

n Feature points are used also for:q Panorama stitchingq 3D reconstruction q Motion trackingq Object recognitionq Indexing and database retrievalq Robot navigationq … many others


Cases where SIFT does not work

q Strong illumination changesq Large out-of-plane rotationsq Non-rigid deformations or articulationsq Semantic correspondence


LearningCorrespondence


Similar?

Image courtesy of [Vedaldi et al., VLFeat]

Measuringpatchsimilarity

• HandDesignedFeatures

- SIFT,SURF,ORB,FREAK,DAISY

• NeuralNetworks

- ComparePatches [Zagoruyko.etal.]

- MatchNet [Hanetal.]

- SeebyMoving [Agrawaletal.]

- StereoMatchingCost [Zbontaretal.]

Agrawal et al., ICCV 2015, Han et al., CVPR 2015, Zagoruyko et al., CVPR 2015, Zbontar et al., CVPR 2015

Similarity

CNN CNN

FC Layers

• GlobalOptimization

- Flow,DSP,etc.

Revaud et al., ICCV 2013, Liu et al. PAMI 2011Lowe et al., ICCV 1999, Bay et al., ECCV 2006, Rublee et al., ICCV 2011, Alahi et al., ICCV 2012, Tola et al., CVPR 2008

Measuringpatchsimilarity

SpatiallocalizationinclassificationCNNs

[Long et al., “Do Convnets Learn Correspondence?”, NeurIPS 2014]

SpatiallocalizationinclassificationCNNs

Idea:• Siamesenetworktodecidepatchsimilarity

• Useintermediateactivationsasfeatures.

Advantages:• Easytotrain

Issues:• Inefficient:extractfeaturesforoverlapping

regionswithinpatches

• O(n2)feed-forwardpassestocomparenpatchesineachimage.

[Zagoruyko andKomodakis, CVPR2015]

CNNsforlearningcorrespondence


- Learn a feature space directly optimized for correspondence

- Intermediate activations of patch similarity are surrogate features

- Mapping an image to a metric space- Metric Space: Distance relationship = Class membership

Needforametricincorrespondencelearning

MetricLearning


Metric

MetricLearning

• Ametricquantifiesdistancebetweenanytwomembersofaset

• Inducesasimilaritymeasure


Metric

• Usefulfor,say,k-nearestneighborsclassification• Foreverytrainingsample,wantk closestsamplestobefromthesameclass


Nearestneighbors


• Nearestneighborclassification:

• Assignlabelofnearesttrainingdatapointtoeachtestdatapoint

Voronoi partitioning offeaturespacefor2-category2Ddata

[FigurefromDuda etal.]

Noveltestexample

Closesttoapositiveexamplefromthe

trainingset,soclassify

itaspositive.

Black=negativeRed=positive


Nearestneighbors




• Extendtok-nearestneighborclassification:• Foranewpoint,findthekclosestpointsfromtrainingdata

• Labelsofthekpoints“vote”toclassify

k=5

Ifquery landshere,5NNconsistof3negatives

and2positives, sowe

classifyasnegative.

Black=negativeRed=positive

[FigurefromD.Lowe]CSE252C,SP20:ManmohanChandraker

Nearestneighbors






• Advantages:• Simpletoimplement

• Flexibletofeatureordistancechoices

• Candowellinpracticewithenoughrepresentativedata


Nearestneighbors






• Advantages:• Simpletoimplement

• Flexibletofeatureordistancechoices

• Candowellinpracticewithenoughrepresentativedata

• Limitations:• Largesearchproblemtofindnearestneighbors

• Storageofdata

• MustknowwehaveameaningfuldistancefunctionCSE252C,SP20:ManmohanChandraker

Metric



• Propertiesofametric

• Non-negativity:! ", $ ≥ 0• Identity:! ", $ = 0 ifandonlyif" = $• Symmetry: ! ", $ = !($, ")• Triangleinequality:! ", $ + ! $, + ≥ !(", +)

• Exampleofmetric:! ", $ =∥ " − $ ∥• Exampleofnon-metric:! ", $ = "2$2

Metric

Learning


Metric





• Sometimes,thechoiceofmetricistrivial

• Depthestimation:distancesin3D,soEuclideandistanceagoodmetric


Metric





• Sometimes,thechoiceofmetricistrivial

• Othertimes,itisnotimmediatelyclearhowtodefineametric

• Findthetwofacesthatareclosesttoeachother


Metric:distancesinfeaturespace

• Trainingdatacomposedof(image,label)pairs

• (Image1,dog),(Image2,cat),(Image3,dog),....

• Classifytestimageasdogorcat



• Representimagesasfeatures,canfinddistancebetweenfeatures

Feature1

Feature2

0




• Mapnewimagetofeaturespace,finddistances

??Feature1

Feature2





??

Feature1

Feature2





• Notalldimensionsareequallyuseful!

Feature1

Feature2

Useful

Notuseful

??

??





• Notalldimensionsareequallyuseful!

• Idea: changehowwemeasuredistance

Feature1

Feature2

Useful

Notuseful

??


Metriclearning

• Givendata,learnametricM thathelpspredictionusingadistancefunction

• Goal: FindametricM suchthat

• Samplesfromsameclass(positives)havelow01• Samplesfromdifferentclasses(negatives,impostors)havehigh01

• Approach: Poseasanoptimizationproblem

??


Metriclearning


• Createtwosets,similarpairsS anddissimilarpairsD


Metriclearning


• Createtwosets,similarpairsS anddissimilarpairsD• FindM suchthat


Metriclearning



• MinimizeanobjectivefunctionwithrespecttoM:


Metriclearning



• MinimizeanobjectivefunctionwithrespecttoM:

• Anoptimizationproblem:findstationarypoints!


Metriclearning



• Advancedvariant:

• Convexoptimization(semidefiniteprogram),efficientlysolvable!


Largemarginnearestneighbors

• Targetneighbors ofpointxi:pointsxjwithsimilarlabels

• Impostors arepointsxl closetoxi,butwithdifferentlabels• Goal: pull targetscloser,push impostorsfartheraway



• Targetneighbors ofpointxi:pointsxjwithsimilarlabels

• Impostors arepointsxl closetoxi,butwithdifferentlabels• Goal: pull targetscloser,push impostorsfartheraway

• Givenpointi,similarpointsj,impostorpointsl :



Localconstraints:directlyimproveneighbor quality




Testimage

Largemarginnearestneighbor

Euclideandistancenearestneighbor

Testimage

Largemarginnearestneighbor

Euclideandistancenearestneighbor

[Weinbergeretal.,2009]

Generalrecipeformetriclearning


LDAasmetriclearning


SiameseCNNsfordistancemetriclearning

• AdvantageofLDA:linearproblem!

• AdvantageofLMNN:convexproblem!

• Disadvantages:featurerepresentationislearnedindependentofthemetric

• AdvantageofCNNs:canoptimizefeaturesconditionedonsimilaritymeasure


SiameseCNNforpatchsimilarity

||D(x1)– D(x2)||2

Simo-Serra,E.,Trulls,E.,Ferraz,L.,Kokkinos, I.,Fua,P.andMoreno-Noguer,F.,2015.Discriminativelearningofdeepconvolutional

featurepointdescriptors.In ProceedingsoftheIEEEInternationalConferenceonComputerVision (pp.118-126).CSE252C,SP20:ManmohanChandraker

Issuewithusingthislossfortraining?

LossfunctionforSiameseCNNs

Thefinallossisdefinedas:

L=∑lossofpositivepairs+∑lossofnegativepairs



Bell,S.andBala,K.,2015.Learningvisual similarityforproductdesignwithconvolutional neuralnetworks.ACMTransactions onGraphics(TOG),34(4), p.98.

Wecanusedifferentlossfunctionsforthetwotypesofinputpairs.

• Typicalpositivepair (xp,xq)loss:L(xp,xq)=||xp – xq||2

(EuclideanLoss)



Bell,S.andBala,K.,2015.Learningvisual similarityforproductdesignwithconvolutional neuralnetworks.ACMTransactions onGraphics(TOG),34(4), p.98.

• Typicalnegativepair(xn,xq)loss:

L(xn,xq)=max(0,m2 - ||xn – xq||2) (HingeLoss)



• Combinedintoacontrastiveloss

• Forpairoftrainingexamplesx1 andx2 (withlabelsy1,y2):

L(x1,x2)=s12||x1 – x2||2 +(1– s12)max(0,m2 - ||x1 – x2||

2),

s12 =1wheny1 =y2,

otherwises12 =0.


TripletLossforMetricLearning

• Ateachtrainingiteration,sampleamini-batchoftriplets

• Goal:pushnegativeamarginfurtherfromanchorthanpositive

• Tripletloss:

Anchor Positive Negative



• Tripletloss:

• Gradients:



• Tripletloss:

• Gradients:

• Issues:• Fixedparameterm,whileintra-classdistancescanvary

• Inefficienttoconsideralltriplets


AlternativestoSiamesenetworks


CSE 252C: Advanced Computer Visioncseweb.ucsd.edu/~mkchandraker/classes/CSE252C/... · •...

Documents

Transcript of CSE 252C: Advanced Computer Visioncseweb.ucsd.edu/~mkchandraker/classes/CSE252C/... · •...