COMP 776: Computer Vision - Computer Sciencelazebnik/spring11/lec01_intro.pdfWhy study computer...
Transcript of COMP 776: Computer Vision - Computer Sciencelazebnik/spring11/lec01_intro.pdfWhy study computer...
TodayToday
I t d ti t t i i• Introduction to computer vision• Course overview• Course requirements• Course requirements
The goal of computer visionThe goal of computer vision
T t t “ i ” f i l• To extract “meaning” from pixels
What we see What a computer seesSource: S. Narasimhan
The goal of computer visionThe goal of computer vision
T t t “ i ” f i l• To extract “meaning” from pixels
Humans are remarkably good at this…
Source: “80 million tiny images” by Torralba et al.
Humans are remarkably good at this…
What kind of information can be extracted What kind of information can be extracted f ?f ?from an image?from an image?
M t i 3D i f ti• Metric 3D information• Semantic information
Vision as measurement deviceVision as measurement device
Real-time stereo Structure from motionReconstruction from
Internet photo collections
NASA Mars RoverNASA Mars Rover
Pollefeys et al. Goesele et al.
Object categorization
sky
building
flag
wallbannerface
bus busstreet lamp
cars slide credit: Fei-Fei, Fergus & Torralba
Scene and context categorizationtd• outdoor
• city• traffic• …
slide credit: Fei-Fei, Fergus & Torralba
Qualitative spatial information
slantedslanted
non-rigid moving objectobject
rigid moving
vertical
rigid movingrigid moving object
horizontal slide credit: Fei-Fei, Fergus & Torralba
rigid moving object
Why study computer vision?Why study computer vision?• Vision is useful: Images and video are everywhere!Vision is useful: Images and video are everywhere!
Personal photo albums Movies, news, sports
Surveillance and security Medical and scientific images
Why study computer vision?Why study computer vision?
• Vision is useful• Vision is interesting• Vision is difficult
– Half of primate cerebral cortex is devoted to visual processing– Achieving human-level visual perception is probably “AI-complete”Achieving human level visual perception is probably AI complete
Challenges or opportunities?Challenges or opportunities?
I f i b t th l l th t t f• Images are confusing, but they also reveal the structure of the world through numerous cues
• Our job is to interpret the cues!Our job is to interpret the cues!
Image source: J. Koenderink
Position and lighting cues: Cast shadowsPosition and lighting cues: Cast shadows
Source: J. Koenderink
Grouping cues: Similarity (color, texture,Grouping cues: Similarity (color, texture,proximity)proximity)p y)p y)
Grouping cues: “Common fate”Grouping cues: “Common fate”
Image credit: Arthus-Bertrand (via F. Durand)
Inherent ambiguity of the problemInherent ambiguity of the problem
M diff t 3D ld h i i t• Many different 3D scenes could have given rise to a particular 2D picture
Inherent ambiguity of the problemInherent ambiguity of the problem
M diff t 3D ld h i i t• Many different 3D scenes could have given rise to a particular 2D picture
• Possible solutions– Bring in more constraints (more images)
Use prior knowledge about the structure of the world– Use prior knowledge about the structure of the world
• Need a combination of geometric and statistical methods
Connections to other disciplinesConnections to other disciplines
Artificial Intelligence
Machine LearningRobotics
Computer Vision
Cognitive scienceNeuroscience
Computer Graphics
Image Processing
Neuroscience
Origins of computer visionOrigins of computer vision
L. G. Roberts, Machine Perception of Three Dimensional Solids,Ph.D. thesis, MIT Department of pElectrical Engineering, 1963.
Optical character recognition (OCR)
Digit recognition License plate readersg gyann.lecun.com
License plate readershttp://en.wikipedia.org/wiki/Automatic_number_plate_recognition
Sudoku grabberhttp://sudokugrab.blogspot.com/
Source: S. Seitz, N. SnavelyAutomatic check processing
Biometrics
Fingerprint scanners on many new laptops
Face recognition systems now beginning to appear more widely
htt // ibl i i /many new laptops, other devices
http://www.sensiblevision.com/
Source: S. Seitz
Automotive safety
Mobileye: Vision systems in high-end BMW, GM, Volvo models • Pedestrian collision warning• Forward collision warning• Forward collision warning• Lane departure warning• Headway monitoring and warning Source: A. Shashua, S. Seitz
Vision-based interaction: Xbox Kinect
http://electronics.howstuffworks.com/microsoft-kinect.htmhttp://blogs.howstuffworks.com/2010/11/05/how-microsoft-
http://www.xbox.com/en-US/Live/EngineeringBlog/122910-HowYouBecometheController
kinect-works-an-amazing-use-of-infrared-light/
http://www.ismashphone.com/2010/12/kinect-hacks-more-interesting-than-the-devices-original-intention.html
Vision for robotics, space exploration
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
Vision systems (JPL) used for several tasks
a low plateau where Spirit spent the closing months of 2007.
y ( )• Panorama stitching• 3D terrain modeling
Obstacle detection position tracking• Obstacle detection, position tracking• For more, read “Computer Vision on Mars” by Matthies et al.
Source: S. Seitz
The computer vision industryThe computer vision industry
A li t f i h• A list of companies here:
http://www.cs.ubc.ca/spider/lowe/vision.htmlp p
Basic InfoBasic Info• Instructor: Svetlana Lazebnik (lazebnik@cs unc edu)• Instructor: Svetlana Lazebnik ([email protected])• Office hours: By appointment, FB 244• Class webpage: http://www.cs.unc.edu/~lazebnik/spring11Class webpage: http://www.cs.unc.edu/ lazebnik/spring11
• Textbooks (suggested): F h & P C t Vi i A M d A hForsyth & Ponce, Computer Vision: A Modern ApproachRichard Szeliski, Computer Vision: Algorithms and Applications (available online)pp ( )
Course requirementsCourse requirements• Philosophy: computer vision is best experienced hands-onPhilosophy: computer vision is best experienced hands on
• Programming assignments: 50%– About four assignments– Expect the first one in the next couple of classes– Brush up on your MATLAB skills (see web page for tutorial)
• Final assignment: 30% – Recognition competitionRecognition competition– Winner gets a prize!
• Participation: 20%• Participation: 20% – Come to class regularly– Ask questions
A ti– Answer questions
Collaboration policyCollaboration policy
F l f t di i t ith h th b t di• Feel free to discuss assignments with each other, but coding must be done individually
• Feel free to incorporate code or tips you find on the Web, provided this doesn’t make the assignment trivial and you explicitly acknowledge your sourcesexplicitly acknowledge your sources
• Remember: I can Google too (and I have the copies of g ( peverybody’s assignments from the last three years this class was offered)
Course overviewCourse overview
I E l i i I f ti d iI. Early vision: Image formation and processingII. Mid-level vision: Grouping and fittingIII Multi view geometryIII. Multi-view geometryIV. RecognitionV. Advanced topicsd a ced top cs
I. Early visionI. Early vision
B i i f ti d i• Basic image formation and processing
* =
Cameras and sensorsLight and color
Linear filteringEdge detection
Light and color
Feature extraction: corner and blob detection
II. “MidII. “Mid--level vision”level vision”
Fitti d i• Fitting and grouping
Alignment
Fitting: Least squaresHough transformHough transform
RANSAC
III. MultiIII. Multi--view geometryview geometry
Stereo Epipolar geometry
Tomasi & Kanade (1993)
Projective structure from motionAffine structure from motion
IV. RecognitionIV. Recognition
Patch description and matching Clustering and visual vocabularies
Bag-of-features models ClassificationBag of features models
Sources: D. Lowe, L. Fei-Fei