Attila Tanács - University of Novi...
Transcript of Attila Tanács - University of Novi...
Attila Tanács
Department of Image Processing and Computer Graphics
University of Szeged
Mobile platforms
o Modern smartphones
Mobile imaging
o Physical constraints, software and hardware tricks
Mobile image processing
o Software packages
o Use cases
2
3
Penetrationo Billions of potentional users
o More SIM cards than people
Usageo Communication
• Basic: voice, SMS
• Internet-based: Facebook, messengers, email, …
o Source of information• Weather, stock indices,
exchange rate, news, navigation, …
o Entertainment• Games, e-books, music,
multimedia
Feature phoneo Basic functions
• Voice calls, SMS
o Closed system
o Limited developerpossibilities
Smartphoneo Internet access
o Multitude of communicationchannels
o Expandable „knowledge”
o Open environment fordevelopers
o App stores
4
First mobile phone call (1973)o Motorola (Martin Cooper → Joel S. Engel)
Manager calculators (1980s, 1990s)o EPOC
o Personal digital assistant, no phone!
Apple Newton platform (1987-1998)o Handwriting recognition, digital assistant, no phone!
PDA era (1996-2008)o Palm, Windows CE, Windows Mobile
o Still no phone, multimedia push
Modern smartphones (2007-)o Convergence of mobile phones and PDAs
o Nokia communicators (phone with data capabilities)
o RIM Blackberry, Symbian
o Apple iPhone, Android, Windows Phone, …• Radical market change from 2007!
Tablets, phabletso Decline of PCs
5
6
Since 2013:
Android ~80%
iOS ~20%
Others ~ 0%
CPU
o Mainly ARM architecture (32 and 64-bits)
o 16 MHz → 2.3 GHz (quad and octa core)
o Dedicated GPU
Memory
o 1 MB → 1–3 GB
Display
o Resistive: stylus – old technology!
o Capacitive: multi-finger taps
o Resolution: 160x160 black-white → 4K, 2.55’’–7’’
o Autostereoscopic 3D displays
New sensors
o Camera, GPS, accelerometer, magnetic compass, gyroscope, microphone, magnetic field, light sensor, barometer,, …
7
1997: Sharp, Kyocera: initial experiments (in Japan)
2000: First phone camera
o Sharp J-SH04, 0.1 MP resolution
2000: Eyemodule Digital Camera Springboard Module for Handspring Visor
o 320x240 resolution
2002: Sony Ericsson T68i
o First MMS-capable phone (with externally attachable camera)
2002: Sanyo SCP-5300
o First camera phone in USA, VGA (0.3 MP) resolution
2003: more camera phones than digital cameras
2004: Nokia sells the most digital cameras
2006: half of the mobile phones have a camera
2008: Nokia is the biggest camera manufacturer (analogue and digital all together!)
2010: almost all phones have cameras 8
Huge social impact!o Family
• Sending photos and videos toany distances
• Instantly
o Politics• Broadcast events
• Inform news channels
o Science• Meteorology
• Tsunami
• …
Taking noteso Opening hours,
advertisements
Visual codeso Bar codes, QR codes
Photo processing and filteringo Montage, HDR, time lapse,
removing moving objects, …
Photo sharing and uploado Instagram, Twitter
Content-based image search, augmented realityo Client-server applications
Image analysiso Face detection, OCR, …
Video calls
… 9
Visual codes
o Product identification
o Business card scanning
o URL
o General text
Complex photo functions
o Smile/blink detection
o Post processing
o HDR imaging
o Panorama creation (even using sensor information)
o Taking 3D photos
10
Google Goggles (Android)
o Buildings, paintings
o Visual codes
o Business card
o Books, products
o Solving Sudoku
Google Cloud Vision API
o See later in this presentation
11
Source: Google System BlogSpot
Optical character recognition, scanning
o Scanning and recognizing text on the go
o OfficeLens
Translation
o Translating words and phrases in camera live view
o (Word Lens, CamTranslator,) Google Translate
Augmented Driving (iPhone)
o Lane detection, approaching vehicle warning
o iOnRoad
Analyzing skin pigments
o Doctor Mole
Motion detection
o Security applications
… 12
Imaging sensor
o Sensitivity (ISO)
o Resolution (MP)
o Size, type
Lens
o Shutter speed
o Focal length
• Zoom
o Aperture
Illuminance
o Amount, evenness
• → HDR (High Dynamic Range)
o Color temperature
Source: Stanford University
Larger sensor size →
o Less noise
o Better depth of field
o Better photos
o But larger lens needed!
Few examples
o 35 mm: DSLR
o APS-C: DSLR, MILC
o 4/3’’: MILC
o 1/1.8’’: top compact
o 1/2.3’’: compact, better mobile
o 1/3.2’’: ordinary mobile
Source: Engadget
In mobile devices
o Usually small sized sensors
o Comparable to average P&S cameras
Sources: Camera Image Sensor Comparison
Mobile devices
Mobile devices MILC DSLR
Smartphones
Nokia smartphone sensors
P&S cameras
Average Point&Shoot cameras
Brightness
o Amount of light getting through the lens
o Aperture (F-stop) and shutter speed play role
Focal length
o Determines the angle of view
o Fixed
• Mobile or „pancake” lens (MILC, DSLR)
• Simpler buildup → smaller lens
o Zoom
• 3x-5x: compact, MILC, DSLR
• 5x-40x: ultrazoom (compact) cameras
• Only very few smartphones have zoom!
Source: Camera Lenses Research Project
Source: http://www.digital-photography-student.com/lens-focal-length-explained/
Smartphone cameras
High enough megapixel count, but
Cheap and small sized sensors
o Low power consumption and cheap price is dominant
Weak lens
o Longer shutter speeds
• Blurry photos, especially in low light situations
o Weak flash or lack thereof
• Works acceptably only in good illuminance
o Miniaturized size
• Geometric distortions
Limited parameter settings
o Automatism, slow!
Megapixel count
alone is not a good
indicator of
imaging quality!
Advantages
o Small, always at hand, easily available
• „The best camera is the one that’s with you!” (Chase Jarvis)
o Might replace low-end compact cameras
o Photos can be edited and shared instantly
o Good quality video
Distinguishing factor for manufacturers
o OS and other phone features are mainly similar
o Manufacturers pay attention to camera capabilities
• Especially in high-end devices
19
Better sensors
o BSI, Exmor, IsoCell, Ultrapixel, …
o Megapixel war is (mainly) over!
o Dedicated processing chip
o OIS (optical image stabilization)
Camera function
o Zoom lens (thicker device)
o Faster and more precise focusing using laser light
o Dual-lens systems (with different parameters)
o Better front side camera module („selfie” camera)
Software photo processing
o HDR
o Simulate depth-of-field
o Sensor-based panorama20
Source: PixInfo
Source: iPhone Informer
Panorama creation
o Horizontal and sphere (StreetView-like) panoramas
High Dynamic Range photos
o Handling uneven lighting conditions
Artificial depth-of-field simulation (”lens blur”)
o Highlight what is important in the scene
o Refocus: interactively select area after image capture
22
Google Camera
o Taking photos using sensory help (gyroscope)
o Blending overlapping photos to eliminate blockiness
23
Source: downloadol.com
HDR (High Dynamic Range)
o Difficult illumination situations (dark and bright
regions in the image)
o Software solution
• Series of photos with different exposure values
(e.g., ±2 EV, 3-5 photos)
• Fixed position, stand still scene
• Photo fusion
• Divide the photos to little squares
• Select the one with highest information content
• Entropy
• Blending to get smooth boundaries
Images: A.A. Ghostasby,
András Osztroluczki and Erik Zajác
”Lens blur”
o Wide DoF: (almost) everything is sharp in the scene
• „Documentary”
o Shallow DoF: only objects at a certain distance interval are sharp,
others are blurred
• Photographer can show what he/she thinks is important in the scene
• Hard to obtain using mobile camera modules due to physical
constraints
Source: Google Research BlogSpot
Software solution using 1 camera (Google Camera app)
o Camera movement during exposition (small video)
• See YouTube video
o Estimating distance of object points (works for objects close to us)
o Follow-up interactive refocus using the depth map and blur strength
Source: Google Research BlogSpot
27
Source: Google Research BlogSpot
Two identical lenses
o 3D stereo – disappeared from markets
Lenses with two different focal lengths
o Dual-lens smartphone cameras are coming, and this is why we want one
o Can switch lenses to magnify more distant subjects without resorting to digital zoom
Lenses with different aperture
o Honor 6 Plus: F/2 and F/2.4 apertures
• It uses both cameras at the same time to capture several photos at different apertures
• Lets adjust the focal point and aperture after you've already taken a photo, as well as a choice of apertures — from f/0.95 to f/16
Different sensors
o RGB + Black&White: better low light performance since BW is withoutBayer filtering; hardware based DoF simulation
28
RGB-D sensor
o Color + distance
• Kinect-like sensor
o Google Project Tango
• Mapping 3D interior of rooms
• Small distances (few meters)
• No direct (sun)light
• Development kit is available (in the USA)
• Lenovo Phab 2 Pro phablet is coming (first commercial Tango device)
o Official YouTube Video
o The next mobile imaging war won't be waged over megapixels
• „Though Mountain View is focused on 3D mapping, so-called depth
camera tech could dramatically improve all the pictures you take with
your smartphone.”
Native APIs available
o Camera handling (photo, live view), bitmap I/O and display
o Accessing pixel data → mainly do-it-yourself processing!
• Warning: Do not reinvent the wheel!
Dedicated image processing libraries
o OpenCV
• http://www.opencv.org
• Open source (BSD licence) and multiplatform
• Computer vision, machine learning functionality
o FastCV
• https://developer.qualcomm.com/docs/fastcv/api/index.html
• From Qualcomm (the chip maker), mobile specific
30
>2500 optimized algorithms
o Basic algorithms
• Filtering, edge detection, histogram, Hough-transform, …
o Complex functionality
• Face detection and recognition, movement classification, object
tracking, image stitching, content-based image search, …
Multiple platforms and programming languages
o >10 years of development, numerous contributors
o C++ template interface
o Python, Matlab, Java bindings
o Windows, Linux, Mac OS, Android, iOS versions
• Architecture-specific pre-built libraries
On smartphone camera live view
o Fast processing is necessary
Applications
o Face detection (and recognition)
o Content-based image search (client-server)
o Object classification (client only)
32
Usage
o Anonimize photos (e.g., StreetView)
o Tagging photos: detecting and recognizing participants
o Security functions
o Marketing (age, man/woman, …)
o Detecting emotions
Main steps
o Detecting faces
o Search face in database (recognition)
33
Viola-Jones detector (2001)
o Very fast (real time), high accuracy
o Strong classifier built from weak classifiers (AdaBoost)
• Weak classifier ~ Somewhat better than random
o Cascading
• Fast rejection of trivial cases
o Haar-like features
• Intensity sums and differences inside rectangular regions
Publication
o Viola, P. and Jones, M. (2001). Rapid Object Detection using a
Boosted Cascade of Simple Filters. Conference on Computer Vision
and Pattern Recognition.
o Viola, P. and Jones, M. (2004). Robust Real-Time Face
Detection. International Journal of Computer Vision 57(2), 137-154.34
Haar-like features
o Rectangular regions of different sizes (windows)
o Differences of intensity sums in windows („blacks” – „whites”)
• Eyes-face (dark-bright); eyes-nose (dark-bright-dark); …
o Can be computed efficiently
• Using integral image representation
• Overwhelmingly many possible features. Which are the good ones?
35
Integral image representation
o At position (x,y) we can find the sum of the intensity values to the
„left” and „up”
• 𝑖𝑖 𝑥, 𝑦 = 𝑥′≤𝑥,𝑦′≤𝑦 𝑖(𝑥′, 𝑦′)
• Can be computed by one pass over the image
o The sum of intensity values inside rectangle ”D” can be computed
by indexing 4 array values
• 𝐷 = 𝑃4 + 𝑃1 − 𝑃2 + 𝑃3
36
Learning phase with AdaBoost
o Positive and negative annotated examples (24x24 pixel)
o Long learning time: can take even hours/days
o Cascaded: fast rejection of regions where there is no face
37
Evaluation
o Hundreds of thousands of features
o Tasks
• Determine cascades of features
• Determine feature weights inside cascades
38
= 1 (if w1x1 +… wd xd + wd+1 > 0)
o(x1, x2,…, xd-1, xd)
= -1 (otherwise)
Training set examples
39Source: Prof. Padhraic Smyth, University of California, Irvine
Source: University of Cambridge Digital Technology Group.
Available online at
http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_f
aces.zip.
Detection in images
o Size of 24x24pixels; 6060 features in 38 layers of cascade
o Scaling
• Change size of detector (not the image) 1.25x
o Position
• In moving window
o Final detection
• At overlapping detections (at least e.g., 3)
• Post processing: merge overlaps to single detection
o Computing time
• In 2003 using 700 MHz CPU: 0.067 secs for image sizes of 384 x 288
pixels
40
Results
41Source: Prof. Padhraic Smyth, University of California, Irvine
Object detection approaches
o opencv_haartraining (older versions)
o opencv_traincascade (newer versions)
• Haar-like features
• Local binary pattern (LBP) features
o Should provide negative and positive examples
o Can be used to classify other type of objects too
Face recognition
o Eigenface, Fisherfaces, and LBP histogram approaches
Image source
o Photo or video 43
Qualcomm Snapdragon for Android SDK
o Processing detected face
• Blinking, smile
• Face orientation
• Face recognition (from version 2.1)
• Blog
facereclib
o https://pypi.python.org/pypi/facereclib
o Comparing face recognition algorithms using standard databases
Many applications in Google Play
o Search for ”face detection”44
Huge amount of image data
o Cheap digital imaging
• Digital cameras, mobile devices with camera
o Internet
• Easy sharing of photos
→ Need for automatic image retrieval methods
Decades of research
o CBIR (Content-based image retrieval)
o QBE (Query by Example)
o Paper describing earlier approaches
• Ying Liua, Dengsheng Zhanga, Guojun Lua,Wei-Ying Mab. A survey of
content-based image retrieval with high-level semantics. Pattern
Recognition 40 (2007) pp. 262–282. 45
Problem
o Usually there is no direct connection between low level features and
higher order, abstract descriptions
Image database types
o Specialized
• Photos from a well defined topic
• E.g., plant leaves, flowers, cars, …
o Wide range of photos
• Heterogeneous
• E.g., photos uploaded to social networks (Flickr, FaceBook, Google
Photos, …)
46
Three levels of search
o Level 1
• Search based on low level features
• ”Find similar images to this one”
• Should provide a photo or sketch
o Level 2
• Inference based on features
• ”Find photos of flowers!”
o Level 3
• Search based on abstract attributes
• ”Find photos of happy people!”
At levels 2 and 3 it is important the compensate for the
semantic gap!
47
Approaches
o Image content + surrounding text (in web pages)
o Object antologies
o Machine learning
o Providing relevance feedback
o Semanic templates
o Bag of visual words
Deep learning
o Popular buzzword today: gives way better results than other approaches
o Multilayer nonlinear processing
o Can be supervised or non supervised
o Can learn multiple levels of abstraction (conception hierarchy)
o Hierarchical representation: Higher levels are based on lower levelconcepts
o Needs really large amuont of data! 48
Usually client-server solutions
o Mobile uploads photo, server gives result
Android applications
o PlinkArt: search for paintings (Google acquired)
o Google Goggles
• Famous buildings, paintings
• Decoding visual codes
• Business card scanning
• Image based identification of books and other comsumer products
• Sudoku solver
49
Analysing photo content
o What is visible in the image?
• Can give a list of object classes
• Can caption photos
• Google's AI is getting really good at captioning photos
https://www.engadget.com/2016/09/23/googles-ai-is-getting-really-good-
at-captioning-photos/
ImageNet
o Large database of annotated photos
• Over 10 million annotations provided by the community
• Categorized into thousands of fine-grained categories (e.g., 120
categories of dog breeds)
o Challenge
• Detect objects in photos from 1000 categories
• Earlier approaches: 25% error rate; deep learning: few percent!50
Google Cloud Vision API
o Made available in 2016
• Google opens its Vision API in public beta to give any app the power of
Google Photos
o CLOUD VISION API BETA
o Quickstart
Sample Applications
o Label Tagging Using Kubernetes; Landmark Detection Using Google
Cloud Storage; Text Detection Using the Vision API
o Object Detection Using Android Device Photos
Main features of usage
o Needs internet connection
o Can be tested for free (for larger scale use you have to pay a fee)
51
Usage
o Accessible from smartphone apps also (client-server solution)
o Gives description of image contents
• »It quickly classifies images into thousands of categories (e.g.,
"sailboat", "lion", "Eiffel Tower"),detects individual objects and faces
within images, and finds and reads printed words contained within
images.«
o Try Google Photos!
• Search for photos containing dogs, river, … in your own photo library
52
TensorFlow
o Google’s open source AI package
• Available on Github
o C++, Python API + bindings to other languages
o Desktop PC + mobile environment
• https://www.tensorflow.org/mobile/
• Android, iOS, Raspberry
Features
o After the learning stage (that can take a loooong time on PC massivelyusing GPU architecture),
o Model can be used locally on mobile camera live view (really fast)
Android examples
o https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android/
o APK is available (based on Google’s Inception model)
• Provides TF Classify, TF Detect, TF Stylize53