Online Graphics Recognition: State-of-the-Art
Transcript of Online Graphics Recognition: State-of-the-Art
Online Graphics Recognition: State-of-the-Art
Liu WenyinDept of Computer Science, City University of Hong Kong
[email protected]://www.cs.cityu.edu.hk/~liuwy/
Outline
Introduction: Problem and MotivationPrimitive Shape RecognitionComposite Graphic Object Recognition Document Recognition and UnderstandingPerformance Evaluation and User StudySummaryOpen problems
The Problem
Input graphic objects by freehand sketchingon tablet
Instant and continuous recognitionwhen the strokes are addeddetermine type/class & parameters (& semantics)(optionally) beautify appearance find (due to feedback) and correct error earlier
Similar problem: handwriting
Motivation
Driven by Pen-based interaction devicesUbiquitous computingNatural, humanistic, & convenient UI
TabletPC heating up the market of handwriting software
e.g., Microsoft and IBM are developing their own handwriting software
Motivation: Why Sketch-based?
Current UI for Graphics InputMouse clicks on menus & buttons
Disadvantages of Mouse and MenusInconvenient: too many clicks; hard to remember/findUnnatural: interruptive, unsuitable for quick tasks/ideasUnsuitable for small screen devices
Graphics Input UI for Creative Design by SketchingSketching with a pen is a mode of informal, perceptual, direct interaction: especially good for creative design tasksContinuous, non-interruptive, natural ways for interactionProductivity, instant feedback for quick correction/position
Sketch-based UI: Design Principles
Humanistic Same way as human: informal, fast/instant feedback, ambiguity, creativity...Machine adapts to human, and not vice versa
Efficient and IntelligentGuess user’s intention and do accordinglySupported by graphics recognition: sketchy to regular
PersonalizedLearn user’s drawing styles, habits, preferences from user actions (as virtual feedback)
Graphic Objects
Visual Token
Primitive Shape
Composite Graphic Object
Stroke
Graphic Document
Text
1..m
2..m
m
Graphic Objects
Stroke--trajectory of a pen movement during touching a tabletminimal unit of user input, represented by a chain of points
Primitive Shape--simple shape: closed/open, single/multi-strokeslimited # of classes: e.g., triangles, ellipses, or line/arc segments
Composite Graphic Object--consisting of 2+ primitive shapesassumption: input components of one object consecutively & adjacently
Visual Token--component of a graphic documentjust like words in a text documentcomposite graphic objects or free/single primitive shapes
Graphic Document--complete document for a purposecomposed of visual tokenssemantics: defined by the tokens, their parameters & spatial relations
Recognition Tasks/Stages
1. Primitive shape (or stroke) recognition: simultaneous recognition immediate recognition
2. Composite graphic object recognition:simultaneous recognition immediate recognition
3. Document recognition and understanding: simultaneous recognitionimmediate recognition
Primitive/Stroke Recognition
Determine type & parameters (pos, size, & orientation)Regularize/beautify to the most common form (params)
because it is intents of most users
In most cases, immediate recognition:recognize after it is completely input
Single or multiple strokeslink multiple strokes first
Gestures: visual commands (e.g., undo/re-do, remove)represented by primitive shapes & need this level recognition
Examples of Editing Gestures
SILK (Landay and Myers, IEEE Computer 2001)
Different groups developing different set of gesturesStandardization is necessary
Simultaneous Recognition
Fluid Sketches (Arvo and Novins, UIST2000)
while a freehand stroke is being inputguess/suggest what the user is intending to drawsimultaneous or immediate feedback
Fluid Sketches (Arvo & Novins)
Primitive Recognition: 4 Stages
Stroke curve pre-processing
Shape classification
Shape fitting
Shape regularization/beautification
Stroke Pre-Processing: Problem
Input: a freehand stroke
Output: a refined polyline
Requirement: the output is similar to the input freehand stroke but with some necessary perfection (noise reduction)Link multiple strokes or segment a single stroke
if necessary
Pre-Processing
Polygonal approximation with ε = 1.0 pixel
Polygonal approximation with ε = 5.0 pixels
Hooklet
Circlet
The sketchy line before processing
The sketchy lineafter processing
The sketchy linebefore processing
After pullingthe end points
After deletingextra points
Shape Classification: Problem
Input: the refined polyline
Output: the type id of a basic shape class: e.g., line, triangle, quadrangle, pentagon, hexagon,ellipse, or free curve
Requirement: correctness: the type (output) is of the user’s intent
Shape Classification
Based on featuresextracted from the stroke’s vector polyline (or image)represent the stroke
Many pattern recognition methods can be used:Rule-Based ApproachesNeural-Network-Based ApproachesSVM-Based Approaches
e.g., Ernesto Tapia and Raul Rojas (ICDAR 2003)
etc.
Features Used in Recognition
Corners can be found byspeed (Davis 2002; Calhoun et al. 2002)curvature
Turning angle functions (Arkin et al. 1991)Attraction force model (Jin et al. PG2002)Stroke order and direction
especially for composite objectsDomain-specific or independent knowledge
Corners: Speed & Curvature
Davis (2002)Calhoun et al. (2002)
Turning Angle Functions
Arkin et al. (IEEE T-PAMI 1991)
v
O
10
v
s
T (s)
v+2p
Jin, Liu, Sun & Sun (2002)Inner angle of attracted pointInner angle of the attracting pointDistance between the two points
Attraction Force Model
A
B
C
A
B
(a)
A
B
C
(b) (c) (d)C
B
C
),(βα),( 2 BADis
BAf =
Decision Making
Rule-based: # of corners (or vertices)
Construction of ClassifiersSVM (can be used for incremental learning)
One-against-one structure: n(n-1)/2 classifiersOne-against-all structure: n classifiersMax-win scheme
Training with samplesNeural Network
Shape Fitting: Problem
Input: the type idthe stroke (original and refined polyline)
Output: the fitted shape (characterized by parameters)
Requirement: the output has the lowest average distance to the input stroke
Shape Fitting
(a) (b)
(c) (d)
the axis orientation
the axis orientation
x
y
the center point
(a) (b)
Polygonal Fitting:
Ellipse Fitting:
Shape Regularization: Problem
Input: the fitted shape
Output: the regularized shape (characterized by parameters)
Requirement: the output is similar to the original freehand stroke but also appears in its most beautiful form:
e.g., conforming as much as possible to connectedness, perpendicularity, congruence, and symmetry, intended by the user.
Also referred to as beautification (Igarashi et al. 1997)
Shape Regularization
Inner-Shape Regularization
Inter-Shape Regularization
Inner-Shape Regularization
Equilateral RectificationEdges and axes
Parallelism RectificationEdges
Special Angle Rectification90, 30, 45, 60, 120, etc.
Horizontal/Vertical RectificationEdges, axes, diagonals
Fitted Shape
circle
ellipsetriangle
equilateral triangle
isosceles triangle
parallelogram
quadrangle
diamond
square
right triangle
rectangle
trapezoid
Inner-Shape Rectification Rules
Inter-Shape Rectification
Affected by neighbors in a documentSize Rectification
Position/orientation Rectification
AlignmentIntersectionTangencyConcentric…
Composite Object Recognition: Problem
After recognizing the current strokeCombine the current shape with previous onesBased on their sequential & spatial relationship
assumptions: consisting of 2+ primitive shapesinput components consecutively & adjacently
Determine or predict the type & parameters of the composite objectRegularization or beautification
Composite Object Recognition: Approaches
Classifier-based approaches: decision tree Fonseca and Jorge (2000): fuzzyPeng, Sun, Liu, & Cong (GREC2003)
Similarity-based approachesBased on similarities of component and constraintsRepresentation: ARG(Li 2000), RAG(Lladós 2001)Relational distance metric (Shapiro 1993)Directional shape similarity (Liu et al. 2001)Ernesto Tapia and Raul Rojas (GREC2003)
Directional Composite Similarity
Principles for composite similarity metricsPartial (for partial match)Structural/topologicalStroke-number freeStroke-order free
Used in our demo system (SmartSketchpad)
Scenario for Composite Input
for partial input match:
Document Recognition and Understanding: Problem
Analyze the connections and relationshipamong the elements
Obtain and represent the semantics in current (part/whole) drawing as one document
Beautify and re-display it into a neat layoutComparison to offline document recognition
similar to engineering drawings but more cursive for sketches more regular for engineering drawings
Document Recognition and Understanding: Applications
Mainly for quick design2D diagrams:
GUI: Landy & Myers (2001), Caetano et al. (2002) UML diagrams: Blostein et al. (2002) …
3D object input: Igarashi et al. (SIGGraph1999): TEDDYLipson and Shpitalni (2002)Hsu and Lee (1994): 2.5D animations
Sketch Input for a Dog in Animation
Fabian Di Fiore & Frank Van Reeth (2002)A Multi–Level Sketching Tool for “Pencil&Paper” Animation
Document Recognition and Understanding Approaches
Gross (1994, 1996): Sketch-A-Sketchdetect & maintain spatial relationship (constraints)
represented as binary predicates: • e.g., “concentric”, “contains”, “connects”, “overlap”
by the bounding box, size, & starting-ending pointsby-product: learn composite objects from examples
Pinto-Albuquerque et al. (2000): DocSketchsyntax as a fuzzy relational adjacency grammarvisual syntax analyzer
Prototype Systems
ASSIST (Alvarado and Davis IJCAI2001), MITSketchIT: Stahovich (1996)…, MIT, CMU SILK: Landay & Myers (2001), UC Berkeley & CMUTeddy: Igarashi et al. (SIGGraph1999), U-TokyoTivoli: Pedersen et al. (CHI1993), Xerox PARCEsQUIsE: Pierre Leclercq (GREC2003)SmartSketchpad: Liu et al. (2001, 2002) …
ASSIST
A Shrewd Sketch Interpretation & Simulation ToolMIT AI LabChristine Alvarado’s Master Thesis (2000)Alvarado and Davis (IJCAI-2001)Each new stroke triggers three stage process:
Recognition: generate all possible interpretations primitive stroke recognition & composite object (device) recognition
Reasoning: score each interpretationResolution: select the current best consistent interpretation
Gesture recognition: arrows and pointing
SketchIT
Conceptual Design for CAD (mechanical engineering)instead of precise design
Thomas F. Stahovich (1996), MIT PhD ThesisStahovich, Davis & Shrobe (AAAI-1997)
QC-space for representing interaction among mechanical parts
Calhoun, Stahovich, et al. (2002)semantic network based recognizer for multi-strokes
Stahovich, Davis & Shrobe (AI-1998)generate multiple new designs from a sketch
Experiments of Primitive Shape Recognition
Database of Composite ObjectsIn this experiment, we created 97 composite graphic objects. All these objects are composed of less than ten primitive shapes. The weights and thresholds we used areε =20, w1=0.4, w2=0.3, w3=0.3, k1=k2=0.5.
We randomly selected 10 objects (whose ID is 73, 65, 54, 88, 22, 5, 12, 81, 18, and 76) and draw these objects as queries. In most cases, the intended object will appear in the smart toolbox (ranked in the first 10) after only a few components are drawn.
33
124 1 10
20
40
60
80
100
2 3 4 5 6
61
38
25 25
1 10
20
40
60
80
100
2 3 4 5 6 7
Object 73 Object 54
30
2 1 10
20
40
60
80
100
2 3 4 5
25
102 1 10
20
40
60
80
100
2 3 4 5 6
Object 76 Object 5
User Study and Performance Evaluation
Sketches for Evaluating Different UIs
(a) sketch1 (b) sketch2
Drawing Time for Sketch 1 (s) Drawing Time for Sketch 2 (s) #User ID
Sketch-based Traditional Sketch-based Traditional 1 104 125 157 190 2 93 99 151 288 3 69 98 156 294 4 59 81 122 156 5 64 178 135 191 6 63 100 85 231 7 61 120 92 203 8 72 91 119 195 9 70 70 156 252 10 78 110 125 201
Average 73.3 107.2 129.8 220.1
Drawing Time for Different Sketches Using Different UIs
Open Problems
Complex editing gestures recognition and editing-related applications
up to several hundred different gestures Composite object recognition for large object set: for graphics input
up to 10,000 master objects in MS VISIOuser and domain adaptation
Semantic level understanding for creative/conceptual design
reasoning & prediction of the user’s intentions
Where to Find Papers?
2002 AAAI Spring Symposium Series--Sketch Understanding
Chairs: Tom Stahovich, James Landay, Randy DavisOnline Proc.: http://automatix.inesc.pt/sketch02/
ACM Annual Conference on Human Factors in Computing Systems (SIGCHI)ACM Annual Symposiums on User Interface Software and Technology (UIST)GREC and ICDAR SIGGRAPH
Summary
Brief survey of online graphics recognition Problems, Approaches, and Applications Supportive for pen-based UI
improve user productivity convenient for creative tasks: quick design ideas users unanimously prefer the sketch-based UI
Thank You!Contact Liu Wenyin [email protected]
See some of my research work athttp://www.cs.cityu.edu.hk/~liuwy/The survey paper can be found at
http://www.cs.cityu.edu.hk/~liuwy/publications/GREC2003_LNCS.pdf