Online Graphics Recognition: State-of-the-Art

Online Graphics Recognition: State-of-the-Art

Liu WenyinDept of Computer Science, City University of Hong Kong

[email protected]://www.cs.cityu.edu.hk/~liuwy/

Outline

Introduction: Problem and MotivationPrimitive Shape RecognitionComposite Graphic Object Recognition Document Recognition and UnderstandingPerformance Evaluation and User StudySummaryOpen problems

The Problem

Input graphic objects by freehand sketchingon tablet

Instant and continuous recognitionwhen the strokes are addeddetermine type/class & parameters (& semantics)(optionally) beautify appearance find (due to feedback) and correct error earlier

Similar problem: handwriting

Motivation

Driven by Pen-based interaction devicesUbiquitous computingNatural, humanistic, & convenient UI

TabletPC heating up the market of handwriting software

e.g., Microsoft and IBM are developing their own handwriting software

Motivation: Why Sketch-based?

Current UI for Graphics InputMouse clicks on menus & buttons

Disadvantages of Mouse and MenusInconvenient: too many clicks; hard to remember/findUnnatural: interruptive, unsuitable for quick tasks/ideasUnsuitable for small screen devices

Graphics Input UI for Creative Design by SketchingSketching with a pen is a mode of informal, perceptual, direct interaction: especially good for creative design tasksContinuous, non-interruptive, natural ways for interactionProductivity, instant feedback for quick correction/position

Sketch-based UI: Design Principles

Humanistic Same way as human: informal, fast/instant feedback, ambiguity, creativity...Machine adapts to human, and not vice versa

Efficient and IntelligentGuess user’s intention and do accordinglySupported by graphics recognition: sketchy to regular

PersonalizedLearn user’s drawing styles, habits, preferences from user actions (as virtual feedback)

Graphic Objects

Visual Token

Primitive Shape

Composite Graphic Object

Stroke

Graphic Document

Text

1..m

2..m

m

Graphic Objects

Stroke--trajectory of a pen movement during touching a tabletminimal unit of user input, represented by a chain of points

Primitive Shape--simple shape: closed/open, single/multi-strokeslimited # of classes: e.g., triangles, ellipses, or line/arc segments

Composite Graphic Object--consisting of 2+ primitive shapesassumption: input components of one object consecutively & adjacently

Visual Token--component of a graphic documentjust like words in a text documentcomposite graphic objects or free/single primitive shapes

Graphic Document--complete document for a purposecomposed of visual tokenssemantics: defined by the tokens, their parameters & spatial relations

Recognition Tasks/Stages

1. Primitive shape (or stroke) recognition: simultaneous recognition immediate recognition

2. Composite graphic object recognition:simultaneous recognition immediate recognition

3. Document recognition and understanding: simultaneous recognitionimmediate recognition

Primitive/Stroke Recognition

Determine type & parameters (pos, size, & orientation)Regularize/beautify to the most common form (params)

because it is intents of most users

In most cases, immediate recognition:recognize after it is completely input

Single or multiple strokeslink multiple strokes first

Gestures: visual commands (e.g., undo/re-do, remove)represented by primitive shapes & need this level recognition

Examples of Editing Gestures

SILK (Landay and Myers, IEEE Computer 2001)

Different groups developing different set of gesturesStandardization is necessary

Simultaneous Recognition

Fluid Sketches (Arvo and Novins, UIST2000)

while a freehand stroke is being inputguess/suggest what the user is intending to drawsimultaneous or immediate feedback

Fluid Sketches (Arvo & Novins)

Primitive Recognition: 4 Stages

Stroke curve pre-processing

Shape classification

Shape fitting

Shape regularization/beautification

Stroke Pre-Processing: Problem

Input: a freehand stroke

Output: a refined polyline

Requirement: the output is similar to the input freehand stroke but with some necessary perfection (noise reduction)Link multiple strokes or segment a single stroke

if necessary

Pre-Processing

Polygonal approximation with ε = 1.0 pixel

Polygonal approximation with ε = 5.0 pixels

Hooklet

Circlet

The sketchy line before processing

The sketchy lineafter processing

The sketchy linebefore processing

After pullingthe end points

After deletingextra points

Shape Classification: Problem

Input: the refined polyline

Output: the type id of a basic shape class: e.g., line, triangle, quadrangle, pentagon, hexagon,ellipse, or free curve

Requirement: correctness: the type (output) is of the user’s intent

Shape Classification

Based on featuresextracted from the stroke’s vector polyline (or image)represent the stroke

Many pattern recognition methods can be used:Rule-Based ApproachesNeural-Network-Based ApproachesSVM-Based Approaches

e.g., Ernesto Tapia and Raul Rojas (ICDAR 2003)

etc.

Features Used in Recognition

Corners can be found byspeed (Davis 2002; Calhoun et al. 2002)curvature

Turning angle functions (Arkin et al. 1991)Attraction force model (Jin et al. PG2002)Stroke order and direction

especially for composite objectsDomain-specific or independent knowledge

Corners: Speed & Curvature

Davis (2002)Calhoun et al. (2002)

Turning Angle Functions

Arkin et al. (IEEE T-PAMI 1991)

v

O

10

v

s

T (s)

v+2p

Jin, Liu, Sun & Sun (2002)Inner angle of attracted pointInner angle of the attracting pointDistance between the two points

Attraction Force Model

A

B

C

A

B

(a)

A

B

C

(b) (c) (d)C

B

C

),(βα),( 2 BADis

BAf =

Decision Making

Rule-based: # of corners (or vertices)

Construction of ClassifiersSVM (can be used for incremental learning)

One-against-one structure: n(n-1)/2 classifiersOne-against-all structure: n classifiersMax-win scheme

Training with samplesNeural Network

Shape Fitting: Problem

Input: the type idthe stroke (original and refined polyline)

Output: the fitted shape (characterized by parameters)

Requirement: the output has the lowest average distance to the input stroke

Shape Fitting

(a) (b)

(c) (d)

the axis orientation

the axis orientation

x

y

the center point

(a) (b)

Polygonal Fitting:

Ellipse Fitting:

Shape Regularization: Problem

Input: the fitted shape

Output: the regularized shape (characterized by parameters)

Requirement: the output is similar to the original freehand stroke but also appears in its most beautiful form:

e.g., conforming as much as possible to connectedness, perpendicularity, congruence, and symmetry, intended by the user.

Also referred to as beautification (Igarashi et al. 1997)

Shape Regularization

Inner-Shape Regularization

Inter-Shape Regularization

Inner-Shape Regularization

Equilateral RectificationEdges and axes

Parallelism RectificationEdges

Special Angle Rectification90, 30, 45, 60, 120, etc.

Horizontal/Vertical RectificationEdges, axes, diagonals

Fitted Shape

circle

ellipsetriangle

equilateral triangle

isosceles triangle

parallelogram

quadrangle

diamond

square

right triangle

rectangle

trapezoid

Inner-Shape Rectification Rules

Inter-Shape Rectification

Affected by neighbors in a documentSize Rectification

Position/orientation Rectification

AlignmentIntersectionTangencyConcentric…

Composite Object Recognition: Problem

After recognizing the current strokeCombine the current shape with previous onesBased on their sequential & spatial relationship

assumptions: consisting of 2+ primitive shapesinput components consecutively & adjacently

Determine or predict the type & parameters of the composite objectRegularization or beautification

Composite Object Recognition: Approaches

Classifier-based approaches: decision tree Fonseca and Jorge (2000): fuzzyPeng, Sun, Liu, & Cong (GREC2003)

Similarity-based approachesBased on similarities of component and constraintsRepresentation: ARG(Li 2000), RAG(Lladós 2001)Relational distance metric (Shapiro 1993)Directional shape similarity (Liu et al. 2001)Ernesto Tapia and Raul Rojas (GREC2003)

Directional Composite Similarity

Principles for composite similarity metricsPartial (for partial match)Structural/topologicalStroke-number freeStroke-order free

Used in our demo system (SmartSketchpad)

Scenario for Composite Input

for partial input match:

Document Recognition and Understanding: Problem

Analyze the connections and relationshipamong the elements

Obtain and represent the semantics in current (part/whole) drawing as one document

Beautify and re-display it into a neat layoutComparison to offline document recognition

similar to engineering drawings but more cursive for sketches more regular for engineering drawings

Document Recognition and Understanding: Applications

Mainly for quick design2D diagrams:

GUI: Landy & Myers (2001), Caetano et al. (2002) UML diagrams: Blostein et al. (2002) …

3D object input: Igarashi et al. (SIGGraph1999): TEDDYLipson and Shpitalni (2002)Hsu and Lee (1994): 2.5D animations

Sketch Input for a Dog in Animation

Fabian Di Fiore & Frank Van Reeth (2002)A Multi–Level Sketching Tool for “Pencil&Paper” Animation

Document Recognition and Understanding Approaches

Gross (1994, 1996): Sketch-A-Sketchdetect & maintain spatial relationship (constraints)

represented as binary predicates: • e.g., “concentric”, “contains”, “connects”, “overlap”

by the bounding box, size, & starting-ending pointsby-product: learn composite objects from examples

Pinto-Albuquerque et al. (2000): DocSketchsyntax as a fuzzy relational adjacency grammarvisual syntax analyzer

Prototype Systems

ASSIST (Alvarado and Davis IJCAI2001), MITSketchIT: Stahovich (1996)…, MIT, CMU SILK: Landay & Myers (2001), UC Berkeley & CMUTeddy: Igarashi et al. (SIGGraph1999), U-TokyoTivoli: Pedersen et al. (CHI1993), Xerox PARCEsQUIsE: Pierre Leclercq (GREC2003)SmartSketchpad: Liu et al. (2001, 2002) …

ASSIST

A Shrewd Sketch Interpretation & Simulation ToolMIT AI LabChristine Alvarado’s Master Thesis (2000)Alvarado and Davis (IJCAI-2001)Each new stroke triggers three stage process:

Recognition: generate all possible interpretations primitive stroke recognition & composite object (device) recognition

Reasoning: score each interpretationResolution: select the current best consistent interpretation

Gesture recognition: arrows and pointing

SketchIT

Conceptual Design for CAD (mechanical engineering)instead of precise design

Thomas F. Stahovich (1996), MIT PhD ThesisStahovich, Davis & Shrobe (AAAI-1997)

QC-space for representing interaction among mechanical parts

Calhoun, Stahovich, et al. (2002)semantic network based recognizer for multi-strokes

Stahovich, Davis & Shrobe (AI-1998)generate multiple new designs from a sketch

Experiments of Primitive Shape Recognition

Database of Composite ObjectsIn this experiment, we created 97 composite graphic objects. All these objects are composed of less than ten primitive shapes. The weights and thresholds we used areε =20, w1=0.4, w2=0.3, w3=0.3, k1=k2=0.5.

We randomly selected 10 objects (whose ID is 73, 65, 54, 88, 22, 5, 12, 81, 18, and 76) and draw these objects as queries. In most cases, the intended object will appear in the smart toolbox (ranked in the first 10) after only a few components are drawn.

33

124 1 10

20

40

60

80

100

2 3 4 5 6

61

38

25 25

1 10

20

40

60

80

100

2 3 4 5 6 7

Object 73 Object 54

30

2 1 10

20

40

60

80

100

2 3 4 5

25

102 1 10

20

40

60

80

100

2 3 4 5 6

Object 76 Object 5

User Study and Performance Evaluation

Sketches for Evaluating Different UIs

(a) sketch1 (b) sketch2

Drawing Time for Sketch 1 (s) Drawing Time for Sketch 2 (s) #User ID

Sketch-based Traditional Sketch-based Traditional 1 104 125 157 190 2 93 99 151 288 3 69 98 156 294 4 59 81 122 156 5 64 178 135 191 6 63 100 85 231 7 61 120 92 203 8 72 91 119 195 9 70 70 156 252 10 78 110 125 201

Average 73.3 107.2 129.8 220.1

Drawing Time for Different Sketches Using Different UIs

Open Problems

Complex editing gestures recognition and editing-related applications

up to several hundred different gestures Composite object recognition for large object set: for graphics input

up to 10,000 master objects in MS VISIOuser and domain adaptation

Semantic level understanding for creative/conceptual design

reasoning & prediction of the user’s intentions

Where to Find Papers?

2002 AAAI Spring Symposium Series--Sketch Understanding

Chairs: Tom Stahovich, James Landay, Randy DavisOnline Proc.: http://automatix.inesc.pt/sketch02/

ACM Annual Conference on Human Factors in Computing Systems (SIGCHI)ACM Annual Symposiums on User Interface Software and Technology (UIST)GREC and ICDAR SIGGRAPH

Summary

Brief survey of online graphics recognition Problems, Approaches, and Applications Supportive for pen-based UI

improve user productivity convenient for creative tasks: quick design ideas users unanimously prefer the sketch-based UI

Thank You!Contact Liu Wenyin [email protected]

See some of my research work athttp://www.cs.cityu.edu.hk/~liuwy/The survey paper can be found at

http://www.cs.cityu.edu.hk/~liuwy/publications/GREC2003_LNCS.pdf

Online Graphics Recognition: State-of-the-Art

Documents

Transcript of Online Graphics Recognition: State-of-the-Art