Combining Shape and Physical Models for Online Cursive Handwriting Synthesis Jue Wang (University of...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Combining Shape and Physical Models for Online Cursive Handwriting Synthesis Jue Wang (University of...
Combining Shape and Physical Models for Online Cursive Handwriting Synthesis
Jue Wang (University of Washington)Chenyu Wu (Carnegie Mellon University)Ying-Qing Xu (Microsoft Research Asia)Heung-Yeung Shum (Microsoft Research Asia)
International Journal on Document Analysis and Recognition (IJDAR) 2004
Introduction
Handwriting computing techniques (pen-based devices)
Handwriting recognition make it possible for computers to understand the
information involved in handwriting
Handwriting modulation handwriting editing, error correction,
script searching
Introduction
Handwriting Modeling & SynthesisMovement-simulation techniques
base on motor models and try to model the process of handwriting production
focus on the representation and analysis of real handwriting signals rather than handwriting synthesis
Introduction
Shape-simulation methods consider the static shape of handwriting trajectory more practical than movement-simulation tech
when dynamic information is not available straight forward approach : synthesize form
collected handwritten glyphs learning-based cursive handwriting synthesis
approach
Introduction
Successful handwriting synthesis algorithm shapes of letters vs. training samples connection between synthesized letters
A novel cursive handwriting synthesis tech Combine the advantages of the shape-simulation and
the movement-simulation methods
Outline
Sample collection and segmentation Learning strategies Synthesis Strategies Experimental results Discussion and Conclusion
Sample Collection
About 200 words Each letter has appeared more than 5 times These handwriting samples firstly pass through a low
pass filter and then be re-sampled to produce equidistant points
Sample Segmentation
OverviewSegmentation-based recognition methodRecognition-based segmentation(rely heavily on the performance of the recognition engine)Level-building
simultaneously outputs the recognition and segmentation results
segmentation and recognition are merged to give an optimal result
A Two-level Framework
Framework of traditional handwriting segmentation approachesTemporal handwriting sequence
is a low level feature that denotes the coordinate and velocity of the sequence at time t
},...,{ 1 TzzS tz
Segmentation
The segmentation problem is to find the identity string {I1,…,In}, with the corresponding segments of the sequence {S1,…,Sn}, S1= {z1,…,zt1},…, Sn={ztn-1,…, zT},that best explain the sequence
n
iiiii
nnn
n
IIpISpISpIp
IIpIISSp
II
21111
111
**1
)|()|()|()(maxarg
),...,(),...,|,...,(maxarg
},...,{
Segmentation
For the training of the writer-independent segmentation system low-level feature-based segmentation algorithm works
well for a small number of writers
A script code is calculated from handwriting data as the middle-level feature
Middle Level Feature
Five kinds of key points are extracted points of maximum/minimum x-coordinate (X+,X
-)
points of maximum/minimum y-coordinate (Y+,Y-)
crossing points ( )
Average direction of the interval sequence between two adjacent key points
Script Codes Examples
Middle Level Feature
Samples of each character are divided into several clusters those in the same cluster have a similar structural
topology Since the length of script code might not be the
same in all cases → can’t directly compute the similarity
The script code is modeled as a homogeneous Markov chain
Middle Level Feature
Given two script codes T1, T2
We may compute the stationary distributions , and transition matrix A1, A2
The similarity between two script codes is measured as
1 2
2213
12212
12211
21
)(
)],()],([2
)],(),([2
exp),(
nn
AAKLAAKL
KLKL
TTd
n
l l
ll
KL
12
11
21
)(
)(log)(
),(
Middle Level Feature
The position of , , A1, A2 are enforced symmetrically balance the variance of the KL divergence and
the difference in code length
If both the stationary distribution and the transition matrix of two script codes are matched well, and their code lengths are almost the same → d(T1, T2) is close to 1
1 2
321 ,,
Segmentation
After introducing the script code as middle-level features, the optimization problem becomes
improve the accuracy of segmentation dramatically reduce the computational complexity of
level-building
),|(),,|()|(),|()|()(maxarg
),...,(),...,|,(maxarg
),...,(),...,|,...,(maxarg
},...,{
112
1111111
11
111
**1
iiiiiii
n
iii
nn
nnn
n
ITSpITITpIIpITSpITpIp
IIpIITSp
IIpIISSp
II
Graph Model
Result
Outline
Sample collection and segmentation Learning strategies Synthesis Strategies Experimental results Discussion and Conclusion
Learning Strategies
Data alignmentTrajectory matchingTraining set alignment
Shape models
Trajectory Matching
Segmentation and reconstruction of on-line handwritten scripts (1998, Pattern Recognition)
Each piece is simple arc, points can be equidistantly sampled from it to represent the stroke
Trajectory Matching
Landmark-point-extraction method pen-down, pen-up points local extrema of curvature inflection points of curvature
A handwriting sample can be divided into as many as six pieces
The same character are mostly composed of the same number of pieces and they match each other naturally
Trajectory Matching
A handwriting sample can be represented by a point vector
s: number of static pieces segmented from the sample ni: number of points extracted from the i th piece
),,...,,(),...,,...,,{( 2111
211 1
sns
ssn xxxxxxX
)},...,,(),...,,...,,( 2111
211 1
sn
ssn s
xyyyyy
Trajectory Matching
The following is to align different vector into a common coordinate frame estimate an affine transform for each sample
that transforms the sample into the coordinate frame
Affine transformations: translation, rotation, scaling
Training Set Alignment
Iterative algorithm(Learning from one example through shared densities on transforms (IEEE CVPR 2000) )
Deformable energy based criterion is defined as
sN
ix
i
s V
XX
NE
1
2
)2
||||exp(
1log
sN
i is
XN
X1
1 sN
i is
x XXN
V1
2||||1
Training Set Alignment - Algorithm
Maintain an affine transform matrix Ui for each sample, which is set to identity initially
Compute the deformable energy-based criterion E Repeat until convergence:
For each one of the six unit affine matrixes[14], Aj, j = 1,…,6 Let Apply to the sample and recalculate the criterion E If E has been reduced, accept , otherwise: Let and apply again,
If E has been reduce, accept , otherwise revert to Ui
End
ijnewi UAU
newiU
newiU
ijnewi UAU 1
newiU
Shape Models
By modeling the distribution of aligned vectors, new examples can be generated that are similar to those in the training set
Like the Active Shape Model, principal component analysis is applied to the data (PCA)(Statistical models of appearance for computer vision, Draft report, 2000)
Shape Model
Formally, the covariance of the data is calculated as
Then the eigenvectors and corresponding eigenvalues of S are computed and sorted so that
The training set is approximated by represent the t eigenvectors
corresponding to the largest eigenvalues b is a vt-dimensional vector given by
By varying the elements in b, new handwriting trajectory can be generated from this model
apply limits of to the elements bi
Ti
s
i i XXXXs
S )()(1
11
ii
1 ii bXX
)|...||( 21 t
)( XXb T
i3
Outline
Sample collection and segmentation Learning strategies Synthesis Strategies Experimental results Discussion and Conclusion
Synthesis Strategies
Generate each individual letter in the word Then the baselines of these letters are aligned and
juxtaposed in a sequence Concatenate letters with their neighbors to form a
cursive handwriting →can’t be easily achieved
To solve this problem, a delta log-normal model based conditional sampling algorithm is proposed
Individual Letter Synthesis
Delta Log-normal Model
A powerful tool in analyzing rapid human movements With respect to handwriting generation, the movement of a
simple stroke is controlled by velocity The magnitude of the velocity is described as
(Why handwriting segmentation can be misleading?, 13th international conference on PR, 1996)
),,;(),,;()( 22202
21101 ttDttDtv
log-normal function
(on a logarithmic scale axis)
tttt
tttt
02
20
0
20 ,
2
])[ln(exp
)(2
1),,;(
i
t0: activation timeDi: amplitude of impulse commands : mean time delay :response time of the agonist and antagonist systemi
Delta Log-normal Model
The angular velocity can be expressed as
The angular velocity is calculated as the derivative of
Give , the curvature along a stroke piece is calculated as
The static shape of the piece is an arc, characterized by
t
tduuvct
0
)()( 00 : initial directionc0: constant
0
t)()( 0 tvctv
)(tv
000 )(
)()( limlim c
ttv
ttv
stc
ss
DcS ,, 2100 ,,, DDDcc (arc length)
Delta Log-normal Model-Example[Why Handwriting Segmentation Can Be Misleading, 1996 IEEE ICPR]
Conditional Sampling
First, the trajectories of synthesized handwriting letters are decomposed into static pieces
The first piece of a trajectory is called head piece, and the last piece is called the tail piece
In the concatenation process, the trajectories of letters will be deformed to produce a natural cursive handwriting,by changing the parameters of the head and the tail pieces from
tttthhhh DcSDcS ,,,,,
**,, thth StoSSS
Conditional Sampling
A deformation energy of a stroke is defined as
A concatenation energy between the i th letter and the (i+1) th letter is defined as
By minimizing the second and the third items, the two letters are forced to connect with each other smoothly and naturally
2/
*/
2/
*/
2/
*/
/ )()()( ththththththth
d DDccE
)]1()([)1,( 1 iEiEiiE hd
tdc
])1()()()([ 2****2 iciDiic httt
]||)1()([|| 23 ipip ht
Conditional Sampling
The concatenation energy of a whole word is calculated as
We must ensure that the deformed letters are consistent with models
The sampling energy is calculated as
The whole energy formulation is finally given as
1
2)1,(lN
i cc iiEE
tv
i iis bfE1
* ))3/((
1|:|
1|:|0)(
2 xx
xxf
lN
i sscc iEEE1
)(
Synthesis-Iterative Approach
Randomly generate a vector b(i) for each letter initially Generate trajectories Si of letters and calculate an affine
transform Ti for each letter (transform it to its desired position)
For each pair of adjacent letters {Si, Si+1}, deform the pieces in these letters to minimize the concatenation energy Ec(i, i+1)
Project the deformed shape into the model coordinate frame
Update the model parameters If not converged return to step 2
Experimental Results
Discussion & Conclusion
Performance is limited by samples used for training since the shape models can only generate novel shapes within the variation of training samples
Although some experimental results are shown, it is still not known how to make an objective evaluation on the synthesized scripts and compare different synthesis approaches
Markov chains
Markov chain on a space X with transitions T is a random process (infinite sequence of random variables) (x(0), x(1),…x(t),…) that satisfy
That is, the probability of being in a particular state at time t given the state history depends only on the state at time t-1
If the transition probabilities are fixed for all t, the chain is considered homogeneous
),T(),...,|p( )()1()1()1()( tttt xxxxx
T=
0.7 0.3 0
0.3 0.4 0.3
0 0.3 0.7
x2
x1 x3
0.4
0.3
0.3
0.3
0.7 0.70.3
Stationary distribution
Consider the Markov chain given above:
The stationary distribution is
T=
0.7 0.3 0
0.3 0.4 0.3
0 0.3 0.7
x2
x1 x3
0.4
0.3
0.3
0.3
0.7 0.70.3
0.33 0.33 0.33x =0.7 0.3 0
0.3 0.4 0.3
0 0.3 0.7
0.33 0.33 0.33