Incremental Sub-Trajectory Clustering of Large Moving Object Databases
Trajectory clustering - Traclus Algorithm
-
Upload
ivan-sanchez-vera -
Category
Technology
-
view
188 -
download
2
Transcript of Trajectory clustering - Traclus Algorithm
Trajectory Clustering BASED ON: TRAJECTORY CLUSTERING: A PARTITION -AND-GROUP FRAMEWORK
EDITED BY: IVAN SANCHEZ
BY: JAE-GIL LEE
J IAWEI HAN
KYU-YOUNG WHANG
EDUCATIONAL SLIDES ON TRACLUS, AN ALGORITHM FOR CLUSTERING TRAJECTORY DATA CREATED BY JAE -GIL LEE, J IAWEI HAN AND KYU-YOUNG WANG, PUBLISHED ON SIGMOD’07 .
http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf
Objective To group similar trajectories together (cluster).
Trajectory define a set of multidimensional points Tr = p1, p2, p3… pn.
A point is d-dimensional entity.
Most Approaches take in consideration only complete trajectories, thus missing valuable information on common Subtrajectories.
Input: Set of trajectories S = (Tr1, Tr2, Tr3….Tri…TrnumTraj)
Output: Cluster of Trajectories C = (C1, C2 … CnumClusters) where each cluster contains ε or more trajectories.
◦ Ε is a threshold that determines the minimum number of trajectories to create a cluster.
◦ Each cluster is composed by a set of trajectories. E.g. C1 = (Tr3, Tr9… Trc1max).
Approaches DBScan
◦ Uses density clustering
◦ Works only on entire trajectories
Partition and Group ◦ Also uses density-based clustering (help to
discover clusters of arbitrary shape and to filter out noise-outliers).
◦ Can discover common subtrajectories.
Partition and Group Framework 2 phased: Partition and Grouping
Additionally calculates a representative trajectory per cluster.
Discover Common Subtrajectories
TRACLUS Algorithm. ◦ Partition trajectories into segments. O(n)
◦ Where n is the number of trajectories.
◦ Group similar segments together (clustering). O(n log n) ◦ Where n is the number of segments
◦ Calculate representative trajectory per cluster. O(n) ◦ Where n is the number of trajectories.
A trajectory can belong to multiple clusters.
Overview
Overview
Partition Phase Partition a trajectory in a set of Segments.
A trajectory partition is a line segment pipj where i<j and both points belong to the same trajectory.
Groups similar line segments together
This allows to find common subtrajectories.
All segments from all trajectories are inserted into a common set D.
Time complexity O(n) where n is the number of points on a trajectory.
How to partition a trajectory? Characteristic Points: Points where the trajectory changes rapidly
From a Trajectory Tr: p1,p2,p3…pj…plen determine a set of characteristic points {pc1,pc2,pc3,…,pcPart}.
The trajectory is partitioned a every characteristic point, and each partition is represented by a line segment between two consecutive partition points.
Line segment = Trajectory partition.
How to optimally partition a trajectory? Properties:
◦ Preciseness: Difference between a trajectory and a set of its trajectory partitions should be as small as possible.
◦ Conciseness: Number of trajectory partitions should as small as possible.
Balance Preciseness and Conciseness using MDL (minimum description length).
Best Hypothesis H to explain D is the one that minimizes the sum of L(H) and L(D|H). ◦ L(H): Sum of length of all trajectory partitions. Measures conciseness.
◦ L(D|H): Sum of the difference between a trajectory and a set of its trajectory partitions. Measures Preciseness.
◦ This can be costly so it is approximated by a local Optima, such that MDLpart(pi,pj)<=MDLnopart(pi,pj).
Time Complexity O(n).
How to optimally partition a trajectory?
Distance Measure Based on the projection of points of one segment over the other.
3 components: ◦ Perpendicular Distance: (Lehmer mean of order 2) between to line segments.
◦ It is the Euclidean distance between the projected points of one trajectory (over the other) and the original points that generated the projection.
◦ Parallel Distance: Is the minimum distance of the projected points and the points of the segment over which the projection was made.
◦ Angle Distance: Smallest intersecting angle between the segments. Helps to measure trajectories with direction.
Distance measure can be easily calculated with vector operations.
The overall distance between two segments is given by the sum of the 3 components.
Distance Measure
Clustering Phase Line segments of the same cluster are close to each other according to a distance measure.
Use Density-Based clustering as in DBSCAN.
Being D is the set of all line segments:
Density-Based Clustering
Clustering Algorithm 2 Parameters:
◦ ε: Neighborhood of Segment
◦ MinLns: Minimum number of Lines.
Trajectory cardinality limits maximum number of clusters.
Turns a set of Segments D into a Set of clusters O.
Complexity: ◦ O(n log n): where n is the number of segments. Using a spatial index.
◦ O(n²)= For number of dimensions >= 2.
Algorithm
Algorithm
Representative Trajectories Imaginary trajectory obtain from the clusters.
As a regular trajectory, a representative trajectory is a sequence of points.
Representative trajectory indicates the major behavior of segments of a cluster.
Representative trajectory = Common subtrajectory.
End =)