Clustering in artificial intelligence
-
Upload
kaam-butt -
Category
Technology
-
view
47 -
download
2
Transcript of Clustering in artificial intelligence
1
UNSUPERVISED LEARNINGUNSUPERVISED LEARNING
Supervised and Unsupervised Learning
ID3 and Version space are supervised learning algorithms
Unsupervised learning eliminates the teacher and requires that the learners form concepts (categories) on their own
Conceptual clustering is the problem of discovering useful categories in unclassified data (data whose categories are not pre-determined)
2
CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING
Unsupervised Learning and Numeric taxonomy
The clustering problem begins with a collection of unclassified objects and a means for measuring the similarity of objects
The goal is to organize the objects into classes so that similar objects are in one class
Numeric taxonomy is one of the oldest approaches to clustering problem
3
CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING
In Numeric Taxonomy the objects are represented as a collection of features and each of the feature has some numeric value
An object is thus a vector of n feature values and can be considered as a point in n-dimensional space
The similarity of any two objects can be measured by the Euclidean distance between them in this space
Using this similarity metric, clustering algorithms build clusters in a bottom up fashion (agglomerative clustering strategy)
4
CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING
The categories are formed by the following approach
1. Examine all pairs of objects, and select the pair with the highest degree of similarity and make that pair a cluster
2. Define the features of the cluster as some function, such as average, of the features of the component members and then replace the component objects with this cluster definition
3. Repeat this process on the collection of objects until all objects have been reduced to a single cluster
The result of this algorithm is a binary tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size
5
CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING
6
CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING
We may extend this algorithm to objects represented as setsof symbolic features. The only problem is in the measuring the similarity of objects
A similarity metric can be the proportion of features that any two objects have in common
object 1 = {small, red, rubber, ball}object 2 = {small, blue, rubber, ball}object 3 = {large, black, wooden, ball}
similarity (object 1, object 2) = ¾similarity (object 1, object 3) = ¼ similarity (object 2, object 3) = ¼
7
CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING
In defining categories we cannot give all features equal weight
In any given context, certain of an object’s features are more important than others; simple similarity metrics treat all features equally
The feature weights are to be set according to the goals of the categorization
8
CLUSTER/2CLUSTER/2
CLUSTER/2 forms k categories by constructing individuals around k seed objects
The parameter k is user adjustable
CLUSTER/2 evaluates the resulting clusters, selecting new seeds and repeating the process until its quality criteria are met
9
CLUSTER/2CLUSTER/2
The algorithm
• Select k seeds from the set of observed objects. This may be done randomly or according to some selection function
• For each seed, using that seed as a positive instance and all other seeds as negative instances, produce a maximally general definition that tries to cover all of the non-seed instances, until stopped by the negative instances (other seeds)
10
CLUSTER/2CLUSTER/2
The algorithm
• Classify all objects in the sample according to these descriptions. Note that this may lead to multiple classifications of other, non seed, objects
11
CLUSTER/2CLUSTER/2
The algorithm
• Replace each maximally general description with a maximally specific description that covers all objects in the category. This decreases likelihood that classes overlap on unseen objects
12
CLUSTER/2CLUSTER/2
The algorithm
• Classes may still overlap on given objects.
• Using a distance metric, select an element closest to the center of each class. Using these central elements as new seeds, repeat the above steps.
13
CLUSTER/2CLUSTER/2
The algorithm
• Stop when clusters are satisfactory. A typical quality matrix is the complexity of the general descriptions of classes. For instance, we might prefer clusters that yield syntactically simple definitions, such as those with a small number of conjuncts
14
CLUSTER/2CLUSTER/2
The algorithm
• If clusters are unsatisfactory and no improvement occurs over several iterations, select the new seeds closest to the edge of the cluster, rather than those at the center. This favors creation of totally new clusters
15
AssignmentAssignment
Read Section 9.6.1 & 9.6.2 of Luger