A novel classification method based on hypersurface

13
MATHEMATICAL COMPUTER MODELLING PERGAMON Mathematical and Computer Modelling 38 (2903) 395-407 www.elsevier.com/lorate/mrm A Novel Classification Method Based on Hypersurface QING HE AND ZHONG-ZHI SHI The Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences, Beijing 100080. P.R. China LI-AN REN Graduate College of University of Science and Technology of China Beijing 100039, P.R. China E. S. LEE* Department of Industrial and Manufacturing Systems Engineering Kansas State University, Manhattan KS 66506, U.S.A. [email protected] (Received and accepted January 2003) Abstract-The main idea of the support vector machine (SVM) classification approach is mapping the data into higher-dimensional linear space where the data can be separated by hyperplane. Based on the Jordan curve theory, a general nonlinear classification method by the use of hypersurface is proposed in this paper. The separating hypersurface is directly used to classify the data according to whether the number of intersections with the radial is odd or even. In contrast to the SVM approach, the proposed approach has no need for mapping from lower-dimensional space to higher-dimensional space. Furthermore, the approach does not use kernel functions and it can directly solve the nonlinear classification problem via the hypersurface. Numerical experiments showed that the proposed ap- proach can efficiently and accurately solve the classification problems with a large amount of data @ 2003 Elsevier Ltd. All rights reserved. Keywords-support vector machine, Hypersurface, Jordan curve theory, Statistical learnmg the- ory, VC dimension. 1. INTRODUCTION Classification plays an important role in any application endeavor and has been investigated by various researchers with various different approaches. Vapnik and his coworkers il --41 studied extensively the statistical learning theory, which can be used for both modeling and classification. Furthermore, Vapnik developed a learning algorithm, the support vector machine (SVM), which is an efficient classification algorithm for nonlinear and high-dimensional data with finite samples. The main idea of SVM is mapping the nonlinear data to a higher-dimensional linear space where the data can be linearly classified by hyperplane. The mapping is a nonlinear mapping defined *Author to whom all correspondence should be addressed. This work is supported by the National Science Foundation of China (No. 60173017. 90104021j alid the Nature S(,lence Foundation of Beijing (No. 4011003). 0895-7177/03/$ - see front matter @ 2003 Elsevier Ltd. All rights reserved. doi: 10.1016/S0895-717’7(03)00229-2

Transcript of A novel classification method based on hypersurface

Page 1: A novel classification method based on hypersurface

MATHEMATICAL

COMPUTER MODELLING

PERGAMON Mathematical and Computer Modelling 38 (2903) 395-407 www.elsevier.com/lorate/mrm

A Novel Classification Method Based on Hypersurface

QING HE AND ZHONG-ZHI SHI The Key Laboratory of Intelligent Information Processing

Institute of Computing Technology Chinese Academy of Sciences, Beijing 100080. P.R. China

LI-AN REN Graduate College of University of Science and Technology of China

Beijing 100039, P.R. China

E. S. LEE* Department of Industrial and Manufacturing Systems Engineering

Kansas State University, Manhattan KS 66506, U.S.A. [email protected]

(Received and accepted January 2003)

Abstract-The main idea of the support vector machine (SVM) classification approach is mapping the data into higher-dimensional linear space where the data can be separated by hyperplane. Based on the Jordan curve theory, a general nonlinear classification method by the use of hypersurface is proposed in this paper. The separating hypersurface is directly used to classify the data according to whether the number of intersections with the radial is odd or even. In contrast to the SVM approach, the proposed approach has no need for mapping from lower-dimensional space to higher-dimensional space.

Furthermore, the approach does not use kernel functions and it can directly solve the nonlinear classification problem via the hypersurface. Numerical experiments showed that the proposed ap- proach can efficiently and accurately solve the classification problems with a large amount of data @ 2003 Elsevier Ltd. All rights reserved.

Keywords-support vector machine, Hypersurface, Jordan curve theory, Statistical learnmg the- ory, VC dimension.

1. INTRODUCTION

Classification plays an important role in any application endeavor and has been investigated by various researchers with various different approaches. Vapnik and his coworkers il --41 studied extensively the statistical learning theory, which can be used for both modeling and classification. Furthermore, Vapnik developed a learning algorithm, the support vector machine (SVM), which is an efficient classification algorithm for nonlinear and high-dimensional data with finite samples. The main idea of SVM is mapping the nonlinear data to a higher-dimensional linear space where the data can be linearly classified by hyperplane. The mapping is a nonlinear mapping defined

*Author to whom all correspondence should be addressed. This work is supported by the National Science Foundation of China (No. 60173017. 90104021j alid the Nature S(,lence Foundation of Beijing (No. 4011003).

0895-7177/03/$ - see front matter @ 2003 Elsevier Ltd. All rights reserved. doi: 10.1016/S0895-717’7(03)00229-2

Page 2: A novel classification method based on hypersurface

396 &, HE et al.

by inner product function. Many repeated inner product computations of an m-matrix must be carried out, where m represents the number of samples. Thus, it is almost impossible to classify problems with a large amount of data. For example, a problem with more than 4000 samples was found to be difficult to chassify by using PC [5].

Zhang and Zhang [6] proposed a geometry classification method, where the original input space is transferred into a quadratic space by the use of a global project function. Then, the well-known point set covering method was used to perform the partition of the data in the transformed space. The authors also proposed a covering-deletion design algorithm. The approach solves the covering problem in distance-space instead of a quadratic optimization problem as in SVM.

Widrow and Hoff [7] proposed to solve the classification problem by the use of many hyper- planes, resulting in the “Madaline” . A general classification method based on hypersurface and Jordan curve theory is proposed in this paper. The separating hypersurface can be directly used to classify large databases. This is a new approach that has no need for mapping from lower-dimensional to higher-dimensional spaces. Furthermore, there is no need to consider kernel functions. The proposed approach can directly solve the nonlinear classification problem and the experimental results showed that this new method could efficiently and accurately classify large databases.

2. CLASSIFICATION BASED ON SEPARATING HYPERSURFACE

In this section, the Jordan curve theory and the construction of the separating hypersurface will be summarized.

2.1. Jordan Curve Theory

Jordan curve theory forms the basic theory for the proposed classification method based on hypersurface. Jordan curve theory is stated as follows.

JORDAN CURVE THEOREM. Let X be a closed set in R3. If X is homeomorphic to a sphere S2, then its complement R3 \ X has two connected components, one bounded and the other un- bounded. Any neighborhood of any point on X meets both of these components.

Thus, according to the Jordan curve theory, a surface can be formed in a three-dimensional space and used as the separating hypersurface. For any given point, the following classification theory can be used to determine whether the point is inside or outside the separating hypersurface.

CLASSIFICATION THEOREM. Let X be a closed set in R3. If X is homeomorphic to a sphere S2, then its complement R3 \ X has two connecting components, one called inside and the other called outside. For any x E R3\X, then the point x is inside of X M. The intersecting number between any radial from x and X is odd; x is outside of X M. The intersecting number between any radial from x and X is even.

Figure 1. Classification theorem

Page 3: A novel classification method based on hypersurface

A Novel Classification Method 397

This classification theory is also illustrated in Figure 1. The Jordan curve theory and the

classification theory form the basis of our classification approach. The above theories are only for problems with three-dimensional spaces. The Jordan curve theory can be easily extended to problems with high dimensions, The Jordan curve theory with high-dimensional spaces is given in the following.

JORDAN THEOREM IN HIGH-DIMENSIONAL SPACE. Suppose that X c S” is homeornt)rph~c to a sphere Sm, then m < n, otherwise X = S n. If m < n, then the homology group of S’” \ X is

2 $2, if m = n - 1 and k = 0,

Hk (S” \ X) Z

{

2, ifm<n-landk=O,

0, otherwise.

Specifically, if m = n - 1, then S” \ X composed by two connected components. lf VI c’ II I. then there exists only one connected component.

Based on the Jordan curve theory, an n-dimensional space can be separated by ;tn 11 - l- dimensional two-sided surface that is homeomorphic to the n-dimensional sphere. Notice that the separating hypersurface may have more than one hypersurface. For any given point, whether it is inside or outside of the hypersurface depends on the intersecting number between the separating hypersurface and the radial from that point. However, it is not easy to construct this separating hypersurface. In the following, we propose a new approach.

2.2. Construction of Separating Hypersurface

From the Jordan curve theory, the following classification method based on the separatillg hypersurface can be formulated.

The training process

The training process consisted of the following steps.

STEP 1. Input all training sample data. Transform the samples into a cube.

STEP 2. Equally divide the club into smaller regions.

STEP 3. Take a smaller cube region, if the region only contains at most one-class samples, then go to Step 4. If the smaller region contains more than one-class samples, take the small region as a cube in Step 2 and go to Step 2.

STEP 4. Save the small regions with only one-class samples. Label the regions according to the sample’s class. The frontier vector of the region and the class vector form a string.

STEP 5. Combine connected regions, i.e., the same class strings, and save it as a link Dable, these link tables together form a separating hypersurface.

The classification process

The classification process consisted of the following steps.

STEP 1. Input a testing sample.

STEP 2. Input a class of the link table obtained above.

STEP 3. Make a radial from the testing sample.

STEP 4. Count the number of intersections of the sample with the link table.

STEP 5. If the number of intersections between the radial and the separating hypersurface is odd, then the test sample’s class is the same as the class of the link table, otherwise go to Step 2.

STEP 6. Go to Step 1 to take the next testing sample.

The above classification algorithm is basic and general. It can be used to classify problems with any dimension. In the following sections, numerical examples for problems with two- and three-dimensional spaces will be solved to illustrate the approach.

Page 4: A novel classification method based on hypersurface

398 Q. HE et al.

3. TWO-DIMENSIONAL DATA CLASSIFICATION In this section, two-dimensional data will be numerical trained and classified. We first generate

the needed data, and then apply the above-discussed training and classification procedures to test problems with both large and small data sets. Furthermore, to test the generalization ability of the approach, different sizes of training and classification samples will also be used.

3.1. Discrimination Problem with Two Spirals. [6]

To illustrate the approach, two-dimensional data with two spirals will be generated. The equations for the two spirals Kl and K2 in polar coordinates are

K1 :p=O,

K2:p=8+n,

(3.1)

(3.2)

The training sample data and the testing sample data can be created based on equations (3.1) and (3.2). The following detailed formulas are used to obtain the sample data.

Kl:k=i.&,

z = k. cos(k),

y = k. sin(k),

z = k . cos(k + r),

y = k. sin(k + 7r).

(3.3)

(3.4)

For example, let ic = 90 and ikfi = ik + 0.00025. we obtain 10,800,OOO samples, which are obtained by the use of equations (3.3) and (3.4). These samples of data are composed of two classes. Similarly, let ia = 90 and ik+i = ik +0.00012, 22,500,002 samples are created. Let i = 90 and ikfr = ik + 0.00005, 54,000,OOO samples are created. The sample from K1 is called a Class 0 sample and the sample from K1 is called a Class 1 sample. The two spirals data obtained above can be separated in a two-dimensional space baaed on the above algorithm. The classification process can be separated into the following steps.

3.1.1. The training algorithm

STEP 1. Input the training samples, which are composed of two classes. Let the training samples be distributed within the rectangle region.

STEP 2. Transform the region into a unit region.

STEP 3. Divide the region into 10x10 small regions as shown in Figure 2.

STEP 4. Label the small regions as 0, 1 or 2 according to whether the samples’ class in the region is Class 0, Class 1, or both classes, respectively.

STEP 5. Remerge the frontiers of the same class regions, which labeled 0, then save it as a link table.

STEP 6. Remerge the frontiers of the same class regions, which labeled 1, then save it as a link table.

STEP 7. For the regions, which labeled 2, go Step 2.

STEP 8. Repeat the above steps until there is no region labeled 2.

After the completion of the above procedure, the desired separating hypersurface is obtained. This desired hypersurface is constructed by using the link tables.

Page 5: A novel classification method based on hypersurface

A Novel Classification Method

10

Y

Figure 2. Basic algorithm.

3.1.2. The classification algorithm

After the training process is over, we can classify the testing sample as follows.

STEP 1. Input a testing sample and make a radial from the sample.

STEP 2. Input all the link tables of Class 0 obtained in the above training algorithnl

STEP 3. Count the number of intersections of the sample with the above link table.

STEP 4. If the number of intersections of the sample with the above link table is odd then label the sample 0, otherwise go to next step.

STEP 5. Input all the link tables of Class 1 obtained in the above training algorithm

STEP 6. Count the number of intersections of the sample with the link table obtained in Step 5.

STEP 7. If the number of intersections of the sample with the above link table is odd then label the sample 1, otherwise the sample’s class cannot be defined.

STEP 8. Calculate the classification accuracy by the following formula:

correct rate = the number of samples classified correctly the number of all the testing samples

(3.5)

The above algorithm is illustrated in Figure 2. For example, point A in Figure 2 does not belong to any class because the number of intersections of the radial from A with the link table is even. The points 0 and X belong to different classes because the number of intersections of the radial from A with the different link tables is odd.

3.2. Large Database

With the spiral data obtained in the previous section, numerical experiments were carried out. The training results are summarized in Table 1 and illustrated in Figure 3. The three sets of training data listed in the first column of Table 1 are obtained by the use of equations (3.3) and (3.4). In Table 1, the testing sample set is the same as the training sample set. The training time is the time needed to obtain the hypersurface. The classification time is the time needed to classify all the testing samples. The recall rate is the accuracy rate when the testing sample set is the same as the training sample set.

Page 6: A novel classification method based on hypersurface

400 Q. HE et al.

Table 1. Training results with large samples

Number of Training Samples Training Time Classification Time Recall Rate (%)

10,800,OOO 1 h, 34m, 57s Zh, 17m, 35s 100.00 22,500,002 3h, 16m, 9s 4h, 49m, 55s 100.00 54,000,000 7h, 42m, 52s 11 h, 47m, 7s 100.00

Samples Num: 27002

Correct Rate: 100.00%

Figure 3. Training results with large samples.

Table 2. Testing results with large samples.

Number of Number of Training Samples Testing Samples Classification Time Correct Rate (%)

10,800,OOO 22,500,002 4h, 7m, 4s 100.00 22,500,002 I 54,000,000 ( llh, 25m, 3s I 100.00 54,000,000 67,500,002 14h, 37m, 6s 100.00

In Table 2, the training samples are the same as that used in Table 1. The testing or classi- fication samples are much larger and are obtained in essentially the same way as that obtained for the training samples. The correct rate is obtained by using equation (3.5).

3.3. Small Training Sample and Large Classification Sample

To test the generalization ability, the classification or testing sample is obtained from the two spirals data and is ten times larger then the training sample (see Table 3). As can be seen from Table 3, the approach performs very well and the classification rate is essentially 100%. Moreover, for a very large data set of the order of 10 7, the rate of classification based on this hypersurface approach is very fast. The reason is that the time needed for saving and extracting the hypersurface is very short and the need of storage is very low.

Table 3. Small training sample and large testing sample.

Number of Training Samples

5,402

Number of Classification Time

Correct Classification Testing Samples Rate (%)

54,002 41s 99.59 5,402 I 540,000 6m, 45s I 99.58

27.002 540,000 1 6m,44s 99.98 54,002 I 540,000 1 6m,47s 100.00 54,002 5,400,000 1 h, 7m, 7s I 100.00

Page 7: A novel classification method based on hypersurface

A Novel Classification Method ,101

4. THREE-DIMENSIONAL DATA CLASSIFICATION Numerical examples of problems with three-dimensional data will be trained and classified. Fol-

lowing the same approach used in the previous section, we shall first obtain the needed training and classification data, and then carry out the actual numerical training and classification. Dif- ferent sizes of training and classification samples will also be used to illustrate the generalization ability of the approach.

4.1. Constructing the Training and Testing Data

The three-dimensional data will be generated, again, in two-spirals. The two spirals h’ 1 itnd h’2 in polar coordinates are

14.1)

K2:p=b’+n, * <p587r. 2 = P, 2-

(4.2)

The training and classification data are obtained based on equations (4.1) and (4.2). The detailed equations are

Ki:k=i,&,

x = k cos(k), (4.3) y = k. sin(k),

z = k,

K1: k=i-&-,

cc = k cos(k + n),

y = k . sin(k + 7r)%

z = k.

(4.4)

For example, letting ic = 90 and ik+i = ik + 1.7, we can obtain two classes of samples with 54002 data points by equations (4.3) and (4.4). Letting is = 90 and ik+i = ik t 0.2, 13500 samples are created. Letting ia = 90 and ‘&+I = ik +O.l, 27002 samples are created. The sample from K1 is called a Class 0 sample and the sample from K1 is called a Class 1 sample. In the following tables, the samples are all generated in a similar manner.

The samples are shown in Figure 4. However, to separate the two spirals in three-dimensional space is more difficult than the two-dimensional problem.

4.2. Training and Testing Procedure

Three-dimensional data can be classified in a similar manner. The difference between the two- dimensional and three-dimensional data is that the sample space of the three-dimensional data is first transformed into a cube, which can be represented schematically by the following unit cube:

{

ID of region,

unit label of class,

string of frontier surface.

The structure of the training samples is

the string of the same level samples,

training sample label of level,

the string of the next level string.

Page 8: A novel classification method based on hypersurface

402 Q. HE et al.

,’ i _,*,’ I

,’ /’

,’ _’ ,’

, ,a _’ i ,,.’

I !-_.._.._.._.._.._.._.._.._.._..-..:- Samples Num: 5402

(a) Data view.

Samples

: _. -_; ._.. j . ._ . . . “ .

:

r , , : . : : I _.. .j. -1, :~.*L. , ; ._ . i.

, * . r . . : ‘-r.‘-

I / I / I I , I / r____.r~___...i______-____-_l__.-__ir___.........~......~......~.....

Num: 5402

i

_i

I .: I

(b) Front view. Figure 4. Three-dimensional data.

The steps for classifying the three-dimensional data is similar to that for the two-dimensional data. Thus, only a summarization of the various steps for the training and testing processes are listed in the following.

STEP 1. Generate the training sample by equations (4.3) and (4.4) and then import them into the database.

STEP 2. Train the samples, record the training time, and save the classifying link table or the hypersurface.

STEP 3. Generate testing sample by equations (4.3) and (4.4) and then import them into the database.

Page 9: A novel classification method based on hypersurface

A Novel Classification Method

Nunx5402

(c) Side view.

Figure 4. (cont.)

STEP 4. Obtain the classifying link table, count the intersecting number of the radial of the testing sample with the hypersurface, classify the testing samples based on whether the inter- secting number is odd or even, record the testing time and the classification results, calculate the classification accuracy rate.

4.3. Small- and Medium-Sized Samples

The testing results are listed in Table 4. The testing samples are obtained by the use of equations (4.3) and (4.4). Notice that both testing and training used the same sets of data. Thus, the correct classification rate is called recall rate in the last column of Table 4

Table 4. Training results with small- and medium-sized samples

Number of Training Samples

5,402

13,500

27,002

54,002

108,000

540,000

1,350,002

5,400,000

Training Time

7s

12s

23s

40s

lm, 17s

6m, 16s

15m, 41s

lh, 2m, 39s I Classification Time

4s

11s

22s

45s

lm, 30s

7m, 36s

19m, 21s

1 h, 17m, 53s

Recall Rate 1%)

100.00

100.00

100.00

100.00

lOO.Orl

100.00

100.00

100.00

Table 5. Testing results with small- and medium-sized samples

Number of Number of Training Samples Testing Samples Classification Time Correct Rate (% I

- i 5,402

13,500

27,002

54,002

108,000 540,000

1.350.002

13,500

27,002

54,002

108,000

540,000 1,350,002

5.400.000

12s 99.87

23 s 99.95

45s 99.99

lm, 30s 100 00

7m, 30s 100.00

18m, 59s 100 00

1 h, 17m, 13s 100.00 -.

Page 10: A novel classification method based on hypersurface

404 Q. HE et al.

I ,,,’ ,/’ /’

I _,,+ ,,*’ I ,’

i ___I.’ : ,*’ I .-’ * ,’ “-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..-..4’

Samples Num: 27002 Comcl Rate: 100.00%

(a) Hypersurface.

“9 ,’

I

f

..,~ I /. ,_,’ /,-

_.,’ /’ ,,~’

I ,. , ,,,’ i ,./.I

: ,, ‘-..-“-“-.--.--..-..~..-..-..-..~..-..~..~..~’ I .’ %L.

Samples Num: 27002 Concci lbic: 100.00%

(b) Class 1.

Figure 5. Hypersurface and covering map

Page 11: A novel classification method based on hypersurface

A Novel Classification Method ,l.-..-..-..-.‘-‘.-‘.-..-..-..-..-..--.-.- .--..- ‘a

/., ; / /-

/’ ,/” : ,I’

: I ,:’

./” ,/’

.’ /’

,’ /’ ./

.,’ ,

; ; a _..__._.._.. -_ . - . . - . . - . . - . . - . . - . . - . . - . . - . . - . . - . .

Samples Num: 27002

Coned Rate: 1 SO.OO%

(c) Class 2. Figure 5. (cont.)

The hypersurface or the classifying link table is obtained from the results of the training. Saving and extracting the classifying string or the hypersurface from the results can be carried out very quickly, in a matter of a few seconds. Furthermore, the results require very little storage. The hypersurfaces are illustrated in Figure 5.

The testing results for small- and medium-sized samples are summarized in Table 5. In this case, the testing samples are much larger than the training samples.

4.4. Large-Sized Samples

The training results of large sized-samples are listed in Table 6. The sample size is very large and, as can be seen from Table 6, is of the order 107. In this table, the same training and the classifying samples are used.

Table 7 summarizes the testing or classifying results for large sample size. The testing size of the sample is of the order 107, which is much larger than the training sample size.

4.5. Small-Sized Training Sample and Large-Sized Testing Sample

With testing samples larger then ten times the training samples, the results listed in Table 8 are obtained. As can be seen from Table 8, the results are very good and the classification errors are nearly zero. Thus, the classification method based on hypersurface has strong generalization capability.

NOTE. All the results obtained in this paper used computers with the following capabilities.

(1) Main computer. Pentium III process with 733 MHz and memory capability: 256 M. (2) Operating system. Microsoft Access 2000. (3) Compilation environment,. Visual C++ 6.0 Service Pack 4.

Page 12: A novel classification method based on hypersurface

406 Q. HE et al.

Table 6. Training results, large sample size.

Number of Samples 1

Training Time Classifying Time Recall Rate (%) 10,800,000 2h, 6m, 23s 2h, 34m, 45s 100.00 22,500,002 4h, 23m, 18s 5h, 22m, 26s 100.00

Table 7. Testing results, large sample size

Number of Training Samples

5,400,000 10,800,OOO 22,500,002

Number of Testing Samples

10,800,000 22,500,002 60,OOO.OOO

Classifying Time

2h, 35m, 48s 5h, 14m, 51s 14h, 25m, 8s

Correct Classification Rate (%)

100.00 100.00 100.00

Table 8. Results with small-sized training and large-sized testing samples.

Number of Training Samples

5,402 5,402

Number of Testing Samples

54,002 540,000

Correct Classification Classifying Time Rate (%) 45s 99.82 7m, 42s 99.81

27,002 540,000 7m, 34s I 99.98 54.002 540.000 I 7m.33~ 100.00 54,002 5,400,000 lh, 15m, 59s 100.00 54,002 22,500,002 5h, 15m, 19s 100.00

5. DISCUSSIONS

To illustrate the effectiveness of the proposed hypersurface approach, the two-spiral discrimina- tion problem with both two-dimensional and three-dimensional data are investigated and solved. As shown in the results, the prediction rate for all the training samples is 100%. The correct classification for all samples is more than 99%. Furthermore, Tables 3 and 8 show that the pro- posed approach has strong generalization ability. It should be noted that the classification of two spirals needed 3000 iterations and only obtain 89.6% correct classification by the use of other approaches [8,9].

The proposed new classification method based on a hypersurface is a universal classification method for a large nonlinear database and has the following advantages.

(1) HIGH EFFICIENCY AND HIGH ACCURACY. For a large data set, say 107, the speed of the hypersurface approach is very fast. The reason is that the time for saving and extracting the hypersurface is very short and the need for storage is very low, which is not the case for the support vector machine (SVM) approach [l-4]. Another advantage is that the decision process by the use of the Jordan curve theorem is very simple, which reduces the optimization problem to a simple comparison process.

(2) STRONG ABILITY OF GENERALIZATION. The experimental results of training on small sam- ples and testing on large samples show that the approach has strong generalization ability. From the statistic learning theory (31, we know that the higher the VC dimension, the larger the con- fidence domain. Thus, the difference between the real risk and the experimental risk increases. This is the problem of excessive learning. Machine learning is not just minimizing the exper- imental risk, but also reduces the dimension VC. However, this strategy is not useful for the proposed hypersurface approach because the hypersurface is formed by the use of linear segmen- tation function. Furthermore, the approach can separate any number of samples that distributed in any way, the function set has infinite VC dimension,

Furthermore, we see that the continuousness of the hypersurface is improved as the number of samples increases. This shows that the scale of the unit should be larger than the margin of the sample, If the scale of the unit is too small, the hyperspace may be separated into different parts.

Page 13: A novel classification method based on hypersurface

A Novel Classification Method

Thus, a larger unit is required in the region where the samples are scattered. On the other hillId. a small unit is required in regions where the samples are densely distributed. However! the samples. in general, are seldom uniformly distributed. To solve this problem, a ‘Vocal elaboration” division step is added to the proposed procedure. This local division strategy improves the generi~lizatic~n ability and the accuracy.

(3) R.OBUSTNESS. Although the noise in the data can cause classification error, the &~.t of thih

noise can be controlled in the local region. The problem is that a noisy sample located inside a hypersurface may cause this hypersurface to transform into a complex hypersurface. l’11us. t 1~. noise may cause a mistake in classification. This problem can be controlled by tht-1 w-’ 01’ i\ lr~al small unit.

(4) SAMPLE DISTRIBUTION. The proposed approach can solve nonlinear classification i~rctblerns with no restrictions on the sample. In other words, the sample can be distributed in any; wav in a finite region.

REFERENCES

1. V.N. Vapnik, Support Vector Method for Function Approximation, Neural Informat>,on Proce~s~nq S~/?te~rc.~. Volume 9, MIT Press, Cambridge, MA.

2. V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, New York, ( 1995 1 3. V.N. Vapnik, Statistical Learning Theory, Wiley, New York, (1998). 4. V.N. Vapnik and E. Levin, Measuring the VC-dimension of learning machine, Neural (“amputation 6.

851-876 (1994). 5. B. Schokopf, J.C. Burges and A.J. Smola, Advances in Kernel Methods Support Vector Lecwnrng, hl’l‘

Press, Cambridge, MA, (1999). 6. L. Zhang and B. Zhang, A geometrical representation of McCulloch-Pitts neural model and its appllcatlorrs.

IEEE Transactions on Neural Networks 10 (4), 925-929 (1999). 7. B. Widrow and M. Hoff, IRE Weston Convension Record, Part 4, pp. 96--104, Inst.itut(a of Kadic Engmeers,

New York, (1960). 8 E.B. Baum and K.J. Lang, Constructing hidden units using examples and queries, In Neural informatzo7~

Processing Systems, Volume 3, (Edited by R.P. Lippman et al.), pp. 904-910, Kaufmann. San SIatro. CA (1991).

9. S.E. Fahlman and C. Lebierere, The cascade-correlation learning architecture, In Advances zn Nrzlr& Info- mation Process Systems, Volume d, (Edited by D.S. Tourdzhy), pp. 524-532, Kaufmann. San %lateo. CA. (1990).

10. C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mznmq nnn Kllo~uledgc Discovery 2 (2) (1998).

11. C. Co&s and V. Vapnik, Support vector networks, Machzne Learnzng 20, 273~-297 (1995j 12. W. Fulton, Algebraic Topology A First Course, Springer-Verlag, New York, (1995). 13. B. Widrow and KG. Winter, Layered neural nets for pattern recognition, IEEE ~ransact~~zs UI~ .~\(.o~ILs~I(~.\

Speech and Signal Processing 36 (3), 1109-1118 (1988)