ARTIFICIAL NEURAL NETWORK CLASSIFICATION TOOL FOR...

29

CHAPTER - III

ARTIFICIAL NEURAL NETWORK CLASSIFICATION

TOOL FOR DIAGNOSING DIABETES

3.1.1 Artificial Neural Network

Artificial neural network is a system closely modeled on the human brain. Artificial neural network contains the multiple layers of simple processing elements called neuron. Each neuron is linked to certain of its neighbors with coefficients of connectivity that represent the strengths of these connections. Learning is accomplished by adjusting these strengths to cause the overall network to output appropriate results. Diagnostic systems, biochemical analysis, image analysis and drug development are the various areas in medicine where artificial neural network is used successfully.

In diagnostic systems, normally artificial neural network is used to detect cancer and heart problems. In biochemical analysis artificial neural network have been used to analyze blood and urine samples, track glucose levels in diabetes, determine iron levels in body fluids, and detect pathological conditions such as tuberculosis. Tumor detection in ultra sonograms, classification of chest x-rays and vessel classifications in magnetic resonance images (MRI), determination of skeletal age from x-ray images, and determination of brain maturation are some of the applications where artificial neural network is being used for image analysis. It used as tools in the development of drugs for cancer and AIDS and in the process of modeling biomolecules.

Biological Neuron

Human brain contains 1014 tiny cells called Neurons. A neuron is composed of a cell

body, a tabular axon and a multitude of hair like dendrites. The dendrites form a very tiny filamentary brush surrounding at the body neuron. The axon is a long, thin tube that splits into branches terminating in little end bulbs that touch the dendrites of other neuron cells. The small gap between an end bulb and a dendrite is called a synapse, across which information is propagated. The axon of a single neuron forms synthetic connections with many other neurons. The pre synaptic side of the synapse refers to the neuron that sends a signal. The post synaptic side refers in the neuron that receives the signal. It is represented in the form of figure 3.1.

30

Axon hillock

Axon Soma

Nucleus Dendrite

Figure 3.1 Schematic Representation of Biological Neuron

Terminal Buttons

31

Artificial Neuron

Artificial neuron network is simulation of biological neuron network. Node, Connection, Weight and Node Output are the mapping keywords used in artificial neuron network instead of Synapse, Synaptic Efficiency and Firing frequency in biological neuron network [11]. Neuron model is represented in the form of figure 3.2.

w1

w2

w1

f

x1

xn

x2

Figure. 3.2 General neuron model.

32

3.1.2 Various activation functions

Step function

Various activation functions are used by Node to calculate the Node output. In the Step function if the input to the Node is greater than a thresholds value the output will be high value otherwise its output will be low value. It contains only two different output levels. Step function is represented in the form of figure 3.3.

Formula

f(net)=

a if net < c

b if net > c

(net)

(ON)

b

a

f(net)

(OFF)

c

Figure. 3.3 Step Function

33

Ramp Function

It contains two threshold values and three different levels. If its input value is greater than the first threshold value the output will be high. When the input value less than the second threshold value it has low level. If the input value is in between first and second threshold’s value then its value will be in between low to high. In this mid level the output level is going to be continuous. Even one small change in the input will make small change in the output. Ramp function is represented in the form of figure 3.4.

Formula

f(net) =

−−−+ )/()))((( cdabcneta

b

a

d

(net)

(ON)

b

a

f(net)

(OFF)

c

Figure. 3.4 Ramp Function

If net < c

If net > d

Otherwise

34

Sigmoid Function

It’s the most popular function in artificial neuron network. It is continuous and differentiates every where. Another name for this function is S-shaped function. The advantage of this function is that function smoothness makes it easy to devise learning algorithms and understand the behavior of large Networks whose nodes compute the signal functions. Sigmoid function is represented in the form of figure 3.5.

Formula

f(net) = tanh(x . net – y) + z

b

a

f(net) 0.5

0 (net)

Figure. 3.5 Sigmoid function

35

3.2.1 Different Neuron Network Architectures

Artificial Neuron Network contains various numbers of nodes. The way nodes are connected determines how computations are going to take place in the Artificial Neuron Network.

Fully Connected Network

In this architecture every node is connected to every other node in the network. The connection may have positive or negative weights. If N nodes are there, N2 connections will be there in the network. A special case of fully connected network is one in which the weight that connects one node to another is equal to its symmetric reverse. These networks are called fully connected symmetric networks.

The Connection from one node to another node may carry a different weight than the connection from the second node to the first. This type of network is called as fully connected asymmetric network. Fully connected symmetric and asymmetric networks are represented in the form of figure 3.6 and figure 3.7.

0.7 0.2 Hidden node 0.3 Input node

1.9 0.4 0.8 0.7 Output node Input node Output node 0.1

0.7 0.3

Figure. 3.6 A Fully Connected Symmetric Network

II

I

III

IV

36

The Connection from one node to another node may carry a different weight than the connection from the second node to the first. This type of network is called as fully connected asymmetric network.

0.2 Output node 0.2 Hidden node 0.3 Input node 0.8 0.3 1.9 0.4 0.5 3.0 0.01 Output node 1.2 0.8 -0.5 Input node 0.1 0.4 0.3

Figure. 3.7 A Fully Connected Asymmetric Network

37

Layered Network

In this network nodes are portioned into subsets called layers. There is no connection from layer J to layer K if J > K. There is no intra layer connection and computation in the input layer. Connection exist from any node in layer I to any node in layer J for J > I and intra layer connections may exist among them. Layered Network is represented in the form of figure 3.8.

Layer 0 Layer 1 Layer 2 Layer 3 (Input layer) (Output layer)

Hidden layer

Figure. 3.8 A Layered Network

38

Acyclic Network

It is a subclass of layered network. It does not have intra layer connections. A connection may exist between any node in layer I and any node in layer J for I < J, but a connection is not allowed for I = J. Networks that are not acyclic are called as recurrent networks. Acyclic network is represented in the form of figure 3.9.


Hidden layer

Figure. 3.9 An acyclic Network

39

Feed forward Network

It is a subclass of acyclic networks in which a connection is allowed from a node in layer I only to layer I+1. It is the most commonly used neural networks. Mostly it may have maximum four layers. First and last layers are input and output layers. In between layer is the hidden layer. Input layer has connection towards hidden layer and from hidden layer connection goes to the output layer. Feed forward network is represented in the form of figure 3.10.


Hidden layer

Figure. 3.10 An Feed forward 3-2-3-2 Network

40

Modular Neural Network

It consists of several methods which have interconnection among themselves. Modularity allows the neural network designer to divide the task into various subtasks and logically combines them into one. In hierarchical organization each higher level module processes the output of the previous level module. In successive refinement organization each module performs some operations and distributes tasks to next higher level modules. In input modularity organization each first level module processes a different subset of inputs and sends it on to the node in the next layer. Different types of modular neural network are represented in the form of figure 3.11, 3.12 and 3.13.

Figure. 3.11 Hierarchical Organization

Inputs

.

.

.

.

.

.

.

.

.

.

41

Inputs

Figure. 3.13 Input Modularity

.

.

.

.

.

Inputs

Figure. 3.12 Successive Refinement

.

.

.

.

42

3.2.2 Various Learning Rules

In artificial neural networks, learning refers to the method of modifying the weights on connections between the nodes. Through learning the network can do the task correctly for the new input data.

Correlation learning

It is also called as “Hebbian Learning” based on the inventors name called Hebb. The learning rule is given below:

“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes phase in firing it, some growth process or metabolic change takes place in one or both calls such that A’s efficiency as one of the call firing B, is increased.”

For artificial neural networks, this implies a gradual increase in strength of connections among nodes having similar outputs when presented with the same input. The weight modification rule is as follows:

∆ Wi,j = C Xi Xj

Where c is a small constant and Xi, Xj are the activation levels of the input nodes i and j.

Competitive learning

In this learning when the input pattern is presented to the network, all the different nodes will compete with each other to be the winner with high levels of activity. It involves self excitation and mutual inhibition among nodes and a single winner emerges. The connections between input nodes and the winner node are then modified, increasing the likelihood that the same winner continues to win in future competitions. This leads to the development of networks in which each node specializes to be the winner for a set of similar patterns. Hamming networks uses this competitive learning to retrieve the nearest pattern for the given input pattern.

Feedback based weight adaptation

Human beings learn based on the feedback from the environment. If increasing a particular connection weight decrease performance or increase error then the weight is decreased as the network is trained to perform better. The amount of charge made at every step is very small in most networks to ensure that a network does not go too far from its partially evolved state. It will surely increase the training times of the network.

43

Supervised and Unsupervised learning

Artificial Neural Network can be used to solve the task which needs the supervised or unsupervised learning. In supervised learning, an instructor is available to indicate whether a system is performing correctly or to indicate the amount of error in system performance. In unsupervised learning, no instructor is available and learning must rely on guidance obtained heuristically by the system examining different sample data. Classification type problems are example for supervised learning. Clustering is example for unsupervised learning.

An archaeologist discovers a human skeleton and has to determine whether it belonged to a man or woman. He is using, the past cases of male and female skeletons as training set for learning. Thorough that learning he has to know the distinctions between male and female skeletons. This learning process is an example of supervised learning. The result of the learning process can be used to classify whether the newly discovered skeleton belongs to a man or woman.

If the archaeologist has to determine whether a set of skeleton fragments belong to the same dinosaur species or not, no previous data may be available to clearly identify the species for each skeleton fragment. He has to determine whether the skeletons are sufficiently similar to belong to the same species, or if the differences between these skeletons are large enough to warrant grouping them into different species. This type of learning process is called unsupervised learning.

3.2.3 Various Applications of Neural Network

Classification

In classification task, the assignment of each instance belongs to a specific class. The training set consists of all sample instances related information fields and its representative classes. Each output node can stand for one class. An input pattern is determined to belong to class I if the I th output node computes a higher value than all other output nodes when that input pattern is fed into the network. Recognizing printed or handwritten characters and classifying loan applications into credit worthy, non-credit worthy groups and analyzing sonar and ready data to determine the nature of the source of a signal are some of the examples for classification tasks which can be implemented by artificial neural network.

Clustering

It requires grouping of related or similar objects together. It works on the basis of distance measure. The distance should be minimum with in the cluster (Intra-cluster) and between the clusters (Inter-cluster) the difference should be maximum. For example flowers may be clustered using attributes such as colour and number of petals.

44

Vector Quantization

Vector quantization is the process of dividing up space into several connected regions called voronoi regions. Each region is represented using a single vector called code book vector. Every point in the input space belongs to one of voronoi regions, and it is mapped to the corresponding nearest code book vector. The set of code book vectors is a compressed form of the set of input data vectors, since many different input data vectors may be mapped to the same codebook vector. Using neural networks we can compress voluminous input data into a small number of weight vectors. We can implement vector quantization concept using Neural Network and compress the large volume of data.

Pattern Association

In the pattern association task input pattern has to be used to retrieve the output pattern. It contains two types. In auto association the input pattern will be a corruptible or noisy or partial version of the desired output pattern. Using the corrupted input pattern the Neural Network has to retrieve the uncorrupted or complete output pattern. For example in the face recognition system neural network will accept the corrupted image and retrieve the complete output image related to the given input image.

In hetero association the output pattern may be any arbitrary pattern that is to be associated with a set of input patterns. The system has to retrieve the output pattern for the given input pattern. Here the input and output patterns are not the same they will be different. For example the system has to retrieve the name of the person when the image of a person presented as input to the system.

Function Approximation

Many computational models are the functions which are mapping numerical input vectors to numerical outputs. Function approximation is the process of learning or constructing a function that generates approximately the sample outputs from input vectors. Using Neural Networks we can do the function approximation. We can find the related functions which will be suitable for mapping the input data into the corresponding output.

Forecaslting

In real life we have problems which can be predicated on the basis of past history. Weather forecasting and stock market prediction are the some of the examples for forecasting type problems. Even though prefect prediction is not possible, neural networks can be used to obtain reasonably good prediction in a number of cases. It is a special case of function approximation, in which the function values are represented using time series. Time series is a sequence of values measured

45

overtime, in discrete or continuous time units. For example in the stock prediction problem time unit can be a single day or week. Based on the past training examples neural network attempts to predict next value in the time sequence.

A better understanding of difficult problems is often obtained by studying many related variables together rather than by studying just one variable. A multivariate time series consists of sequences of values of several variables concurrently changing with time. Values for each variable may then be predicted with greater accuracy if variations in the other variables are also taken into account. To be successful, forecasting must be based on all available correlations and empirical interdepencies among different temporal sequences. Feed forward and recurrent type neural networks architecture are used for solving forecasting problems.

Optimization

Many optimization problems are there in business and scientific modeling. The main goal of optimization problem is to optimize (maximize or minimize) some functions, subject to some constraints. A task of arranging components on a circuit board such that the total length of wires is minimized is an example for optimization problem. It may have constraints like certain components to be connected to certain others in the circuit board. Using stochastic algorithms in neural network we can solve the optimization problems.

Search

Search problems consist of set of states, transition among different states and methods of making moves that it make to reach the goal. Neural networks can be used to find the optimal way to reach the goal. For example game playing where neural network can be used to find the optimal solution to reach the goal. The network receives the input of current state in the game and gives the output of best possible move. The network can be trained based on the quality of previous moves made by the network in response to various input patterns.

3.3.1 Single Layer Network

A single layer network contains only a single layer where computation is going to take place. Input layer receives the input there is no computation taking place in the nodes. It receives the inputs and sends it on to the Nodes which is in output layer through the connections. There is no hidden layer in this architecture.

46

Using single layer network we can solve problems like AND, OR. These problems are solved by a linear function. In the time of network training, it finds a linear function, which will divide the problem space into two different categories. Linear function is a function that has a constant rate of change. To train the neural network to learn the linear function Rosenblatt invented a learning algorithm called perceptron. Single layer network is represented in the form of figure 3.14.

Perceptron Training Algorithm

Algorithm Perceptron

Start with a randomly chosen weight vector wo;

Let k=1

While there exist input vectors that are misclassified by Wk-1, do

Let I j be a misclassified input vector;

Let Xk class(ij).ij, implying that Wk-1 Xk< 0;

Update the weight vector to Wk=Wk-1+ n Xk;

Increment k;

End-while;

In the time of training the perception weights got changed. From the changed weights, the equation for separating hyper plane is derived. The derived hyper plane will be used to classify the new input samples. The training algorithm starts with the initial random weights. Input samples are repeatedly presented and the performance of the perception observed. If the performance on a given input sample is satisfactory the weights are not changed. But if the network output differs from the desired output, the

X2

X1

Y

Input Layer Output Layer

Figure. 3.14 Single Layer Network

47

weights must be changed in such a way as to reduce system error. Samples are presented repeatedly to train the weights. Once the training is over it will be ready to classify the new input into two different classes.

Perceptron can solve only linear separable problems. In real life we have lot of problems that are very complex which can not be solved by linear function. Exclusive- OR, Peninsula and Island are some examples for linearly non separable problems. To solve these types of problems Multi-layer networks are used.

3.3.2 Multilayer Networks

It contains a hidden layer which is not included in single layer networks. Compotation are taking place in the hidden layer and output layer. There is no computation in input layer which has nodes simply to receive the input from the user. Multi layer network is represented in the form of figure 3.15.

Perceptron and other one-layer networks like adaline are seriously limited in their capabilities. Feed-forward network with non-linear node functions can overcome the limitations. Mainly back propagation algorithm is used for training the feed-forward network.

Backpropagation Training algorithm

The number of input nodes equals the dimensionality of input patterns and the required problem output desired the number of nodes in the output desired the number of nodes in the output layer. The number of nodes in the hidden layer depends on the

Y

Input Layer Hidden Layer Output Layer

X1

X2

X3

Figure. 3.15 Multi Layer Network

48

problem complexity. Each hidden node and output node applies a sigmoid function to its net input. Its formula is

1 S(net) = 1+e(-net)

It is continuous, monotonically increasing, invertible and every where differentiable function. Training set contains set of input output patterns that are used to train the network. Testing set contains collection of input output patterns that are used to access network performance. Learning rate is used to set the rate of weight adjustments in the time of network training.

The back propagation algorithm trains a given feed forward multiplayer neural network for a given set of input pattern with known classifications. When each entry of the sample set is presented to the network, the network examines its output response to the sample input compared to the known and desired output and the error value is calculated. Based on the error, the connection weights are adjusted. The back propagation algorithm is based on Widrow-Hoff delta learning rule in which the weight adjustment is done through mean square error of the output response to the sample input. The set of these sample patterns are repeatedly presented to the network until the error value is minimized.

Algorithm

1. Randomly choose the initial weights

2. While error is too large

- for each training pattern (presented in random order)

- Apply the inputs to the network

- Calculate the output for every neuron through the hidden layer(s), to the output layer.

- Calculate the error at the outputs.

- Use the output error to compute error signals for pre-output layers.

- Use the error signals to compute weight adjustments.

- Apply the weight adjustments.

- Periodically evaluate the network performance.

The major advance of backpropagation over perceptron algorithm is in expressing how an error at a higher (outer) layer of a multiplayer network can be propagated backwards to nodes at lower (inner) layers of the network. The gradient of

49

these back propagated error measures can then be used to determine the desired weight modifications for connections that lead into these hidden nodes. The back propagation algorithm has had a major impact and widely applied to a large number of problems in many disciplines. Back propagation has been used for several kinds of applications including classification, function approximation and forecasting.

3.4.1 The types of diabetes

In our research we are using the Pima Indians Diabetes data set from National Institute of Diabetes and Digestive and Kidney Diseases to train and test the artificial neural network application [14]. In this chapter we have developed a classifying tool for the diabetes instances using the backpropagation algorithm. Before we used the data set in our research we studied about the diabetes disease that helped us to understand the data set clearly. The related information on diabetes disease and Pima Indians Diabetes data set are given below.

Diseases are of two types. They are Infectious and constitutional. Bacteria or viruses coming from without cause infectious diseases. Constitutional diseases are caused by untoward changes occurring within the body. Diabetes mellitus is a constitutional disease. It is an outcome of leading a sedentary life and eating processed foods. The incidence of diabetes varies directly with the consumption of processed foods like biscuits, bread, cakes, chocolates, pudding, ice cream, etc. Diabetes is a metabolic disorder, arising either due to absolute deficiency of a digestive hormone called insulin or inability of body cells to use the available insulin. The disorder completely throws the metabolism of dietary carbohydrates, fats and proteins into disarray [4].

The role of insulin in our body

The carbohydrates in our food are digested in the intestines. The end-products of carbohydrate-digestion are various sugars, chiefly glucose. This glucose is absorbed through the mucous membrane of intestines to enter the blood-stream. Thus the concentration of glucose in the blood rises. Insulin makes this glucose available to each and every cell of the body. Each cell in our body is a tiny engine that uses glucose as fuel to generate heat and energy. If glucose fuel is to gain entrance into the cellular engine, insulin is essential.

If the amount of glucose in the blood is greater than the cellular requirements, insulin converts it into glycogen and fat which are stored in the liver or muscles and adipose tissue respectively. The most important and obvious function of insulin is to control the concentration of glucose in the blood. After taking food, the concentration of glucose in blood rises. Insulin prevents the glucose concentration to rise above normal or physiological limits.

If insulin is inadequate or absent, the glucose in blood cannot enter various body-cells or cannot be converted into glycogen. Consequently, blood-glucose level rises. When the blood passes through the kidneys, the glucose is normally not

50

allowed to escape in the urine. However, when due to lack of insulin, the concentration of glucose increases beyond a particular level. It surpasses the efficiency level of kidneys and spills into the urine. That is the reason why the urine of diabetics is sweet. While defining diabetes, it was said that it is a condition arising due to either

i) Deficiency of insulin or

ii) Inability of body cells to use available insulin.

The first type of diabetes is called Juvenile or insulin dependent diabetes mellitus (IDDM). It afflicts mostly children or young adults and produces acute symptoms. The second type of diabetes is called non-insulin-dependent or maturity onset diabetes mellitus (NIDDM). It mostly afflicts middle aged persons and produces mild symptoms. In our country, almost 99% of all diabetics suffer from the latter variety of diabetes.

3.4.2 Causes of diabetes

All the causative factors of diabetes have sill not discovered. Yet, the known factors have been discussed below:

i) Heredity

Of the total diabetes, more than 46 per cent give a family history of the disease. Some researchers believe that diabetes develops not because the person has inherited defective chromosome from his parents but because he has not received that chromosome from his parents, which imparts resistance to this disease. It can be said that even though hereditary factors do play a role in the development of diabetes, to what extent and in what way these factors act is still a mystery. It can be said that hereditary factors can become effective only when certain other exciting environmental factors like obesity, faulty dietary habits and inadequate physical exertion are at work.

ii) Obesity

Overweight persons become easy victims to diabetes. Studies have shown that 60 to 85 per cent of diabetics are overweight. The more the obesity, the greater is the mortality rate due to complications of diabetes. Bodyweight which is 30% below the ideal is an almost certain guarantee against diabetes.

51

iii) Incorrect dietary habits

Food can maintain or save life; it can destroy life as well. Proper food serves the purpose of medicine while improper food works as poison and causes disease. For the origin of diabetes, excessive food is as much to be blamed as improper (i.e., refined and processed) food. The body has to produce more digestive juices and insulin to digest excessive food. Under the pressure of such excessive workload, the pancreas gland weakens and ultimately breaks down, leading to diabetes. A philosopher has rightly said that ‘very few people die of starvation; the rest die of overeating’. It would not be an exaggeration to say that we dig our graves with our teeth.

iv) Inadequate physical work

Because of the industrialization, man has drifted away from physical labour. During physical work, muscles use up a lot of glucose present in the blood. Consequently, the work load on the pancreas is reduced. Moreover, physical labour also prevents or reduces obesity, which is intimately connected with diabetes.

v) Viral infection

A possible role of some viral infection as an aetiological factor for diabetes is also being considered by many a scientist. Some children have been seen to contract diseases after suffering from mumps, a viral infection. The viruses destroy the insulin producing beta cells of the pancreas. Besides, the antibodies produced by the body to fight the virus also attack the beta cells and aggravate the disease.

vi) Effects of certain hormones

Some hormones produced in the body have an action opposite to that of insulin, i.e., they increase the amount of glucose in the blood. Such hormones include glucagons, growth hormone, adrenaline and thyroxine. If the secretion of these hormones is excessive, the effectiveness of efficiency of insulin decreases and blood glucose level rises.

vii) Side effects of certain drugs

Long term use of certain drugs like cortisone (used for asthma, respiratory diseases, arthritis, and skin diseases), contraceptive pills and thyroid group of drugs can also produce diabetes by harming the pancreas.

viii) Other illness

Acute pancreatitis, a heart attack or some other illness may precipitate diabetes. This fact is more applicable to persons who are carriers of diabetes or who have the family history of diabetes. In such persons, an acute illness may unmask latent diabetes. Acute pancreatitis is an important cause of diabetes in kerala and South Africa.

52

Symptoms of diabetes

Diabetes affects various organs or systems of the body to give rise to such symptoms as would sometimes mislead even a physician. Maturity onset diabetes creeps into the body so silently that the victim usually remains unaware and symptom less. On the other hand, juvenile diabetes develops suddenly and gives rise to dramatic symptoms.

The following symptoms point towards a possibility of diabetes:

i) Polyurea (excessive and frequent urination) The sugar escaping in the urine drags along with itself, a large quantity of water. A diabetic, therefore, frequently passes large amounts of urine.

ii) Polydipsia (dryness of mouth and excessive thirst) This symptom is the result of efforts by the body to compensate for the fluids lost through excessive urine.

iii) Polyphagia (excessive hunger) In diabetes, glucose cannot enter the various body cells. Thus the cells starve in spite of being bathed by the glucose rich serum. They suffer from ‘poverty in the midst of plenty’. To overcome this cellular starvation, the body gives rise to abnormal and excessive hunger.

iv) Loss of weight When the cells cannot utilize glucose, the body disintegrates stored fats to provide the cells with the necessary nourishment. Therefore, the person loses weight.

v) Weakness, fatigue and body ache The body also disintegrates stored muscle protein to nourish the starving cells. This is the cause of undue weakness and fatigue.

vi) Mental fatigue and lack of concentration The brain cells have to depend chiefly on glucose for their nourishment. However they cannot utilize the available glucose, due to which the person experiences undue mental fatigue, cannot concentrate and becomes forgetful.

vii) Wound infection and delayed healing Glucose rich blood is a good breeding medium for pus forming micro organisms. Moreover, diabetes also affects the small blood vessels and nerve leading to a decrease in the blood supply of the skin and derangement of skin sensations. This is the reason why even a small wound on a diabetic person’s body easily gets infected and fails to heal in time.

viii) Easy susceptibility to infections of the skin, gums and the respiratory system The glucose rich blood of a diabetic provides optimum conditions for a rapid growth and reproduction of disease causing micro organisms. Besides the hormonal imbalance causes a decrease in the natural resistance power of the body against disease. Hence a diabetic easily contracts infections of the skin, gums and

53

the respiratory tract. He commonly suffers from boils, carbuncles, pyorrhea, cough and colds.

ix) Frequent changes in the sharpness of vision and the spectacle numbers Changes in the glucose concentration of the internal fluid of the eyes leads to variations in their focusing power. That is the reason why a diabetic has to often change is spectacle lenses. The crystalline lens of the eye depends, for its nourishment and transparency, on the glucose dissolved in the aqueous. In diabetes, the nourishment of the crystalline lens is jeopardized, leading to an untimely cataract.

x) Aching or numbness of limbs and an abnormal increase or decrease in skin

sensations Diabetes untowardly affects the whole nervous system to give rise to these symptoms.

xi) Sexual weakness or impotence General weakness, disintegration of muscle protein, mental depression and

undesirable changes in the blood circulatory and nervous systems give rise to

these symptoms.

xii) Diabetic unconsciousness (hyperglycemic coma) As stated earlier, the body disintegrates stored fats to nourish starving cells. Fat disintegrations leads to the production of ketone bodies in the blood. Excessive increase of ketone bodies makes the blood acidic and gradually leads to unconsciousness. Many a time, diabetes is suspected or diagnosed after the victim becomes unconscious.

3.4.3 Diagnosis of diabetes

The oral glucose tolerance test (OGTT) measures the body's ability to use a type of sugar, called glucose, which is the body's main source of energy. An OGTT can be used to diagnose prediabetes and diabetes. An OGTT is most commonly done to check for diabetes that occurs with pregnancy (gestational diabetes).

The oral glucose tolerance test (OGTT) is done to:

• Check pregnant women for gestational diabetes. When done for this purpose, the test is called a glucose challenge screening test, and it is usually done during the 24th to the 28th week of pregnancy. Patient has an increased chance of developing gestational diabetes if she:

o Has had gestational diabetes during a previous pregnancy.

o Has previously given birth to a baby who weighed more than 8.8lb.

o Are younger than age 25 and were overweight before getting pregnant.

54

• Confirm the presence of gestational diabetes if other blood glucose measurements are high.

• Screen women who have polycystic ovary syndrome (PCOS) for diabetes.

• Diagnose prediabetes and diabetes.

Patient should be fasting for 8 – 14 hours before the test. Test is scheduled for morning about 8 o’clock. Patient will be asked to drink a sweet liquid containing glucose of 75 Gms dissolved in 200 ml water. Following this blood glucose is measured at half hourly intervals with the 2 hour sample being the most important. This is a best test to check for diabetes that occurs with pregnancy.

3.5.1 Pima Indians Diabetes Data set

National Institute of Diabetes and Digestive and Kidney Diseases provided the Pima Indians Diabetes Database for research purpose to the UCL machine learning dataset website [14]. The dataset samples are taken from the population living near Phoenix, Arizona, USA. This diabetes check has conducted only on female patents in the times of their pregnancy. It contains 9 parameters among that 8 is input parameters and 1 is output parameter.

Input parameters:

The 8 input parameters are Number of times pregnant, Plasma glucose level, Diastolic blood pressure, Triceps skin fold thickness, 2-Hour serum insulin, body mass index, Diabetes pedigree function and Age. The output parameter name is Class. It contains the value 0 or 1. Class value 1 is interpreted as "tested positive for diabetes" and class value 0 is represented as “tested negative for diabetes”.

All the input parameters had numeric values. First parameter is total number of times the patient was pregnant. Second parameter is the value of oral glucose tolerance test which is used to find the amount of glucose level in the blood. Third parameter is the diastolic blood pressure value which is measured in millimeter by Hilo gram. Fourth parameter is triceps skin fold thickness which is a measure in millimeter.

Fifth parameter is 2-Hour serum insulin test values. It is conducted to find the amount of insulin creation in the patient body. Sixth parameter is the patents body mass index. It is calculated by the following formula

Body mass Index = Patient weight in kg / (patient height in meter)2

Seventh parameter the relationship function value of diabetes family hierarchy. Doctors believe that if both the parents have the diabetes then the child has near to 60% of chance of getting diabetes. If one parent has then the child has the chance of 40%. Totally this dataset contains 768 Number of Instances. Within that 500 numbers of instances belong to class 0 remaining 268 instances belong to class 1.

55

Below given are the mean and standard deviation values for the eight input parameters. The raw and pre-processed dataset is attached in the Appendix-A. Below table 3.1 explore the Pima Indian Diabetes Dataset using descriptive Statistics.

Table 3.1 Descriptive Statistics for Pima Indian Diabetes Dataset

Parameter

Number Mean

Standard

Deviation 1 3.8 3.4

2 120.9 32.0

3 69.1 19.4

4 20.5 16.0

5 79.8 115.2

6 32.0 7.9

7 0.5 0.3

8 33.2 11.8

In the time of exploration we found that 5th parameter 2-Hour serum insulin has 115.2 as the Standard Deviation value. It is very high. The field data may be deviated highly in it’s range. The 7th parameter pedigree function standard deviation value is 0.3 that means values in its field is not very much deviated because the value range is in between 0.085 to 2.42. In the time of preprocessing all the fields data’s are going to be normalized in between 0 to 1 then it will be used in the Artificial Neural Network.

56

3.5.2 Neural Network Classification System Diagram

Neural Network Architecture Diagram

Patient Input Data Records

Trained Feed Forward Artificial Neural Network Using Backbropagation Algorithm

Classified output class (+ive / -ive ) for diabetes

Figure.3.16 Artificial Neural Network Classification Tool for

Diabetes Disease Diagnosis

Input Layer Hidden Layer Output Layer

Figure. 3.17 A Feed Forward 8-8-1 Backpropagation Neural

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

5

57

The classification system contains three different modules. First module is input module. It will receive a new patient’s input and hand it over to the second module. Second module is a trained neural network system which will classify the given input patient’s case record into positive or negative for diabetes disease. Third module is an output module which is going to display the classification system output. The block diagram and architecture design are represented in the form of figure 3.16 and 3.17.

We constructed a feed forward neural network to classify the diabetes patient dataset. It consists of input layer, one hidden layer and an output layer. Input layer contains 8 nodes to receive the patient input data. Hidden layer contains 8 hidden nodes which have the link with the input layer and the output layer. The hidden layer is using the tansig function. Output layer contains one output node to show the result class 1 or 0. Class 1 represents “+ive” for the diabetes disease and class 0 represents “-ive” for diabetes disease. From the available 768 instances after preprocessing we got 701 instances ready to use for training and testing the neural network. Training dataset contains 681 instances and testing dataset contains 20 instances.

3.5.3 System Output for Test case 1 Using Matlab software

Figure. 3.18 Created Neural Network Classification System Output

Screen

58

Neural network used the backpropagation algorithm for training. Using MATLAB software, Xlminer software we implemented this neural network classification tool. The overall performance of the neural network is represented in the form of confusion matrix below. The Matlab code is attached in Appendix-B.

Table 3.2 Confusion matrix representing network performance

Predicted

Positive Negative

Actual Positive 5 5

Negative 1 9

Table 3.2 shows the classification performance matrix of the created neural network. The created artificial neural network system has classified 14 cases correctly out of 20 given input patient cases. It has given 70% correct classification and 30%

miss classification result. Testing dataset contains 10 type-1 classes (Tested positive for diabetes) and 10 type-0 classes (Tested negative for diabetes). From the above cost matrix we observe that out of given 10 type-1 class cases it classified 5 cases correctly as type-1 and remaining 5 cases classified wrongly as type-0. For the 10 type-0 class cases it classified 9 cases correctly and remaining 1 case wrongly as type-1.

We created the same neural network architecture with 8-8-1 network format with xlminer software. Among the 532 diabetes dataset 80% assigned to training dataset and remaining 20% is assigned to test dataset. Total number of training dataset is 426 and the testing dataset is 106.

3.5.4 System Output for Test case 2 using Xlminer Software

ANN Training Information

Among the total 532 datasets 426 is assigned for training and the remaining

106 is for testing purpose. We used the random sampling method to select the

data for training and testing purpose with the random seed number 12345. The

number of hidden layers is 1 and it has 8 hidden nodes. It uses the squared

error cost functions. The hidden layer and output layer uses the sigmoid

function and we used the 100 Epoch for training. Below table 3.3 gives all the

related information about partitioning and training parameters for ANN.

59

Table 3. 3 Dataset Partitioning and ANN Parameters Setting

Random Seed and Data Partitioning

Data source Sheet1!$A$2:$H$533

Selected variables npreg glu bp skin bmi ped age class

Partitioning Method

Randomly chosen

Random Seed 12345 # training rows 426 # validation rows 106

Normalized Data

Training data used for building the model ['diabetes_dataset.xls']'Data_Partition1'!$C$19:$I$444

# Records in the training data 426

Validation data ['diabetes_dataset.xls']'Data_Partition1'!$C$445:$I$550

# Records in the validation data 106

Input variables normalized Yes

Input and output Variables

# Input Variables 7

Input variables npreg glu bp skin bmi ped age

Output variable class

ANN Different Parameters/Options

# Hidden layers 1

# Nodes in HiddenLayer-1 8

CostFunctions Squared error

Hidden layer sigmoid Standard

Output layer sigmoid Standard

# Epochs 100

Step size for gradient descent 0.1

Weight change momentum 0.6

Error tolerance 0.01

Weight decay 0

60

Inter-layer connections weights of the 8-8-1 Artificial Neural Network

The 8-8-1 ANN has one input layer, hidden layer and a output layer. Weights

will be given only for the hidden layer and output layer nodes. Below table-3.4

have the weights for the 8 hidden layer nodes and the output layer node.

Table 3.4 Inter Layer Nodes Connection Weights

Input Layer

Hidden

Layer

# 1 npreg glu bp skin bmi ped age

Bias

Node

Node # 1 -0.4522 -0.9775 0.11865 0.16302 -0.5823 -1.0714 -0.3824 -0.0574

Node # 2 -0.25 -1.291 -0.7831 0.02103 -0.9005 -1.8599 0.85199 -1.1962

Node # 3 0.86604 -1.452 1.34197 1.67844 -2.0763 1.29752 -2.4523 -0.448

Node # 4 -1.0671 -3.8907 1.02068 -2.6326 0.05503 1.2654 -4.4535 -2.2187

Node # 5 -1.0359 -0.8156 1.43389 -0.2299 1.26618 -1.4356 -3.1668 -2.152

Node # 6 -0.5728 2.82629 0.143 1.55815 0.82498 6.25805 -0.3448 0.56364

Node # 7 -0.368 -1.6021 -1.728 0.08224 -2.0159 0.22804 1.11316 -3.0508

Node # 8 -0.6172 0.00948 -0.7307 1.9301 -1.0111 -0.3223 1.91472 -1.9049

Hidden Layer # 1

Output

Layer

Node #

1

Node #

2

Node #

3

Node #

4

Node #

5

Node #

6

Node #

7

Node #

8

Bias

Node

1 0.22563 -0.313 -1.7026 -2.6973 -1.5468 2.30375 -2.0574 -1.591 0.47118

0 -0.2134 0.28843 1.69853 2.68841 1.54273 -2.3104 2.07925 1.58558 -0.4631

61


We used the 100 Epochs for training the Artificial Neural Network. Below

Figure 3.19 shows the Training Curve. For the 1st Epoch the Error Rate is 31.7

in the 100th Epoch the Error Rate is 15.5. In the Graph X-Axis shows the

Epoch Number and the Y-Axis shows the Error rate.

Figure 3.19 ANN Training Curve for Test Case 2

62

Out of 106 test dataset 72 datasets are classified correctly remaining 34 are classified wrongly. The error rate is 32.08%. Among the 34 miss classification 22 of the cases are with class 1 (Positive for Diabetes) and remaining 12 classes are with class 0 (Negative for Diabetes). The performance matrix and error report are represented in the form of table 3.5 and 3.6.

Performance Matrix

Table 3.5 Classification Confusion Matrix for Test Case 2

Classification Confusion Matrix

Predicted Class

Actual Class 1 0

1 21 22

0 12 51

Class 1 (Positive for Diabetes) has the error rate of 51.16 % in the time of classification but class 0 (Negative for Diabetes) has the error rate 19.05 % only.

Table 3.6 Error Report for Test Case 2

Error Report

Class # Cases # Errors % Error

1 43 22 51.16

0 63 12 19.05

Overall 106 34 32.08

63

System Output Screen

Figure 3.20 Xlminer Output Screen for Test Case 2

Created Artificial neural network output screen is displayed in the form of figure 3.20. It contained the classification performance matrix and the error report.

64

3.5.5 System Output for Test case 3 Using Xlminer Software


Among the total 532 datasets 426 is assigned for training and the remaining

106 is for testing purpose. We used the random sampling method to select the

data for training and testing purpose with the random seed number 11111. The

number of hidden layers is 1 and it has 8 hidden nodes. It uses the squared

error cost functions. The hidden layer and output layer uses the sigmoid

function and we used the 100 Epoch for training. Below table 3.7 contains the

step size, weight change momentum and error tolerance parameters

information.

Table 3.7 Data Partitioning and ANN Parameters Settings

Data Partitioning With the Random Seed 11111

Data source Sheet1!$A$2:$H$533

Selected variables npreg glu bp skin bmi ped age class

Partitioning Method Randomly chosen

Random Seed 11111

# training rows 426

# validation rows 106

Normalized Data Setting

Training data used for building the model ['diabetes_dataset.xls']'Data_Partition1'!$C$19:$I$444

# Records in the training data 426

Validation data ['diabetes_dataset.xls']'Data_Partition1'!$C$445:$I$550 # Records in the validation data 106

Input variables normalized Yes

Input and output Variables

# Input Variables 7

Input variables npreg glu bp skin bmi ped age

Output variable class

65

ANN Parameters/Options Settings

# Hidden layers 1

# Nodes in HiddenLayer-1 8

CostFunctions Squared error

Hidden layer sigmoid Standard

Output layer sigmoid Standard

# Epochs 100

Step size for gradient descent 0.1

Weight change momentum 0.6

Error tolerance 0.01

Weight decay 0

Inter-layer connections weights of the ANN

The 8-8-1 ANN has one input layer, hidden layer and an output layer. Weights will be

given only for the hidden layer and output layer nodes. Below table 3.8 have the

weights for the 8 hidden layer nodes and the output layer node.

Table 3.8 Inter Layer Node Connection Weights

Input Layer

Hidden

Layer # 1

npreg glu bp skin bmi ped age Bias

Node

Node # 1 -0.12 -3.84 -0.92 2.32 -0.72 -0.50 -3.88 0.80

Node # 2 -0.94 -0.83 -2.11 0.20 -3.48 0.30 0.81 -4.02

Node # 3 -0.26 -0.56 0.17 0.43 -0.68 -0.26 -1.14 -1.03

Node # 4 0.07 -2.02 1.28 -2.56 -0.17 -1.18 -3.51 -3.08

Node # 5 -1.11 -2.13 1.71 -0.42 0.73 -0.83 -4.29 -2.42

Node # 6 -1.75 2.06 0.67 1.70 0.64 5.55 -0.60 0.87

Node # 7 -0.58 -0.80 -0.25 0.02 -0.28 -0.32 1.28 -2.02

Node # 8 -0.80 0.70 0.22 2.13 -1.19 1.30 1.29 -1.56

Hidden Layer # 1

Output

Layer

Node

# 1

Node

# 2

Node

# 3

Node

# 4

Node

# 5

Node

# 6

Node

# 7

Node

# 8

Bias

Node

1 -2.00 -2.75 0.07 -2.65 -2.05 2.38 -0.68 -1.82 0.84

0 1.98 2.71 -0.01 2.64 2.03 -2.36 0.74 1.80 -0.85

66


We used the 100 Epochs for training the Artificial Neural Network. Below

Graph shows the Training Curve. For the 1st Epoch the Error Rate is 31.9 in

the 100th Epoch the Error Rate is 14.6. In the Figure 3.20 X-Axis shows the

Epoch Number and the Y-Axis shows the Error rate.

Figure 3.20 ANN Training Curve for Test Case 3

Out of 106 test dataset 77 datasets are classified correctly remaining 39 are classified wrongly. The error rate is 27.36%. Among the 29 miss classification 20 of the cases are with class 1 (Positive for Diabetes) and remaining 9 classes are with class 0 (Negative for Diabetes).Table 3.9 and 3.10 shows the performance matrix and Error Report for Test Case 3.

Epoch Number

67

Figure 3.19 Xlminer Output Screen for Test Case 3

Performance Matrix

Table 3.9 Classification Confusion Matrix for Test Case 3

Classification Confusion Matrix

Predicted Class

Actual Class 1 0

1 22 20

0 9 55

Class 1 (Positive for Diabetes) has the error rate of 47.62 % in the time of classification but class 0 (Negative for Diabetes) has the error rate 14.06 % only.

68

Table 3.10 Error Report for Test Case 3

Error Report

Class # Cases # Errors % Error

1 42 20 47.62

0 64 9 14.06

Overall 106 29 27.36

3.6 Chapter Summary

In this Chapter we mentioned the different Artificial Neural Network architectures and its various activation functions. We used Multi layer perceptron architecture with Back propagation algorithm. This algorithm is very much used in the classification tasks. We explored the dataset characteristics with the basic descriptive statistics and found the need for the normalization process in the time of data preprocessing. We implemented a neural network for the Pima Indian Diabetic dataset with 8-8-1 network architecture. First test case contains 20 test datasets. Second and third test datasets contains 106 test data. The implemented system gave 30%, 32.08% and 27.36% as error rates in the time of classification. The average error rate is 29.81% and the correct classification rate is 70.18%. That is the standard performance rate of the neural network on the Pima Indian diabetes dataset. Our main objective is to improve the classification performance that we discussed in our next chapter.

ARTIFICIAL NEURAL NETWORK CLASSIFICATION TOOL FOR...

Documents

Transcript of ARTIFICIAL NEURAL NETWORK CLASSIFICATION TOOL FOR...