You are on page 1of 7

Research on the application of genetic algorithm combined with the cleft-overstep algorithm for improving learning process of MLP

neural network with special error surface.


Cong Huu Nguyen - Facuty of Electronics. Thai Nguyen University of technology - Thai Nguyen university
Thai Nguyen, Viet Nam

Thanh Nga Thi Nguyen


Facuty of Electrical. Thai Nguyen University of technology Thai Nguyen University Thai Nguyen, Viet Nam
Abstract

The success of an artificial neural network depends much on the training phase. Techniques for training neural network based on gradient are partially satisfying and are widely used in practice. However, in several cases which has special error surface similar to a deep cleft, these algorithms seem to work slowly and encounter local extreme values. Authors of this paper propose the use of genetic algorithm in combination with the cleft-overstep algorithm to improve the training process of neural network which has special error surface and illustrate this usage through a simple application in text recognition. First, An MLP artificial neural network with cleft-similar error surface is trained using back propagation algorithm and the results are analyzed. Next, the paper describes the usage of the proposed method to improve the training process of neural network on two aspects: correctness and rate of convergence. Implementation is conducted in and results obtained from Matlab environment.
Keywords: MPL, Genetic Algorithm, cleft-overstep Algorithm, Character recognition.

This paper presents the use of GA in combination with cleft-overstep algorithm to conquer the trajectory and shorten the time for optimal searching in complex cleft-similar error surface. For illustration, the authors propose neural network for recognizing digits from 0 to 9. The sigmoid function is used to produce the error surface similar to a cleft [3] (fig 2). To present the digits, a matrix of size 5x7 was used to encode each character (fig 1). Each input vector x corresponds to a vector of size 35x1 with 0/1 elements. Therefore, we can choose the input layer for neural network with 35 neurons. To distinguish 10 digits, we choose the output layer for neural network with 10 neurons. For the hidden layer, its structure is illustrated in figure 2, where:

1.

INTRODUCTION Figure 1: Cleft-similar error surface and Sample of Character

Neural network training process is essentially the process of solving optimal problems aiming at updating weights so that the error function reaches its minimum value or reaches the value smaller than a preset value [5] . A popular algorithm used for neural network training is conjugate gradient algorithm or the Levenberg Marquardt algorithm using back propagation techniques which is sometime called back propagation algorithm. The search for optimal algorithm which aims at minimizing convergence time as well as avoiding local minimums, weak minimums usually starts with studying the nature of error surface or also called the quality surface. One of the kinds of error surface which receives much attention of scientists is a surface with multivaluedness and the stretching ability in the parameter space. Using steepest descent gradient back propagation algorithm introduces convergence problems with complex cleft-similar error surface which has elongated and curved contours forming cleft axes and gradient may vary in a wide range in different areas of the parameter space [3,5](fig 1).

Figure 2: Structure of neural network for recognition The input vector x has the size of 35x1 The output vector y for hidden layers has the size of 5x1 The output vector z has the size of 10x1

The weight matrix for hidden layer W1,1 has the size of 35x5. - The weight matrix for hidden layer W2,1 has the size of 5x10 Function f is chosen as a sigmoid function for the fact that it is widely used for multiple layers neural network and it can easily produces an error surface with narrow clefts. The equation of sigmoid function is: f = 1 / (1 + exp(-x)) The error fuction used for neural network training is J = 0.5*( z t )2 where z is the output of the output neurons and t is the desired output 2. CONSTRUCTING TRAINING ALGORITHM training using back propagation 2.1. Neural network algorithm.

Figure 3 illustrates results of training process for the text recognition with different back propagation algorithms: Batch Gradient Descent (traingd), Variable Learning Rate (traingdx). These methods are all integrated in the Neural Network Toolbox of Matlab. Generally, the methods produce good results; however, the time for training is long in order to obtain desired correctness. To solve this problem we need an algorithm which adjusts learning steps so that the convergence time for training can be shortened as well as avoiding local extreme values. In the following section we will present:

Back propagation learning algorithm for MLP network can be described as follows [5]: - Step 1: Provide the training set with k pairs and the target output. - Step 2: Initialize weights and set parameters for the network - Step 3: Propagate k pairs of training data from the input layer to the output layer. We can describe the calculation of signals in each layer as: a0=PK am+1=fm+1(Wm+1am+bm+1) m=0,1,2M-1 a=a
M

Figure 3: Results obtained from different back propagation training methods (traingd, traindx) The principle of the cleft-overstep algorithm for adjusting learning coefficient of the neural network in order to increase the correctness and the speed of convergence for neural networks. - The principle of using genetic algorithm for designing neural networks with high correctness. 2.2. The cleft-overstep algorithm for neural network training Cleft-overstep principle Examining the unconstrained minimizing optimization problem: J(u) min, u En (1) -

(input) with the layer index

(a-output signal of the network)

- Step 4: Calculate the mean square error and back propagating this error to previous layers. - Step 5: Updating weights in the steepest descent gradient direction. Iterating the process from step 3 to step 5 until the acceptable mean square error is obtained. Back propagation algorithm converges to a solution which minimizes the mean square error because the way it adjusts the weights and bias coefficient of the algorithm is in the opposite direction with the gradient vector of the mean square error function . However, with the MLP network the mean square error function is usually complex and has local extreme values; thus iterations in neural network training may only reach local extremes of the mean square error function, but not the global extreme values. The convergence of the training process depends on the initialized conditions of the neural network; especially the chosen leaning coefficient plays an important role in increasing the speed of convergence. The method for choosing the learning coefficient is different among specific problems. Moreover, when the training process using back propagation algorithm converges, we cannot conclude that it converges to the optimal solution. We have to test with some initialized conditions to make sure the optimal solution was obtained.

(2.1) Where u is the minimizing vector in an n-dimensional Euclidean space, J(u) is the target function which satisfies (2) lim J ( x) = b
u

(2.2) The optimization algorithm for problem (1) has the iteration equation as follows: u k +1 = u k + K s k , k = 0,1.. ... (3) where uk and uk+1 are the starting point and the ending point of the kth iteration step, sk is the vecgtor which show the changed direction of numeric variables in n-dimentional space; k is the step length.

and called a cleft-overstep step and equation (3) is called the cleft-overstep algorithm.

k is determined according to the cleft-overstep principle

The basic difference between cleft-overstep method and other methods is in the principle for step adjustment. According to this principle, the step length of the searching point at each iteration step is not smaller than the smallest step length at which the target function reaches the (local) minimum value in the moving direction at that iteration step. The searching optimization trajectory of the cleft-overstep principle creates a geometric picture in which the searching point oversteps the cleft bottom at each iteration step. To specify the cleft-overstep principle we examine a one numeric variable function at each iteration step [3]:

Suppose that J(u) is continuous and satisfies the condition limJ(u) = when u and at each iteration k, point uk-1 and moving vector sk-1 was determined. We need to determine the length of the step k which satisfies condition (7). 2.3. Genetic algorithm (GA)

h ( ) = J ( u k + .s k )

(4)

Suppose that sk is the direction of the target function at the point uk. According to condition (2), there is a smallest value * > 0 so that h() reaches minimum: * = arg min h ( ) , > 0 (5) If J(uk), this also means h(), continuously differentiable, we can define the cleft-overstep step as follows: h ' ( ) v > 0, h ( v ) h ( 0 ) (6) = ( v is the overstep step, means that it oversteps the cleft) The variation graph of function h(), when the optimization trajectory changes from the starting point uk to the ending point uk+1 is illustrated in figure 4. We can see that when the value ascends from 0, go through the minimal point * of

h(), to the value v , the corresponding optimization trajectory moves forward parallely with sk in the relationship k +1 k k that u = u + Ks ,k = 0,1.. and takes a step length of = v * . This graph also shows that, considering the moving direction, the target function changes in the descending direction from point uk, but when it reaches point uk+1 it changes to the ascending direction. If we use moving steps according to condition (5), we may be trapped at the cleft axis and the corresponding optimization algorithm is also trapped at that point. Also, if the optimization process follows condition (6), the searching point is not allowed to be located at the cleft bottom before the optimal solution is obtained and, simultaneously, it always draw a trajectory which overstep the cleft bottom. In order to obtain effective and stable iteration process, condition (6) is substituted by condition (7). 0 h* v * = arg min h ( ), h ( v) h* h (7) >0 Where: 0 < < 1 is called overstep coefficient h * = h( * ) h 0 = h( 0 ) Determining the cleft-overstep step Choosing the length of learning step in the cleft problem is of grave importance. If this length is too short, the running time of computers will be long. If this length is too long, there may be difficulties in searching process because it is difficult to observe the curvedness of the cleft. Therefore, adaptive learning step for the cleft problem is essential in the process of searching for optimal solution. In this section, we propose a simple, yet effective way to find the cleft-overstep step.

Start
=a =0.5 Correctness >0 =0.1 Initialized u0 Searching direction s0

h()=h(u+as)

= + ( )

h() h(0) = =a =1.5


0 1 1

=0 =a

h() h()

h() h()
1

End

Figure 4: diagram of algorithm that determines the cleftoverstep learning step Parameters that have to be found in neural network training include weights, initial value of learning rate, momentum rate and structure of the network. Initial parameters are initialized by using random function in [0,1]. Network structures are described by the number of hidden layers (h) and the number of nodes in each hidden layer (ni; i = 0; 1; :::; h). GA is regarded as the searching process based on natural selection and genetics. This algorithm is used to obtain optimal values of hidden nodes and initial values of momentum learning rate. Figure 5 describe the operations of a simple GA. 1.Randomly initialize a population 2. Calculate adaptive values
0 0 P 0 = (a10 , a2 ,...., a )

4. Replace chromosomes with weak fitness by new better chromosomes. 't 5. Calculate adaptive values for new chromosomes f ( ai ) and insert to the new population. 6. Increase the number of generations if the iteration stopping criteria are not met and iterate from step 3. When the iteration stopping criteria are met, the output is the best chromosomes.

f (ait ) of each chromosome

ait in the current population P t .


3. Based on adaptive values, new chromosomes are created by selecting parent chromosomes for applying mutation and crossover algorithms.

Figure 5. Operation cycle of GA

Genetic operators used in experiments for this paper are as follows [1,5]: Population initializing: This process randomly creates a tube of genes ( is the size of the population) encoding the genes using real values we obtain the length of the chromosomes as L, the set of chromosomes creates the initial population. Adaptive function: The adaptive function used in this paper is as follows:
TSSE = f = ( tij zij )
i =1 j =1 s m

(8)

Where s is the total of learning samples, m is the number of neurons in the output layer, G is the total sum squared error of s samples and zij is the output of the network. Selection: Adaptive values are calculated and selection is conducted Phase 1 by using the Roulette wheel selection method. As a result, the Phase 2 Genetic fitness are selected and inserted to the individuals with high MLP algorithm algorithm nextBegin generation. and cleft-overstep Crossover:
Initialize weights from Initializing a population of Crossover for the current chromosomes with create chromosomes combines characteristics of parents to smallestoffsprings by combining corresponding segmentsfunction fitness of parents. The generation

crossover point is selected dependent of the fitness in each generation according the following function:

Cr = ROUND[F fit(i, j) L]

[0.....L]

(9)

2.4. Neural network training using the combination of GA and cleft-overstep algorithm The algorithm which combines cleft-overstep and back propagation for training MLP neural network is applied for the text recognition problem as proposed in figure 6. This algorithm has 2 phases for neural network training. The first phase uses genetic algorithm with straight propagation aiming at speeding up the entire training process. The genetic algorithm conducts a global search and an optimal search near the initial point (weights vector) for the second phase. The fitness function (target function) for genetic operators is determined as total sum squared error (TSSE) of the corresponding neural network. Therefore, the problem will become an unlimited optimization problem which attempts to find a set of variables. In the second phase, back propagation algorithm is used with learning steps which are changed according to the cleft-overstep algorithm proposed above. Implementing the algorithm using Matlab is as follows: /* Phase 1: */ Initializing chromosomes randomly for the current generation, initializing working parameters and the first chromosome is the best chromosome.

Where ROUND() is the function propagation for MLP the Calculating fitness Straight for determining nearest satisfyingfor each The greater the crossover point, the function integer. using sigmoid activation more the chromosome characteristics of parents represent in their offfunction springs.
Calculating total square Applying genetic error for real output and operators to create To avoid local optimal point, individuals are changed desired output randomlynew generation with the mutation point Mr as follows: Updating weights by using Replacing is generation Where Mb old the upper limit BP algorithm and adjust The of the mutation point. new one fitness willbyascend over time learning step according to and crossover for mutated the Evolution process is chromosomes is also applied. cleft-overstep algorithm

Mutation:

M r = ROUND[( L C r ) M b / L ] { 0...M b}

(10)

implemented until the desired output is obtained.


False End of stage 1 End True

End

False

Tru e Directions for phase 1: Directions for phase 2:

5 Figure 6: Diagram of the algorithm which combines cleft-overstep and back propagation for training MLP neural network

a- Iterating with i=1 to the size of the population, do the following actions: Initializing sub_total_fitness = 0 and sub_best_chromosome as null b- Iterating with j = 1 to the length of the chromosome, do the following actions: - Implementing the back propagation procedure for MLP neural network (using sigmoid activation function). - Calculating the target function (system error of neural network). - Calculating total error total_fitness by accumulating sub_total_fitness c- Saving best_chromosome to sub_best_chromosome d- Comparing sub_best_chromosomes to each other and set the largest sub_best_chromosome as the best_chromosome. e- Iterating with i = 0 to the population size/2, do the following procedures: Initializing sub_total_fitness as 0 and sub_best_chromosome as null - Iterating with j = 1 to the length of the chromosome, do the following actions: * Select parent chromosomes by using the roulette wheel selection method * Apply mutation and crossover operators - Iterating from k=1 to the length of the chromosome, do the following actions: * Implement the straight propagation MPL neural network * Calculate values of target function for parent - Calculating sub_total_fitness by accumulating values of target function of each chromosome Saving best_chromosome to sub_best_chromosome g- Replacing old generation by new generation if the stopping criteria are met. /* Phase 2 */ - Set best_chromosome as the initial weight vector, and determine the structure of the MLP neural network. - Calculating the real output of the MLP neural. - Calculating the error between the real and the desired. - Updating weights by using back propagation, updating learning coefficient by the cleft-overstep algorithm. END 3. EXPERIMENTAL RESULTS The MLP neural network was trained by using sample characters set with the size of 5x7 as presented above. Initial values such as the number of inputs (35), the number of hidden layer (1), the number of hidden neurons (5), different neural network training techniques, encoding input and output in order to initialize weights are mentioned above. To test the capacity of the neural network for the recognition process, we propose a parameter for assessing the quality of the network named the rate of recognition error. The rate is calculated as:

Training parameters: Size of characters = 5 x 7 Number of outputs = 10 Number of inputs = 35 Neurons in the hidden layers = 5 Desired correctness = 90% Learning rate = 0.6 Desired system error = 0.06 Training results::
Number of training cycles Error % TSSE rate 20 93.33 0.8136 60 60.33 0.6848 100 40.67 0.2834 130 37.33 0.2823 200 0 0.06

Test 2 : Training MLP neural network using BP algorithm combined with cleft-overstep algorithm. Training parameters: Population size = 20 Crossover probability = 0.46 Encoding using real values Length of chromosome = 225 Desired correctness = 90% Number of generations: 20 Desired system error=0.06 Training results:
Number of generations Total Fitness 1 9.5563 5 8.1638 10 6.1383 15 5.724 20 5.697

Number of training cycles rate Error TSSE

5 93.33% 0.3186

10 60.33% 0.1674

15 40.67% 0.1387

20 37.33% 0.0864

33 0% 0.0589

We can see that system error in test 1, when pure back propagation was used, was 0.06 after 200 training cycles. For test 2 we need only 20 cycles to obtain this result. Mean fitness value obtained was 5.679. The results from phase 1 are used to initialize weights for phase 2. With the change of learning step according to cleft-overstep algorithm, system error reached 0.0589 after 33 training cycles and the correctness of the recognition process was 100%. Operation of pure MLP neural network and MLP neural network which uses cleft-overstep and genetic algorithm are illustrated on figure 7.

Failue rate FR= ( )

N o of tested characters N o of recogni zed N o of tested characters

Test 1 : Training MLP neural network using pure back

6 Figure 7: Operation of pure MLP and improved

REFERENCES
[1] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley Pub. Comp. Inc., Reading, MA, 1989 D. Anthony, E. Hines, The use of genetic algorithms to learn the most appropriate inputs to neural network, Application of the International Association of Science and Technology for Development-IASTED, June, 1990, 223226. Nguyen Van Manh and Bui Minh Tri, Method of cleft-overstep by perpendicular direction for solving the unconstrained nonlinear optimization problem, Acta Mathematica Vietnamica, vol. 15, N02, 1990. L. Davis, Hand book of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991. L. Fauselt, Fundamentals of Neural Networks, Prentice-Hall, International Inc., Englewood CliGs, NJ, 1994. R.K. Al Seyab, Y. Cao (2007)Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation, School of Engineering Cranfield University, College Road, Cranfield, Bedford MK43 0AL, UK, Science Direct Cong Nguyen Huu; Nam Nguyen Hoai, Optimal control for a distributed parameter and delayed time system based on the numerical method, Teth international conference on Control, Automotion, Robotics and vision( ICARCV2008)..

[2]

[3]

[4] [5] [6]

4.

CONCLUSION
[7]

In this paper, the authors proposed using genetic algorithm in combination with cleft-overstep algorithm to improve the training process neural network which has special error surface and illustrated the application of the proposal in digits recognition. Through research and experiments on computers we can see that, with neural network structures which has cleftsimilar, the back propagation technique is still used but the application of genetic algorithm combined with the cleftoverstep algorithm for neural network training will provide higher correctness and faster convergence compared with the gradient method. The results can be explained as follows: - The result of neural network training depends much on initial values of the weight vector. The use of genetic algorithm for global searching allows obtaining good initial weight vector for the next phase of the training process. - When special cleft-similar error surface presents, training the neural network by using conjugate gradient algorithm or Levenberg-Marquardt will result in low convergence and encounter local extreme values. The cleftoverstep algorithm aims at searching for optimal learning step in Phase 2 of the neural network training process and showed effectiveness in overcoming those drawbacks and thus increase the speed of convergence and the correctness of the training process. The use of genetic algorithm in combination with the cleftoverstep algorithm can be applied for training several structures of neural network which have other special error surfaces. Therefore, the results of this research can be applied in different fields such as communications, control and information technology. This paper need further research on determining the directions of searching vector in the cleft-overstep algorithm, changing the assessment criteria of the quality function in order to reduce complexity for the computation process on computers [7]. However, results of this research bears good reflection of the proposed algorithm and may lead to its application in reality.

You might also like