Evolutionary Neural Networks: Design Methodologies
By ai-depot | March 24, 2003
Evolution of Connection Weights
Training a given network topology to recognize its purpose generally means to determine an optimal set of connection weights. This is formulated as the minimization of some network error function, over the training data set, by iteratively adjusting the weights. The mean square error between the target and actual output averaged over all output nodes serve as a good estimate of the fitness of the network configuration corresponding to the current input.
One of the major advantages of using EAs for optimizing functions is its ability to scan through the global search space simultaneously instead of restricting itself to localized regions of gradient shifts. Gradient-based algorithms are prone to getting trapped in a local minimum of the function, especially when the function is multimodal or non-differentiable. EAs work with a population of trial solutions, and hence the only information they need is some kind of an evaluation scheme to check the performance of the solutions. They continuously scan out the bad ones and allow the better ones grow.
Any EA requires a consistent representation of the function parameters. The genetic operators (crossover and mutation) have to be then decided in conjunction with the representation scheme. The training performance may vary from one scheme to another, and it is very likely that the operators may disrupt the genes in a particular representation, meaning not allowing the information to pass over to the next generation. This may happen when information pertaining to some functional unit (say a node in an ANN) is scattered in the gene. Randomly breaking the gene into two parts and interchanging one of them with another gene will disperse this information.
Figure 4: Binary representation of connection weights.
Encoding - a genotype representation - an ANN’s weights can be done in two ways - binary and real valued. In either of the cases, it is just a concatenation of the network’s weights in a string. In the binary form, each connection weight is represented by a bit string of certain length. These strings are concatenated to form what is called a chromosome. Each chromosome acts an individual of the population which is then subjected to the genetic operators. This kind of a representation is simple and straightforward, but can result in very long chromosomes resulting in inefficient evolution. If too few bits are used, certain real valued weights may not get well approximated by discrete bits. Another choice is to somehow use a real valued chromosome and design genetic operations that can be applied to such chromosomes. In the next section we shall see one such scheme.
A real valued evolutionary process
A real valued representation of the weights can be done using an n-dimensional vector. The connection weight vector can be written as U = ( u1, u2, u3, ….., un ).
where u1, u2, u3, ….., un are the connection weights of the network. We will also be requiring what is called a variance vector. Let S = (s1, s2, s3, ….., sn ) represent this variance vector. The importance of this vector will be made clear at a later stage. Each individual of our population will be a pair of these two vectors - (Ui, Si). Each individual defines an ANN.
A fitness value now needs to be assigned to each of these individuals. We shall be using the mean square error function to evaluate the fitness. Calculating the mean square error involves summing over the square of the differences between the actual and expected output at each node, and then taking the average over all output nodes.
Σ (Outputactual - Outputexpected )2 | |
Error = | ——————————————- |
Number of output nodes |
Our objective throughout the process would be to minimize this error. Hence, the fitness we assign shall be the inverse of this error - more the error, less the fitness. At this stage some form is recombination could occur, but we shall omit this step and take mutation as our primary genetic operator. For each member of the population, an offspring is created by mutation as follows.
si′ = si exp ( τ′ N(0,1) + τ Ni(0,1) ) ui′ = ui + si′ N(0,1) for i = 1, 2, 3, …, n
This scheme is known as the Gaussian mutation scheme. N(0,1) is a normally distributed random value with mean 0 and variance 1. A new number is generated for every component (connection weight) of every population member. Ni(0,1) is a similar random number but is generated only once for each population member. The values of τ′ and τ are commonly set to:
τ′ = 1 / sqrt ( 2 sqrt(n)) τ = 1 / sqrt (2n)
As we can see, the variance vector Si has been used to decide the amount of shift the component weights should be given from their current value. As such, the variance values tell us how much the corresponding weight values are susceptible to deviate from their current value. Weights that are contributing well towards the convergence of the real and expected output are not changed much in their value.
The next stage is to evaluate the fitness of the offspring and generate the next population. The next population is generated using a method called tournament selection. For each individual in the set of parents and offspring, a fixed number of individuals are chosen uniformly from all parents and offspring. A comparison of the fitness value is carried out and the individual is assigned a “win” if its fitness is not smaller than the opponent’s. The individuals with the most wins are taken to form the next generation. The process is carried on until some halting criteria are satisfied.
Although EAs offer an attractive way to optimize ANN weights, they are relatively slow in local fine tuning in comparison to gradient methods. An obvious approach in this case would be to integrate a local gradient search with the EA. EAs global search ability can be used to locate a good region in the space and then a local search procedure can locate the near optimal point in that region. The EA will find out a good set of initial weights to start with for the local search algorithm. Since EAs are not sensitive to initial conditions as opposed to gradient methods, such kind of a hybrid training algorithm will be quite competitive in terms of faster convergence.
Tags: evolutionary algorithm, genetic algorithm, neat, neural network
Category: tutorial |