Evolutionary Neural Networks: Design Methodologies
By ai-depot | March 24, 2003
NN and genetic algorithms are both abstractions of natural processes, formulated into a computational model to harness the learning power and adaptive capabilities of both approaches. So called ‘adaptive learning’ demonstrates complex and purposeful behaviour thanks to the EA.
Written by Rinku Dewri.
Evolutionary Neural Networks: Design methodologies
Preface
Artificial Neural Networks and Evolutionary Algorithms are both abstractions of natural processes. They are formulated into a computational model so that the learning power of neural networks and adaptive capabilities of evolutionary processes can be harnessed in an artificial life environment. “Adaptive learning”, as it is called, produces results that demonstrate how complex and purposeful behavior can be induced in a system by randomly varying the topology and the rules governing the system. Evolutionary algorithms can help determine optimized neural network architectures, as well as, provide faster mechanisms to train the network to identify its purpose. Adaptation being the keyword here, modifications or reformulations in the network’s objectives can also be easily handled without much effort.
Introduction
The brain is a mesh of billions and billions of neurons. The process of neural communication, that gives the brain most of its functional complexity, has been a key area of research for many years now. There has been a great deal of effort to understand the processes of information storage and retrieval going on inside the brain. A rather deterministic procedure has been contrived to explain it, which in turn inspired the foundations of an artificial model - artificial neural networks.
Every neuron is charged with its excitation potential, which is much like an electrical impulse. This electric pulse propagates down a connecting linkage, called the axon, starting at the nucleus of the cell and ending in another neuron. At the end of the axon, chemicals, called neural transmitters, are released because of this excitation potential. These chemical carriers bind themselves to receptors of the neighboring neurons, and then either activate the neuron or inhibit it from reaching its excitation potential.
The receptors on the receiving cell control the threshold to which a neuron has to be pushed in order to achieve its excitation potential. The neural transmitters are capable of causing the potential of a cell go up or down, and hence the number of receptors actually determine whether a neuron has to be charged or not. The cell can control this number of receptors depending on how frequently it has been excited. More number of excitations in a relatively smaller interval of time will result in an increase in this number making it more prone the next time around; whereas an inactive neuron will have less of these receptors making it less likely to fire. The chemical makeup decides whether a neuron reacts to impulses or not.
It can be seen that the information stored in a neuron is an evolutionary process, rather than being a fixed step biological one. Over time, an active neuron may loose its influence and an inactive one gain more engagement. The configuration of the whole network changes frequently, and adapts very well to changing or new levels of information. Moreover, this network is massively parallel, robust, fault tolerant and amenable to learning. Although it is not an easy task to design an artificial neural network with all of these attributes, the basic functionalities can always be modeled.
Artificial Neural Networks (ANN)
An ANN is an interconnected network of very simple calculating units called neurons. Every connection in the network is assigned a weight which specifies the extent of possible influence. The structure of the network determines how the neurons influence each other. The whole network can be represented using a directed graph where an incoming edge to a node acts like an input to a neuron and outgoing edges are outputs from the neuron.
Each node performs a transfer of stimuli based on its activation function. A node produces an output only if the weighted sum of its input is greater than its set threshold. More specifically, a nonlinear transfer function of this parameter is used in actual implementations. We may write,
yi = fi ( Σwij xj - Ti )
where yi is output at node i, xj is the input from the jth node, and wij is the connection weight between nodes i and j. Ti is the threshold of the node. fi is nonlinear such as Heaviside, Sigmoid or Gaussian function. We shall not worry about the mathematical complexities of these functions and assume that every node has the same transfer function.
Figure 1: A typical neural network.
An ANN consists of an input layer, an output layer and one or more hidden layers. No transfer function is applied at the input layer and direct inputs are transferred as outputs from this layer. The input layer acts like the biological sensory system, providing information about the surrounding environment. Activations are calculated from the next layer onwards (the hidden layers) and fed into higher layers till it reaches the final output layer. This kind of an ANN structure is called a feedforward network - output from one layer goes to neurons in the next layer. We can also have feedback networks, commonly called a recurrent network.
Learning in ANNs is achieved by varying the connection weights iteratively so that the network is trained to perform certain tasks. It generally involves the minimization of some error function - say the total mean square error between the actual and expected output - under the supervision of a trainer. This is often called supervised training. However, in some cases, the exact desired output is not known. Reinforcement learning is used in such cases and training is based only on whether the actual output is correct or not. Unsupervised learning tries to find correlations among input data when no information on the correctness of the output is available. The rule followed to update the connection weights - the learning rule - determines how well the network converges towards its desired optimality.
In this article, we shall see an evolutionary methodology to figure out an optimized ANN, both in terms of architecture and connection weights.
Evolutionary Algorithms (EA)
Evolutionary algorithms refer to a class of algorithms based on probabilistic adaptation inspired by the principles of natural evolution. They follow a stochastic search strategy on a population of individuals, each representing a possible solution to the problem. They are broadly classified into three main forms - evolution strategies, genetic algorithms, and evolutionary programming.
EAs start with a randomly generated set of trial solutions. Each of these solutions is assigned a fitness value depending on how well it suits the solution. For example, given a function to maximize, the value of the function itself can serve as a fitness factor. In the case of an ANN, where we need to minimize the mean square error, the inverse of the same value (1/MSE) can be used for fitness evaluation.
Once all the members of the population are assigned fitness values, a selection process is carried out where better individuals (high fitness value) stand a greater chance to be selected for the next operation.
Selected individuals undergo recombination and mutation to result in new individuals. Low fitness individuals are discarded from the population and better ones are included. The whole process is repeated with this new population until some termination criteria is satisfied. The average fitness of the population is expected to increase over generations, and finally converge at the point where the optimum is found.
Figure 2: Genetic operations in an EA.
A suitable genetic representation of trial solutions is important in order to work with EAs. A function to map this representation back to real problem parameters is also required. We shall see how such a representation is done in case of an ANN. Apart from this, recombination and mutation operators need to be specified in a way that the offspring generated is a valid trial solution.
EAs for ANNs
The applications of EAs in ANNs are mostly concentrated in finding suitable network topologies and then training the network. EAs can quickly locate areas of high quality solutions when the domain is very large or complex. This is important in ANN design and training where the search space is infinite, highly dimensional and multimodal.
The evolution of connection weights introduces an adaptive and global approach to training. Unlike gradient-based training methods, viz. back propagation, EAs rely on probabilistic search techniques and so, even though their search space is bigger, they can ensure that better solutions are being generated over generations. Optimal network architectures can also be evolved to fit a given task at hand. The representation and search operators used in EAs are two most important issues in the evolution of architectures. It is shown that EAs relying on the crossover operator do not perform very well in searching for optimal network topologies.
Figure 3: Evolutionay design of an ANN.
It is also possible to simultaneously evolve both the structure and weights of an ANN. However, what is mostly done is to evolve the network first and then subject this network to a fixed number of training steps. Although evolution in this manner is straight forward, it may result in a noisy fitness evaluation. Different random initial weights may produce different training results; different training algorithms may produce different training results from the same set of initial weights. The other way is to partially train the network before any genetic operator is applied to it. Mutation itself contains some level of training and then a final training phase once the structure is decided.
In the next section we shall see how genetic representations, performance evaluation, and search operators can be defined for a typical EA to be used for ANN design and training. We shall be looking specifically at a Gaussian mutation based real coded evolutionary method for training, and a direct coded method for evolution of architectures.
Tags: evolutionary algorithm, genetic algorithm, neat, neural network
Category: tutorial |