Evaluation Issues

Evaluation is the process of determining the fitness of an individual. Not only is the calculation of the fitness important (as we discussed in the previous tutorial), but how it's done is equally crucial. In real-time noisy environments, evaluation can often be an error prone process. What we strive to do is to find a fitness value in practice that is close to the theoretical fitness (resulting of an infinitely long evaluation, in every possible context). This section describes precautions that can be taken to improve the accuracy of this estimation.

Context Variety

The most important factor for a quality evaluation is to subject the individual to a wide variety of situations and scenarios. Once the solution is frozen and put into a 'production' bot, this assures that it will handle well all the time.

There are many ways of assuring context variety in practice, and these are discussed below.

Trials

Trials are a good solution to assure that the results are not too random, and do not diverge too much from the theoretical fitness. They also improve the variety of situations in which the evaluation takes place.

The idea is to split up the evaluation process into major chunks, like for example four major trials. The theory is that the average fitness of the trials is closer to the theoretical fitness, since more cases will have been handled. Like in statistical analysis, you can also discard the lowest and highest results of trials in order to obtain a homogenous distribution of the results.

As far as trials go, it's usually a case of 'more is better'. Admittedly, this is not always possible, and you'll need to find a good compromise that allows you to find a good balance betwen speed of evaluation and quality of results.

Long Times

Long evaluation times is also a good way of increasing variety: by letting the evaluation drag on for ages allows the solution to deal with continuous scenarios - and not just small random selections. If the bot does badly in the early stages, a long trial will emphasise this since the rest of the evaluation will usually be handicapped (e.g if the bot nessles himself into a corner there will be little option for escape).

Such long evaluations can easily be combined with a trial based approach, in order to provide even better solutions. Keep in mind that all this will usually be done in accelerated time (usually 100x in my Quake 2 implementation), so there's little need to worry about hanging around waiting for it to finish!

Deterministic Trials

If this possible, a great way to assure consistency in the fitness evaluation is to submit each bot to the exact same evaluation. This implies that the solutions are judge based on the same criteria. However, this can have quite an impact on the creativity of the solutions. This is a desireable property if you like your solutions to have specific characteristics, which diverge randomly from individual to individual.

For real-time computer games, it is often difficult to 'rig' the trials to that they are always the same. Hence the need for...

Custom Levels

Environments that are designed specially for the learning process only have advantages.

Efficiency - Much cosmetic details can be removed from the environment when a specific task is learnt. Only the useless details can be removed though, but this usually implies a more efficient simulation. For the obstacle avoidance, I designed a completely flat level, so my movement simulation was essentially one single trace. This speeds up the simulation by a factor of over 100! When the bots move back into the 'real' levels (with stairs and ledges...), we simply start using the complex movement simulation code again, and the bots don't even notice a difference.
Determinism - With custom levels, you can define trial scripts that allow you to set the challenges your bot goes up against. This can be good if you want the solution to be capable of performing a very specific task. You could also use scripts in non-custom levels, but this would require more work and hassle to set-up.

Noise

This concept was briefly mentioned in the previous tutorial, but I decided to go into it in much more detail since it is of such great importance.

Firstly, the noise has many great properties for the training. Even when the individual is submitted to similar situations, the addition of noise will make the appear slightly different to the neural network, thereby increasing the context variety mentioned above.

Secondly for the simulation itself, noise has advantages. If the movement simulation code has small bugs, adding noise to the angles can usually overcome them. Not that my routines have any bugs of course... I was just hypothesizing ;) Also, randomness allows the movement to be less deterministic. This looks more human like, and if you add large amounts of noise, the bots start to look drunk! Very funny to watch indeed. But this has great consequences for the learning of the level structure, as we will discuss in future tutorials.

Incremental Evolution

This is an idea that is slowly becoming a standard for any large sized projects. The concept is simple, we evolve the population to do something simple at first, and then make the fitness function more complex, and restart the evolution with the same base population.

For obstacle avoidance, we can do very nice things with this. First, we can evolve solutions for 10 seconds only, based on their ability to turn in a lazzy human-like fashion. Then we can set the evolution times to 1 minute, and add the obstacle collision penalty. Once solutions are good, we set the evolution times to 3 minutes, and wait for the best solution to emerge.

Re-evaluating all the fitnesses is a good idea anyway, since lucky breaks where individuals wrongly have high fitness values will be sanctioned, and potentially dropped

Remember you can visit the Message Store to discuss this tutorial. Comments are always welcome!. There are already replies in the thread, why not join in?

Next: Genetic Algorithms