Challenges in Evolving Controllers For Physic 1996 Robotics and Autonomous S

Robotics and
Autonomous
Systems
ELSEVIER Robotics and Autonomous Systems 19 (1996) 67-83
Challenges in evolving controllers for physical robots

M a j a M a t a r i 6 a,,, D a v e C l i f f b'l
a Volen CenterJbr Complex Systems, Computer Science Department, Brandeis University, Waltham, Ma 02254, USA
b School of Cognitive and Computing Science, University of Sussex, Brighton BN1 9QH, UK
Abstract
This paper discusses the feasibility of applying evolutionary methods to automatically generating controllers for physical
mobile robots. We overview the state-of-the-art in the field, describe some of the main approaches, discuss the key challenges,
unanswered problems, and some promising directions.
Keywords: Evolutionarycomputation; Robot control; Automated synthesis; Evolvingcontrollers; Evolving hardware; Embodied
systems; ffs; Morphology;Physical robots; Genetic algorithms; Genetic programming;Simulation.
1. Introduction trollers in simulation, on physical systems, and in com-

bination.
This paper is concerned with the distant goal of au- Section 2 reviews a selection of work to give an
tomated synthesis of robot controllers. Specifically, we overview of the state-of-the-art in the field. Section 2.1
focus on the problems of evolving controllers for phys- describes approaches to genetic programming in sim-
ically embodied and embedded systems that deal with ulation, Section 2.2 describes approaches to evolving
all of the noise and uncertainly present in the world. network controllers in simulation, Section 2.3 de-
We will also address some systems that evolve both scribes approaches to evolving both the morphology
the morphology and the controller of a robot. Within and the controller in simulation, Section 2.4 describes
the scope of this paper we define morphology as the approaches to control in both simulation and the
physical, embodied characteristics of the robot, such real world through the aid of shaping, Section 2.5
as its mechanics and sensor organization. Given that overviews various experiments with evolving in simu-
definition, the only examples of evolving both mor- lation and then testing on physical robots, Section 2.6
phology and control exist in simulation. Evolutionary describes evolving hardware for a robot controller,
methods for automated hardware design are an active Section 2.7 describes work on evolving controllers
subarea but are not directly relevant and are not dis- using real vision, and Section 2.8 overviews the few
cussed. We overview the main areas of research in experiments in evolving fully on-board a physical
applying genetic techniques to developing robot con- robot. Section 3 addresses the key issues, challenges,
unanswered problems, and promising directions for
future research. Section 4 presents our conclusions.
*Corresponding author. Tel.:+l 617 736-2708; fax: +1 617
736-2741; e-mail: maja@cs.brandeis.edu.
I Tel.: +44 1 273 678754; fax: +44 1 273 671320; e-mail:
davec@cogs.susx.ac.uk.
0921-8890/96/$15.00 1996 Elsevier Science B.V. All rights reserved

Pll S0921-8890(96)~0034-6
68 M. Matari(, D. Chliff/Robotics and Autonomous Systems 19 (1996) 67-83
2. The state-of-the-art the simulated robot arena, consisting of straight wall

sections and right angles, was divided into squares
In terms of evolving controllers for physical sys- using the designer-determined edging distance as a
tems, genetic approaches have been applied to several "wall-following" threshold. The fitness function was
different subdomains of the problem. Most of the work designed as a simple counter of the number of dis-
has used genetic techniques to evolve controllers for tinct near-wall squares the robot visited during its
simulated robots. A small number of researchers has lifetime. To control complexity, the depth of initial
evolved controllers in simulation and then transferred random S-expressions was limited to 4, and that of
them onto physical robots. Finally, an even smaller the crossed-over expressions to 15. Consequently, the
number has evolved directly on physical robotic sys- evolved expressions were not overly complex, but
tems in real time. This section reviews selected re- were nonetheless difficult to parse, although more
search from each of the areas in turn. understandable than the typical evolved network con-
trollers we discuss later. The resulting S-expressions
2.1. Genetic programming in simulation of the most successful individuals encoded a control
algorithm that snaked the simulated robot around the
Genetic programming (GP), introduced by John walls of the entire arena.
Koza [29,30], is one of the more popular approaches Koza and Rice [32] applied the same GP approach to
to evolving controllers in simulation. Unlike genetic a box-pushing problem inspired by robotic work done
algorithms, which typically operate on bit strings, GP by Mahadevan and Connell [34]. The system used 12
manipulates higher-level primitive constructs such sonars, a bump sensor, a stuck sensor, a set of three
as Lisp programs. This abstraction of representation action primitives (derived from the original robotics
allows for significantly reducing the search space work), and four primitive functions based on the sen-
and can greatly accelerate the evolutionary process. sory readings. The fitness function was based on the
However, the construction of the primitives requires distance between the shortest distance between the
additional craftiness on the part of the programmer, box and a wall. The evolved solution was successful
and it has been argued that it hides perhaps the most in pushing the box toward the nearest wall for differ-
challenging aspect of controller design. On the other ent initial box and robot positions. The authors exper-
hand, it can also be argued that the use of higher-level imented with an added sensory abstraction, "shortest
primitives is a method for building domain knowl- sensor", which facilitated the simulated robot's per-
edge into the evolutionary system in a form other than formance.
through the fitness function. The lack of a realistic noise model in the simulation
Genetic programming has been applied to evolv- begs the question of how general the results are. Fur-
ing navigation and wall following in a simulated thermore, the importance of selecting the proper, in
sonar-based robot. The goal of the project was to this case, expert hand-selected primitives, is difficult
demonstrate the genetic programming can be used to to estimate. Genetic programming has subsequently
evolve a subsumption-style [4] robot controller repre- been successfully applied to a variety of domains, but
sented with Lisp S-expressions, and compare it to an we have not located any literature reporting GP being
existing robotic system developed by Matari6 [35]. used on physical robots.
The genetic algorithm utilized many of Matarir's In the simulation domain, Craig Reynolds has ap-
hand-crafted sensing and action primitives, includ- plied genetic programming to a series of problems in
ing minimum overall sonar distance, minimum safe autonomous agent control. In [43], GP was used to
distance, edging distance, as well as control func- develop controllers which generate coordinated mo-
tions including move backwards by a fixed amount, tion strategies in groups of "critters": simulated au-
turn right, turn left, and move forward [29]. Other tonomous agents with idealized and highly simplified
primitives, such as stopping, were eliminated in the sensory and motor characteristics. Reynolds [45] ap-
simulation, and all 12 simulated sonar readings were plied GP to evolve controllers for collision-free nav-
used, unlike the hand-crafted solution which only re- igation in twisting corridors. Finally, Reynolds [44]
lied on the front and lateral readings. The boundary of used GP to competitively co-evolve critters which play
M. Matari(, D. Chliff/Robotics and Autonomous Systems 19 (1996) 67-83 69
the children's pursuit-evasion game known as "tag" or two were interneurons. All units in the network could
"it". receive input from a sensory unit giving a signal de-
pendent on the angle made between the leg and the
2.2. Evolving network controllers in simulation hexapod's body. For reasons of economy, parameters
for only a single-leg controller were evolved: the sub-
Research by Beer and Gallagher [3] demonstrated sequent controller was then replicated appropriately,
the use of genetic algorithms to develop continuous- one controller per leg, with sparse inter-controller
time recurrent neural network controllers for artificial connectivity: each unit in a controller made inter-leg
agents. Both the con~Irol networks and the artificial connections with only the corresponding units in the
agents were simulated. Networks were evolved to controllers on the opposite side of the body, and the
produce coordinated sensory-motor behavior for two controllers of any adjacent legs on the same side of
tasks: chemotaxis and locomotion. the body.
The chemotaxis controller was responsible for guid- A single-leg controller network was evolved to
ing an agent to a source of "food" which emitted a produce appropriate stepping patterns: both in the
chemical signal, the intensity of which diminished as presence of input from the angle-sensor, and with
the inverse-square of the distance from the source. The the sensor disabled (in which case the network had
agent had a circular body, with two chemical sensors to act as a central pattern generator). In both cases,
placed symmetrically either side of the center-line: the successful controllers evolved. In general, controllers
sensors produce a signal proportional to the intensity evolved with sensory input showed severely degraded
of the chemical signal at their location. The agent also performance when the sensor was removed, but with
had two effectors on opposite sides of the body which the sensor attached were capable of fine-tuning their
could move it forward in straight lines or in arcs of responses on the basis of sensory input - something
varying radii: the physics of movement were based on the central pattern generator networks were not capa-
a simple model where velocity was proportional to the ble of. Results of evolving full-locomotion controllers
applied force. The controller was a six-unit fully in- (i.e., parameters for both the single-leg controller,
terconnected network: two sensory neurons received and for the inter-leg connectivity) were broadly simi-
signals from the che~aical sensors, and two motor lar: the final evolved controllers exhibited tripod gait
neurons provided output to the effectors; the remain- patterns akin to those used by fast-walking insects.
ing two were interneurons. The fitness of the agents Beer and Gallagher compared the approach of
was evoluated by monitoring their ability to move to- evolving the full controllers from scratch with an
ward the source of food and, once at the source, stay incremental approach, where a single-leg controller
there. A variety of strategies evolved: some agents was evolved and then, with the intra-leg parameters
used tropotaxis (i.e. moving forward while turning to held fixed, the inter-leg connectivity parameters were
the side with the stronger input stimulus); others used evolved. They found that, although adequate full-
klinotaxis (i.e. oscillating from side to side with a bias locomotion controllers did results from this process,
toward the side of stro~ager stimulus) before switching their performance was not as good as with those
to tropotaxis in the near vicinity of the food source. evolved from scratch. The converse was also true:
Once at the source, some agents evolved to come to taking leg controllers evolved from scratch in the full-
a halt, while others circled the source, and still oth- locomotion experiments and comparing them with
ers repeatedly crossed the source and then reoriented the evolved single-leg controllers showed that the
toward it. full-locomotion controllers had come to be dependent
The locomotion controller experiments evolved on inter-leg signals to produce appropriate outputs.
parameters for a neural network governing leg move-
ments in a simulated hexapod. Each leg was con- 2.3. Evolution o f morphology and control in
trolled by a small fully interconnected network of simulation
five units: three were output units which connected to
effectors for raising/lowering the foot, swinging the Cliff et al. [8] studied the evolution of visually
leg forward, and swinging the leg backward; the other guided robots, using a simulation model based on a
real robot. The robot was controlled by continuous- tion of global synchronization. Initial genotypes are
time recurrent dynamical neural networks. The geno- synthesized randomly, and their fitness is evaluated
type of the robot coded not only for the connectivity through a co-evolution process in which individuals
and number of neurons in the controller, but also for compete directly. The genetic material of the winners
the physical positioning of photosensors on the robot's is combined to create the next creature set. First each
body. The robot was evolved to perform a simple vi- genome is mutated through a sequence of steps; inter-
sually guided behavior: to find its way to the center nal node parameters are mutated, new nodes are added,
of a circular room. The resulting controller networks the parameters of the connections are mutated, new
were analyzed using a number of techniques: qualita- random connections are added, and unconnected ele-
tive studies of the functional architecture of the net- ments are removed. Genotypes with a higher number
work [9] and the effects of noise [7]; and quantitative of components probabilistically undergo more muta-
studies based on dynamical systems analysis [26]. tion. The individuals are then combined through the
Sims [47] demonstrated a methodology for jointly application of three mating operators: 40% asexual,
evolving morphologies and controllers for embodied 30% crossover of nodes, and 30% grafting by connect-
three-dimensional creatures. The creatures had realis- ing the different parents' nodes. The offspring then
tic mass and inertial properties, and were situated in a undergo another round of competition and the process
physically based dynamical simulation. The physical is repeated.
dynamics of the creatures were by far the most com- The method produced a great variety of creature
plex and realistic of any evolved in simulation to date. morphologies and behaviors, including creatures that
The modeling as well as the evolutionary computa- jumped, slid, pushed, toppled, covered, and grasped.
tion were performed on a CM-5. The morphology of Sims also evolved a set of creatures that could fol-
the creatures was represented with a directed graph, a low a moving target. The dynamics of Sims' environ-
structure particularly well suited for applying various ment are the most complex used in simulation so far.
construction operators. Graph nodes could connect to Perhaps the only unrealistic feature is the simulator's
themselves, form cycles, chains, and fractals. The con- determinism. Due to the computational complexity of
nections between nodes contained information about the environment, non-determinism would have obvi-
the position, orientation, scale, reflection, and termi- ated the possibility of repeating any particular evolu-
nation; each of the properties could be mutated inde- tionary run.
pendently. The nodes contained information about the
dimensions of the part, the joints connecting it with 2.4. Evolution by shaping in simulation and on robots
the parent node, the limits on recursing, the connec-
tions to other nodes, and the local set of neurons (the Work by Colombetti and Dorigo [11,12] demon-
controller) equipped with a fixed set of functions. Joint strates the use of classifier systems for evolving con-
angle sensors, contact sensors, and photosensors could trollers for a simulated and a real robot. In both cases,
be attached to various nodes. Effectors controlled indi- incremental learning through shaping was applied in
vidual degrees of freedom of the joints between nodes, order to accelerate convergence. In the simulated en-
and received inputs either directly from sensors or via vironment the distributed genetic algorithm was ap-
the neurons. plied first to learning basic behaviors (chase, feed, and
Sims devised a uniform graph representation for escape), then to learning their coordination (i.e., be-
the brain and the body of the creature so the two havior selection) under different environmental condi-
could be evolved together. When a creature is syn- tions. Finally, the two types of learning were allowed
thesized, individual morphological components and to run in parallel, after the system had settled into a
their neural controllers are generated replicated, re- stable solution.
suiting in similar copies of separate subsystems (e.g., A simpler experiment was performed in the phys-
leg segments). The subsystems can be connected lo- ical world, where AutonoMouse, a robot capable of
cally if they are adjacent in the graph hierarchy. They sensing the direction of a light source, learned to turn
can also be connected through an independent non- and move toward the light. As in the simulation case,
replicated set of neurons which allow for the evolu- the on-line genetic learning algorithm was applied in
M. Matari6, D. Chliff/Robotics and Autonomous Systems 19 (1996) 67-83 71
the form of shaping; in the first phase of the learning neural-network controllers for the simulated Khepera.
the light target was stationary and the robot was free The evolved controllers could be down-loaded onto
to move about and learn to approach it. In the second the real robot: because of the care taken in validating
learning phase, the robot was specifically presented the simulation, the behavior of the real robot closely
with the conditions it failed to learn in the first phase, matched that shown in simulation. Controllers were
until satisfactory behavior was observed. Finally, in evolved to perform obstacle avoidance (exploring the
the last phase the light target was moved and the robot environment without colliding with obstacles) and
was able to follow it. light-seeking (starting from a random position and
The role of shaping is appropriate and useful, as it orientation in the environment and moving towards a
significantly accelerates the learning/adaptation pro- 60 W desk lamp).
cess. In the described systems, the user could deter- A series of experiments reported in [28] explored
mine, by observation, what parts of the behavior have the effects of the level of noise in the simulation.
not been learned and could direct the robot to search The conclusion from these experiments was that if the
in the appropriate area of the behavior space. The ef- noise levels in the simulator differ significantly from
fectiveness of the resulting behavior and learning per- those in the real robot, the behavior of a controller in
formance, then, is less due to the evolutionary aspect simulation is much less likely to be transferrable to the
of the techniques than to proper human supervision. real system. When the noise in the simulator is less
The described work i:~ an interesting example of com- than that in the real world, it is likely that the evolved
bining shaping and on-line classifier systems. The ef- controllers will rely on unrealistically accurate actua-
fectiveness of the approach is dependent on the proper tor and sensor responses. Finally, when the noise in the
design of the behavior primitives, much like in the simulator is set very high, the evolutionary process can
case of genetic progr:unming, and on well timed user exploit the potential of stochastic resonance effects,
intervention and guidance. where noise boosts a weak signal sufficiently to allow
it to have some effect on the behavior of the robot:
2.5. Evolution in simulation testing on robots Jakobi et al. [28] describe one controller evolved in a
high-noise simulation which, when tested in the real
Several groups haw; undertaken the job of designing robot (with real noise levels lower than those used in
faithful simulations and evaluating their effectiveness the simulation) failed to produce the behaviors seen
by using them to evolve controllers and then test those in the simulation because the lack of noise failed to
on a physical robot. raise the response of the sensors above the threshold.
Jakobi [27] and Jakobi et al. [28] developed a Although the Khepsim work demonstrates that it is
simulator for the Khepera robot developed at EPFL possible to transfer controllers evolved in simulation to
in Lausanne, Switzerland. This robot is a two-wheel real robots, Jakobi et al. emphasize the fact that much
differential-drive platform with eight active infra- care was taken in building the simulation and setting
red (IR) proximity sensors, where the detector ele- appropriate levels of noise, and that the success of their
ment of each sensor has some sensitivity to visible experiments may be due to the relative simplicity of
wavelengths, allowing it to be used in a passive mode the Khepera robot; the approach may be infeasible as
to measure ambient light. The simulator system, the complexity of the robots and interaction dynamics
known as Khepsim, models a single Khepera in a increases.
restricted class of environments: the user can specify Nolfi et al. [41] also describe work done with the
some arrangement of obstacles (planar walls and up- Khepera robot. The authors built a simulator using real
right cyclinders) in ~t bounded space, and may also sampled position and infra-red data in order to per-
position a light-source somewhere within the space. form experiments in evolving neural network-based
Khepsim is based on idealized mathematical models controllers for navigation. The topology of the net-
of the Khepera's kinematics, IR sensors, and ambient work was simple and fixed, consisting of the indi-
light sensors. These models were refined by incor- vidual infra-red sensor inputs, two hidden units, and
poration of constants derived from empirical studies two outputs for the wheel motors. Synaptic connec-
of the robot. A genetic algorithm was used to evolve tions and thresholds were coded in the chromosomes.
72 M. Matari(, D. Chliff/Robotics and Autonomous Systems 19 (1996) 6 7 8 3
When the best evolved solutions were transferred to ral controller for navigation with a LEGO robot. The
the physical robot and tested in the real world, the robot used two optosensors and high-level action in-
performance degraded significantly. The authors coun- cluding forward and backward by fixed amounts and
tered by allowing the adaptation process to continue fixed-degree left and right turns. A rough simulation
on line and reported that within only a few generations was used in the evolutionary process in which the
the performance was again elevated to the simulation fitness function rewarded the number of novel loca-
level, hypothesizing that the discrepancy between the tions the robot visited. The authors found that the
simulated and the real world in this domain is quite addition of more realistic noise into the sensor mod-
small. els reduced the difference between the simulated and
The authors have also evolved a grasping behavior real world performance and facilitated the transfer.
on the Khepera, using the same simulator [42]. After However, even though the efficiency of the real robot
some trial and error, they chose a 5-input 4-output behavior was improved through the introduction of
network with no hidden units. The inputs included sensor noise in the simulation, a significant difference
two frontal sensors, the average of the two left and between the simulated and real navigation trajectories
two right side sensors, and the gripper sensor. The persisted [41].
outputs included the two wheel velocities, and two Work by Grefenstette and Schultz [18] also de-
triggers for specific procedures: pick-up and release. scribes an application of genetic algorithms, within a
The fitness function was quite complex, and included classifier system SAMUEL, to the problem of learn-
elements of the robot's distance from the target object, ing collision-free navigation in simulation, and trans-
the object's position relative to the robot, the robot's ferring those to a mobile robot. In comparison to the
attempts to pick up an object, the presence of an object Khepera simulations, this work used a more complex
in the gripper and the release of the object. Although Nomad 200 mobile robot, equipped with 20 tactile,
not all of the actions are directly rewarded, enough 16 sonar, and 16 infra-red sensors. To simplify the
of them are that the fitness formula can be said to learning process, the authors abstracted the 52 sensory
have a shaping effect. The best controllers evolved in inputs and other available state of the robot into the
the simulation, when transferred to the physical robot, following set: four sonar sensors, four infra-red sen-
declined in performance, but were able to correctly sors, time-step, speed, distance to the goal, and angle
perform the task two or more times without collisions to the goal. The output being learned was a translation
or inappropriate grasps, and were thus judged to be rate and rotation angle for each distinct set of inputs.
successful. The task consisted of learning to reach a specific goal
The use of shaping and basic primitive behaviors region from a fixed start position within predetermined
blurs the line between evolution and learning. For in- time. To prevent the system from learning an internal
stance, work by Nolfi and Parisi [42] employs basic map of the environment, the start and goal positions
behaviors (rather than motor velocities or even low- were fixed, but the locations of obstacles were changed
level actions) and a shaped fitness function, both of at each trial. The initial population consisted of a col-
which have been successfuly employed in robot learn- lection of random rules, some designed by hand and
ing [37]. While these approaches utilize the program- others generated automatically by mutating the hand-
mer's expertise and thus bias the learning/evolution coded ones. Instead of mutation and crossover used in
process in order to simplify and accelerate, they may other systems we have reviewed, SAMUEL used gen-
be necessary to scale up the existing approaches to eralization and specialization, classifier system oper-
any realistic robotic behaviors and tasks. ators that alter parts of the rule relative to its utility
Another example of evolution in simulation and rating.
subsequent transfer to the physical robot is described Both the genome representation and the operators
by Miglino et al. [39]. Miglino et al. [39] describe the used in SAMUEL are at a higher level than those used
careful process involved in the design of their Khep- in the experiments described so far. Consequently, a
era simulator. They offer a thorough statistical analy- relatively small population size (50 rules) was eval-
sis of the different types and amounts of sensor noise uated, using the average performance over 20 trials,
that affected the resulting evolved behavior in a neu- and run over 50 generations. Best five rule sets were
selected after each five generations, and tested on 50 proximately 4 k Hz, from gates which have propaga-
trials to pick the best rule set. The best performing tion delays of no more than one nonosecond. 2
individuals performed the task successfully 93.5% of Thompson demonstrates the direct evolution of a
the time, and when transferred to the physical robot control circuit which produces wall-avoidance behav-
succeeded 86% of the time. ior in differential-drive wheeled robot, using two sonar
The success of the transfer between simulation and heads pointing left and right of the robot's direction
the real world in this system is at least in part due to of travel. The controller is based on Thompson's no-
the appropriate design of the initial rule set. As a con- tion of a dynamic state machine (DSM). A DSM is
sequence, the learning system was adapting thresholds similar to a finite state machine (FSM) implemented
rather than operating at the level of raw sensory inputs by direct-addressed ROM, where a clocked register
and motor outputs. SAMUEL's design also enabled holds the current state; this, when combined with in-
the user to embed domain knowledge and heuristics put variables, gives the address input for a ROM. The
into the initial population, thus further accelerating the data outputs of that address of the ROM give the next
learning process. The general approach has been suc- state and the output variables of the FSM. A DSM
cessfully applied to other robot domains, as described differs from a direct-addressed-ROM FSM in two im-
in [46]. portant respects. First, RAM is used instead of ROM,
Yamauchi and Beer [49] describe the successful to allow the transitions and outputs of the system to
evolution of continuous-time recurrent neural net- be reconfigurable, and hence evolvable. Second, the
works which can learn to recognize landmarks from input, state, and output variables may be either syn-
temporal sequences of sonar readings also on a No- chronous, in which case their clock rate is genetically
mad 200 robot. Significantly, the networks exhibit specified, or asynchronous. Whether a variable is syn-
forms of learning without changes in the connection chronous or asynchronous is genetically specified. If
weights: the learning is solely a result of changes in any of the state variables are asynchronous, then the
the networks internzd dynamics as a consequence of circuit is not an FSM. DSMs have the potential for rich
perturbations from external (sensory) inputs. Also, intrinsic dynamics: reflexive input-output responses
Gallagher and Beer I17] have reported the transfer of can be produced, yet at the same time internal state
evolved hexapod controllers (described earlier in this can be perturbed or maintained over genetically spec-
paper) to a real hexapod robot. ified time scales.
Because the temporal dynamics of a DSM depend
2.6. Evolving hardware for control on the physical characteristics of the hardware it is
built from, it is not practical to simulate them. Instead,
The vast majority of the work discussed so far Thompson uses a reconfigurable DSM circuit in the
involves using evolution to develop robot control real robot, with direct sensory input from the sonars to
software: even where the controller involves parallel the DSM and direct output from the DSM to the mo-
distributed processing, the parallelism is frequently tors. Thus, the DSM receives raw echo signals from
simulated on a fast serial von Neumann processor. the sonars, and directly drives the motors. One evolved
Recently Thompson [48] reported results from ex- controller successfully produced wall-avoidance be-
periments where robot control hardware, in the form havior using only 32 bits of RAM and three flip-flops.
of physical semiconductor circuits, can be evolved Although the controller was evolved directly in
directly, i.e., without the need for a circuit simulator. hardware, there was some use of simulation: the robot
Thompson's techrdque requires the use of reconfig- was put on jacks so that its wheels span in the air,
urable hardware: he demonstrates (in simulation) that while sensors monitored their rotation, and used the
a field programmable gate array (FPGA) can be used data to update the position of a simulated robot in a
as the substrate for evolving recurrent asynchronous simulated environment. The simulator then synthe-
networks of high-speed logic gates, without impos- sized the sonar signals that would result from the
ing modularization or clocking constraints. The FPGA
demonstration involves the evolution of an oscillator 2 The best randomly connected circuit in the initial population
circuit which produces spikes at a frequency of ap- produced spikes at a frequency of around 18 M Hz.
74 M. Matari(, D. Chliff/Robotics and Autonomous Systems 19 (1996) 67~83
robot's orientation and location in its environment, erning the x, y, z, and yaw of the camera head, code
and supplied these to the physical DSM. Thus the was written to provide a "virtual" differential-drive
robot could be physically stationary while the DSM wheeled robot: the outputs of the network controllers
is subjected to a "virtual reality" (VR) evaluation. were nominally to the left and right motors: these sig-
The authenticity of the VR results can be checked by nals were then transformed to appropriate changes in
taking the robot off the jacks and testing it in a real the yaw-angle and horizontal (x, y) position of the
environment. Thompson demonstrates a close corre- camera h e a d - the vertical altitude of the head was kept
spondence between the virtual and real behaviors of fixed, thereby simulating a wheeled robot on a planar
the wall-avoidance controllers evolved in this manner. surface. The mapping from left and right motor val-
While DSMs are a simpler form of reconfigurable ues to x, y, and yaw outputs included terms to model
hardware than FPGAs, either approach to developing the momentum of the robot. Thus, while the motor
asynchronous control circuits requires that, for the full side of the robot was, at least in part, a simulation,
potential to be available, the real hardware has to be it was possible to evolve neural network controllers
evaluated in real-time. The intrinsic dynamics of the for real-time visual control with real video input. The
physical circuit would be very difficult to model in gantry offered a further advantage: the head could be
simulation, and any sufficiently accurate simulation is positioned and re-positioned very accurately, so near-
likely to run much slower than real-time on a serial identical initial conditions could be used for succes-
machine. In principle, fully synchronous control cir- sive trials of the same or differing genotypes.
cuits could be evaluated at rates faster than real-time, The visual input to the neural networks was given
by increasing the clock speed, but this would require by a genetically specified sampling pattern: the geno-
the sensory-motor interactions with the environment type could specify a number of circular visual recep-
to be simulated at an appropriately accelerated rate in tive fields: the radius and position of the center of the
evaluation, and we are not aware of anyone who has receptive field (in image coordinates) were read from
done this. the genome for each visual sensory input unit in the
network. During an evaluation, the instantaneous in-
2. 7. Evolving with real vision put to the sensory unit was an estimate of the instan-
taneous average image intensity of the pixels within
Producing simulations of visual sensing, with band- the unit's receptive field.
width of anything more than a few pixels, that are The gantry was used to demonstrate the principle of
sufficiently accurate to be worth using in evolution- incremental evolution [23]. Rather than starting with
ary robotics can take prohibitively long. Using a real a population of random genotypes when attempting to
video input system is a much more attractive option. evolve controllers to achieve some challenging task,
Harvey et al. [24] developed a cartesian gantry-robot it is better to evolve from a population which has al-
which allowed for 50 Hz frame rates from a 64 x 64 ready been selected for a similar but less-challenging
monochrome CCD camera with an umbilical video- task. In [24], a sequence of evaluation tasks was used
feed cable to off-board computers. The gantry allowed on a population of size 30 to evolve controllers capa-
for three-dimensional physical translation of the cam- ble of an elementary visual discrimination task. The
era, and a mirror mounted on a stepper-motor allowed sequence started with 12 generations where selection
for the direction of view of the camera to be rotated was for behavior which took the robot towards a large
through 360 in the horizontal ("yaw") plane without white rectangular target in an otherwise dark visual
any twisting of the umbilical. The camera mounting environment. Following this, the population was se-
(or "head") included eight binary touch-sensors for lected for ability to approach a much smaller white
detecting collisions with obstacles. rectangular target: after six generations, an individual
Building on experience from their earlier simula- emerged which could approach the smaller target and
tion studies, discussed above, Harvey et al. evolved also "pursue" it if the target was moved at a reason-
neural network controllers to produce a variety of viable speed. The population was then subjected to 15
sually guided behaviors. Rather than allow the neural generations of selection for the ability to head towards
network controllers direct access to the motors gov- a small white triangle, and avoid moving towards a
white rectangle of size similar to the triangle. After directly to the two motors on the wheels (i.e., no high-
15 generations, fit individuals emerged, giving a total level actions were used). The fitness function penal-
of 33 generations in the incremental sequence. As a ized collisions and lack of movement.
comparison, the same initial random population was In the first set of experiments, the authors demon-
evolved for 15 generations of selection for approach- strated a robust collision-free navigation behavior, but
ing the small white target. At the end of 15 generations, had to add a bending penalty to the fitness function in
there were high-scoring individuals in the population order to prevent a particular efficient solution that kept
but further analysis indicated that these had less robust the robot spinning in a small circle within an obstacle-
controllers than the best of the population emerged in- free area.
crementally for the same task. In [40] the authors describe the approach applied
One of the final controllers evolved to approach the to evolving a homing behavior. A decaying battery
triangular target was analyzed to see how it worked: was simulated and the robot evolved a "recharging
the active part of the network relied on only two re- behavior" in which it learned an internal representa-
ceptive fields, set at ~m angle which allowed for dis- tion of the environment and moved toward the light
crimination between an oriented edge of the triangular (i.e., the "recharging station") which boosted its sim-
target and the vertical edges of the rectangular target. ulated battery level. The evolved behavior was also
tested in an altered environment, when the light source
2.8. Evolution entire(y on robots was removed. The robot executed a searching behav-
ior within the appropriate region until its battery was
Work by Floreano and Mondada is one of the very exhausted. The evolved behavior kept the robot wan-
first attempts of evolving controllers entirely on a dering around its world until its internal battery level
physical robot in real-time without any human inter- reached a low level, then taking a short path toward
vention. In [14,15] the; authors describe successful re- the light.
sults of evolving navigation and obstacle avoidance The authors also describe the evolution of a simple
behaviors on a Khepera robot. This robot has proven grasping behavior [40] by adding graspable balls to the
to be the most successful platform for physical evalua- environment, a simple gripper to the robot, and intro-
tion of genetic approaches to controller evolution, due ducing gripping to the action set. The fitness function
to its small size and protability, as well as the ability was based purely on the number of grasped objects.
to be tethered for external computational and battery In order to evolve all aspects of gripping, including
power. The described experiments used the physical approaching an object, an incremental approach was
sensors and effectors of the robot, processed by an on- employed. The system was first run on a high density
board microcontroller., to act and evaluate in the real of balls in the environment, then run on half-density to
world, but performed the computation on an off-board further refine the search and approach behaviors, much
workstation that proviided a significant improvement like the method used in [13]. The learned gripping be-
in computational and memory capacity. The robot was havior is more complex than most others because it
also equipped with special designed hardware for in- involves the use of sensors with different ranges: the
teracting with an external laser positioning device that gripper sensors are only relevant after the other infra-
allowed for recording the robot's exact movement over red get the robot close enough to the ball to be grasped.
time for subsequent analysis. This experiment was successful, but required several
The controller is represented in the form of a neural days of continuous evolution, even after the gripping
network whose weights and thresholds are coded as action was reduced to a fixed action pattern to mini-
floating point values in the genetic string. The topol- mize complexity.
ogy of the network, a multi-layer perceptron of sig-
moid units with a set of recurrent connections at the 3. Issues and challenges
hidden layer, was static, while the weights were mod-
ified by the evolutionary process. The inputs to the One of the main goals of the work on evolution-
network consisted of ~he robot's eight infra-red sen- ary robotics is to provide a methodology for auto-
sory values, and the outputs fed velocity commands matically synthesizing more complex behaviors than
those that can be designed by hand. However, a survey and result in behavior that is maladaptive in the real
of the results in the field to date does not show any world. As has been empirically demonstrated, too lit-
demonstrations that have reached that goal. Much like tle, too much, or too inaccurate noise in a simulation
the state-of-the-art in learning on robots, none of the creates nontransferrable systems. Consequently, it has
evolved or learned behaviors have been particularly been necessary to obtain careful measurements from
difficult to implement by hand. To develop evolution- the robot in order to construct the simulator. The Khep-
ary techniques to a level where they can seriously be era is particularly convenient for modeling due to its
considered for use in designing robots, there are many clean and simple design, but most robost do not share
challenges to be overcome, critical questions to be ad- that property, and have already been found difficult to
dressed, and some promising directions to be explored. simulate for the purposes of testing hand-crafted con-
This section addresses them in turn. trollers in simulation. It would appear that the applica-
tion of evolutionary methods to controller design only
3.1. Evolving on physical robots makes it more difficult to work in simulated domains
and highlights this unsolved robotics problem.
Real time on real hardware. Evolution on physi- Generality v. usefulness o f simulations. As de-
cal systems takes prohibitively long. As demonstrated scribed above, the most successful simulations have
by the successful example of evolving collision-free been based on accurate physical measurements incor-
navigation on a Khepera [ 14] at approximately 39 min porated into the sensor and effector models, as well
per generation and a hundred generations, 65 h were as into the fitness function. This not only makes the
required to evolve the desired behavior. While the au- job of writing a simulation for a nontrivial robot very
thors present impressive experimental techniques and challenging, it also produces an extremely specialized
repeatable data, it is clear that this approach will not tool that does not generalize to any other system. The
scale up to more complex behavior evolution that will investment of time required to construct very faithful
require both more time per individual trial as well as simulations has proven to be largely prohibitive in
more generations. robotics in general, where most researchers have em-
Battery lifetime. The unavoidable need to recharge ployed quite coarse simulations but allow for some
robot batteries further slows down the experimental algorithmic testing, and are then coupled with sub-
procedure. In most of the Khepera-based experiments sequent evaluation of a physical system. A similar,
described, the robot was tethered thus eliminating both incremental approach has been applied to testing
the on-board power and computation problem, but evolved controllers that required subsequent refine-
tethering is not possible on all platforms and in all do- ment on the real system. As the complexity of robotic
mains, nor does it scale up to multi-robot co-evolution systems grows and the gap between the simulation
experiments. and the real system widens, the question of the value
Robot lifetime. Aside from the prohibitive time over- of investing in a specialized simulation will become
head, physical hardware of a robotic system cannot increasingly important.
survive the necessary continuous testing without con- Brooks [6] suggests the application of interleaved
stant maintenance and repairs. As the complexity of off-line (simulated) evolution and on-line (physical)
behaviors scales up, the need to offload much of the evaluation as a means of "connecting to reality". Most
experimentation and evaluation to a simulation will of the approaches we reviewed that transitioned from
increase. simulation to the real world and completed the evolu-
tionary process there, stopped at that level rather than
3.2. Evolving in simulation continue interleaving by returning the system to sim-
ulation for evolving still higher-level behaviors. In an
Noise and error models. The difficulty of accu- ideal scenario the interleaving could be performed au-
rately simulating physical systems is well known in tomatically without the need for human intervention.
robotics [5]. Since it is impossible to simulate all de- However, even in a simplified human-controlled form,
tails of a physical system, any abstraction made in a such a continuous on-line off-line development and
simulation may be exploited by the genetic algorithm evaluation process may be necessary for scaling up
to more complex behaviors. Related approaches have as their variability, typically insufficient data are avail-
been used by Floreaao [13] and Harvey et al. [24], able for statistically significant analysis. Since this
both on physical robots, to incrementally evolve, eval- problem is endemic in physical system evaluation [38]
uate, and "freeze" behaviors before moving to the next it must also be addressed by the evolutionary robotics
more complex level. community.
3.3. Evaluation 3.4. Fitness function design
Experimental repeatability. Difficulties with exper- Complexity of design. The process of designing an
imental repeatability, both at the level of evaluating evaluation function for behavior evaluation on a robot
individual genotypes and replicating entire experi- is delicate and laborious [40]. In the majority of cases,
ments, are inherent in robotics work, due to the noise the necessary insights are gained through incremen-
in the sensors, effectors, and the environment result- tal augmentation over many trials in the environment.
ing in a large variance across trials. Any stochastic Unfortunately, most of this insight is lost since typi-
components in the algorithms compound the problem. cally only the best fitness function is reported but not
Simulation designers have the luxury of eliminating the complex process that generated it.
non-determinism in order to create repeatable tri- Fitness function complexity. As with any learning
als [47], but the results are not necessarily relevant in and optimization technique, a genetic algorithm will
the physical, nondete:rrninistic environment. exploit all of the available fitness features, some of
The problems of variation and noise in evalua- which may be opaque to the designer. The work on
tion are well known in the evolutionary computation evolving effective grasping behavior [42] has already
community. This issue is particularly important in demonstrated that, as more complex behaviors are
evolving architectures for autonomous agents. A evolved that involve the interaction of multiple goals
recent paper by Aizawa and Wah [1] describes tech- and subgoals, the more complex the fitness function
niques for scheduling GAs in noisy environments: becomes. To date, the designers have resorted to in-
one technique allocates a given number of samples directly embedding all of the subgoals into the fit-
(i.e. fitness evaluations) among the members of the ness function. While it may be effective, this approach
population in an adaptive fashion. In this method, eliminates the autonomy of evolution and both fails to
different individuals in the population may get dif- reduce the job of the designer and strongly biases the
ferent numbers of ewaluations: an individual may be possible solutions.
evaluated repeatedly :in order to form an accurate es- Measurability of fitness parameters. Nolfi et al. [41]
timate of its underlying performance distribution. In point out that the design of a fitness function in sim-
cases where sampling fitness is costly (as is generally ulation, however complex, is easier than the same in
the case in evolving robot systems), this method can the real world where not all of the sensory informa-
save time that would otherwise be wasted if too many tion is readily available. For example, much of the
samples are taken. The method also helps prevent too navigation and homing work depends on the robot's
few samples from being taken, which could lead to position within the environment, which can be directly
inaccurate estimation:s of fitness and hence incorrect obtained in simulation, but not necessarily outside of
application of the differential reproduction. one. Experiments have added special sensors for this
Behavior convergence. Determining when the de- purpose. While such solutions can be very elegant
sired behavior has been achieved on a physical robot (e.g., see [41] for a clever example used in evolving
is notoriously difficult. Consequently, much of the object size discrimination), they are not general and
analysis is qualitatiw~ and based on human judge- do not scale to more complex robot behaviors.
ment. While human observers can apply reasonable
phenomenological performance descriptions, relevant 3.5. Co-evolution
quantitative analysis is rare [11]. Average performance
is difficult to establish as trials vary significantly, and Co-evolution has been shown to be a powerful
due to the overhead of physical experiments as well method for searching fitness landscapes [25,33]. It
has been particularly successful in evolving robust co- A primary requirement is that the encoding is ro-
operative and competitive behaviors in societies [33], bust with respect to the genetic operators employed:
but has not yet been applied to evolving controllers mutation of a fit individual should, on the average,
for physical robots; the co-evolutionary simulations yield an individual of roughly the same fitness. Simi-
discussed above [44,47] are not intended to result in larly, crossover between two parents of similar fitness
the controllers for real robots. However, as a method should, on the average, produce off-spring with simi-
for accelerating evolution, co-evolution could be ap- lar fitness.
plied both in simulation and on a physical group of In cases where a genetic algorithm (GA) is being
robots. A variant of the latter idea has been applied used to tune or optimize a set of parameters influ-
by Matari6 [36] to achieve collective learning in a encing a pre-specified controller, this robustness re-
group of robots through a direct exchange of received quirement is typically the sole concern; and in many
reinforcement and learned information. The intro- cases it does not present a serious difficulty. Yet in
duction of crossover and mutation would turn this such cases a significant proportion of the design ef-
approach into a readily testable method for on-line fort can be expended in creating the initial controller
co-evolution, assuming the availability of the robots which is then configured by evolution. For example,
and inter-robot communication. if a GA is used to configure the weights and thresh-
However, co-evolutionary systems can exhibit dy- olds in a pre-specified neural network controller, then
namics which run counter to those required of an the designer needs to make a priori commitments con-
engineering-design evolutionary system, where some- cerning the number of units, and their potential con-
thing approximating "continuous progress" is the nectivity. It is possible that insufficiently many units
desired dynamic. In co-evolutionary systems, cycles are in the initial design, so no satisfactory setting of
through genotype space are possible, and limited "ge- the network parameters can be found. Some effort (ei-
netic memory" can lead to an overall reduction in ther analytic or empirical trial-and-error) is required
performance levels of the evolved controllers (see [10] to determine the minimum network architecture. This
for further details). Nevertheless, it may be possible to issue could possibly be sidestepped by using a man-
automatically detect such situations and take appropri- ifestly over-specified initial network architecture, i.e.,
ate actions [ 10], in which case co-evolution promises have many more units than are likely to be necessary,
to be an approach with considerable potential. in the hope that the GA finds a solution involving
a large number of zero weights, effectively deleting
some of the units from the network. Yet such an ap-
3.6. Genetic encodings proach could easily waste much time evaluating overly
complex designs, and it would probably also be nec-
Using evolutionary techniques to generate robot essary to include terms in the evaluation function to
control systems requires that parameters determining encourage the use of fewer active units. Arriving at
the nature of the controller are encoded in a manner the correct balance between rewards for behavior and
suitable for use in a genetic algorithm. In most cases, rewards for economy in controller design may not be
this encoding is a string of characters: the "genotype" easy.
of the robot. In much evolutionary robotics research, A more appealing prospect is to have the con-
the encodings are application-specific, and sometimes troller, and possibly also the morphology, be (as far
ad hoc; researchers develop an encoding scheme with as is possible) entirely the product of an automatic
features necessary for the task at hand. This is not evolutionary process, thereby minimizing the pre-
always easy: in some cases, the refinement of an en- commitment on important architectural issue. As was
coding scheme can represent a significant amount of noted above, a number of researchers have studied
work. Increasingly, the need for more general purpose the evolution of co-adapted morphology and control
encoding schemes, applicable across a range of ap- (as advocated by Brooks [6]), but only in simulation.
plications, is being recognized. Here we comment on The advantages of such co-adaptation indicate that
some of the properties that may be required of such any truly general encoding scheme should smoothly
encoding schemes. integrate specifications for both controller and mot-
phological features. The encoding used by Sims [47] controllers can have weaknesses such as singularities
is a promising start, but has two aspects which require on the plane of reflection.
attention: first, the simulations were not intended as In order that complex controller architectures can
models of real robot~,; - building a working physical be encoded in compact genotypes, it is also highly de-
version of one of Sims' evolved agents is proba- sirable that the encoding scheme allows for repeated
bly beyond the limi1:s of current robot technology; substructures: this need is particularly acute in evolv-
second, Sims' description of the mutation and recom- ing parallel distributed processing architectures for vi-
bination processes involved in breeding new agents sually guided agents, where a particular arrangement
makes it clear that these processes are significantly of processing units may need to be repeated over a
more complex than the flipping of bits or splicing and raster of visual input units (e.g., to achieve the same
concatenation of bit-:~trings that are more commonly effect as convolving the input image with a difference-
used in GAs. This increase in procedural complexity of-gaussians mask). Gruau [19] has developed a mod-
of the genetic operators is not a welcome prospect, ular encoding scheme for neural networks, inspired by
but it may be unavoidable. Koza's work on gene-splicing for automatically de-
The encoding scheme should allow for a wide va- fined functions (ADFs) [31]. In Gruau's scheme, the
riety of possible architectures. The widest possible genotype specifies a sequence of graph-rewrite oper-
range is given by allowing the length of the genotypes ations which are applied to an initial graph consisting
to be variable, as irt Koza's genetic programming. of a single neuron: many of the rewrite operators gen-
Harvey [20-23] has developed the species adaptation erate on e or more new neurons via a process of "cell-
genetic algorithm (SAGA) explicitly for dealing with division". The rewrite sequences have a binary-tree
variable length genotypes. The attraction of SAGA is structure, with the two subtrees from a branch-node
that (assuming that the length of the genotype is in specifying sequences of rewrite rules to be applied
reasonable correspondence with the complexity of the to the two daughter networks resulting from applica-
resultant phenotype), if no upper limit is put on geno- tion of a division operator. Using Koza's gene-splicing
type length, then in principle it should be possible to technique, the genotype is divided into a number of
start with short, simple genotypes which produce el- separate trees ordered in a hierarchy: terminals in a
ementary behaviors sufficient for evolution to operate tree can be "subroutine" calls to trees lower in the
on, and then the length can increase until it encodes hierarchy, allowing for modularity and repeated struc-
for a sufficiently complex design. In this sense, to use tures. Gruau developed this scheme to satisfy the fol-
fixed-length genotypes is to presuppose the dimen- lowing criteria which he argues should be met by any
sionality of the space of possible solutions explored modular neural network encoding scheme:
by the genetic search, and this presupposition might Completeness: any network can be encoded.
be wrong. As with mutation and crossover, variations Compactness: encodings should be of minimal size.
in length should be achievable without, on the aver- Closure: any genotype should produce a meaningful
age, significant impact on fitness. To this end, Har- network.
vey [21] developed a specialized crossover procedure Modularity: the code for the network should be
for variable-length genotypes. formed from code for subnetworks, with the possi-
In may autonomous agent applications, it is highly bility of module re-use, or recursion.
desirable to have sorae degree of symmetry, both in Scalability: the length of the encoding should only
morphology and in the responses of the controller [6]. weakly reflect the complexity of the network: a
Using n-fold symmela'y can, in principle, reduce the fixed-length encoding should allow for a variety of
length of the genotype by a factor of n: for instance, networks.
with twofold (e.g., bilateral) symmetry, the left-hand Expressive power: the encoding should be capable
side of the controller/morphology can be specified on of expressing the network connectivity, the weights,
the genome, while the right-hand side can be generated the "learning" method, and so on.
by reflection in the appropriate axial plane. Neverthe- Gruau's encoding scheme exhibits these properties,
less, some mechanisra for sommetry breaking is also and has been demonstrated to work on a number of
likely to be required, as the responses of symmetric problems, including the control of a simulated hexa-
pod walking robot. Further empirical experience with eter is varied in turn while keeping the others constant.
this and other encoding schemes is required before For simplicity's sake, assume that the same number np
their generality can be fully determined. of different value-settings is used for each parameter.
Then there will be N j trials spread throughout the f -
3. 7. Combinatorics o f evaluation dimensional space of possible parameter values. This
number can grow prohibitively quickly.
To evaluate the fitness of an individual genotype, it
For example, suppose there are five free parameters
is common for the individual to be subjected to a num-
( f = 5), each of which is tested with four different
ber of trials, with the fitness being calculated from a
values (np = 4), and suppose that each trial takes 15 s;
summary statistic such as the mean score. There are
use a reasonable population size of 100, and evolve
two primary motivations for basing fitness on the re-
for 100 generations. While all these parameters are
suits of multiple tests: the need to counter the effects of
reasonable, the result is not: there will be 45 = 1024
nondeterminism (discussed previously in this paper)
trials to control for the effects of variations in initial
and the need to control for the effects of free param-
conditions. With each trial lasting 15 s, evaluating a
eters in the evaluation process. The free parameters
single genotype will take about 4.25 h. With a popu-
may be, for example, the arrangement of obstacles in
lation of size 100 a single generation would require
an environment, or the ambient lighting conditions, or
about 2.5 weeks to evaluate, and the 100th generation
the initial conditions (position and orientation) of the
will finish in roughly five years. This is true regard-
robot, and any other agents it may need to interact
less of the nature of the trial: it could be 15 s of mon-
with. In many applications, the setting of these free
itoring a noisy physical robot, or 15 s of deterministic
parameters may have a significant effect on the sub-
simulation.
sequent observed behavior. If a small fixed set of pa-
rameter settings is employed, there is the danger that Of course, this is an exaggerated example: there are
the evolutionary process opportunistically exploits this established statistical techniques for experiment de-
regularity, over-fitting to the test set and failing to pro- sign which offer principled methods for reducing the
duce desired behaviors in other conditions. Attempt- number of evaluations required, and often problem-
ing to avoid this problem by assigning random values specific heuristics can be introduced to cut short an
to the free parameters on each trial introduces a new individual trial, or to terminate the evaluation of a par-
source of stochastic variation: two identical controllers ticular genotype. But there is a further issue in evolv-
may record differing fitnesses because of chance vari- ing autonomous mobile robots which seems to create
ations in the initial conditions which they were sub- an inescapable difficulty: there are, we presume, many
ject to during evaluation; worse still, a good genotype classes of desirable behavior where it is not possible to
may be incorrectly rated as less fit than a poorer one. evaluate an individual in a few seconds. Furthermore,
An alternative is to have sufficiently many trials some behaviors may, by their nature, not be evaluable
with different settings of the free parameters to en- until the end of the trial, thus preventing premature
sure that the final evolved system is robust with re- termination as a time-saving measure. A population
spect to variation in these parameters. But there may of size 100, evolving for 100 generations, at the bare
be no method of determining a priori how many trials minimum of one trial per individual, will last 10000
is sufficient, so a period of trial-and-error experimen- times the duration of a single trial. If the trial lasts
tation may be required. For any single free parameter 1 min, the experiment takes a week, and so on. In the
p, some number np of trials should be conducted with limit, evolving controllers to produce behaviors of the
different values of p. We will ignore here the poten- type: "do this behavior and keep doing it for as long
tially problematic issue of deciding how the np trials as possible" is likely to be fraught with difficulties: as
should be distributed through the range of possible the controllers get better, the duration of evaluating an
values of p. individual will increase, and the rate of evolutionary
Now assume that there are f free parameters, and search will slow down. While some kind of macro-
the possibility of interactions between the settings of parallel architecture could be employed to factor the
the parameters cannot be ruled out, so each free param- population-size out of the equations (e.g., if working in
M. Matarid, D. Chliff/Robotics and Autonomous Systems 19 (1996) 67-83 81
simulation, use one simulation per individual, spread ing the evolution of a hierarchical modular network
across as many workstations as there are individuals structure, can be developed to work successfully using
in the population), evolving for temporally extended the small CTRNN building blocks indetified by Beer,
tasks is an open problem. then evolving genuinely complex and challenging
controllers for real robots may be a possibility.
3.8. Evolving from higher-level primitives
Koza's extension of genetic programming to use 4. Summary

gene-splicing and automatically defined functions
(ADFs), where the building-blocks of evolution at The work we have reviewed indicates that evo-
one level are themselves evolving at a lower level, lutionary techniques have some promise for the
is certainly an appealing and apparently powerful automatic synthesis of both robot control systems
development. But we are cautious about the use of and the physical morphology for robots. Such work
Lisp S-expressions ff~r robot control. The expressive demonstrates, by providing existence proofs, that
granularity of LISP could introduce further problems evolutionary techniques can in principle reduce the
if due care is not exercised. Evolving with a restricted human effort required to configure or design robot
set of primitives may limit the design space, possibly systems.
to such an extent that the evolution of appropriate But for a real reduction in human effort, the effort
controllers is a theoretical impossibility (i.e., there is expended in designing or configuring the evolutionary
no possible configuration of the available primitives system should be less than that required to manually
that produces an appropriate controller). On the other design or configure the robot controllers that it pro-
hand, evolving with larger sets of primitives introduces. In general, so far this has not been the case; the
duces the danger that the design space is so large that behaviors produced by current evolved controllers are,
successful evolution becomes a practical impossibil- on the whole, relatively simple, and could have been
ity (i.e., an appropriate configuration of primitives is designed by hand with the same or lesser amount of
extremely unlikely to be found). The latter problem effort. At this early stage, this is not necessarily cause
is more likely if there is considerable epistatic inter- for alarm: the matter of interest is not what the con-
action between the primitives, which would seem to trollers make the robots do, but how the controllers
be an increased possibility as the primitives become came to be. Yet to develop evolutionary techniques to
more complex. a level where they can seriously be considered for use
Neural networks are widely acknowledged to ex- in designing robots, there are several challenges to be
hibit tolerance in the presence of noise, and graceful overcome, and some critical questions to be addressed.
degradation with respect to component failure: these If the challenges can be successfully addressed, the
features indicate that the fitness landscapes for neural use of evolutionary techniques may become a viable
network controllers are likely to be smooth (i.e., show alternative to manual design.
relatively little episl~asis). Furthermore, as Beer [2]
notes, a particular class of continuous-time recurrent References
neural networks (CTRNNs) has the added attraction
of being universal dynamics approximators, i.e., the [1] A.N. Aizawa and B.W. Wah, Scheduling of genetic
trajectory of any smooth dynamical system can be ap- algorithms in a noisy environment, Evolutionary
Computation 2 (2) (1994) 97-122.
proximated by such networks [16]. Beer's analysis of [2] R.D. Beer, On the dynamics of small continuous-time
the dynamics of small CTRNN circuits [2] indicates recurrentneural networks,Adaptive Behavior 3 (4) (1995)
that particular simple circuits, of one or two neurons, 471-511.
could constitute very useful building blocks for evolv- [3] R.D. Beer and J.C. Gallagher, Evolvingdynamicalneural
ing larger CTRNNs with rich intrinsic dynamics. networks for adaptive behaviour, Adaptive Behaviour 1
(1) (1991) 91-122.
Nevertheless, Koza's ADF concept can still be [4] R.A. Brooks,A robust layeredcontrol systemfor a mobile
employed, in the manner demonstrated by Gruau. If robot, IEEE Journal of Robotics and Automation RA-2
Gruau's techniques, or some similar approach involv- (1986) 14-23.
[5] R.A. Brooks, Intelligence without reason, in: Proc. HCAL [21] I. Harvey, The SAGA cross: the mechanics of crossover
91 (1991). for variable-length genetic algorithms, in: R. Manner and
[6] R.A. Brooks, Artificial life and real robots, in: Proc. B. Manderick, eds., Parallel Problem Solving from Nature
Toward a Practice of Autonomous Systems, 1st European 2 (North-Holland, Amsterdam, 1992) 269-278. Also
Conf. on Artificial Life (MIT Press, Cambridge, MA, available as University of Sussex School of Cognitive and
1992) 3-10. Computing Sciences Technical Report CSRP223.
[7] D. Cliff, I. Harvey and E Husbands, Evolved recurrent [22] I. Harvey, Spices adaptation genetic algorithms: A basis
dynamical networks use noise, in: S. Gielen and B. for a continuing SAGA, in: E Varela and P. Bourgine,
Kappen, eds., Proc. Int. Conf. on Artificial Neural eds.,Towards a Practice of Autonomous Systems: Proc. 1st
Networks (Springer, Berlin, 1993) 285-288. European Conf. on Artificial Life (MIT Press, Cambridge,
[8] D. Cliff, I. Harvey and E Husbands, Explorations in MA, 1992) 346-354. Also available as University of
evolutionary robotics, Adaptive Behavior 2 (1) (1993) Sussex School of Cognitive and Computing Sciences
71-108. Technical Report CSRP221.
[9] D. Cliff, E Husbands and I. Harvey, Analysis of evolved [23] I. Harvey, Evolutionary robotics and SAGA: the case for
sensory motor controllers, Technical Report CSRP 264, hill crawling and tournament selection, in: C. Langton,
School of Cognitive and Computing Sciences, Sussex ed., Artificial Life 3 Proc., Santa Fe Institute Studies in
University. Presented at: 2nd European Conf. on Artificla the Sciences of Complexity, Proc. Vol. XVI (Addison-
Life (1992) unpublished proceedings. Wesley, Reading, MA, 1993) also available as University
[10] D. Cliff and G.F. Miller, Tracking the red queen: of Sussex School of Cognitive and Computing Sciences
Measurements of adaptive progress in co-evolutionary Technical Report CSRP222, 1992.
simulations, in: E Mor~n, A. Moreno, J.J. Merelo and P. [24] I. Harvey, P. Husbands and D. Cliff, Seeing the light:
Chac6n, eds. Advances in Artificial Life: Proc. 3rd Int. Artificial evolution; Real vision, in: D. Cliff, P. Husbands,
Conf. on Artifical Life (Springer, Berlin, 1995) 200-218. J.-A. Meyer and S.W. Wilson, eds., From Animals to
[11] M. Colombetti and M. Dorigo, Learning to control an Animats 3: Proc. 3rd Int. Conf. on Simulation of Adaptive
autonomous robot by distributed genetic algorithms, in: Behavior, (MIT Press, Cambridge, MA, 1994) 392-401.
J.-A. Meyer, H.L. Roitblat and S.W. Wilson, eds., Proc. [25] W.D. Hillis, Co-evolving parasites improve simulated
Simulation of Adaptive Behavior (MIT Press, Cambridge, evolution as an optimization procedure, Physica D 42
MA, 1993) 305-311. (1990) 228-234.
[12] M. Colombetti and M. Dorigo, Training agents to perform [26] P. Husbands, I Harvey and D. Cliff, Circle in the round:
sequential behavior, Adaptive Behavior 2 (3) (1994) 247- state space attractors for evolved sighted robots, Robotics
276. and Autonomous Systems 15 (1995) 83-106.
[13] D. Floreano, Patterns of interactions in shared [27] N. Jakobi, Evolving sensorimotor control architectures in
environments, in: Toward a practice of autonomous simulation for a real robot, Master's thesis, University
systems: Proc. 1st European Conf. on Artificial Life of Sussex School of Cognitive and Computing Sciences,
(1993) 347-366. 1994 (unpublished).
[14] D. Floreano and E Mondada, Automatic creation of an [28] N. Jakobi, P. Husbands and I Harvey, Noise and the
autonomous agent: Genetic evolution of a neural-network reality gap: the use of simulation in evolutionary robotics,
driven robot, in: Simulation of Adaptive Behavior (1994) in: E Mor~, A. Moreno, J.J. Merelo and P. Chac6n,
421-430. eds., Advances in Artificial Life: Proc. 3rd Int. Conf. on
[15] D. Floreano and E Mondada, Evolution of homing Artificial Life (Springer, berlin, 1995) 704-720.
navigation in a real mobile robot, IEEE Transactions on [29] J.R. Koza, Evolution of subsumption using genetic
Systems, Man, and Cybernetics (1996). programming, in: EJ. Varela and P. Bourgine, eds.,
[16] K. Funahashi and Y. Nakamura, Approximation of Proc. 1st European Conf. on Artificial Life (MIT Presss,
dynamical systems by continuous time recurrent neural Cambridge, MA, 1990) 110-119.
networks, Neural Networks 6 (1993) 801-806. [30] J.R. Koza, Genetic Programming (MIT Press, Cambridge,
[17] J.C. Gallagher and R.D. Beer, Application of evolved MA, 1992).
locomotion controllers to a hexapod robot, Technical [31] J.R. Koza, Genetic Programming 11: Automatic Discovery
Report CES-94-7, Case Western Reserve University of Reusable Programs (MIT Press, Cambridge, MA,
Department of Computer Engineering and Science, 1994. 1994).
[18] J. Grenfenstette and A. Schultz, An evolutionary approach [32] J.R. Koza and J.P. Rice, Automatic programming of robots
to learning in robots, in: Proc. Machine Learning using genetic programming, in: Proc. AAAI-92 (MIT
Workshop on Robot Learning, New Brunswick, NJ (1994). Press, Cambridge, MA, 1992) 194-201.
[19] E Gruau, Automatic definition of modular neural [33] H.H. Lund, Specialization under social conditions in
networks, Adaptive Behavior 3 (2) (1994) 151-183. shared environments, in: Proc. Advances in Artificial Life,
[20] I. Harvey, The artificial evolution of behaviour, in: J.-A. 3rd European Conf. on Artificial Life (1995) 477-489.
Meyer and S.W. Wilson, eds., From Animals to Animats: [34] S. Mahadevan and J. Connell, Automatic programming of
Proc. 1st Int. Conf. on the Simulation of Adaptive behavior-based robots using reinforcement learning, in:
Behavior (MIT Press, Cambridge, MA, 1990). Proc. AAAI-91, Pittsburgh, PA (1991) 8-14.
M. Matarid, D. Chliff/Robotics and Autonomous Systems 19 (1996) 67-83 83
[35] M.J. Matari6, Integration of representation into goal- [48] A. Thompson, Evolving electronic robot controllers that
driven behavior-based robots, IEEE Transactions on exploit hardware resources, in: E Morfin, A. Moreno, J.J.
Robotics and Automation 8 (3) (1992) 304-312. Merelo and P. Chac6n, eds., Advances in Artificial Life:
[36] M.J. Matarit, Learning to behave socially, in: D. Cliff, P. Proc. 3rd Int. Conf. on Artificial Life (Springer, Berlin,
Husbands, J.-A. Meyer and S. Wilson, eds., From Animals 1995) 640--656.
to Animats: Int. Conf. on Simulation of Adaptive Behavior [49] B.M. Yamauchi and R.D. Beer, Integrating reactive,
(1994) 453-462. sequential, and learning behavior using dynamical neural
[37] M.J. Matari6, Reward functions for accelerated learning, networks, in: D. Cliff, P. Husbands, J.-A. Meyer and S.W.
in: W.W. Cohen and H. Hirsh, eds., Proc. llth Int. Wilson, eds., From Animals to Animats 3: Proc. 3rd Int.
Conf. on Machine Learning (Morgan Kauffman, New Conf. on Simulation of Adaptive Behavior (MIT Press,
Brunswick, NJ, 1994) 181-189. Cambridge, MA, 1994) 382-391.
[38] M.J. Matarit, Ewfluation of learning performance of
situated emboided agents, in: Proc. Advances in Artificial
Life, 3rd European Conf. on Artificial Life (1995) 579- Maja J Mataric is an assistant pro-
589. fessor in the Computer Science De-
[39] O. Miglino, H.H. Lund and S. Nolfi, Evolving mobile i?~ partment and the Volen Center for
robots in simulated and real environments, Technical Complex Systems at Brandies univer-
Report 95--04, Institute of Psychology, C.N.R., Rome, sity. She received a Ph.D. in Computer
1995. Science and Artificial Intelligence
[40] E Mondada and D Floreano, Evolution of neural control from MIT in 1994. She has worked at
structures: some e~:periments on mobile robots, Robotics NASA's Jet Propulsion Lab, the Free
and Autonomous Systems (1996). University of Brussels AI Lab, LEGO
[41] S. Nolfi, D. Floreaoo, O. Miglino, and E Mondada, How Cambridge Research Labs. GTE Re-
to evolve autonomous robots: different approaches in search Labs, and the Swedish Institute
evolutionary robotics, in: Proc. Artificial Life IV (1994) of Computer Science. Her Interaction
190-197. Laboratory conducts research on the dynamics of interaction
[42] S. Nolfi and D. PazJsi, Evolving non-trivial behaviors on in complex adaptive system, multi-agent systems, control and
real robots: an autonomous robot that picks up objects, in: learning in intelligence agents, and cognitive neuroscience
Proc. 4th congress of the Italian Association for Artificial modeling of visuo-motor skill learning.
Intelligence (Springer, Berlin, 1995)
[43] C. Reynolds, An evolved, vision-based behavioral model Dave Cliff was born in 1996. He has a B.Sc. in Computer
of coordinated grottp motion, in: J.-A. Meyer, H. Roitblat Science from the University of Leeds, and MA and D.Phil.
and S. Wilson, eds., Proc. 2nd Int. Conf. on Simulation degrees in Cognitive Science from the University of Sussex.
of Adaptive Behaviour (SAB92) (MIT Press, Cambridge, He was a founder member (with I. Harvey and P. Husbands)
MA, 1993) 384-393. of the Sussex Evolutionary Robotics Research Group. His re-
[44] C. Reynolds, Competition, coevolution, and the game of search interests are primarily in parallel distributed processing
tag, in: R. Brooks and P. Maes, eds., Artificial Life IV for visual control of action in autonomous agents: both ani-
(MIT Press, Cambridge, MA, 1994) 59-69. mals and artificial creatures. Dr. Cliff is currently a Lecturer
[45] C. Reynolds, Evolution of corridor following behavior in in Computer Science and Artificial Intelligence at the Univer-
a noisy world, in: D. Cliff, P. Husbands, J.-A. Meyer and sity of Sussex, and an associate faculty member of the Sussex
S.W. Wilson, eds., From Animals to Animats 3: Proc. Center for Neuroscience.
3rd Int. Conf. on Simulation of Adaptive Behavior (MIT
Press, Cambridge, MA, 1994) 402--410.
[46] A.C. Schultz, Using: a genetic algorithm to learn strategies
for collision avoidance and local navigation, in: Proc.
7th Int. Syrup. on Unmanned Untethered, Submersible
Technology, Durham, NH (1991) 213-225.
[47] K. Sims, Evolving 3D morphology and behavior by
competition, in: Proc. Alife IV (MIT Press, Cambridge,
MA, 1994) 28-39.

Challenges in Evolving Controllers For Physic 1996 Robotics and Autonomous S

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Challenges in Evolving Controllers For Physic 1996 Robotics and Autonomous S

Uploaded by

Copyright:

Available Formats

Robotics and

Challenges in evolving controllers for physical robots

1. Introduction trollers in simulation, on physical systems, and in com-

0921-8890/96/$15.00 1996 Elsevier Science B.V. All rights reserved

2. The state-of-the-art the simulated robot arena, consisting of straight wall

3.3. Evaluation 3.4. Fitness function design

Koza's extension of genetic programming to use 4. Summary

You might also like