Professional Documents
Culture Documents
Adversarial Search
Game playing
Perfect play
The minimax algorithm
alpha-beta pruning
Resource limitations
Elements of chance
Imperfect information
Type of games
Type of games
Game Playing
Many different kinds of games!
Axes:
Deterministic or stochastic?
One, two, or more players?
Perfect information (can you see the state)?
Turn taking or simultaneous action?
Deterministic Games
Deterministic, single player,
perfect information:
Deterministic Two-Player
E.g. tic-tac-toe, chess, checkers
Zero-sum games
One player maximizes result
The other minimizes result
Minimax search
Plan of attack:
Computer considers possible lines of play (Babbage, 1846)
Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944)
Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948; Shannon,
1950)
First chess program (Turing, 1951)
Machine learning to improve evaluation accuracy (Samuel, 1952- 57)
Pruning to allow deeper search (McCarthy, 1956)
Chess:
1.
2.
Two-player games
A game formulated as a search problem
Initial state:
Operators:
Terminal state:
Utility function:
of the
Example: Tic-Tac-Toe
13
Steps 2 and 3 in the algorithm assume that the opponent will play
perfectly.
x o
o x
x
o
x
o
1 move
x o
o x
x
o
x
o
A sub-tree
x o x
ox
o
x o x
x ox
o
x o x
x ox
o
o
x o x
x ox
o x o
x o x
x ox
oo
x ox
ox
x
o
x ox
o ox
x
o
x ox
o ox
x xo
win
lose
draw
x o x
ox
x o
x ox
ox
x oo
x o x
o ox
x o
x o x
ox
o xo
x o x
o ox
x x o
x o x
x ox
o x o
20
x o x
x ox
oo
x ox
ox
x
o
x ox
o ox
x
o
x ox
o ox
x xo
win
lose
draw
x o x
ox
x o
x ox
ox
x oo
x o x
o ox
x o
x o x
ox
o xo
x o x
o ox
x x o
x o x
x ox
o x o
20
MiniMax Example
MAX
MIN
21
22
Minimax Properties
Optimal against a perfect player. Otherwise?
Time
complexity?
max
O(b )
m
Space
complexity?
O(bm)
For
min
10
10
100
Resource Limits
max
Depth-limited search
Instead, search a limited depth of tree
-2
min
-1
-2
min
9
Example:
Suppose we have 100 seconds, can explore 10K nodes / sec
So can check 1M nodes per move
reaches about depth 8 decent chess program
Evaluation Functions
Function which scores non-terminals
Ideal
In
e.g.
Evaluation Functions
Opponent
m
If
> v then MAX will chose m so
prune tree under n
Similar for
Opponent
Player
for MIN
28
- pruning: example 1
MAX
[-,+]
MIN
30
- pruning: example 1
[-,+]
MAX
MIN
30
- pruning: example 1
[-,+]
MAX
MIN
[-,3]
31
- pruning: example 1
[-,+]
MAX
MIN
[-,3]
12
32
- pruning: example 1
[-,+]
[3,+]
MAX
MIN
[-,3]
12
33
- pruning: example 1
[-,+]
[3,+]
MAX
MIN
[3,2]
[-,3]
12
34
- pruning: example 1
[-,+]
[3,+]
MAX
MIN
[3,2]
[-,3]
12
[3,14]
14
35
- pruning: example 1
[-,+]
[3,+]
MAX
MIN
[3,2]
[-,3]
12
[3,14] [3,5]
14
36
- pruning: example 1
[-,+]
[3,+]
MAX
MIN
[3,2]
[-,3]
12
14
37
- pruning: example 1
[-,+]
[3,+]
MAX
Selected move
MIN
[3,2]
[-,3]
12
14
38
- pruning: example 2
MAX
[-,+]
MIN
40
- pruning: example 2
MAX
[-,+]
[-,2]
MIN
40
- pruning: example 2
MAX
[-,+]
[-,2]
MIN
41
- pruning: example 2
MAX
[-,+]
[2,+]
[-,2]
MIN
14
42
- pruning: example 2
MAX
[-,+]
[2,+]
MIN
[2,5]
[-,2]
14
43
- pruning: example 2
MAX
[-,+]
[2,+]
MIN
[2,5]
[2,1]
[-,2]
14
44
- pruning: example 2
[-,+]
[2,+]
MAX
MIN
[2,8]
[-,2]
[2,5]
[2,1]
14
45
- pruning: example 2
[-,+]
[2,+]
MAX
MIN
[2,8]
12
[-,2]
[2,5]
[2,1]
14
46
- pruning: example 2
[-,+] [2,+] [3,
+]
MAX
MIN
[2,8]
[2,3]
12
[-,2]
[2,5]
[2,1]
14
47
- pruning: example 2
[-,+] [2,+] [3,
+]
MAX
Selected move
MIN
[2,8]
[2,3]
12
[-,2]
[2,5]
[2,1]
14
48
- pruning: example 3
MIN
MAX
[6, ]
MIN
50
- pruning: example 3
[-,6]
MIN
MAX
[6, ]
[-,6]
MIN
50
- pruning: example 3
[-,6]
MIN
MAX
[6, ]
MIN
[-,14]
[-,6]
14
CS561 - Lecture 7-8 - Macskassy - Spring 2011
51
- pruning: example 3
[-,6]
MIN
MAX
[6, ]
MIN
[-,6]
[-,14][-,5]
14
52
- pruning: example 3
[-,6]
MIN
MAX
[6, ]
MIN
[5,6]
[-,14][-,5]
14
53
- pruning: example 3
[-,6]
MIN
MAX
[5,6]
[6, ]
MIN
[-,14][-,5]
14
[5,1]
54
- pruning: example 3
[-,6]
MIN
MAX
[5,6]
[6, ]
MIN
[-,14][-,5]
14
[5,1]
[5,4]
4
55
- pruning: example 3
[-,6] [-,5]
MIN
MAX
[5,6]
[6, ]
MIN
[-,14][-,5]
14
[5,1]
[5,4]
4
56
- pruning: example 3
[-,6] [-,5]
MIN
Selected move
MAX
[5,6]
[6, ]
MIN
[-,14][-,5]
14
[5,1]
[5,4]
4
57
- pruning: example 4
MIN
MAX
[6, ]
MIN
59
- pruning: example 4
[-,6]
MIN
MAX
[6, ]
[-,6]
MIN
60
- pruning: example 4
[-,6]
MIN
MAX
[6, ]
MIN
[-,14]
[-,6]
14
CS561 - Lecture 7-8 - Macskassy - Spring 2011
60
- pruning: example 4
[-,6]
MIN
MAX
[6, ]
MIN
[-,6]
[-,14][-,7]
14
61
- pruning: example 4
[-,6]
MIN
MAX
[6, ]
MIN
[7,6]
[-,14][-,7]
14
62
- pruning: example 4
[-,6]
MIN
Selected move
MAX
[6, ]
MIN
[7,6]
[-,14][-,7]
14
63
Opponent
m
If
> v then MAX will chose m so
prune tree under n
Similar for
Opponent
Player
for MIN
64
The algorithm
Properties algorithm
Resource limits
Standard
approach:
Use CUTOFF-TEST instead of TERMINAL-TEST
e.g., depth limit (perhaps add quiescence search)
Suppose
Evaluation Functions
Function which scores non-terminals
Ideal
In
e.g.
STOCHASTIC GAMES
Dice
As
1.2
of
reaching a given node shrinks
So value of lookahead is
diminished
So limiting depth is less
damaging
pruning is much less
effective
CS561 - Lecture 7-8 - Macskassy - Spring 2011
TDGammon
70
Expectiminimax
Example
Four-card bridge/whist/hearts hand, Max to play first
Commonsense example
Proper analysis
* Intuition that the value of an action is the average of
its values in all actual states is WRONG
With partial observability, value of an action depends on
the information state or belief state the agent is in
Can generate and search a tree of information states
Leads to rational behaviors such as
Acting to obtain information
Signalling to one's partner
Acting randomly to minimize information disclosure