Adversarial Search

Adversarial Search
Adversarial Search
Game playing
Perfect play
The minimax algorithm
alpha-beta pruning
Resource limitations
Elements of chance
Imperfect information
Game Playing State-of-the-Art

Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley
in 1994. Used an endgame database defining perfect play for all positions involving
8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is
now solved!
Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game
match in 1997. Deep Blue examined 200 million positions per second, used very
sophisticated evaluation and undisclosed methods for extending some lines of
search up to 40 ply. Current programs are even better, if less historic.
Othello: Human champions refuse to compete against computers, which are too
good.
Go: Human champions are beginning to be challenged by machines, though the
best humans still beat the best machines. In go, b > 300, so most programs use
pattern knowledge bases to suggest plausible moves, along with aggressive
pruning.
Pacman: unknown
What kind of games?

Abstraction: To describe a game we must capture every
relevant aspect of the game. Such as:
Chess
Tic-tac-toe
Accessible environments: Such games are characterized by

perfect information
Search: game-playing then consists of a search through
possible game positions
Unpredictable opponent: introduces uncertainty thus gameplaying must deal with contingency problems
Type of games
Type of games
Game Playing
Many different kinds of games!
Axes:
Deterministic or stochastic?
One, two, or more players?
Perfect information (can you see the state)?
Turn taking or simultaneous action?
Want algorithms for calculating a strategy (policy) which

recommends a move in each state
Deterministic Games
Deterministic, single player,
perfect information:
Know the rules

Know what actions do
Know when you win
E.g. Freecell, 8-Puzzle, Rubiks cube
its just search!

Slight reinterpretation:
Each node stores a value: the best
outcome it can reach
This is the maximal outcome of its
children (the max value)
Note that we dont have path sums
as before (utilities at end)
After search, can pick move that

leads to best node
Deterministic Two-Player
E.g. tic-tac-toe, chess, checkers
Zero-sum games
One player maximizes result
The other minimizes result
Minimax search
A state-space search tree

Players alternate
Each layer, or ply, consists of around of moves*
Choose move to position with highest minimax
value = best achievable utility against best play
Games vs. search problems
Unpredictable" opponent solution is a strategy specifying a move for every

possible opponent reply
Time limits unlikely to find goal, must approximate
Plan of attack:
Computer considers possible lines of play (Babbage, 1846)
Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944)
Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948; Shannon,
1950)
First chess program (Turing, 1951)
Machine learning to improve evaluation accuracy (Samuel, 1952- 57)
Pruning to allow deeper search (McCarthy, 1956)
Searching for the next move
Complexity: many games have a huge search space
Chess:
b = 35, m=100 nodes = 35 100

if each node takes about 1 ns to explore then each move will
take about 10 50 millennia to calculate.
Resource (e.g., time, memory) limit: optimal solution not

feasible/possible, thus must approximate
1.
Pruning: makes the search more efficient by discarding

portions of the search tree that cannot improve quality
result.
2.
Evaluation functions: heuristics to evaluate utility of a state

without exhaustive search.
Two-player games
A game formulated as a search problem
Initial state:
Operators:
Terminal state:
Utility function:
of the
board position and turn

definition of legal moves
conditions for when game is over
a numeric value that describes the outcome
game. E.g., -1, 0, 1 for loss, draw,
win. (AKA payoff function)
Example: Tic-Tac-Toe
CS561 - Lecture 7-8 - Macskassy - Spring 2011
13
The minimax algorithm
Perfect play for deterministic environments with perfect information

Basic idea: choose move with highest minimax value
= best achievable payoff against best play

Algorithm:
1. Generate game tree completely

2. Determine utility of each terminal state
3. Propagate the utility values upward in the three by applying
MIN and MAX operators on the nodes in the current level
4. At the root node use minimax decision to select the move
with the max (of the min) utility value
Steps 2 and 3 in the algorithm assume that the opponent will play
perfectly.
Generate Game Tree
Generate Game Tree

x
x
Generate Game Tree

x
x o
o x
x
o
x
o
Generate Game Tree

x
1 ply
1 move
x o
o x
x
o
x
o
A sub-tree
x o x
ox
o
x o x
x ox
o
x o x
x ox
o
o
x o x
x ox
o x o
x o x
x ox
oo
x ox
ox
x
o
x ox
o ox
x
o
x ox
o ox
x xo
win
lose
draw
x o x
ox
x o
x ox
ox
x oo
x o x
o ox
x o
x o x
ox
o xo
x o x
o ox
x x o
x o x
x ox
o x o
20
What is a good move?

x o x
ox
o
x o x
x ox
o
x o x
x ox
o
o
x o x
x ox
o x o
x o x
x ox
oo
x ox
ox
x
o
x ox
o ox
x
o
x ox
o ox
x xo
win
lose
draw
x o x
ox
x o
x ox
ox
x oo
x o x
o ox
x o
x o x
ox
o xo
x o x
o ox
x x o
x o x
x ox
o x o
20
MiniMax Example
MAX
MIN
21
MiniMax: Recursive Implementation
22
Minimax Properties
Optimal against a perfect player. Otherwise?
Time
complexity?
max
O(b )
m
Space
complexity?
O(bm)
For
min
chess, b=35, m=100
Exact solution is completely infeasible

But, do we need to explore the whole tree?
10
10
100
Resource Limits
max
Cannot search to leaves
Depth-limited search
Instead, search a limited depth of tree
-2
min
-1
-2
min
9
Replace terminal utilities with an eval

function for non-terminal positions
Guarantee of optimal play is gone
More plies makes a BIG difference
Example:
Suppose we have 100 seconds, can explore 10K nodes / sec
So can check 1M nodes per move
reaches about depth 8 decent chess program
Evaluation Functions
Function which scores non-terminals
Ideal
In
function: returns the utility of the position
practice: typically weighted linear sum of features:
e.g.
f1(s) = (num white queens num black queens), etc.
Why Pacman starves

He
knows his score will
go up by eating the dot now

He knows his score will go up
just as much by eating the dot later on
There are no point- scoring
opportunities after eating the dot
Therefore, waiting seems
just as good as eating
- pruning: general principle

Player
Opponent
m
If
> v then MAX will chose m so
prune tree under n
Similar for
Opponent
Player
for MIN
28
- pruning: example 1
MAX
[-,+]
MIN
30
[-,+]
MAX
MIN
30
[-,+]
MAX
MIN
[-,3]
31
[-,+]
MAX
MIN
[-,3]
12
32
[-,+]
[3,+]
MAX
MIN
[-,3]
12
33
[-,+]
[3,+]
MAX
MIN
[3,2]
[-,3]
12
34
[-,+]
[3,+]
MAX
MIN
[3,2]
[-,3]
12
[3,14]
14
35
[-,+]
[3,+]
MAX
MIN
[3,2]
[-,3]
12
[3,14] [3,5]
14
36
[-,+]
[3,+]
MAX
MIN
[3,2]
[-,3]
12
[3,14] [3,5] [3,2]
14
37
[-,+]
[3,+]
MAX
Selected move
MIN
[3,2]
[-,3]
12
[3,14] [3,5] [3,2]
14
38
MAX
[-,+]
MIN
40
MAX
[-,+]
[-,2]
MIN
40
MAX
[-,+]
[-,2]
MIN
41
MAX
[-,+]
[2,+]
[-,2]
MIN
14
42
MAX
[-,+]
[2,+]
MIN
[2,5]
[-,2]
14
43
MAX
[-,+]
[2,+]
MIN
[2,5]
[2,1]
[-,2]
14
44
[-,+]
[2,+]
MAX
MIN
[2,8]
[-,2]
[2,5]
[2,1]
14
45
[-,+]
[2,+]
MAX
MIN
[2,8]
12
[-,2]
[2,5]
[2,1]
14
46
[-,+] [2,+] [3,
+]
MAX
MIN
[2,8]
[2,3]
12
[-,2]
[2,5]
[2,1]
14
47
[-,+] [2,+] [3,
+]
MAX
Selected move
MIN
[2,8]
[2,3]
12
[-,2]
[2,5]
[2,1]
14
48
MIN
MAX
[6, ]
MIN
50
[-,6]
MIN
MAX
[6, ]
[-,6]
MIN
50
[-,6]
MIN
MAX
[6, ]
MIN
[-,14]
[-,6]
14
51
[-,6]
MIN
MAX
[6, ]
MIN
[-,6]
[-,14][-,5]
14
52
[-,6]
MIN
MAX
[6, ]
MIN
[5,6]
[-,14][-,5]
14
53
[-,6]
MIN
MAX
[5,6]
[6, ]
MIN
[-,14][-,5]
14
[5,1]
54
[-,6]
MIN
MAX
[5,6]
[6, ]
MIN
[-,14][-,5]
14
[5,1]
[5,4]
4
55
[-,6] [-,5]
MIN
MAX
[5,6]
[6, ]
MIN
[-,14][-,5]
14
[5,1]
[5,4]
4
56
[-,6] [-,5]
MIN
Selected move
MAX
[5,6]
[6, ]
MIN
[-,14][-,5]
14
[5,1]
[5,4]
4
57
MIN
MAX
[6, ]
MIN
59
[-,6]
MIN
MAX
[6, ]
[-,6]
MIN
60
[-,6]
MIN
MAX
[6, ]
MIN
[-,14]
[-,6]
14
60
[-,6]
MIN
MAX
[6, ]
MIN
[-,6]
[-,14][-,7]
14
61
[-,6]
MIN
MAX
[6, ]
MIN
[7,6]
[-,14][-,7]
14
62
[-,6]
MIN
Selected move
MAX
[6, ]
MIN
[7,6]
[-,14][-,7]
14
63
- pruning: general principle

Player
Opponent
m
If
> v then MAX will chose m so
prune tree under n
Similar for
Opponent
Player
for MIN
64
The algorithm
Properties algorithm
Resource limits
Standard
approach:
Use CUTOFF-TEST instead of TERMINAL-TEST
e.g., depth limit (perhaps add quiescence search)
Use EVAL instead of UTILITY

i.e., evaluation function that estimates desirability of position
Suppose
we have 100 seconds, and can explore

104 nodes/second
106 nodes per move 358/2

reaches depth 8 pretty good
chess program
Function which scores non-terminals
Ideal
In
function: returns the utility of the position
practice: typically weighted linear sum of features:
e.g.
f1(s) = (num white queens num black queens), etc.
Digression: Exact values don't

matter
Behavior is preserved under any monotonic transformation of

Eval
Only the order matters:
payoff in deterministic games acts as an ordinal utility function
STOCHASTIC GAMES
Dice
rolls increase b: 21 possible

rolls with 2 dice
Backgammon
20 legal
moves
Depth 4 = 20 x (21 x 20)3
x 109
As
1.2
depth increases, probability
of
reaching a given node shrinks
So value of lookahead is
diminished
So limiting depth is less
damaging
pruning is much less
effective
TDGammon
uses depth-2 search +
70
Nondeterministic games in general

In nondeterministic games, chance introduced by dice,
card-shuffling
Simplified example with coin-flipping:
Algorithm for nondeterministic

games
Expectiminimax gives perfect play
Just like Minimax, except we must also handle chance nodes:
if state is a Max node then

return the highest ExpectiMinimax-Value of Successors(state)
if state is a Min node then

return the lowest ExpectiMinimax-Value of Successors(state)
if state is a chance node then

return average of ExpectiMinimax-Value of Successors(state)
Expectiminimax
Digression: Exact values DO matter
Behavior is preserved only by positive linear

transformation of Eval
Hence Eval should be proportional to the expected
payoff
Games of imperfect information

E.g., card games, where opponent's initial cards are unknown
Typically we can calculate a probability for each possible deal
Seems just like having one big dice roll at the beginning of the
game
Idea: compute the minimax value of each action in each deal,
then choose the action with highest expected value over all deals
Special case: if an action is optimal for all deals, it's optimal.
GIB, current best bridge program, approximates this idea by
1) generating 100 deals consistent with bidding information
2) picking the action that wins most tricks on average
Example
Four-card bridge/whist/hearts hand, Max to play first
Commonsense example
Proper analysis
* Intuition that the value of an action is the average of
its values in all actual states is WRONG
With partial observability, value of an action depends on
the information state or belief state the agent is in
Can generate and search a tree of information states
Leads to rational behaviors such as
Acting to obtain information
Signalling to one's partner
Acting randomly to minimize information disclosure

Adversarial Search

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adversarial Search

Uploaded by

Copyright:

Available Formats

Adversarial Search

Game Playing State-of-the-Art

What kind of games?

Accessible environments: Such games are characterized by

Want algorithms for calculating a strategy (policy) which

Know the rules

its just search!

After search, can pick move that

A state-space search tree

Games vs. search problems

Unpredictable" opponent solution is a strategy specifying a move for every

Time limits unlikely to find goal, must approximate

Searching for the next move

Complexity: many games have a huge search space

b = 35, m=100 nodes = 35 100

Resource (e.g., time, memory) limit: optimal solution not

Pruning: makes the search more efficient by discarding

Evaluation functions: heuristics to evaluate utility of a state

board position and turn

CS561 - Lecture 7-8 - Macskassy - Spring 2011

The minimax algorithm

Perfect play for deterministic environments with perfect information

= best achievable payoff against best play

1. Generate game tree completely

Generate Game Tree

Generate Game Tree

Generate Game Tree

Generate Game Tree

CS561 - Lecture 7-8 - Macskassy - Spring 2011

What is a good move?

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

MiniMax: Recursive Implementation

CS561 - Lecture 7-8 - Macskassy - Spring 2011

chess, b=35, m=100

Exact solution is completely infeasible

Cannot search to leaves

Replace terminal utilities with an eval

Guarantee of optimal play is gone

More plies makes a BIG difference

function: returns the utility of the position

practice: typically weighted linear sum of features:

f1(s) = (num white queens num black queens), etc.

Why Pacman starves

knows his score will

go up by eating the dot now

- pruning: general principle

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

[3,14] [3,5] [3,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011

[3,14] [3,5] [3,2]

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011

CS561 - Lecture 7-8 - Macskassy - Spring 2011