From cube-lovers-errors@oolong.camellia.org Sat May 31 19:11:01 1997
Return-Path: cube-lovers-errors@oolong.camellia.org
Received: from oolong.camellia.org (localhost [127.0.0.1]) by oolong.camellia.org (8.6.12/8.6.12) with SMTP id TAA28175; Sat, 31 May 1997 19:11:01 -0400
Precedence: bulk
Errors-To: cube-lovers-errors@oolong.camellia.org
Sender: Haym Hirsh
Date: Sat, 31 May 97 19:07:01 EDT
From: Haym Hirsh
Reply-To: Haym Hirsh
To: Dan Hoey
Cc: cube-lovers@ai.mit.edu
Subject: Re: More on Korf's method
In-Reply-To: Your message of Fri, 30 May 97 23:15:17 EDT
Message-ID:
> Thanks very much for the explanation. It agrees with my understanding
> of the paper, as far as that goes. But do you have a succinct
> explanation of what makes IDA* more tractable than A*? That's the
> part I missed.
Here are a couple of attempts to explain why and how IDA* is a win over A*.
In my attempt to generate a description for the layman I tried to err
on the side of saying too much rather than too little -- my apologies
if I belabor the obvious for anyone.
The highest-level explanation is that A* may need to store a number of
intermediate results whose number is some exponential function of the
length of the solution -- e.g., b^l: l is the solution length, b,
roughly speaking, is the number of new cubes you can get from a given
cube, aka the "branching factor", and "^" means exponentiation --
whereas IDA* will only store a number of intermediate results whose
number is a linear function of the solution length -- e.g., b*l. The
apparent paradox is that, to do this, IDA* does redundant work --
exploring some intermediate results many times because of its inferior
"bookkeeping". However, it is usually the case that the extra work is
more than paid off by memory savings. This is particularly true for
Rubik's cube, since the higher the branching factor (i.e., roughly 18
for the cube if you count crudely), the less the redundant work.
In a bit more detail, the difference between A* and IDA* is similar to
the difference between breadth-first search and depth-first iterative
deepening. Imagine you want to generate all cubes that are reachable
in d steps from the start. What you can do is generate all cubes that
are one step away, then generate all that are one step away from those
that are one step away (resulting in a list of all that are two steps
away), then all that are one step away from those that are two steps
away (resulting a list of all that are three steps away), etc. At the
final step in this process you have a list of all cubes that are d-1
steps away, and you generate all cubes that are one step away from any
item in this list. This generates all cubes that are d steps away.
The process is known as breadth-first search. It's main problem is
that the list of all cubes that are d-1 steps away will have size
roughly b^(d-1).
Depth-first search, on the other hand, generates all cubes that are
one step away from start, puts all but one of them (i.e., b-1 of them)
on a list, and takes the one that wasn't placed on the list and
(recursively) generates all cubes that are d-1 steps away from it.
When you are done with this first depth-one cube, take one of the
other cubes that are one step from start (which is one of those
stashed away in the afore-mentioned list) and do the same thing,
generating, in turn, all cubes that are d-1 steps away from it. This
continues until all the items that were put on the list have been
explored -- i.e., they have had all cubes at depth d-1 from them
returned. This is depth-first search. Because at each recursion
level it saves only b-1 things, at worst it winds up saving roughly
(b-1)*d cubes in its search.
Now imagine you have a cube that you know is at most (but not
necessarily exactly) d steps from the start, and you want to know what
the shortest solution to it is. One approach would be to do a
breadth-first search to depth 1 and see if you have it, continue to
depth 2 and see if you have it, etc., until you reach depth d. A
second possible approach would appear to be to use depth-first search
to depth d, but this is not guaranteed to give a shortest solution.
To see this, imagine that the cube is two moves from start, but it is
also four moves away if you make the wrong first move. If the result
of that wrong first move is the cube that depth-first search chooses
to "expand" first (with the "correct" one waiting its turn on the list
of cubes to be seen later if you haven't found your cube), you will
find your desired cube via the depth-four solution. You didn't find
the depth-two solution.
This problem with depth-first search leads to the idea of depth-first
iterative deepening. The basic idea of iterative deepening is simple.
First do a depth-first search to depth 1. If you haven't found it,
throw away all your work and start over, doing a depth-first search to
depth 2. Again, if you haven't found it, throw away all your work and
start over, doing a depth-first search to depth 3. This continues,
until you hit the right depth for finding it. This process is
guaranteed to find the shortest solution, but seems silly,
regenerating everything you did in the previous depth-first search
when you add one to the depth. The interesting observation that makes
this a win is that the percent of overall effort spent on previous
depths is only a small fraction of the effort spent on the final
depth-first search. So you penalize yourself a little redundancy, but
are rewarded with much more modest and realistic memory requirements.
The step from this to A* vs IDA* is actually not too large. The basic
idea is to use depth-first search, but instead of using a depth bound
d, instead don't go any farther from a cube if the sum of the number
of steps to get to it plus the guess on how many more steps are needed
to get to a solved cube exceeds some threshold. You start with a
small threshold, and slowly keep increasing it, each time starting
over again from scratch, until the threshold is just barely high
enough to find the solution. If you do this in the correct way (for
example, upping the threshold each time in the appropriate fashion
based on the values you observed in your previous iteration), you can
prove that the solution IDA* finds is the shortest possible (as long
as the solution-length guesser never over-estimates the correct value).