Uses the estimated cost from a node to the goal to determine its desirability.
The heuristic function estimates distance remaining to a goal.
h(n) = estimated cost of path from node to the goal.
In general, a heuristic is a technique that improves average-case
performance but may not improve worst-case performance.
Greedy algorithms often perform very well. They tend to find
good solutions quickly, although not always optimal ones.
A good heuristic for the route-finding problem would be
straight-line distance to the goal ("as the crow flies")
A good heuristic for the 8-puzzle is the number of tiles out of place.
A better heuristic is the sum of the distances of each tile from its
goal position ("Manhattan distance").
An even better heuristic takes into account the number of direct
adjacent tile reversals present.
Heuristic 1: tiles 4, 5, 6, 7, and 8 are out of place, so estimate = 5
Heuristic 2: estimate = 0 + 0 + 0 + 1 + 1 + 2 + 1 + 1 = 6
Heuristic 3: tiles 4 and 7 form a direct reversal, so estimate = 6 + 2 × 1 = 8
Combines uniform-cost search and greedy search by summing their
evaluations.
f(n) = g(n) + h(n)
f(n) = actual distance so far + estimated distance remaining
An admissible heuristic never overestimates the distance remaining to a goal.
If h occasionally overestimated, search might never go down
cheapest path, and would find a suboptimal solution.
Examples of admissible heuristics:
- straight-line distance in route-planning
- number of tiles in the wrong position in the 8-puzzle
- sum of distances of tiles from goal position in the 8-puzzle
Unfortunately, estimates are usually not good enough for A* to avoid
having to expand an exponential number of nodes to find the optimal solution.
In addition, A* must keep all nodes it is considering in memory.
A* is still much more efficient than uninformed methods.
It is always better to use a heuristic function with higher values as
long as it does not overestimate.
A heuristic is consistent if:
h(n) < cost(n, n') + h(n')
For example, the heuristic shown below is inconsistent, because
h(n) = 4, but cost(n, n') +
h(n') = 1 + 2 = 3, which is less than 4. This makes the value of
f decrease from node n to node n':
If a heuristic h is consistent, the f values along any
path will be nondecreasing:
f(n') | = |
estimated distance from start to goal through n' |
| = | actual distance from start to n
+ step cost from n to n' +
estimated distance from n' to goal |
| = | g(n) + cost(n, n') + h(n') |
| > | g(n) + h(n) because
cost(n, n') + h(n') > h(n) by consistency |
| = | f(n) |
Therefore f(n') > f(n), so
f never decreases along a path. |
|
If a heuristic h is inconsistent, we can tweak the f values so that
they behave as if h were consistent, using the pathmax equation:
f(n') = max(f(n), g(n') + h(n'))
This ensures that the f values never decrease along a path from the
start to a goal.
Given nondecreasing values of f, we can think of A* as searching
outward from the start node through successive contours of nodes, where
all of the nodes in a contour have the same f value:
For any contour, A* examines all of the nodes in the contour before
looking at any contours further out. If a solution exists, the goal node in
the closest contour to the start node will be found first.
A* is thus complete and optimal, assuming a consistent
heuristic function (or using the pathmax equation to simulate consistency).
A* is also optimally efficient, meaning that it expands only the
minimal number of nodes needed to ensure optimality and completeness, for a
given heuristic function.