The main problem with null productions is that they allow multiple prediction-completion cycles inbetween scanning steps (since null productions do not have to be matched against one or more input symbols). Our strategy will be to collapse all predictions and completions due to chains of null productions into the regular prediction and completion steps, not unlike the way recursive predictions/completions were handled in Section 4.5.
A prerequisite for this approach is to precompute, for all nonterminals X, the probability that X expands to the empty string. Note that this is another recursive problem, since X itself may not have a null production, but expand to some nonterminal Y that does.
Computation of
for all X can be cast as a system of
non-linear equations, as follows.
For each X, let
be an abbreviation for
.
For example, let X have productions
The semantics of context-free rules imply that X can only expand to
if all the RHS nonterminals in one of X's productions
expand to
.
Translating to probabilities, we obtain the equation
In other words, each production
contributes a term in which the rule probability is multiplied by
the product of the e variables corresponding to the RHS nonterminals,
unless the RHS contains a terminal (in which case the production contributes
nothing to
because it cannot possibly lead to
).
The resulting non-linear system can be solved by iterative approximation.
Each variable
is initialized to
, and then
repeatedly updated by substituting in the equation right-hand sides, until
the desired level of accuracy is attained.
Convergence is guaranteed since the
values are monotonically increasing
and bounded above by the true values
.
For grammars without cyclic dependencies among
-producing nonterminals
this procedure degenerates to simple backward substitution.
Obviously the system has to be solved only once for each grammar.
The probability
can be seen as the precomputed inner probability
of an expansion of X to the empty string, i.e., it sums the
probabilities of all Earley paths that derive
from X.
This is the justification for the way these probabilities can be
used in modified prediction and completion steps, described next.