next up previous
Next: Prediction Up: An Efficient Probabilistic Context-Free Previous: Notation

Earley Parsing

 

An Earley parser is essentially a generator that builds left-most derivations of strings, using a given set of context-free productions. The parsing functionality arises because the generator keeps track of all possible derivations that are consistent with the input string up to a certain point. As more and more of the input is revealed the set of possible derivations (each of which corresponds to a parse) can either expand as new choices are introduced, or shrink as a result of resolved ambiguities. In describing the parser it is thus appropriate and convenient to use generation terminology.

The parser keeps a set of states for each position in the input, describing all pending derivations.gif These state sets together form the Earley chart. A state is of the form

displaymath7751

where X is a nonterminal of the grammar, tex2html_wrap_inline7757 and tex2html_wrap_inline7759 are strings of nonterminals and/or terminals, and i and k are indices into the input string. States are derived from productions in the grammar. The above state is derived from a corresponding production

displaymath7752

with the following semantics:

A state with the dot to the right of the entire RHS is called a complete state, since it indicates that the left-hand side (LHS) nonterminal has been fully expanded.

Our description of Earley parsing omits an optional feature of Earley states, the lookahead string. Earley's algorithm allows for an adjustable amount of lookahead during parsing, in order to process LR(k) grammars deterministically (and obtain the same computational complexity as specialized LR(k) parsers where possible). The addition of lookahead is orthogonal to our extension to probabilistic grammars, so we will not include it here.

The operation of the parser is defined in terms of three operations that consult the current set of states and the current input symbol, and add new states to the chart. This is strongly suggestive of state transitions in finite-state models of language, parsing, etc. This analogy will be explored further in the probabilistic formulation later on.

The three types of transitions operate as follows.




next up previous
Next: Prediction Up: An Efficient Probabilistic Context-Free Previous: Notation

Andreas Stolcke
Sat Jun 29 21:49:02 PDT 1996