next up previous
Next: Probabilities from expectations Up: The Algorithm Previous: The Algorithm

Normal form for SCFGs

A grammar is in Chomsky Normal Form (CNF) if every production is of the form tex2html_wrap_inline869 or tex2html_wrap_inline871 . Any CFG or SCFG can be converted into one in CNF which generates exactly the same language, each of the sentences with exactly the same probability, and for which any parse in the original grammar would be reconstructible from a parse in the CNF grammar. In short, we can, without loss of generality, assume that the SCFGs we are dealing with are in CNF. In fact, our algorithm generalizes straightforwardly to the more general Canonical Two-Form [Graham et al.1980] format, and in the case of bigrams (n = 2) it can even be modified to work directly for arbitrary SCFGs. Still, the CNF form is convenient, and to keep the exposition simple we assume all SCFGs to be in CNF.



Andreas Stolcke
Fri Jun 28 20:57:43 PDT 1996