A grammar is in Chomsky Normal Form (CNF)
if every production is of the form
or
. Any CFG or SCFG can be
converted into one in CNF which generates exactly the same
language, each of the sentences with exactly the same probability, and for
which any parse in the original grammar would be reconstructible from
a parse in the CNF grammar. In short, we can, without loss of
generality, assume that the SCFGs we are dealing with are in CNF.
In fact, our algorithm generalizes straightforwardly to the
more general Canonical Two-Form [Graham et al.1980] format, and
in the case of bigrams (n = 2) it can even be modified to work directly
for arbitrary SCFGs.
Still, the CNF form is convenient, and to keep the exposition simple
we assume all SCFGs to be in CNF.