Transform Model File Format

HEADAPT estimates the required transformation statistics and can either output a transformed MMF or a transform model file (TMF). The advantage in storing the transforms as opposed to an adapted MMF is that the TMFs are considerably smaller than MMFs (especially triphone MMFs). This section describes the format of the transform model file in detail.

The mean transformation matrix is stored as a block diagonal transformation matrix. The example block diagonal matrix ${\bf A}$ shown below contains three blocks. The first block represents the transformation for only the static components of the feature vector, while the second represents the deltas and the third the accelerations. This block diagonal matrix example makes the assumption that for the transformation, there is no correlation between the statics, deltas and delta deltas. In practice this assumption works quite well.

\begin{displaymath}
{\bf A} \; = \; \left(
\begin{array}{ccc}
{\bf A}_s & {\b...
...\\
{\bf0} & {\bf0} & {\bf A}_{\Delta^2}
\end{array} \right)
\end{displaymath}

This format reduces the number of transformation parameters required to be learnt, making the adaptation process faster. It also reduces the adaptation data required per transform when compared with the full case. When comparing the storage requirements, the 3 block diagonal matrix requires much less storage capacity than the full transform matrix. Note that for convenience a full transformation matrix is also stored as a block diagonal matrix, only in this case there is a single block.

The variance transformation is a diagonal matrix and as such is simply stored as a vector.

Figure 9.2 shows a simple example of a TMF. In this case the feature vector has nine dimensions, and the mean transform has three diagonal blocks. The TMF can be saved in ASCII or binary format. The user header is always output in ascii. The first two fields are speaker descriptor fields. The next field <MMFID>, the MMF identifier, is obtained from the global options macro in the MMF, while the regression class tree identifier <RCID> is obtained from the regression tree macro name in the MMF. If global adaptation is being performed, then the <RCID> will contain the identifier global, since a tree is unnecessary in the global case. Note that the MMF and regression class tree identifiers are set within the MMF using the tool HHED. The final two fields are optional, but HEADAPT outputs these anyway for the user's convenience. These can be edited at any time (as can all the fields if desired, but editing <MMFID> and <RCID> fields should be avoided). The <CHAN> field should represent the adaptation data recording environment. Examples could be a particular microphone name, telephone channel or various background noise conditions. The <DESC> allow the user to enter any other information deemed useful. An example could be the speaker's dialect region.

% latex2html id marker 37501
$\textstyle \parbox{70mm}{\noindent
\fbox{ \parbox...
...g. \thechapter.\arabic{figctr}\ \ A Simple example of a TMF}
\end{center}
}$

Whenever a TMF is being used (in conjunction with an MMF), the MMF identifier in the MMF is checked against that in the TMF. These must match since the TMF is dependent on the model set it was constructed from. Also unless the <RCID> field is set to global, it is also checked for consistency against the regression tree identifier in the MMF.

The rest of the TMF contains a further information header, followed by all the transforms. The information header contains necessary transform set information such as the number of blocks used, node occupation threshold used, and the node occupation counts. Each transform has a regression class identifier number, the mean transformation matrix ${\bf A}$, an optional bias vector ${\bf b}$ (as in equation 9.2) and an optional variance transformation diagonal matrix ${\bf H}$ (stored as a vector). The example has both a bias offset and a variance transform.


Back to HTK site
See front page for HTK Authors