There are several critical trainings in progress this month, August, 1998. This page will be kept up to date with the latest results.
The latest set of alignments, align3, was installed at ICSI on Aug 21. These alignments are based on the output of a combined RNN/ModSpecGram acoustic model, so should be much better suited to ModSpec training. We immediately set to training a 4000 HU net for comparison:
| Epoch | Date | Comments | ||||||
| LrnRt | CV FA% | WER% | LrnRt | CV FA% | WER% | |||
| 1 | aug22 09:21 | 0.008 | 60.60 | 0.008 | 59.33 | 49.9 | ||
| 2 | aug23 03:11 | 0.008 | 62.13 | 0.008 | 60.89 | 47.7 | ||
| 3 | aug23 20:56 | 0.008 | 63.28 | 0.008 | 61.62 | 47.0 | ||
| 4 | aug24 14:42 | 0.008 | 63.45 | 0.008 | 62.02 | 45.5 | ||
| 5 | aug25 08:38 | 0.004 | 66.23 | 40.7 | 0.004 | 64.83 | 41.3 | BN98I models |
| 6 | aug26 02:38 | 0.002 | 67.93 | 0.002 | 66.43 | 38.5 | ||
| 7 | aug26 20:43 | 0.001 | 69.03 | 0.001 | 67.41 | 37.1 | ||
| 8 | aug27 14:43 | 0.0005 | 69.72 | 0.0005 | 68.12 | 36.4 | ||
| 9 | aug28 08:40 | 0.00025 | 70.08 | 35.6 | 0.00025 | 68.52 | 35.3 | BN98I models |
| 10 | (training complete) | |||||||
| combo w/RNN1 | 29.5 | combo w/RNN1 | 29.3 | BN98I models | ||||
| .. + 27hyp | 27.6 | .. + 27hyp | 27.2 | |||||
This is the large (4000HU) net we are training to give to Cambridge to use in their next round of alignments. It is based on the modulation spectrogram (msg1) features, with a 28x9 input window. This is similar to our best-performing msg net so far, except that it is trained to the new 'align2' labels.
| Epoch | Date | Comments | ||||||
| LrnRt | CV FA% | WERR% | LrnRt | CV FA% | WERR% | |||
| 1 | aug01 12:53 | 0.008 | 59.33 | 49.9 | 0.008 | 56.58 | ||
| 2 | aug02 06:45 | 0.008 | 60.89 | 47.7 | 0.008 | 57.83 | ||
| 3 | aug03 00:37 | 0.008 | 61.62 | 47.0 | 0.008 | 58.47 | ||
| 4 | aug03 18:35 | 0.008 | 62.02 | 45.5 | 0.008 | 59.39 | ||
| 5 | aug04 12:31 | 0.004 | 64.83 | 41.3 | 0.008 | 58.97 | ||
| 6 | aug05 06:26 | 0.002 | 66.43 | 38.5 | 0.004 | 61.87 | ||
| 7 | aug06 00:24 | 0.001 | 67.41 | 37.1 | 0.002 | 63.43 | ||
| 8 | aug06 18:31 | 0.0005 | 68.12 | 36.4 | 0.001 | 64.62 | ||
| 9 | aug07 12:31 | 0.00025 | 68.52 | 35.3 | 0.0005 | 65.34 | ||
| 10 | (training complete) | 0.00025 | 65.74 | 37.8 | ||||
| combo w/RNN1 | 29.3 | combo w/RNN1 | 29.9 | |||||
Since we're throwing everything we've got at the problem this week, we are also training up a large plp12N net on all the data. We already trained a (13x9):4000:54 plp12N net to the align1 labels on half the data, so we used the net from the 4th iteration of this training (bn/experiments/fosler/train5) as the starting point for this net, and fixed the learning rate to ramp down, as if starting with epoch 6 of the previous training (since epoch 5 gave a negligable CV improvement). Comparison is against the equivalent iterations from the align1-half training.
| E | Date | Comments | ||||||
| LrnRt | CV FA% | WERR% | LrnRt | CV FA% | WERR% | |||
| 6 | aug04 20:19 | 0.004 | 63.49 | 39.9 | 0.004 | 62.55 | ||
| 7 | aug05 17:13 | 0.002 | 65.29 | 37.8 | 0.002 | 63.68 | ||
| 8 | aug06 14:30 | 0.001 | 66.20 | 36.5 | 0.001 | 64.52 | 36.0 | |
| 9 | aug07 11:30 | 0.0005 | 66.88 | 35.1 | 0.0005 | 65.06 | ||
| 10 | aug08 08:25 | 0.00025 | 67.47 | 34.3 | 0.00025 | 65.34 | 35.0 | |
| 11 | aug09 05:14 | 0.000125 | 67.87 | 33.7 | 0.000125 | 65.48 | ||
| 12 | 0.000063 | 0.000063 | 65.55 | 34.4 | ||||
| combo w/RNN1 | 29.7 | combo w/RNN1 | 30.1 | |||||
We have only recently started working with the second set of alignments, generated by Cambridge using improved pronunciation modelling and a combination of more acoustic models. This training is a duplication of our 'standard' msg1N 2000HU training, but based on the align2 alignments. The corresponding figures for the align1 training are shown as comparison.
| Epoch | Date | Comments | ||||||
| LrnRt | CV FA% | WERR% | LrnRt | CV FA% | WERR% | |||
| 1 | jul31 | 0.008 | 57.35 | 53.3 | 0.008 | 56.39 | 51.9 | |
| 2 | 0.008 | 58.32 | 51.3 | 0.008 | 57.60 | 50.5 | ||
| 3 | 0.008 | 59.17 | 48.7 | 0.008 | 58.05 | 48.5 | ||
| 4 | aug01 | 0.008 | 59.86 | 48.6 | 0.004 | 61.05 | 44.5 | |
| 5 | 0.008 | 59.88 | 48.2 | 0.002 | 62.59 | 42.2 | ||
| 6 | aug02 | 0.004 | 62.87 | 42.8 | 0.001 | 63.73 | 39.9 | |
| 7 | 0.002 | 64.26 | 40.3 | 0.0005 | 64.39 | 39.2 | ||
| 8 | 0.001 | 65.32 | 39.3 | 0.00025 | 64.75 | 39.4 | ||
| 9 | aug03 | 0.0005 | 65.99 | 38.6 | 0.000125 | 64.95 | 38.8 | |
| 10 | 0.00025 | 66.29 | 38.6 | 0.000063 | 65.04 | 38.8 | ||
| 11 | (training finished) | 0.000031 | 65.08 | 38.6 | ||||
| 12 | 0.000016 | 65.09 | 38.5 | |||||
| 13 | 0.000008 | 65.10 | 38.5 | |||||
| combo w/RNN1 | 30.4 | combo w/RNN1 | 30.1 | |||||
| +RNN, 27hyp | 28.5 | +RNN, 27hyp | 27.9 | |||||
Per-utterance normalization gave a big win on plp features, but had negligable effect on msg features. Other analysis reveals that, for the very shortest utterances, msg features did particularly well until they were normalized. One hypothesis is that because the msg features correspond almost directly to energy in given spectral bands, they might be strongly bimodal, corresponding to the two modes of voiced speech and silence (particularly in longer utterances that contain pauses). Hence, simple normalization could be very unfortunate, and depend critically on the proportion of silence in an utterance.
Using the cepstral transform on the modspec features has proven to make little or no difference in Brian's experiments with NUMBERS, but it might at least alleviate the bimodality of feature dimensions (if it exists). Since Brian had already written code to calculate cepstral features, it was easy to start a training on them (train19), whose progress is charted below.
This is a 2000HU net trained on 28-element feature vectors - full-order cepstral transforms of both msg1 spectra, followed by per-utterance normalization (msg1cepN). Training labels are from align2. Comparison net is the plain msg1N-align2-2kHU from above.
| E | Date | Comments | ||||||
| LrnRt | CV FA% | WERR% | LrnRt | CV FA% | WERR% | |||
| 1 | aug01 01:10 | 0.008 | 56.66 | 0.008 | 57.35 | 53.3 | ||
| 2 | aug01 13:42 | 0.008 | 58.22 | 0.008 | 58.32 | 51.3 | ||
| 3 | aug02 02:!5 | 0.008 | 58.82 | 0.008 | 59.17 | 48.7 | ||
| 4 | aug02 14:47 | 0.008 | 59.65 | 47.7 | 0.008 | 59.86 | 48.6 | |
| 5 | aug03 03:21 | 0.008 | 59.79 | 47.3 | 0.008 | 59.88 | 48.2 | |
| 6 | aug03 16:00 | 0.004 | 62.71 | 42.4 | 0.004 | 62.87 | 42.8 | |
| 7 | aug04 05:06 | 0.002 | 64.10 | 39.9 | 0.002 | 64.26 | 40.3 | |
| 8 | aug04 17:58 | 0.001 | 65.02 | 38.7 | 0.001 | 65.32 | 39.3 | |
| 9 | aug05 06:31 | 0.0005 | 65.60 | 38.2 | 0.0005 | 65.99 | 38.6 | |
| 10 | (training halted) | 0.00025 | 66.29 | 38.6 | ||||
| 11 | (training finished) | |||||||
| combo w/RNN1 | 30.2 | combo w/RNN1 | 30.4 | |||||
Back to ICSI BN Home Page