The SPRACHcore software package
2007-11-01: Note: A new
version of the MSG feature calculation script bedk_frontend/msgcalc.in, which fixes a problem in the original script, is available here.
The SPRACHcore package has not been updated since 2004. We have switched to releasing new versions of individual components of SPRACHcore. We recommend that you use the new versions. The most recent version of the core neural net code (QuickNet) has many speed improvements
and other enhancements and it is available
here.
The most recent version (compatible with gcc4) of the dpwelib sound utilities
is
dpwelib-2009-02-24.tar.gz.
The most recent versions of feacat, feacalc, noway, and pfile_utils (a set of 18 feature file tools including pfile_klt)
are here.
This overview discusses obtaining technical support and how the neural network tools fit into the big picture of the speech recognition field.
Example scripts and configuration files for speech recognition using Quicknet and the noway decoder can be found in SPRACHcore, and also
here
and here.
SPRACHcore package overview
One of the outcomes of the
SPRACH project
was a release of the neural network (connectionist) speech
recognition tools and speech feature extraction tools developed at ICSI and the other partners. This package is named SPRACHcore and
includes full source code for the tools. It is intended to allow anyone with access to
a Unix workstation to try out our techniques.
The package is automatically configured
using GNU Autoconf ("just type ./configure") to make it easier to
install, particularly under Linux or Solaris.After the SPRACH project ended in 1998, we continued to
maintain and expand the SPRACHcore package with subsequent developments
and additions to the tools until 2004.
SPRACHcore package details
The package comes in two main variants:
- SPRACHcore-nogui
contains all the command-line tools but not the graphical front ends.
In practice, these parts caused the most problems in installation, and
were the least valuable. In most circumstances, SPRACHcore-nogui is the
preferable choice for downloading and installing on new machines. It
still includes the simpleui command-line demonstrator.
- SPRACHcore is the full package including everything
in SPRACHcore-nogui (i.e., neural net training and recognition, feature calculation, and
soundfile manipulation), plus all the GUI components and tools for SPRACHdemo, ThislGUI, BeRP, etc.
You can download the latest version of the source packages from:
The contents of SPRACHcore-nogui are:
- intvec - Integer vector support library
- fltvec - Floating-point vector support library
- quicknet - Neural-net training and evaluation library & programs (including qnstrn and qnsfwd)
- rasta - Speech feature-calculation front-end
- dpwelib - Universal soundfile and audio IO interface lib & progs
- feacalc - Integrated feature calculation program for RASTA, PLP and other features (uses rasta code as a library)
- feacat - Universal data-file conversion/trimming utility
- pfile_utils - Programs for detailed transforms of pfile data files
- bedk_frontend - For the 'modspec' program, which calculates MSG (Modulation-filtered SpectroGram) features
- ffwd - Small, fast, C-only neural net forward-pass pkg
- labels2pfile - Tcl script for converting labels file formats
- simplebn - Script and data for a cut-down broadcast news recogizer
- dr_scripts - Perl scripts for training and aligning
(alignment also requires Softsound's efsgd, not included)
- randlines - Simple utility for randomizing text file list lines.
- noway - Steve Renals's posterior-based HMM decoder
- simpleui - simple speech demo using audio_fe, rasta, quicknet
The full SPRACHcore package further includes:
- aprl - Tcl extensions for audio processing & GUI
- guitools - more Tcl GUI extensions & support, comprising
- tclsh-readline - C++ compatible tclsh & wish
- libdat - 3-dimensional data file interface lib
- otcl - automatic Tcl wrappers for C++ classes
- farray_otcl - floatArray classes + Tcl interface
- pfif_otcl - pfile feature archive classes & Tcl
- sound_otcl - soundfile & STFT Tcl interface
- gdtcl - Tcl interface to Boutell's gifdraw
- dpweutils_tcl - misc Tcl/Tk extensions
- dpwetcl - various Tcl script library files
- audio_frontend - aprl-based script to capture live speech input
- berpbackend - Tcl extensions to implement restaurant query demo
- berpdemo - Front-end and data files for restaurant demo
- sprachdemo - Front-end for multilingual recognition demo (no data)
- thislgui - Tcl/Tk GUI for the Thisl Information Retrieval demo
Credits
Many different authors have contributed to the software in the
SPRACHcore package. The original construction of the package, as well
as the development of the core quicknet neural network tools, was
done
by David Johnson of ICSI. The package was then maintained by
Dan Ellis (who contributed dpwelib, feacat, and feacalc) and Chuck
Wooters. Jeff Bilmes (pfile_utils, ffwd), Brian
Kingsbury
(bedk_frontend, dr_scripts) and Eric Fosler-Lussier
(dr_scripts)
contributed whole packages as well as being involved in
maintaining
and developing the common tools, along with others too
numerous to
mention. RASTA was originally programmed by Morgan and
Hynek
Hermansky. NOWAY appears by kind
permission
of Steve Renals of Sheffield University.
Back to:
ICSI -
ICSI Speech group -
Speech group projects -
SPRACH
Updated: $Date: 2008/06/30 18:29:29 $
DAn Ellis <dpwe@icsi.berkeley.edu>
International Computer Science Institute, Berkeley CA