$Header: /u/drspeech/src/babylon/RCS/babylon.txt,v 1.3 1996/09/05 19:36:39 davidj Exp $ Introduction ------------ This document contains some notes on standards for the new ICSI Realization Group speech scripts that were commenced in the Summer of 1996. General ------- Scripts should be written in Perl5. The first line should be: #! /usr/local/bin/perl5 All scripts should be stored under RCS and include an RCS \$Header\$ in their first few lines. They should be developed in a separate directory and installed into the appropriate directory before execution. A man page should be installed for all scripts. Wherever possible, scripts should could PERL libraries or include simple functions inline rather than execute many other external scripts. File creation dates should not be used to modify actions, although command line options can be used to skip various stages of processing. Scripts should still work (albeit slowly) with 100,000 sentences, 1,000,000 frames in one sentence and 10,000 output classes. Files ----- There should be no absolute path names in the executable. All non-user-supplied files must be referenced via the environment variable DRSPEECH_HOME, which defaults to "/usr/local/drspeech/". The structure of the directory at DRSPEECH_HOME is: /bin /lib share/bin /lib where is one of "sun4", "sun4-sunos5", "iris", "iris4-irix5" - see below. The script will set the PATH environment variable to be: /bin:/usr/bin:/usr/ucb/bin:${DRSPEECH_HOME}//bin:\ ${DRSPEECH_HOME}/share/bin i.e. All programs used by default must be in the DRSPEECH_HOME tree. The value of is set from the DRSPEECH_ARCH environment variable if set, else it is decided automatically by using the Unix "uname" command. Temporary files should be stored in the directory specifiefd by DRSPEECH_TMPDIR, or TMPDIR if that is not defined, else "/tmp" if neither are defined. The script should not use more than 10MB or 64 files in the temporary directory - if more space is needed, it should be in a user-defined place. Temporary files should be called "drspeech_" where is built on the name of the script and something to differentiate between multiple temporary files, is the ASCII representation of the process ID of the PERL program as stored in the $$ variable. All temporary files must be deleted on error or termination due to all possible catchable signals. All filenames must be user-specifiable. In the case of needing to specify many files of a similar form, filenames with escape sequences should be used. e.g. "dir/%n.txt", where %n is replaced by the sentence number. In these situations, "%%" should be replaced by a single "%". Command Line Arguments ---------------------- Command line arguments are of the form "-option value". If "-option" is not recognized, an error must be raised. Lists of values are separated with commas without spaces, e.g. "-option val,val,val". A null value may be acceptable, e.g. "-option ''". It is not necessary that all options have default values - in these case not specifying the option is an error. All non-standard-Unix executables called must be overridable on the command line using a "-*_prog" option, e.g. "-qnforward_prog". There must also be a command line option allowing extra arguments to be passed at the end of the command line to any given executable, e.g. "-qnforward_args". In the case where a program is used multiple times for different functions, multiple prefixed "_prog" and "_args" options should exist, each defaulting from the unprefixed option, e.g. -y0_prog, -viterbi_y0_prg, -recog_y0_prog. The default for programs should be the program name without a path prefix. Command line options can also be passed in a parameter file. Multiple parameter files can be specified, and the last instance of any given option is the one used by the program. The parameter file is specified using the "-params filename" option. Blank lines are acceptable and comments beginning with "#" are stripped. The "-help" or "-h" option gives a usage message. Error and Logging Output ------------------------ The "-verbose val" option sets the level of diagnostic output - the default is 1 meaning produce reasonable levels (a few lines to a few pages) of output for tracking the progress of the job. A value of 0 means produce no diagnostic output. With "-verbose" set to 1, the values of all command line options are logged to the diagnostic log stream at the start of program execution. Diagnostic output is sent to the file specified by the "-log" option, defaulting to "-" (file no. 1, stdout). All error and warning output should be sent to the log stream and also to STDERR. On abnormal termination, including termination due to signals, an error message must be produced. All output to STDERR should include the programe name and the string "ERROR" or "WARNING". e.g. example.pl: ERROR - something went wrong. The return code of every Unix command executed must be checked and the script must terminate with an appropriate error message if there is an error return. The script should terminate returning 1 on error and 0 on success. Coding Conventions ------------------ Variables use the same name as each of the command line options. Uppercase variable names are used for constants. Packages -------- drspeech - basic variables drspeech_args - argument parsing module drspeech_parallel - parallelize using various mechanisms drspeech_fileutils - useful utilities "drspeech" package ------------------- $drspeech::home - The directory holding speeech files, excluding trailing "/" $drspeech::arch - The architecture - e.g. "sun4-sunos5" $drspeech::tmpdir - The temporary directory, excluding the trailing "/" $drspeech::verbose - Level of diagnostic verbosity - 0, 1 or more $drspeech::log - Stream used for log output $drspeech::progname - The name of the program being executed