# $Header$ Overview -------- This document is a reference for scientists and developers using the speech software environment at ICSI. It attempts to describe where things are, what things are called, some programming conventions and how to perform standard tasks. However, there are no doubt many errors, unclear wordings and omissions - please contact the current maintainer of this document (initially David Johnson, ) if you see anything that can be improved. Directory structure ------------------- At ICSI, the speech software is in the directory tree mounted as "/u/drspeech/". This tree is structured in a manner similar to the traditional Unix "/usr/" tree, although with modifications based around specific ICSI and speech requirements. The tree is essentially the same as that used in the forthcoming "SPRACHworks" connectionist speech software distribution. An important distinction in "/u/drspeech/" is the difference between directories containing changeable information (typically source directories) and those containing comparatively static files generated by some form of install process. A directory can only be of one form or the other, and only installed-into directories should be used by people other than those actively developing on a specific project. The main installed-into directories are either architecture-specific or under "share". Examples of installed-into directories include "/u/drspeech/sun4/bin/" and "/u/drspeech/share/doc/". Examples of dynamic directories include everything under "/u/drspeech/src/". The reason for this distinction is that installed-into directories are potentially being used at any time, and consequently you only want to make consistent, atomic changes in those directories. How would you feel if someone started randomly editing a script being used in your 3-week-long speech experiment? The basic structure of "/u/drspeech/" is: u--drspeech-----+-src-----------+-doc | |-rasta | :. +-sun4----------+-bin | |-include | |-lib | \-opt +-sun4-sunos5---+-bin | |-include | |-lib | \-opt +-... +-share---------+-bin | +-doc | +-include | +-lib | \-man-----------+-man1 | +-man2 | :. |-data----------+-timit | |-vmdigits | :. +-etc +-du +-demos : The use of directories is as follows: sun4 - sun4 specific files sun4/bin - sun4 executables sun4/lib - sun4 libraries sun4/include - sun4 architecture-specific include files sun4/opt - sun4 architecture-specific binary distributions sun4-sunos5 - SPARC/Solaris 2 specific files share - architecture-independent files share/bin - portable scripts (e.g perl, sh) share/doc - documentation share/lib - portable libraries (perl?) share/man - man pages share/man/man1 - section 1 man pages (commands/executables) share/include - architecture-independent header files src - source directories src/doc - source directories for documentation src/html - source directories for WWW stuff data - shared data files data/belldigits - data for the Bellcore Digits corpus data/berp - data for the "berp" corpus opt - architecture-independent installed packages etc - misc. stuff (cron tables) du - disk usage details demo - some example directories for common functions html - WWW pages (but not yet...) The drspeech directory tree is structured so as to support multiple machine architectures. All architecture-specific files (specifically executables in "bin", object file libraries in "lib" and architecture-specific include files in "include") are stored in different directory trees for each architecture. Architecture-independent files are NOT stored in these trees - there is a "share" directory tree for scripts and "portable" (e.g. perl) libraries, portable include files, man pages and documentation. The system is setup so the programs with GNU-style configure scripts or makefiles can be installed with prefix=/u/drspeech/share and exec_prefix=/u/drspeech/. The following names are used for architectures: iris Irix 4.x iris-irix5 Irix 5.x rap ICSI RAP spert0 SPERT sun4 SunOs 4.x on a SPARC sun4-sunos5 Solaris 2.x on a SPARC Note that although files are logically stored in "/u/drspeech/", they may be physically stored on different disks (with references via symbolic links). In this situation, the file structure on the other disk should mimic the structure in "/u/drspeech/". e.g. sushi:/da/drspeech/data/berp/pfiles/ There may also be symbolic links present in "/u/drspeech/" for reasons of historical compatibility. As of October 1996, this includes the following directories and their new replacements. data/DIGITS -> data/belldigits data/NTIMIT -> data/timit data/NUMBERS93 -> data/numbers93 data/NUMBERS95 -> data/numbers95 data/NUMBERS -> data/numbers93 data/TIMIT -> data/timit demo -> demos doc -> share/doc include -> share/include man -> share/man sun4-sos4 -> sun4 sun4-sos5 -> sun4-sunos5 All of these links will disappear eventually - please avoid using them. Inside "/u/drspeech/data/" -------------------------- Speech data files of general use are stored in "/u/drspeech/data/". The following are the canonical names for the different corpora: belldigits Bellcore digits database berp ICSI Berp database noisex A database of noises nswboard The Switchboard database (second SRI alignment) ntimit The NTIMIT database (telephone version of TIMIT) numbers93 The original OGI numbers database numbers95 The 1995 OGI numbers database phonebook The Nynex PhoneBook database swboard The Switchboard database (original SRI alignment) timit The TIMIT database vmdigits The Voicemail Digits database from Siemens Under each corpus directory are the following files and directories: README src/ Source files doc/ Documentation files list/ Lists of database subsets trans/ Utterance ID to transcription/filename mappings wavfile/ Speech wave files wrdfile/ Word level transcriptions, one transcription per file phnfile/ Phone level transcriptions, one transcription per file ftrarch/ Feature file archives labarch/ Label file archives phone/ Phone models lex/ Lexicons lm/ Language models These are described in detail below. _README_ This is a brief introduction to the database. This should contain (as a minimum): "description" - a very brief description of what is in the database (including references to other documentation) "source" - who produced the database, how it was obtained and what media ICSI possesses "installation" - details of the installation of the data at ICSI "corpus ID" - the official corpus identifier "utterance IDs" - how the utterance ID is structured _src_ The source for scripts, programs and locally produced documentation for this database. Directories containing Makefiles of common database maintenance functions are also stored here. Note that scripts and programs should be specific to this database, and their names should start with the corpus ID so they can be installed globally if appropriate. _dist_ A copy of the original distribution, or some subset thereof (if some part of the distribution has been deleted because it uses significant disk space but is unlikely to be used). Nothing in this tree should ever be modified - make copies of directories or files elsewhere if you need to change them. Making this tree read-only is important so that we are aware of how our version of the data is different from the same database installed elsewhere. Typically there will be links from other directories (e.g. "wavfile") to files in the "dist" tree - the tree should not be referenced directly. If a file needs to be modified (e.g. transcriptions fixed), replace the link into the dist tree with the new version of the file. _doc_ Corpus-specific documentation. Can contain links into "dist" and locally produced documents. In any documentation or README files, don't make assumptions that the user will be using ICSI file formats (e.g. pfiles) or ICSI programs (e.g. isr_* scripts) unless absolutely necessary. _list_ Lists of utterance IDs describing ordered subsets of the database, typically ending in the ".utids" suffix. Four common divisions are "train" (training data), "cv" (training cross validation data), "dev" (development test data) and "test" (final test data). Other possible subsets could be based on gender, recording quality or random selection. Lists can be in a logical order or random order - often both are needed (random ordering of test data is often also useful - logical ordering may e.g. leave all the long sentences at the end, making prediction of runtimes difficult). _trans_ Mappings from utterance IDs to words, phonemes, filenames or whatever. Files should be ASCII with the utterance ID at the start of each line and one line per utterance. Files should have lines for all utterances in the database, with null or invalid values inserted where this no appropriate mapping (e.g. there is no phonetic transcription for a given utterance). Typically, files from the "trans" directory are used in conjunction with a file from the "list" directory, which specifies which utterances to use and in what order. Files with ".files", ".phnstrings" and ".wrdstrings" suffices are often found in this directory. "*.files" files, which map from utterance IDs to different files such as wave files, should include a relative filename assuming a current directory of "/u/drspeech/". _wavfile_ The "wavfile" directory tree is used to store audio files. Directly under "wavfile" are directories containing different `versions' of the audio files - i.e. the same audio signal with different filtering, padding etc. One common version directory is "dist" (which is typically a link into the main corpus "dist" tree). There should also be a "default" symbolic link which should point to the `best' wavfiles for general use. The audio files should be stored one utterance per file and may be stored in a directory tree (sensible when there are many utterances) or flat (all in one directory). Audio files should be stored in NIST SPHERE format, although copies in other formats are allowed if necessary - see sphere(3). Wave files should have the ".wav" suffix. _phnfile_ The "phnfile" directory tree is used to store time-aligned phone-level transcription files. The structure is the same as the _wavfile_ directory, including allowing different versions (e.g. for different phonesets). Wherever possible, TIMIT-format transcriptions in the ICSI56 phoneset should be available, although copies in other version directories using different formats or phonesets are allowed. _wrdfile_ The "wrdfile" directory tree is used to store time-aligned word-level transcription files. The structure is the same as the _wavfile_ directory, including supporting different versions (primarily used for corrections). Wherever possible, the TIMIT word level transcription format should be used, which implies a ".wrd" suffix (although in some distributions a ".txt" suffix is used). Copies in other formats or phonesets are allowed. _ftrarch_ The "ftrarch" directory is used to store feature archives - files containing features for multiple utterances. As of October 1996, feature archives are usually stored in the "pfile" format with the ".pfile" suffix. Features should be stored in pfiles with a sentence number, frame number, the requisite number of feature columns and an empty label column (to allow patching of labels at a later date when using the old embedded training tools). Pfiles containing both features and usable labels should not be stored as they may not be supported by future tools (although they may be used internally with experiments). All pfiles should be "indexed", i.e. contain a "sent_table_data" section as produced using "pfile -sent_check". Note that feature files with one utterance-per-file, although common as intermediate files, should not be stored for extended periods. In addition, any log files output during creation of a feature archive should be stored alongside that file, using the same name and the suffix ".log". Finally, normalization files as produced e.g. by "qnnorm" should also be stored in the same directory. As of October 1996, norms files are typically in RAP ASCII format with the ".norms" suffix. Again, a log file should be present indicating how the normfile was created. _labarch_ The "labarch" directory is used to store label archives - files containing labels for multiple utterances. As of October 1996, label archives are usually stored in the "pfile" format with the ".pfile" suffix. Labels should be stored in pfiles with a sentence number, frame number and one label column. Pfiles containing both features and usable labels should not be stored as they may not be supported by future tools (although they may be used internally with experiments). All pfiles should be "indexed", i.e. contain a "sent_table_data" section as produced using "pfile -sent_check". Historically, ASCII label files with no indication of utterance segmentation have been stored. These should not be kept centrally - they are dangerous in that they have no header, no indication of utterance boundaries and anyway they can be easily derived from the pfile format label file (while the reverse transformation is often impossible). In addition, any log files output during creation of a file should be stored alongside that file, using the same name and the suffix ".log". File naming conventions ----------------------- There are many standard Unix filenaming conventions, and there are good reasons for sticking with them. Some common Unix filenaming conventions used in the ICSI speech software are enumerated below: a.out Default object file produced by compilations and links lib*.a Object library file Makefile Makefile for a directoryK (preferred) makefile Obsolete - see "Makefile" RCS/ Directory for RCS files README Description of the directory in which it is stored INSTALL Document describing how to build and/or install a package NEWS Details about changes to a package *,v RCS file *.[1-9] Manual page *.aif[fc] SGI Audio Interchange Format files *.au Sun/Next format audio files *.C C++ source file *.c C source file *.cc C++ source file *.csh C shell script *.dist Distributed version of a file "before being changed locally" *.dvi TeX dvi file *.eps Encapsulated Postscript file *.epsf Encapsulated Postscript file with bitmip thumbnail *.f Fortran source file *.fm FrameMaker file *.gz File compressed with "gzip" *.h C/C++ header file *.html Hyper text mark up language document (for use on WWW) *.o Object file *.orig Version of a file before a change (typically by "patch") *.pl Perl program *.ps Postscript file *.S Assembler source file (requiring use of C preprocessor) *.sh Bourne shell script *.tex LaTex or TeX source *.txt Plain ASCII file *.tmp Temporary file *.Z File compressed with "compress" Other more local filenaming conventions include: files.mk Parameter file for the "isr_" speech training and recognition scripts spert.core Core file produced by crashed SPERT program - in general can be deleted *.abigram ARPA-style bigram language model *.atrigram ARPA-style trigram language model *.align ? Alignment file as produced by y0 *.asciiact RAP format ASCII activation file - see rapact(1) *.binact RAP format binary activation file - see rapact(1) *.cddur ??? *.dur *.esps ESPS-format file - see *.sd *.files Utterance ID to filename mappings *.fvq Ascii data used by fvq program for vector quantized clustering *.hexact RAP format ASCII-hex activation file - see rapact(1). *.label ASCII label file as input to patchlabels(1) *.lex (not sure everyone uses this) *.list Generic machine-readable list *.lna Binary phoneme probabilities as used by y0 and noway - see lna(5). NOTE: There are two types of 8 bit LNA files, different only in the scaling factor used. Be sure your LNA file and your program use the same scaling factor! *.log Output of run *.mk Make file (often used for pmake-only Makefiles) *.nbigram NoWay-style bigram language model *.ntrigram NoWay-style trigram language model *.norms BoB and QuickNet format feature file normalization coeffecients - see norms(5) *.norm Obsolete - see "*.norms" *.onlftr Online feature file - format normally used over a pipe in realtime recognition systems - see online_ftrs(5) *.pfile "pfile" feature archive binary file format as used by BoB and QuickNet - see pfile(5) and pfile(1) *.phn Single utterance time-aligned phone-level transcription (TIMIT-style) *.phonestring Obsolete - see *.phnstring *.phnstring Multiple utterance non-time-aligned phone-level transcriptions with utternace ID at start *.phset A list of ASCII phones, one per line *.priors Ascii priors file - an ASCII list of floating point priors (as used by y0 and noway) - see priors(5) *.prm Paramater file - list of Unix arguments stored in a file (as used by BoB, clones and noway) *.prons *.proto "protofile" ASCII feature archive file as use by old MLP programs *.sd An ESPS format wave file *.sentid ? *.sentstring Obsolete - see *.wrdstring with utternace ID at start *.tr A simple ASCII file consisting of two fields - a phone name and a number. Used to translate phone sets. *.txt Used in some corpora for word level transcriptions Also plain ASCII documentation file *.utids List of utterances, one utterance per line *.wav NIST format speech file (not to be confused with Microsoft audio files which have the same suffix) *.wavlist ? *.weights An ascii weight file as used by BoB and QuickNet - see weights(5) *.wrd Time aligned word-level transcription (TIMIT-style) *.wrdstring Multiple utterance non-time-aligned word-level transcription *.xcom Transcription comment file (xwaves-style) *.xphn Time aligned phone-level transcription (xwaves-style) *.xwrd Time aligned word-level transcription (xwaves-style) *.ybigram Y0 format bigram language model - see bigram(5) *.ywpbegin Y0 format word pair language model file - beginning words *.ywpcont Y0 format word pair language model file - continuation words *.ywpend Y0 format word pair language model file - ending words Please use names consistent with the above conventions wherever possible. Note that the suffix qualifies the format of the data NOT the meaning. In particular, do not use any of the above suffices except where they are representative of the contents of a file. It is much better to use "_" or "-" in a filename rather than ".". Also, avoid suffices to enumerate different versions - instead of "recognise_limey.sh.wsj", use something like "recognise_limey-wsj.sh". See also file(1). Due to the enormous number of parameters, naming speech in a reasonably unique way can be difficult. The preferred method is to use multiple fields separated by "-". Possible fields, in a suitable order, include: - the standard corpus name - the subset of the corpus (maybe two fields) o for cuts, use e.g. "cut0" o for gender, use "mal" and "fem" o "train" is the training test set, including cross validation o "dev" is the development test set o "test" is the final test set - whether the order of the utterances has been changed o use "r" if randomized, followed by seed if appropriate - the DC removal technique o "rmdc0" means ESPS "rmdc" command used to remove DC o "hpfilt" means RASTA high pass filter used to remove DC (assumed if not present) - the feature extraction technique and number of features, consisting of o "rasta" or "plp" o number of features, typically "8" or "12" o "+d" if deltas, "+dd" if double deltas - details of any other feature extraction options - the phoneset used o "icsi56" assumed by default - the number of hidden units followd by "hu" - the net output unit o "sigmoid", "sofmax" or "tanh" - the epoch number, preceded by "epoch" - any other training or forward pass options - the file type suffix Note that the exact fields used will depend on file type and context. Some examples: timit-train-r-rasta12+dd.pfile ntimit-train-r-rasta8+d-400hu-softmax-epoch1.weights Environment variables --------------------- The DRSPEECH_HOME environment variable must be set in a users' initialization script to point to the "/u/drspeech" directory. The DRSPEECH_TMP directory can be used to override where recent scripts store temporary files. In addition, if older speech scripts are being used, the SPEECH_DIR environment variable must be set to point to the same directory and SPEECH_ARCH must be set to specify the architecture of a user's workstation (typically "sun4" or "sun4-sunos5"). The PATH environment variable must point to both "/u/drspeech/share/bin" as well as "/u/drspeech//bin". To access the drspeech man pages, the MANPATH environment variable must include "/u/drspeech/share/man". Finally, if a SPERT board is being used, the SPERTPATH variable must include "/u/drspeech/spert0/bin". A suitable section of .cshrc might be: switch(`/bin/uname`) case SunOS: switch(`/bin/uname -r`) case 5*: setenv SPEECH_ARCH sun4-sunos5 breaksw default: setenv SPEECH_ARCH sun4 breaksw endsw breaksw case IRIX: switch(`/bin/uname -r`) case 5*: setenv SPEECH_ARCH iris-irix5 breaksw default: setenv SPEECH_ARCH iris breaksw endsw breaksw default: setenv SPEECH_ARCH unknown breaksw endsw if (-d /u/drspeech) then setenv DRSPEECH_HOME /u/drspeech setenv SPEECH_DIR ${DRSPEECH_HOME} setenv MANPATH "${MANPATH}":${DRSPEECH_HOME}/share/man setenv SPERTPATH /u/drspeech/spert0/bin endif if (-d ${DRSPEECH_HOME}/${SPEECH_ARCH}/bin) then set path = (${DRSPEECH_HOME}/${SPEECH_ARCH}/bin \ ${DRSPEECH_HOME}/share/bin $path) endif See also csh(1). Program development overview ---------------------------- The first half of this document includes information of use to anyone using the speech software at ICSI. The sections that follow put more emphasis in details relevant to people who actually modify that environment. However, having said that, many of the guidelines are sensible for use in developing personal software or experiments and following them will help when someone else is involved with your work. The model for program development is that a directory tree is created where experimental development takes place. This directory tree is a shared directory, in that other users will at various times use the same directory. Note, though, at any one time there will normally only be one user working in a given development directory. When development of some files or programs reach a stage that they are of use to someone else, some form of verification is performed to make sure that things are working, then the shared files are installed into a public "installed-into" directory. Typically the development directory tree will contain a group of C/C++ source files which combine to produce an executable, but other examples include a related group of shell/pmake/perl scripts or a set of scripts and data files to produce a data file made available for general use. The operations appropriate within the development tree (including installation) are normally enumerated in a makefile, and modifications to files are controlled using RCS. Using RCS to aid program development ------------------------------------ In general, any user-edittable files should be stored under RCS. Using RCS has several advantages: - It controls access to files, preventing inadvertent simultaneous modification - It maintains a history of changes - It provides a set of backups - It allows the location of source files for installed programs (using e.g. $Header$). - Allows specific versions of several files to be identified as one "release" - It saves disk space Note that RCS should be used for storing source files, Makefiles, man pages, READMEs and all other non-trivial files. The procedure for using RCS is: i) Create a directory called "RCS" in each source directory ii) Whenever new source files are created, store them under RCS using ci -u file1 file2 .. The source files will now be read only and should not be changed. If the file is derived from some other file, check in the pre-modified version first, then procede as described below to change it. iii) When you want to edit some files, check them out of RCS co -l file1 file2 .. You can then edit them as required. One of your edits should be to add a comment containing the string "$Header$". When you next check your file out, it will replace this with information about the version and location of the file. iv) When you have completed your changes, you can view them: rcsdiff file1 file2 .. | more v) When you are happy with your changes, check them back into RCS: ci -u file1 file2 .. At this point, you will be prompted to supply a message describing the change. Whatever you do, PROVIDE A MEANINGFUL DESCRIPTION OF YOUR CHANGES. Not doing this negates most of the advantages of RCS. The above commands will allow you to maintain a history of changes using RCS. However, the big advantage of using RCS comes when you ppneed to analyze changes that have occurred. Some useful commands are: i) To see what changes have been made to a file: rlog file ii) Using the above information to locate the versions of interest, you can compare the differences between them: rcsdiff -r1.4 -r1.6 file iii) Or maybe you want to see exactly what an old version of a program looked like: co -p -r1.3 file | more There are several other powerful features in RCS - check out the manual pages for more details. Some hints for RCS usage: - Try and do one change at a time. E.g. if you need to restructure something to add a new feature, first restructure, check in your changes then check out again and add your new feature. - Most of the RCS functionality can be accessed from EMACS. Check out the "Version Control" node of the emacs info document. - If you make sure the $Header$ string exists in all installed files, other users can find the corresponding RCS file using the "ident" command (see below for how to do this with compiled files). - If you like to include a list of all changes made to a file in that file, put the string "\$Log\$" in a comment. See also rcsintro(1), ci(1), co(1), ident(1), emacs::Version Control Using makefiles to control program development ---------------------------------------------- All programs, including those written in a scripting language, should have a "Makefile" in their source directory. The speech group makefile conventions are based on the GNU makefile conventions - see the "Managing Releases" section of file "standards.info" (e.g. by executing "info standards"). There are many incompatible versions of make and makefiles should be written to work with as many versions as possible. Do note write makefiles that only work with pmake - if you really need a variety of advanced features use GNU make. Writing to use only the subset of features common to pmake, gnumake and System V make (as on Solaris) will give a sensible balance between features and portability, although ideally only use features available in all make programs! Standard makefile targets include: "" - with no target, assume "default" is the target default - build every program, but don't run or install anything progs - build all programs objs - build all object files libs - build all libraries check - run test programs install - install programs, libraries and documentation clean - delete all derived files dist - build a distribution version - give the current version a name ensure-ci - ensure all files are checked under RCS Standard makefile macros include: Writing shell and PERL scripts ------------------------------ The languages of preference for writing speech shell scripts is PERL (perl), although the bourne shell (sh) is adequate for trivial tasks. Scripts that need a user interface can use TCL/Tk (wish). Historically, there are scripts that have been written in C shell (csh), TCL (tclsh) and pmake, although these should not be used for new development. # Shell scripts should explicitly set their own paths, not rely on a # user's path. Shell scripts should also use environment variables or # command line arguments for the names of all programs and scripts, with # defaults provided. This allows a user to override the installed # version of any program without having to edit the script. # # Code for parsing the ICSI style arguments (i.e. arg=value) has been # written for common scripting languages - check out # "/u/drspeech/share/lib/icsiargs.{sh,csh,pl}". These scripts also handle # the setting up of a suitable PATH. Note that if these scripts are # included (e.g. with the sh "source" command) MUST be accessed using # the SPEECH_DIR environment variable. # # A well written shell script should: # # - allow all program and path names to be overridden by the environment # (standard Unix utilities excepted) # - store all temporary files in /tmp and erase them when finished or # on interrupt # - return 0 on success, 1 on error # # An example Bourne shell script is in example.sh. Scripts should be installed into "/u/drspeech/share/bin/" from a directory under "/u/drspeech/src/". Typically, several shell scripts and/or executables will be developed together in a source directory (along with a suitable test environment!) and installed to their final destination using a makefile. In the source directory, scripts should have a suitable suffix (e.g. ".sh" )), but should be installed in the bin directory without a suffix. This allows easy grepping in source directories, but also allows for the functionality of the script to be implemented in different manner (e.g. a compiled program) without changing everything that references the script. See also perl(1), sh(1), csh(1), icsiargs(3) # Writing make and pmake scripts # ------------------------------ # # It is important to distinguish between scripts that are used to build # an application program or tool, and scripts that are used as part of a # speech experiment. The former are only used within a given source # directory by someone who is a program developer, the later may be used # in a wide variety of situations by someone who is considerably less # knowledgable in Unix or software engineering. This section is # directed towards the second group - people writing make are pmake # scripts that will be used as speech tools. # # For scripts that only function under pmake, the following is useful to # ensure they fail under other versions of make: # # #ifndef .PMAKE # wrong:; @echo this Makefile works only with pmake; exit 1 # #endif # # There are a set of useful standard definitions for pmake files in # /u/drspeech/share/lib/icsimacros.mk. This should be accessed using # the following technique: # # SPEECH_DIR ?= /u/drspeech # SPEECH_SHLIBDIR ?= /u/drspeech/share/lib # # #include "$SPEECH_SHLIBDIR/icsimacros.mk" # # [More needed here] # # See also make(1), pmake(1), /usr/local/doc/ai/pmake/tutorial.ps # Writing C and C++ programs -------------------------- RCS header information should be bracketed by a check of the NO_RCSID preprocessor symbol, and if this is set it should not be included. This symbol can then be set by makefiles when passing programs through lint or using ObjectCenter (both of which would otherwise complain). e.g. #ifndef NO_RCSID ... blah blah blah #endif Manual pages ------------ All scripts and programs, not to mention file formats and library functions, should have manual pages. The source for the manual pages should be in a directory accompanying the program and/or libraries source. The manual pages should be installed into "/u/drspeech/man/man[1-8]/*.[1-8]". Script and program descriptions use suffix 1, functions use suffix 3 and file formats 5. An example manual page is available in "/u/drspeech/doc/manexample.1". To view manual pages when writing them, a suitable command is: nroff -man manpage.1 | less Note - using "more" instead of "less" works, but "less" is better at displaying e.g. boldness information See also man(1), man(7), "/u/drspeech/doc/manexample.1", "/u/drspeech/doc/manexample.5".