Home

TODE User Manual

1. Gem 7 vey 3 6 Beam Search Decoding Options 3 6 1 dec_int prune window se ie ise A de A 3 6 2 decendprunewindw 3 6 3 lt dec Word enin pen e se e amp e e 30 4 dec delayed Im o o c cube oe ude elem wd b a do et rs ee ee he eB Se MLPW File Format Priors File Format Norms File Format Online Features File Format LNA File Format CTM File Format Noway Phone Models File Format ARPA Language Model File Format HTK HMM Model Definition File Format HTK MLF File Format 18 21 24 27 29 31 34 36 39 41 Chapter 1 Introduction TODE TOrch DEcoder is a continuous speech recogniser based on a time synchronous beam search algorithm that is compatible with the Torch ma chine learning library It s purpose is to satisfy the general speech decoding needs of researchers at IDIAP and in the wider speech community TODE has been designed to be a flexible recogniser with a straightforward imple mentation that overcomes some of the limitations of other popular decoders while maintaining an acceptable level of efficiency The major features of TODE are Efficient beam search decoder Can be used with both ANN and GMM based acoustic models Accepts features or emission probabilities as input N gram language modelling with full back off and caching Supports many commonly used file formats model definition ANN weights features language model etc Uses a linear
2. berkeley edu gt This manpage was written by Su Lin Wu lt sulin icsi berkeley edu gt SEE ALSO isr train 1 NOTES Note that for y0 compatibility it is necessary for the pri ors file to contain only numbers Any extraneous words or lines will cause errors Also yO does not currently check for the number of priors matching the number of neural ICSI Last change Date 1995 11 22 18 06 44 1 ICSI SPEECH SOFTWARE PRIORS 5 network outputs ICSI Last change Date 1995 11 22 18 06 44 2 Appendix C Norms File Format Reproduction of ICSI man page 24 ICSI SPEECH SOFTWARE NORMS 5 NAME norms RAP style speech feature normalization file DESCRIPTION The norms file format is used to store speech featur til normalization data A norms file is typically associated with a specific pfile norms files are used by mlp training and feed forward programs such as bob 1 CLONES gntrain 1 and gqnforward 1 The norms file consists of two vectors of information a vector of means for each feature in the feature file and a vector of the reciprocal of the standard deviation of each feature in the feature fil The format of the vectors is tagged ASCII as produced by the RAP matrix vector library FORMAT vec lt lt mean vec lt The norms o file features fo
3. lt string gt The value field is not required for boolean options Some options are mandatory eg a dictionary file must be defined All TODE options are described in detail in the folowing sections A summary of all options can be obtained by typing tode help 3 1 General Options 3 2 input fname Required Format Summary Details Default input name lt string gt Describes where feature or emission probability input file s are located If the input format see input format below is an archive format ie lna archive or online trs archive then the string value denotes the actual archive file Otherwise the string value specifies the file that contains the filenames of the individual input files undefined 3 2 1 input_format Required Yes input format lt string gt Format Summary Describes the format of the input files Details Default Valid file formats are e htk HTK feature file readable by Torch IOHTK class with 1 utterance per file lna LNA 8 bit emission probabilities see Appendix E with 1 utter ance per file lna archive LNA 8 bit emission probabilities with all utterances in a single big archive file online ftrs Online features format see Appendix D with 1 utter ance per file online ftrs archive Online features format with all utterances in a single big archive file The format of input files must be compatible with the acoustic model settings und
4. 12 3 3 12 Required Format Summary Details Default am online norm alpha v No am online norm alpha v lt real gt The update constant for feature variances during online normalisation This option is only used during HMM ANN decoding with online normal isation of features At each time step and for each feature dimension the existing variance value is scaled by 1 a and a times the square of the current feature value is added to obtain the new variance 0 005 3 4 Lexicon Options 3 4 1 lexdict fname Reguired Format Summary Details Default Yes lex_dict_fname lt string gt Specifies the file containing the dictionary used for recognition The dictionary file contains entries for all pronunciations that can be recog nised The format of each entry is word prior phi ph2 phn The prior field denotes the prior probability of a pronunciation and is optional defaults to 1 0 if omitted Multiple pronunciations of the same word are permitted All phones in each entry must be present in the phone models file see am models fname undefined 3 4 2 lex sent start word Reguired Format Summary Details Default No lex sent start word lt string gt Specifies the word that starts every result sentence If specified TODE constrains all output word sequences to begin with this word The sentence start word can be the same as the silence word and the sentence end word most com
5. Appendix H undefined 14 3 9 2 1m_ngram_ order Required Format Summary Details Default No 1m_ngram_order lt integer gt Specifies order of N gram to use for the language model The value specified must be lt the order of the language model file A value of 0 results in no language model being used during decoding Note that for N grams with N gt 2 the language model is incorporated in an approximate way In the tri gram LM case N 3 when evaluating a transition from w to wj the predecessor word of w say w as determined by the Viterbi search is used to retrieve the LM prob that gets associated with the transition between w and wj 0 3 9 3 lm_scaling factor Required Format Summary Details Default No Im scaling factor lt real gt Scales language model probabilities during decoding Whenever a language model probability is retrieved in log domain it is multiplied by this factor before being incorporated in the decoding 1 0 3 6 Beam Search Decoding Options 3 6 1 dec_int_prune window Required Format Summary Details Default No dec int prune window lt real gt Specifies the log window used for pruning hypotheses in word interior states Needs to be a positive log value At each time step during decoding a threshold is calculated by subtracting this constant from the score of the best word interior hypothesis Any interior state hypotheses that have scores below
6. TODE User Manual Darren C Moore Dalle Molle Institute for Perceptual Artificial Intelligence IDIAP CP 592 rue du Simplon 4 1920 Martigny Switzerland moore idiap ch http www idiap ch moore January 31 2003 Contents 1 Introduction 3 2 Installation 5 3 How to use 6 3 1 General Options er AAA EES T 3 2 input fname AAA E 7 J2 inp t format ea a OS 7 3 2 2 gt SOUBPUE NAME tr a a ee ri 8 dido OUTPUT CEM a a ad OA 8 3 2 4 wrdtrns_fname lt a 8 3 2 5 msec LOPES LES ARA A a 9 3 3 Acoustic Model Options a uan 4A A a A 9 3 3 1 am models_fname ica dada a eg k 9 33 2 ames Ll pho e pulsos tos LA o a a 4 9 3 3 3 Sal pase Phone RA A A A di 10 3 3 4 amphonedelpen 10 3 3 0 am_apply pause del pe 10 3 3 6 am priors_fname ei ae Se e Se ile ERS 11 33 amip ame A ed ab aa 11 3 3 8 ammlp cw size i ie A 11 3 3 9 am norms fname a AE Ae a 12 3 3 10 am_online norm ftrs il ee ip gey es 12 3 3 11 am_online norm_alpham aoao 12 3 3 12 am _online_norm_alpha V veee a 13 dek Lexicon Options a so bale Da oR eats sD et e Bete 13 ili Lex di eb fnnam yy erk ee ea E 13 3 4 2 lex sent_ start word leri make daa 13 DA HE Ga Ww p 34 3 eler sent end Word e ls id da 3 4 4 Slee STA word e ki oh bee e il ia 3 5 Language Model Options eee la ed os A A a sac a has A a ee SS e a BS SSD Sones S 3 5 2 Sse ram Order a rta BOs 3 5 3 lm_scaling factor o de
7. ability vector corresponding to that state 1 for each state The number of integers on this line equals the number of states The remaining lines specify the transition pro babilities giving the transitions out of each state prob is a floating point number not logprob An example entry for the phone aa is 30 3 0 50000 1 0 50000 Here aa has 2 non null states making 4 states total and is a left to right Viterbi model with output probabilities corresponding to acoustic probability element 1 Note that an interword pause phone model is essential to the operation of noway This between word pause model will typically contain 1 non null state that may be skipped and will use the silence distribution The interword pause model is placed at the root of the lexicon and corresponds to an optional pre word pause for edge effects it is also the acous tic realization of sentence_end Note that the name interword pause is currently hardwired in and such a model must appear in the phone models file Appendix H ARPA Language Model File Format Reproduction of man page downloaded from SRI website log N gram probabilities in ARPA files that are lt 90 0 are interpreted by TODE as oo log back off weights in ARPA files that are lt 90 0 are interpreted by TODE as 0 0 36 ngram format NAME ngram format File
8. al label files one 1 lab one 2 lab one 3 lab would be needed to identifiy instances of one even though each file contains the same entry just one Using an MLF containing MLF x one lab one two lab two three lab three lt etc gt avoids the need for many duplicate label files 3 A training database db contains directories dri dr2 dr8 Each directory contains a subdirectory called labs holding the label files for the data files in that directory The following MLF would allow them to be found MLF URAL g URAL Mn gt db dri labs gt db dr2 labs gt db dr7 labs gt db dr8 labs
9. ase 1 0 1 Standards Environments and Macros 7654 A 13 30 0 5 CAN 2 806418 7654 A 17 50 0 2 AS 0 537922 7654 B 1 34 0 2 1 6 763 7654 B 2 00 0 34 CAN 12 384530 7654 B 3 40 0 5 ADD 2 806418 7654 B 7 00 0 2 AS 0 537922 For CTM reference files a format extens marking alternate transcripts The same file format as described above strings lt ALT_BEGIN gt lt ALT gt and lt AL delimit the alternation Each tag is t with a conversation id channel and duration time The alternation is begun using the word terminated using the word lt ALT_END gt and end are at least 2 alternativ sequences separated by the word lt ALT gt can contain any number of words An emp nifies a null word Below is and example alternate referenc words uh and um 7654 A lt ALT_BEGIN gt 7654 A 12 00 0 34 UM 7654 A lt ALT gt 7654 A 12 00 0 34 UH 7654 A gt lt ALT END gt SEE ALSO sclite 1 BUGS COMMENTS Please contact Jon Fiscus at NIST with a comments at the email address jfiscus n 301 975 3182 Please include the sclite and any other relevant informati Scoring Pkg Last change SCLITE Release treal ctm 5 ion exists to permit alternation uses the except three word T_END gt are used to ted as a word for the begin and s lt ALT_BEGIN gt and In between the start time marked word Each word sequence ty alt
10. deviations used to normalise features Details The norms file is only used during HMM ANN decoding with features as input If specified each input feature vector is normalised before it is input to the MLP This file must be in ICSI norms format see Appendix C The number of means and inverse stddevs in the file must be equal to the number of input feature vector elements If a norms file is not specified features are read from file and input to the MLP without modification Default undefined 3 3 10 am_online norm_ftrs Required No Format am online norm ftrs Summary Activates online normalisation of input features Details This feature is only used during HMM ANN decoding with features as in put and when a norms file is defined If specified a simple first order online mean and variance normalisation is applied to each feature dimen sion The feature means and variances are updated at each time step see am online norm alpha m and am_online norm alpha v below Default false 3 3 11 amonline normalpham Required No Format am online norm alpha m lt real gt Summary The update constant for feature means during online normalisation Details This option is only used during HMM ANN decoding with online normal isation of features At each time step and for each feature dimension the existing mean value is scaled by 1 am and amp m times the current feature value is added to obtain the new mean Default 0 005
11. e Appendix B The ordering of the prior probabilities must match the order in which phone models are defined in the models file Any emission probability used for decoding whether it originates from an LNA file or is computed on the fly by an MLP is divided by its corresponding prior probability before being used in decoding calculations undefined 3 3 7 ammlp fname Required Format Summary Details Default No am_mlp_fname lt string gt Specifies the file containing MLP weights The file must be in MLPW binary format see Appendix A The file is required for HMM ANN decoding when using features as input ie input format is htk online_ftrs or online_ftrs_archive and the models file is in Noway format undefined 3 3 8 ammlp cw size Reguired Format Summary Details Default am mlp cw size lt integer gt Specifies the context window size to use with an MLP Reguired when performing HMM ANN decoding with features as input The feature vector size multiplied by this number must egual the number of input units in the MLP Note that timing output information eg when using output ctm option will be affected The timings will correspond to the input feature file with the first and last x 1 vectors stripped where N is the context window size undefined 11 3 3 9 am_norms_fname Required No Format am norms fname lt string gt Summary Specifies the file containing means and inverse standard
12. eech recognizers via the NIST sclite program Both the reference and hypothesis input files can share this format The ctm file format is a concatenation of time mark records for each word in each channel of a waveform The records are separated with a newline Each word token must have a waveform id channel identifier A B start time dura tion and word text Optionally a confidence score can be appended for each word Each record follows this BNF for mat CIM lt F gt lt C gt lt BT gt lt DUR gt word lt CONF gt Where lt F gt gt The waveform filename NOTE no pathnames or extensions are expected lt C gt gt The waveform channel Either A or B lt BT gt gt The begin time seconds of the word measured from the start time of the file lt DUR gt gt The duration seconds of the word lt CONF gt gt Optional confidence score It is proposed that this score will be used in the future The file must be sorted by the first three columns the first and the second in ASCII order and the third by a numeric order The UNIX sort command sort 0 1 1 2 2nb 3 will sort the words into appropriate order Lines beginning with are considered comments and are ignored Blank lines are also ignored Included below is an example 7 7 Comments follow 17 7 The Blank lines are ignored 7 7654 A 11 34 0 2 YES 6 763 7654 A 12 00 0 34 YOU 12 384530 Scoring Pkg Last change SCLITE Rele
13. efined 3 2 2 output_fname Reguired Format Summary Details Default No output_fname lt string gt Specifies where decoder output will be written stdout 3 2 3 output_ctm Required Format Summary Details Default No output ctm Specifies that the output is to be written in CTM format see Appendix F false 3 2 4 wrdtrns fname Reguired Format Summary Details Default No wrdtrns name lt string gt Specifies a file containing reference transcipts for all input utterances If a reference transcription file is specified then a verbose output is provided by the decoder showing the input file as well as expected and actual re sults for each utterance In addition after all input files have been decoded recognition statistics are computed and output accuracy insertions substi tutions deletions If this option is not specified then only the recognition output words are output 1 utterance per line If the input file format is non archive ie htk lna or online trs then the reference transcription file can be in HTK MLF format see Appendix J or raw format 1 utterance per line The ordering of utterance transcriptions in the HTK MLF file does not need to match the order of the input files The ordering of utterances in the raw format transcription files must match the ordering of the input files For archive input formats ie lna_archive or online ftrs archive the transcr
14. ernative sig transcript for the ny bug reports or ist gov or by phone version number of on 1 0 2 Appendix G Noway Phone Models File Format Extracted from the Noway LVCSR decoder manual page Note that the interword pause phoneme discussed on the following page is not mandatory in TODE 34 phone models file This file defines the phone models It specifies the number of states including entry and exit null states the model topology the transition probabili ties and the output probability distributions associ ated with each state obtained using the acoustic input options The format of the file is as follows The first line consists of the string PHONE and the second line contains an integer giving the number of phone models The remainder of the file contains the descriptions of each phone model Within a phone HMM 0 indexes the ENTRY null state 1 indexes the EXIT null state and 2 onwards index the real emitting states The format for a phone model is lt id gt lt number of states gt lt label gt 1 2 lt probid 1 gt lt probid 2 gt lt from_state gt lt out trans gt lt to state gt lt prob gt lt from_state gt lt out trans gt lt to state gt lt prob gt Ct ct Where 1 and 2 represent dummy phone numbers for the the entry and exit states and lt probid n gt represents the element of the acoustic prob
15. format for ARPA backoff N gram models SYNOPSIS data ngram 1l n ngram 2 n2 ngram N nN l grams pw bow 2 grams p wl w2 bow N grams pwl wN end DESCRIPTION The so called ARPA or Doug Paul format for N gram backoff models starts with a header introduced by the keyword data listing the number of N grams of each length Following that N grams are listed one per line grouped into sections by length each section starting with the keyword W gram where N is the length of the N grams to follow Each N gram line starts with the logarithm base 10 of conditional probability p of that N gram followed by the words w wN making up the N gram These are optionally followed by the logarithm base 10 of the backoff weight for the N gram The keyword end concludes the model representation Backoff weights are required only for those N grams that form a prefix of longer N grams in the model The highest order N grams in particular will not need backoff weights they would be useless Since log 0 minus infinity has no portable representation such values are mapped to a large negative number However the designated dummy value 99 in SRILM is interpreted as log 0 when read back from file into memory The correctness of the N gram counts n1 n2 in the header is not enforced by SRILM software when reading models although a warning is printed when an inconsistency is encountered This allows easy textual
16. he results of the processing and data must be operated on before a complete sentence is available This situation has different requirements from e g the storage of features for MLP training and consequently the data for mat is different FORMAT The format consists of a continuous stream of frames from one or more sentences Each frame starts with a single flag byte followed by a fixed number of big endian IEEE single precision floating point values For most frames the flag byte is zero For the last frame in each sentence the flag byte is 0x80 Note that online ftrs streams contain no speech label infor mation unlike the pfile 5 file format EXAMPLE An example of a trivial online ftrs file with three features in each frame and two sentences might be 0x00 1 20 5 40 5 43 0x00 0 03 5 41 0 76 0x80 0 04 2231 0 03 0x00 0 34 0 02 T23 0x00 3 34 4 56 3423 0x00 4 34 3 43 2496 0x80 1 02 1 03 0 01 AUTHOR David Johnson lt davidj ICSI Berkeley EDU gt SEE ALSO pfile 5 anforward 1 berpdemo 1 ICSI Last change Date 1996 01 09 01 54 12 1 Appendix E LNA File Format Reproduction of ICSI man page 29 ICSI SPEECH SOFTWARE LNA 5 NAME ina compressed format for MLP output probablility files SYNOPSIS lna DESCRIPTION lna is a compression format for speech developed by Tony Robinson used by y0 1 and noway 1 There are really two ina formats 8 bit and 16 bit supp
17. insertion or deletion of parameters in a model file The proper format can be recovered by passsing the model through the command ngram order N lm input write lm output Note that the format is self delimiting allowing multiple models to be stored in one file or to be surrounded by ancillary information Some extensions of N gram models in SRILM store additional parameters after a basic N gram section in the standard format SEE ALSO ngram 1 ngram count 1 Im scripts 1 pfsg scripts 1 BUGS The ARPA format does not allow N grams that have only a backoff weight associated with them but no conditional probability This makes the format less general than would otherwise be useful e g to support pruned models or ones containing a mix of words and classes The ngram count 1 tool satisfies this constraint by inserting dummy probabilities where necessary For simplicity an N gram model containing N grams up to length N is referred to in the SRILM programs as an N th order model although techncally it represents a Markov model of order N 1 AUTHOR The ARPA backoff format was developed by Doug Paul at MIT Lincoln Labs for research sponsored by the U S Department of Defense Advanced Research Project Agency ARPA Man page by Andreas Stolcke lt stolcke speech sri com gt Copyright 1999 SRI International Appendix I HTK HMM Model Definition File Format Extracted from The HTK Book for HTK version 3 2 TODE supports o
18. iption file must be in raw format undefined 3 2 5 o msec step size Reguired Format Summary Details Default No msec step size lt real gt Specifies the step size of input frames in millieseconds Used only to compute durations when output_ctm is specified 10 0ms 3 3 Acoustic Model Options 3 3 1 am models fname Required Format Summary Details Default Yes am models name lt string gt Specifies the file containing the HMM definitions for the phone models If HMM GMM decoding is required then the models file must be in simple HTK model definition format see Appendix I If HMM ANN decoding is required then the file must be in Noway model definition format see Ap pendix G All phones mentioned in the dictionary file must have a model defined in this file There can be additional phone models defined eg a short pause model undefined 3 3 2 am_sil_phone Required Format Summary Details Default No am_sil_phone lt string gt Specifies a silence phone If defined there must be a corresponding model defined in the phone models file Specifying a silence phone has no effect unless a pause phone is also defined undefined 3 3 3 o am pause phone Reguired Format Summary Details Default No am pause phone lt string gt Specifies a pause phone If defined there must be a corresponding model defined in the phone models file When word HMM s are crea
19. lexicon Implementation is straightforward and can be readily modified upgraded to meet the needs of researchers Easily adapted for use in non speech decoding applications Fully supported with development ongoing 3 This document describes how to use the stand alone TODE executable for speech recognition tasks Chapter 2 Installation TODE is distributed as part of the Torch machine learning library http www torch ch which means that you must download and install Torch first in order to compile and use TODE The steps for installation are as follows 1 Download and follow the Torch installation instructions http www torch ch matos install pdf 2 The following Torch packages are required to build TODE decoder core datasets distributions gradients speech examples 3 You might want to use the FLOATING DOUBLE option in your Makefile options lt os gt file TODE will be slower but the extra float ing point precision may be required depending on your application 4 The main TODE source file tode cc is located in your Torch di rectory under examples decoder Follow the steps in section 5 of the Torch installation instructions to compile this file TODE is now ready for use Chapter 3 How to use The TODE command line is of the form tode lt option gt lt option gt An option consists of one or two command line arguments a keyword eg input file followed by a value eg
20. lly 1 4 the size of ASCII RAP3 files or 1 2 the size of gzipped ASCII files and load 5 10x faster Since the weights are calculated on the SPERT boards using 16 bit fixed point arithmetic there is usually no accuracy loss in storing them this way You shouldn t ever have to access these files directly Instead use the QuickNet class QN_MLPWeightFile_MLPW 3 Little tested at present AUTHOR Dan Ellis lt dpwe ee columbia edu gt SEE ALSO ancopywts 1 ICSI Last change Date 2001 03 13 19 56 41 2 Appendix B Priors File Format Reproduction of ICSI man page 21 ICSI SPEECH SOFTWARE PRIORS 5 NAME priors file format for list of prior probabilities DESCRIPTION priors is a yO compatible file format for prior probabili ties These are a by product of training and are used to compensate for inequities in the amount of training data for each target The file has the following format lt 0 s prior gt lt 1 s prior gt lt 2 s prior gt lt n s prior gt Where lt n s prior gt is the prior probability of neural network output number n EXAMPLE Here is a simple example of a prior file 0 85 01 04 02 05 01 01 01 0 00 O O 00 In this example the file contains prior probabilities for eight neural network outputs FILES An example file can be found in drspeech data TIMIT timit6l PHONE uniform f AUTHOR Dr Speech lt drspeech icsi
21. monly defined as silence The presence of the sentence start word in the language model is optional TODE removes the sentence start word before writing the decoding result to the output file undefined 13 3 1 3 lex sent end word Reguired Format Summary Details Default No lex sent end word lt string gt Specifies the word that ends every result sentence If specified TODE constrains all output word seguences to end with this word The sentence end word can be the same as the silence word and the sentence start word most commonly defined as silence The presence of the sentence end word in the language model is optional TODE removes the sentence end word before writing the decoding result to the output file undefined 3 4 4 lex_ sil word Reguired Format Summary Details Default No lex sil word lt string gt Specifies the silence word Specifies a silence word This word is treated like any other word during decoding but all instances in the final output word sequence are removed before the decoding result is written to file The silence word can be the same as the sentence start word and the sentence end word The silence word is ignored during language model calculations undefined 3 5 Language Model Options 3 5 1 lm fname Required Format Summary Details Default No lm_fname lt string gt Specifies the file containing the N gram language model The file must be in ARPA format see
22. nly the format shown in Figure 7 3 on the following page The lt GCONST gt and lt STREAMINFO gt keywords are also permitted in the file but are ignored by TODE Any other variation from the format of Figure 7 3 will cause TODE to return an error 39 7 2 o Basic HMM Definitions 98 h hmm2 lt BeginHMM gt lt VecSize gt 4 lt MFCC gt lt NumStates gt 4 lt State gt 2 lt NumMixes gt 2 lt Mixture gt 1 0 4 lt Mean gt 4 0 3 0 2 0 2 1 0 lt Variance gt 4 1 0 1 0 1 0 1 0 lt Mixture gt 2 0 6 lt Mean gt 4 0 1 0 0 0 0 0 8 lt Variance gt 4 1 0 1 0 1 0 1 0 lt State gt 3 lt NumMixes gt 2 lt Mixture gt 1 0 7 lt Mean gt 4 0 1 0 2 0 6 1 4 lt Variance gt 4 1 0 1 0 1 0 1 0 lt Mixture gt 2 0 3 lt Mean gt 4 2 10 0 1 0 1 8 lt Variance gt 4 1 0 1 0 1 0 1 0 Fig 7 3 Simple Mixture Gaussian HMM Notice that only the second state has a full covariance Gaussian component The first state has a mixture of two diagonal variance Gaussian components Again this illustrates the flexibility of HMM definition in HTK If required the structure of every Gaussian can be individually configured Another possible way to store covariance information is in the form of the Choleski decomposition L of the inverse covariance matrixi e DT LL Again this is stored externally in upper triangular form so L is actually stored It is distinguished from the normal inverse covariance matrix by using the keyword l
23. of wa If this option is used the application of language model probabilities is delayed and P wa w is applied to hypotheses that reach the final state of wa w is the predecessor word for the hypothesis This approximation can result in significant computational savings less LM lookups false 16 3 6 5 dec_verbose Required No Format dec_verbose Summary Specifies that frame by frame decoding information is to be output Details Default false 17 Appendix A MLPW File Format Reproduction of ICSI man page ICSI SPEECH SOFTWARE MLPW 5 NAME mlpw Family of binary encoded neural net weights file for mats used by QuickNet DESCRIPTION The mlpw file format is used to store neural net weights in a more compact and more quickly accessed format than the traditional ASCII RAP3 weights 5 format The same informa tion is stored in the same order but the values are coded typically as 32 bit floats or 16 bit fixed point ints less often as 8 or 32 bit ints or 64 bit doubles Each section e g weights or biases of a particular layer may be coded in a different format mlpw files are usually created with qnstrn 1 or will be when it is modified to support them and con
24. orted by the software but everybody just uses 8 bit Basically each floating point probability is guantized to an 8 or 16 bit integer by the following formula intval floor LNPROB_FLOAT2INT log x VERY_SMALL where LNPROB FLOAT2INI is 24 for 8 bit and 5120 for 16 bit The int is then pinned to between 0 and 255 or 65535 VERY_SMALL prevents ugliness if the probability is 0 0 As for the actual file format it is a binary stream of frames where each frame consistes of a fixed number of 8 or 16 bit values EOS Val0 Vall Val2 Valn EOS is 0x80 if the frame is the last frame in a sentence 0 otherwise ValQ Valn are the quantized integers corresponding to the probabilities SEE ALSO lna2y0new 1 rap2lna 1 AUTHOR This man page was written by Jonathan Segal lt jsegal ICSI Berkeley EDU gt Eric Fosler lt fosler ICSI Berkeley EDU gt updated by Alfred Hauenstein lt alfredh icsi berkeley edu gt ICSI Last change Date 1996 08 20 18 56 16 1 Appendix F CTM File Format Standards Environments NAME ctm Definition of DESCRIPTION and Macros This describes the time marked conversation input ctm 5 f time marked conversation scoring input to files be used for scoring the output of sp
25. rmat is gt mh O ach feat features lt 1 s EXAMPLE ICSI ve gt 1 AaAaannnNnd O 59 ve 1 3 De 4 4 o tandard deviation of c 18 638622e 01 881508e 01 207185e 01 973742e 01 367414e 01 349086e 01 126812e 01 952942e 02 188954e 02 105962e 00 939942e 03 394057e 01 015354e 01 305043e 01 061634e 02 421233e 02 521029e 03 096025e 02 GY 18 127764e 00 574011e 00 911481e 00 302862e 00 556445e 00 Last change ure gt on Date per line gt ach feature on per line gt 1995 10 19 04 35 16 ICSI SPEECH SOFTWARE NORMS 5 44442 96100 395655e 01 333607e100 928129e 00 324948e 01 59079le 01 079605e 01 311077e 01 412703e 01 254844e 01 489987e 01 016648e 01 070975e 00 RE OZ OCGCIMU MKE PEN AUTHOR David Johnson lt davidj ICSI Berkeley EDU gt SEE ALSO bob 1 gnnorm 1 gntrain 1 gnforward 1 pfile 5 ICSI Last change Date 1995 10 19 04 35 16 2 Appendix D Online Features File Format Reproduction of ICSI man page 27 ICSI SPEECH SOFTWARE ONLINE_FTRS 5 NAME online ftrs format for feature streams for online use DESCRIPTION The online ftrs file format is used when passing speech feature files around during online recognition In this context online means real time i e ther is someone waiting for t
26. t LLTCovar gt in place of lt InvCovar gt 3 The definition for hmm3 also illustrates another macro type that is o This macro is used as an alternative way of specifying global options and in fact it is the format used by HTK tools when they write out a HMM definition It is provided so that global options can be specifed ahead of any other HMM parameters As will be seen later this is useful when using many types of macro As noted earlier the observation vectors used to represent the speech signal can be divided into two or more statistically independent data streams This corresponds to the splitting up of the input speech vectors as described in section 5 13 In HMM definitions the use of multiple data streams must be indicated by specifying the number of streams and the width i e dimension of each stream as a global option This is done using the keyword lt Streamlnfo gt followed by the number of streams and then a sequence of numbers indicating the width of each stream The sum of these stream widths must equal the original vector size as indicated by the lt VecSize gt keyword 3The Choleski storage format is not used by default in HTK Version 2 Appendix J HTK MLF File Format Extracted from The HTK Book for HTK version 3 2 TODE supports a restricted MLF format similar to example 2 on the following page The first line of the file must be MLF This is followed by a number of transcription entries A transcription entr
27. ted by contenating individual phone models the pause model is added to the end of each word model If the phone transcription for a word as defined in the dictionary file ends with a pause phone then an additional pause is not added If a silence phone is specified and the phone transcription for a word ends with a silence phone then the pause phone is not added A pause model with an initial final state transition is valid undefined 3 3 4 am_ phone del pen Reguired Format Summary Details Default No am phone del pen lt real gt Specifies the non log phone level deletion penalty This value is used to scale the non log transition probabilities for transitions originating from the initial state of each phone model When phone models are concatenated to form word level HMM s this scaling serves as a phone deletion penalty 1 0 3 3 5 am_apply pause del pen Reguired Format Summary Details Default No am apply pause del pen Indicates that the phone deletion penalty is to be applied to the model for the pause phone This option is used only if a pause phone is defined false 10 3 3 6 am_priors_fname Reguired Format Summary Details Default No am priors name lt string gt Specifies the file containing the phone prior probabilities The phone priors are required for HMM ANN decoding but are not used for HMM GMM decoding The format of the file must be in ICSI priors format se
28. this threshold are deactivated and removed from further consideration A 0 or negitive value results in no pruning of interior state hypotheses 0 0 15 3 6 2 dec_end prune window Reguired Format Summary Details Default No dec end prune window lt real gt Specifies the log window used for pruning hypotheses in word end states Needs to be a positive log value At each time step during decoding a threshold is calculated by subtracting this constant from the score of the best word end hypothesis Any word end state hypotheses that have scores below this threshold are deactivated and removed from further consideration The pruning occurs before language model probabilities are applied A 0 or negitive value results in no pruning of end state hypotheses 0 0 3 6 3 dec word entr pen Reguired Format Summary Details Default No dec word entr pen lt real gt Specifies the log word insertion penalty used during decoding The word insertion penalty value most commonly a negative log value gets added to word end hypothesis scores during evaluation of word transitions 0 0 3 6 4 dec_delayed_lm Required Format Summary Details Default No dec delayed Im Specifies that the application of language model probabilities is to be delayed Usually a language model probability P wa w1 assuming a bigram LM is applied when a hypothesis makes a transition from the final state of w to the initial state
29. verted to and from other formats with qncopywts 1 They will be read directly by future versions of gnsfwd 1 and ffwd 1 The header The header as currently defined consists of 5 4 byte integers in big endian order magic magic number 0x4D4C5057 MLPW version version code 20010313 today nettype nettype version e g softmax nlayers count of unit layers 3 for MLP3 nsections count of sections 4 for MLP3 Then follow nlayers 4 byte ints specifying the n mber of units in each layer starting at the input followed by the sections Each section also has a small header consisting of 3 4 byte integers sectiontype ON SectionSelector tag for this section numvalues how many weights in this section datatype data type flag bytes wt 32 for float For fixed point data formats only this is followed by a 4 byte int giving the fixed point exponent for this sec tion After this come the actual coded weight values In an MLP3 there are 4 sections 0 input to hidden weights 1 hidden to output weights 2 hidden layer bias weights and 3 output layer bias weights Since bias values occupy a slightly different range they are typically distributed around log n_units they ar often stored with a larger exponent and or more bits per weight The MLPW file format supports this without difficulty ICSI Last change Date 2001 03 13 19 56 41 1 ICSI SPEECH SOFTWARE MLPW 5 NOTES BUGS Short format MLPWS files are typica
30. y consists of a filename line followed by the words in the transcription on separate lines and is ended with a line containing the gt character The filename must be enclosed in double guotes The filename can be relative or absolute The filename should have an extension eg lab TODE prunes all path information and the file extension from each filename and attempts to match the result to an input filename Therefore wildcards are not permitted after the final in the file name After pruning of path and extension information the resulting string should uniquely identify an input file 41 6 3 Master Label Files 6 3 4 MLF Examples 1 Suppose a data set consisted of two training data files with corresponding label files a lab contains 000000 590000 sil 600000 2090000 a 2100000 4500000 sil b lab contains 000000 990000 sil 1000000 3090000 b 3100000 4200000 sil 88 Then the above two individual label files could be replaced by a single MLF MLF a lab 000000 590000 sil 600000 2090000 a 2100000 4500000 sil x b lab 000000 990000 sil 1000000 3090000 b 3100000 4200000 sil 2 A digit data base contains training tokens one 1 wav one 2 wav one 3 wav two 1 wav two 2 wav two 3 wav etc Label files are required containing just the name of the model so that HTK tools such as HEREST can be used If MLFs are not used individual label files are needed For example the individu

TODE User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents