Home
        User manual of the landmark-based speech recognition toolkit
         Contents
1.  0  0   This feature has not been tested in  a while  so please prefer not to use it    Only relevant for frame based  training   randomSelectionParameter2   Instead of picking all frames pick frames randomly  For example  ran   domSelectionParameter2    0  0  0  0    This feature has not been tested  in a while  so please prefer not to use it  Only relevant for frame based  training   middleFlag1   Specify if only the frames from a middle portion of each label is to be used  for training  1  middle 1 3 segment  2  middle 2 3 segment  3  only the  center frame  Example  middleFlagl    0  0  0  0    Only relevant for  frame based training   middleFlag2   Specify if only the frames from a middle portion of each label is to be used  for training  1  middle 1 3 segment  2  middle 2 3 segment  3  only the  center frame  Example  middleFlagl    0  0  0  0    Only relevant for  frame based training   maxclass1   Maximum number of samples to be extracted for class  1  Example   maxclassl    20000  5000  20000  20000    Only relevant for frame based  training   maxclass2       Maximum number of samples to be extracted for class  1  Example  max   class2    20000  5000  20000  20000    Only relevant for frame based    training      SVM parameter settings    trainingFileStyle      Light      Choice between    Light    and MATLAB   If MATLAB is chosen then a bi   nary file is written     kernelType    2  2  2  2    Same usage as SVM Light  10  Use known optimal gammas  Set the  optimumGa
2. 1    User manual of the landmark based speech  recognition toolkit    To accompany     Speech recognition based on phonetic features and acoustic  landmarks     PhD thesis  University of Maryland  2004    Amit Juneja    February 12  2005    Synopsis    System Requirements    A  SVM Light must be installed on the system B  Phoneme label files in TIMIT  format must be availabe C  Frame by frame computed acoustic features in bi   nary format  explained below  or HTK format D  Python 2 2 E   nix  Unix   Linux  etc     It may run on Windows but I never tested it     1  train_config py    Usage  train_config py  lt Config File gt     This is the main executable for phonetic feature classification  It can  a   create files for use with MATLAB  SVM Light and LIBSVM by picking  up acoustic parameters either by frame by frame basis or on the basis of  landmarks   b  train SVM classifiers  available only for SVM Light  and  LIBSVM has to be run separately  while optimizing the kernel parameter  and the penalty  bound on alphas  with different methods   minimum  XiAlpha estimate of error  minimum number of support vectors  minimum  cross validation error   c  do SVM classification on test files created by the  code in a separate pass   d  create histograms  SVMs for multiple phonetic  features can be trained and tested at the same time  Please read the help  in README config for formatting the config file because this is the most  crucial step       print landmarks py    Usage  print_lan
3. am   eters are to be extracted  For example  classes_1        V     SC        N         ST        VST      Pr     w     y     w  r  T     ng          start end     VB        epi        CL       See  the file labels py for the mapping used for phonemes to broad classes   classes_2   The  1 class members  either phonemes or broad classes but not both in  any classification  from which the parameters are to be extracted  For ex   ample  classes_2        V     SC        N      PST     VST   Pw     m        y     w  r  T   ng          start end     VB        epi        CL        See the file labels py for the mapping  used for phonemes to broad classes    useDurationF lag   A flag for each classification  for example   0  0  0  0   A flag can take a  value 1 only when the corresponding parameterExtractionStyles flag is set  to 7  landmark based training      specificDataF lags   If broad classes are used in classes_l and classes_2 for any of the classifi   cation  set it to 0 otherwise set it to 1  for that classification   parameterExtractionStyles   0  Frame based training  1  IGNORE  not tested in a while  7  landmark   based testing     useDataBound   Setting this flag to 1 will use an upper bound on the number of samples  extracted for each classification   The number is set by the values max   class  and maxclass2 explained below     place VoicingSpecifications   This selects the kind of landmark training for each classifier for which land   mark training is chosen  For vowe
4. andmarkApsFlags      V    0    Fr    0     ST    1     SILENCE       0     SC    1 will cause the code to use measements  for the landmarks for ST and SC  and only the phoneme labels will be  used to find the other landmarks  The parameters defined by the land   markAps will be used   landmarkAps  The index of the parameter for each of the measurements   onset  offset   totalEnergy  syllabicEnergy  sylEnergyFirstDiff   has to be set below  For    10    example  landmarkAps      onset     17     offset     18     totalEnergy     18     syl   labicEnergy     13     sylEnergyFirstDiff     32   Note that the first parameter  is 1 and not zero  The maximum value of    onset    parameter will be used  to find stop burst  The maximum value of totalEnergy will be used to  find the vowel landmark its minimum value will be used to find the dip  of an intervocalic sonorant consonant  The maximum value of the sylEn   ergyFirstDiff will be used to find the SC offset  while moving from SC to  vowel  and its minimum value will be used to find the SC onset  while  moving from vowel to SC      11    
5. ber of initial frames will be used from classes_2    The middleFlag2 will be ignored  For example  init2    0  1  0  0    Only  relevant for frame based training   delstart1   Delete an initial number of frames when picking frames for frame based  training from a label in classes_1  For example  delstart1    0  0  0  0     Only relevant for frame based training  Ignored if a corresponding init1  value is set to non zero    delstart2   Delete an initial number of frames when picking frames for frame based  training from a label in classes_2  For example  delstart2    0  0  0  0     Only relevant for frame based training  Ignored if a corresponding init2  value is set to non zero    delend1   Similar to delstart1 but for end frames    delend2   Similar to delstart2 but for end frames    contextFlag1   Specify the left and right context of eaach of the labels in classes_1  Only  the phonemes broad classes with the specified context will be used  If the  ith element of the list contains    left    or    right    or both  then only those  phonemes will be used that have the phonemes or broad classes specified  in the context1 dictionary in the designated context  Currently this is  only implemented for frame based training  For landmark based training   use placeVoicingSpecification   The example file context_config py shows  an example of how to use context  If phonemes are specified in classes_1  and classes_2  then the context must also be phonemes  and the same for  broad clas
6. ch broad class  The  following landmarks are computed   Vowel  V     Vowel onset point  VOP    Peak  Sonorant consonant  SC   nasal or semivowel    For postvocalic case    Syllabic peak of previous vowel  SC onset  syllabic dip which is the mid point  of the SC segment in this case   For prevocalic case   syllabic dip which is the mid  point of the SC segment in this case  SC offset  vowel onset   syllabic peak of the  following vowel   Intervocalic case   Syllabic peak of previous vowel  SC onset   syllabic dip which is the mid point of the SC segment in this case  SC offset   vowel onset   syllabic peak of the following vowel  Stop  ST     Burst  Release   Fricative   start frame  1 4 frame  middle frame  3 4 frame  end frame  Silence    Silence start  silence end  The silence landmarks are useful for classification of  the stop place features in postvocalic contexts    The landmarks shown above for each broad class must be noted because  this knowledge is essential for doing landmark based experiments  In landmark  based experiments  you need to specify where acoustic parameters are to be  picked at  For example  if acoustic parameters 1 23 27  this numbering is for  the order in which the parameters are stored in parameter files starting with 1   are to be picked at Peak of the vowel  then the value of the Parameters variable  below for such a class has to be set as        1  23  27   such that nothing is picked  at the vowel onset point  In addition if a number of adjoin
7. dmarks py  lt Config File gt     This will use the same config file as needed by train_config py   It will  create a landmark label file for each utterance in a list of utterances pro   vided in the config file  The landmarks can be generated in one of the two  ways   a  using knowledge based acoustic measurments  b  using only the  phoneme labels       collate_aps py  Usage  collate_aps py    Combines two streams of acoustic parameters  for example  one stream  of MFCCs and one stream of knowledge based acoustic measurements   by choosing only specified set of measrements from both the streams  It  can also compute and append delta and acceleration coefficients for the  selected measurements from both the streams  Binary and HTK format  for both input and output are accepted  To create output files in HTK  format  ESPS must be installed on the system  especially  the  btosps    and   featohtk    commands must be available  To customize the command opent  the file collate_aps py and follow the instructions       phn2lab py  Usage  phn2lab py  lt phn file gt   lt lab file gt     Converts phn labels to ESPS format labels that can be displayed in xwaves     batch_phn2lab py  Usage  batch_phn2lab py  lt phn file list gt  Converts label files in  phn for     mat to ESPS  lab format given an input list of  phn files  It assumes that  the input files have 3 character extension       findScalingParameters py  findScalingParameters py  lt Config File gt     Uses the same config file a
8. he names of models  For example  modelFiles       rbf_model_sonor         rbf_model_stop        rbf_model_sc        rbf_model_sil          Values and flags related to the parameters used in each classification    Parameters   The list of parameters to be used for each classification  For example    1   2  15  16  19    4  5  17  18    8  13  14  15  16    9  4  5  6  7   where each  list is a list of parameter for the corresponding index of model file  SVM  data file  etc  These examples are good only for frame based training   For landmark based testing  parameters are specified for each landmark  as exemplified in the synopsis above  More examples can be found in the    config_mfc_hie py  example file  file provided with the toolkit    Doublets        nClasses   Not tested in a while and better not to use  Assign Doublets        nClasses  to have the code ignore it    Adjoins   The number of adjoining frames along with the current frame to be used  for classification  For example     4   3   2   1  0  1     4   3   2   1  0  1   2  3  4      16   12   8   4  0  4  8  12  16  20  24     3   2   1  0  1  2    For  landmark based training  adjoins have to be specified for each landmark  as stated in the synopsis above    numberOfParameters   The number of parameters per frame in each acoustic parameter file   stepSize   The step size of the frames in milliseconds  Required for reading the la   bels     classes_1  The  1 class members  phonemes broad classes  from which the par
9. ht formatted data  files will be written   filelist  Full path of a list of acoustic parameter files   shuffleFilesFlag  If this is set to 1  the list of files will be shuffled before use  apFileExtLen  This an integer telling the length of extension of each acoustic parameter  file  The code takes off this many number of characters and appends the  label extension  refLabelExtension  to find the label file in the directory  labelsDir    refLabelExtension  The extension of the label file  for example     phn     SkipDataCreationFlag  If this flag is test to 1  then no SVM formatted data files are created  This  is used to only run the SVM Light  for example  to optimize the value of  gamma or C   SkipModelTrainingFlag  Setting this to 1 will skip model training  This can be used to  1  only  create the SVM Light formatted data files so as to test with other toolkits  such as LIBSVM of MATLAB externally   2  create SVM Light format   ted data files that can be used as validation files for SVM training in a  separate pass   SkipBinningFlag  Setting this to 1 will skip creation of bins for probabilistic modeling of  SVM outputs  This not relevant for this version of teh code   binaryClassificationFlag  If this flag is set to 1  SVMs will be run on the files in the array SvmIn   putFilesDevel  classificationType   2  1  Non Hierarchical 2  Hierarchical   Please ignore this flag in this version       of the toolkit  It is only relevant in the full version   nBroadClasses   Please 
10. ignore this value in this version of the toolkit  It is only relevant  in the full version  Give it any value but do include it in the config file   nBroadClassifiers   4   Not relevant for classification   Please ignore this value in this version of the toolkit  It is only relevant  in the full version  Give it any value but do include it in the config file   nClasses  The number of SVMs   Not required but it can ease writing of certain  variables in the config file that are same across all the SVMs to be trained   For example in python  a     z     5 will assign     z        z        z     z     z  toa   selectiveTraining   The code allows for carrying out the designated tasks on a specified set  of features instead of all the features  Even if config file is written for 20  SVMs  features   you can specify which features to analyze  For example   selectiveTraining    0 3 5 6    apDataFormat   0  binary  1  HTK           Values related to the names of SVM Light format files and model files to  be created    SvmInputFiles   The names of SVM Light formatted files to be created  For example   SvmInputFiles       LightSonor        LightStops      LightSC         LightSilence      SvmInputFilesDevel   The names of files used for validation  When optimizing a kernel re   lated parameter  these files will be used to minimize the error on  For  example  SvmInputFilesDevel       LightSonorDevel        LightStopsDevel         LightSCDevel        LightSilenceDevel       modelFiles   T
11. ing frames is to be  used at Peak landmark then the value of Adjoins is set as        4   2  0  2  4   and  then the parameters  1  23  27  will be picked from  Peak   4 th frame   Peak    2 nd frame and so on  For a particular classification  the current version of the  code has a constraint that if the number of parameters at a landmark for a broad  class are non zero  then the number of parameters and the number of adjoins  for that landmark must be the same as other non zero ones  For example  if  some parameters have to be picked from the VOP  then it should also have three  parameters  considering above example  computed using the adjoins of size five   for example   4   1  0  1  4   Of course  the parameters and the adjoins may be    different    A single config file can be used for a number of SVM classification experi   ments  In the config file you specify a list of SVM Light formatted data files   a list of model files names  indices of parameters to be extracted for each clas   sification  etc  The i   th element of each of these lists determine how the i   th  experiment is done     1  Flags and values related to kinds of tasks and various inputs  labels and  acoustic parameters   outputDir  The full path of the directory containing the acoustic parameter files  A  misnomer because this directory is more of an input   labelsDir  The full path of the directory containing the label files in TIMIT format   modelDir  The outout directory where model files and SVM Lig
12. ls the options are    generic     all vowels will  be used      preSConly     vowels with no following sonorant consonant will  be used and postSConly  vowels with no preceding vowels will be used    For fricatives  the options are    generic     all fricatives      genericPreVocalic           fricatives before vowels and sonorant consonants      genericPost Vocalic      fricatived after vowels or sonorant consonants      genericIsolated     frica   tives with no adjoining sonorants   For sonorant consonants  the options  are    genericInterVocalicSC     as the name suggests   note that there are  five landmarks in this case      genericPreVocalicSC     three landmarks        genericPost VocalicSC     three landmarks   For stops  the only valid op   tion is    genericPreVocalic     The variable place VoicingSpecifications will be  removed in the forthcoming versions of the code and the framework will  allow the user to specify any context    init1   For frame based training this is the list of numbers of initial frames to be  extracted for each classifier  If for any classifier this value is set to non   zero  then only that number of initial frames will be used from classes_1    The middleFlag1 will be ignored  For example  init1    0  1  0  0    Only  relevant for frame based training   init2   For frame based training this is the list of numbers of initial frames to be  extracted for each classifier  If for any classifier this value is set to non   zero  then only that num
13. mmaValues below For example  kernelType    2  2  2  2   gamma Values   The set of values from which optimal is to be found  For example  gam   maValues    0 05  0 01  0 005  0 001  0 0005  0 00001    optimumGamma Values   If optimal gamma value is known for each or some of the classifications  set  it here  For example   0 01  0 001  0 001  0 01  will set 0 01 as the optimal  value for classification 0  0 001 as optimal value fot the classification of  index 1 and so on    cValuesArray    0 05  0 5  1 0  10    Values of C from which best C is to be chosen  For example  cValuesArray     0 05  0 5  1 0  10    flagCheckFor Different C   If set to 0  default C found by SVM Light will be used    svmMinCriterion   If set to    numSV    the minimum number of support vectors will be used  to get the optimum value of C as well as gamma      crossValidation    will  cause the code to use validation across the files in SvmInputFilesDevel    The files in SvmInputFilesDevel need to be created in a separate run of  the code by specifying the same names in the SvmInputFiles  BinsFilenames   The names of files that will contain the histogram binning information   For example  BinsFilenames      BinsSonor30RBF        BinsStops30RBF         BinsSC30RBF        BinsSilence30RBF       Binning is not relevant for this ver   sion of the code    probabilityConversionMethod   Choice of    bins    or    trivial      Trivial will use linear mapping from   1 1  to   0 1    binningBound   Bins will be co
14. nstructed between  binningBound and  binningBound    5  Parameters for scaling    parameterScalingF lag  If this is set to 1  the parameters will be scaled by their empirical mean and  variance  If set to 1  findScalingParameters py must run before train_config    scaleParameterFile   The full path of file to be created by findScalingParameters py and to be  read by train_config py   For example  modelDir            scalesFile     scalingFactor   The value at which standard deviation of the scaled parameters is set   scaling ToBeSkippedFor   A list of indices of features for scaling is not to be used  For example    0 4 5     6  Parameter Addition Specifications   Deprecated  should be ignored but  not deleted  addParametersFlag   0  addDirectory       dept isr labs nsl scl vol05 TIMIT_op train     temporalStepSize   2 5  fileExts       aper bin        per bin        pitch bin        soff bin        son bin      channels    1 1 1 1 1     7  Ap specifications for landmark detection  useLandmarkApsF lags  Before landmark based analysis is done  the code finds out the landmarks  using the phoneme labels and optionally using knowledge based acous   tic measurments  Landmarks are defined corresponding to broad classes  vowel  fricative sonorant consonant  nasal or semivowel   silence and stop  burst  If you want to use knowledge based measurements along with the  phoneme labels for finding landmarks for any of the broad classes  set the  corresponding flags as 1  For example  useL
15. s in train_config py to compute the scaling pa   rameters for all of the acoustic measurements  This script must be run  before running the train_config py if scaled parameters are to be used       File formats  Binary  This is plain binary format  Acoustic parameters are written    frame by frame with each parameter in    float     For example  if there are  500 frames and 39 parameter per frame  then 39 parameters for the first    frame are written first  followed by the 39 parameters of the second frame   and so on  Note  1  each parameter is written in float  2  as far as this  toolkit is concerned  linux and unix generated acoustic parameter files in  binary format are not cross compatible on these systems because the two  systems use a different byte order     2 Configuration files parameters    A number of values can be set in a config file that goes as input to the executables  train_config py   These are discusses here  Three examples of a config file  are config_broadclass_hie py  config_mfc_hie py and context_config py provided  along with the scripts  The config variables are set in python format which has  a very easy and obvious syntax  The code can be used for frame based and  landmark based training and testing  Many experiments can be carried out by  both frame based and landmark based methods  Landmarks are computed by  the sytsem automatically for each phoneme by first converting a phoneme into a  broad class label and then finding a set of landmarks for ea
16. ses        contextFlag2   Specify the left and right context of eaach of the labels in classes_2  Only  the phonemes broad classes with the specified context will be used  If the  ith element of the list contains    left    or    right    or both  then only those  phonemes will be used that have the phonemes or broad classes specified  in the context2 dictionary in the designated context  Currently this is  only implemented for frame based training  For landmark based training   use placeVoicingSpecification   The example file context_config py shows  an example of how to use context  If phonemes are specified in classes_1  and classes_2  then the context must also be phonemes  and the same for  broad classes    context1   Specify the context   Relevant only if contextFlagl is not empty  The  element corresponding to to the ith classifier is a dictionary in python  format  For example  an element may be    left     Piy        ow         right     Pk         g      Many examples of using context are in the file context_config py   context2   Specify the context   Relevant only if contextFlag2 is not empty  The  element corresponding to to the ith classifier is a dictionary in python  format  For example  an element may be    left     Piy        ow         right     Pk         g      Many examples of using context are in the file context_config py   randomSelectionParameter1   Instead of picking all frames pick frames randomly  For example  random   SelectionParameterl    0  0 
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
  Français - Welch Allyn  HP VCX V7005  SMOTEC450 Detettore di fumo  QB1 - Dado Lab  Chapter 1 - Thermo Fisher Scientific  取扱説明書 - マックスレイ  OPERATING INSTRUCTIONS MODEL 205    Copyright © All rights reserved. 
   Failed to retrieve file