Home

Speech Intelligibility Prediction Toolbox User Guide

1. 204 Main menu bar OF Menm tte Setup e Bitacoras Se eS Ee ek ere ae ed Se e OA AE OZ DAVe4 2 tee eee SS AREER A Oe Ee Se A ee 61 3 Batch Processing gt 2 0 4 2 e 4 SR ae ewe A a ee E RR eS 6 1 4 Hagermann amp Olofsson ce dt a ew pots ows Hh Sw eS Be de OLS IE oad Oe et a Ee Be Be ee A IAEA 6 2 Ment litem AUdIOSTAa aurora A A a o ee EE 6 3 Menu item Binaural Options e BBW WW W oe 1 Introduction 1 1 About SIP Toolbox Assessment of speech in all situations The Speech Intelligibility Prediction Toolbox SIP Toolbox developed in the project group Hearing Speech and Audio Technology at the Fraunhofer IDMT offers a quick and easy pre diction of the main factors affecting speech quality for different situations It can be used to easily compare different models and thereby assist the user to select the most suitable models for specific applications The SIP Toolbox designed in MATLABO offers versatile utilities to import process and represent data For users with no access to MarLaB we also offer a stand alone version of the SIP Toolbox 1 2 Requirements The software SIP Toolbox is delivered as an executable file exe for Microsorr Winpows 32bit operation systems or as a p coded function of MarLarS version 2008b or higher for MICROSOFT Winpows 32bit operation systems The SIP Toolbox contains a copy protection in terms of one or two USB dongles depending on the features We recommen
2. 0 077 0 077 0 146 0 271 right channel SII 0 155 0 155 0 335 0 440 left channel 0 155 0 155 0 305 0 440 right channel STI 0 242 0 242 0 412 0 503 left channel 0 242 0 242 0 385 0 503 right channel BATCH MODE SpeechIntel Results computed from testSl batch in txt Time stamp 24 Mar 2010 14 41 45 AI 0 077 0 077 0 171 0 271 left channel 0 077 0 077 0 146 0 271 right channel SII 0 155 0 155 0 335 0 440 left channel 0 155 0 155 0 305 0 440 right channel STI 0 242 0 242 0 412 0 503 0 242 0 242 0 385 0 503 left channel ye right channel ye Figure 20 Result file testSI_batch_in_out txt of a speech intelligibility batch processing Batch processing speech quality An example control file of the speech quality batch processing 1s shown in Figure 21 SES e eee eee eee geese gs SIGNAL SELECTION SEPP ee ee eee ee eegee es TEST SIGNAL DIR batchsO test REFERENCE SIGNAL DIR batehso ref IR DIR AO O w AO Mw SE SSSPS LESS SAREE EE eS LEVEL 65 2 MODEL SELECTION E A A LALALA A 1111111111111191 PEMOO MaxLag 150 ISD Figure 21 Example file testSI_batch_in txt for speech quality batch processing For speech quality batch processing the only parameters to be defined are the level of the test signal and the objective measures you want to calculate There are optional parameters for the assessment of speech quali
3. You add the transfer function to the signal convolution by activating the check box below System on Room impulse respones are supported as wave file wav as text file txt and as a MarLaBS file mat The internal sampling frequency of the SIP Toolbox is 44 1 kHz Signals and transfer functions differing from 44 1 kHz sampling rate are automatically resampled Single channel and dual channel signals are supported If you use single channel signals the channel will be dublicated to generate the second channel Signal and parameter selection System on Apply Speech Signal nonsense_sentence selected transfer functions Noise Signal speech shaped noise stationary y 7 room impulse responses Speech System anechoic frontal v Noise System anechoic left r Fix speech level Fix noise level 65 dB SPL Fix signal at selected Set level for level fixed signal Speech or noise signal Figure 2 Zoom of signal selection in the SIP Toolbox The calibration of the signals is based on the long term RMS value The SIP Toolbox calibrates the digital signals to sound pressure level in dB SPL using a reference pressure at 207 pascal To process speech and noise signals at a specified signal to noise ratio you have to define which signal speech or noise has a fixed level The level of the other signal is then adjusted according to the desired signal to noise ratio see Figures 2 and 3 Below the signal selection part s
4. cmd go to the subdirectory Hinstall in the installation folder and type hinstall i in the command line and press ENTER The installation will take some minutes After finishing the installation insert the USB dongle MS Winpows will now integrate the copy protection To install the SIP Toolbox software continue with the next step 1 4 2 Install SIP Toolbox Stand alone version without the use of Marias The installation package is delivered as a zip file with the program and installation files Unzip the files into any folder The structure of the unpacked folders is fixed PLEASE DO NOT MOVE OR RENAME any folder or file The unpacked zip file contains a folder named Installation To work with the stand alone version of the SIP Toolbox you need to install the MarLaB runtime library Therefore please execute the file MCRInstaller exe in the Installation folder You can choose any local directory for the MartaB runtime library After finishing the installation process please insert your USB dongle You can now start the SIP Toolbox by executing the file SIP_Toolbox _demo exe P coded function for Mattas 2008b or higher The installation packages is delivered as a zip file including the program files Unzip the files into any folder The structure of the unpacked folders are fixed PLEASE DO NOT MOVE OR RENAME any folder or file To work with the SIP Toolbox please start MarraB go to the folder
5. and in 13 As result value of 1 indicates a perfect quality according to the reference signal perfect match The value O indicates no match with the reference LLR Log Likelihood Ratio 12 p 40 Comparison of two windowed speech signals LPC analysis using the auto correlation Calculates the total correlation as a weighted sum of the correlations for each window width The closer the LLR results are to 0 the better is the quality rate ISD Itakura Saito Distance 12 p 50 Prediction of quality distance based on a windowed LPC analysis similar to LRR The closer the ISD results are to 0 the better is the quality rate LAR Log Area Ratio 12 p 233 Quality statement based on a windowed LPC analysis of the signal and determination the distance of the linear prediction reflexion coefficients The closer the LAR results are to O the better is the match with the reference WSS Weighted Spectral Slope Distance 12 p 56ff Separation of the test and the reference signal into 45 critical bands and determination of the intensities in every band To predict a quality value the model determines the weighted distances of the spectral slopes in each frequency band The closer the values are to O the better is the match with the reference 16 SNR Mean Segmental Signal to Noise Ratio 12 p 45 Determination of the segmental signal to noise ratio by calculating the SNR in overlapping parts of the signal followed by aver agi
6. model of the effective signal processing in the auditory system I Model structure J Acoust Soc Am 99 6 3615 3622 Hansen M and Kollmeier B 2000 Objective modelling of speech quality with a psycha coustically validated auditory model J Audio Eng Soc vol 48 5 395 409 Quackenbush S R Barnwell T P and Clements M A 1988 Objective Measures of Speech Quality Prentice Hall Advanced Reference Series Englewood Cliffs NJ ISBN 0 13 629056 6 Huber R 2003 Objective assessment of audio quality using an auditory processing model PhD thesis Universit t Oldenburg Hansen M 1998 Assessment and prediction of speech transmission quality with an au ditory processing model PhD thesis Universit t Oldenburg DIN 18041 2004 05 2004 Acoustic quality in small and medium sized rooms Deutsches Institut f r Normung Zwicker E 1961 Subdivision of the audible frequency range into critical bands J Acoust Soc Am 33 26 17 Moore B C J and Glasberg B R 1983 Suggested formulae for calculating auditory filter bandwidths and excitation patterns J Acoust Soc Am 74 750 753 27
7. where you unzipped the files insert the USB dongle and start the file SIP_Toolbox _demo p in MATLAB 2 Getting started The SIP Toolbox is a modularly structured software which offers an easy handling via a graphical user interface After starting the software the main window is opened as shown in Figure This main window 1s organized in two main sections the signal selection and parameter setting on the left and the module and model selection on the right side Menu bar Preferences Load Save Hagemann Olofsson RA ies ye Audiogram Speech Intelligibility Prediction Toolbox Fraunhofer Module etc ignal and parameter selection me selection Ah E o F eneysle Intelligibility Speech Signal speech shaped noise Loudness l y Noise Signal speech shaped noise stationary Quality Signal selection O eee IR analysis Speech and noise signals s Transfer functions een Room impulse response Speech Signal Noise Signal SpeechIR Noise IR Mixed Signal Single SNR Range of SNRs Cond 1 Cond 2 Cond 3 Cond 4 Cond 5 Pa Pa Piot Visualize and J auralize A RASTI lett a 9g Lets tos i tots right selected signals E STITEL left Plot id 2 BINS Plot T 4k 8k in Frequency Hz Speech Intelligibility Prediction Toolbox Information 02 Sep 2010 10 40 40 window Status Warnings Errors Figure 1 Overview of the SIP Toolbox The menu bar offers a quick access to global functi
8. 1 3 Batch Processing Batch processing speech intelligibility The batch processing within the SIP Toolbox is based on a control text file For every module there is an example text file for speech intelligibility it is called testSI_batch_in txt Here you define parameters for the batch processing Figure 19 shows the structure of this text file to perform a speech intelligibility batch processing 21 RLLIIIIIIAIIIIII A SIGNAL SELECTION 3 13111531131531353131355535 3 SPEECH SIGNAL DIR bat ch speech NOISE SIGNAL DIR Abat ch noise SPEECH RIF DIR NOISE RIR DIR LILIA PARAMETER SELECTION 3 LIL FIXSIGMAL NNNN FIXLEVEL 65 SNR 3 0 3 6 2444444344444444 344444444 MODEL SELECTION SSS ST SSS TEST STE SEE TEE SII BINSII Figure 19 Example file testSI_batch_in txt for the speech intelligibility batch processing N Noise S Speech The file structure includes three parts SIGNAL SELECTION Here you have to enter the paths of the signals speech and noise and op tional impulse responses You can use absolute or relative paths To set a relative path please use following syntax HOME The number of signals in each folder must either be 1 or N For example if you use eight different speech signals 01 speech 02_speech 08_speech and eight different noise signals 01 noise 02_noise 08 noise then the selected model predictions are calculated for the eight pairs It 1s therefore i
9. Loudness Speech Quality IR analysis Models Al STITEL Sil Y BINSII STI RASTI Start the calculation Figure 8 Zoom on the implemented measures within the speech intelligibility module SII Calculation of the intelligibility according to the speech intelligibility index principle ANSI S3 5 1997 3 The SII is an updated version of the articulation index splitting up speech and noise signal into 21 critical bands frequency bands by Zwicker 16 and weighting the signal to noise ratio in each band The weighting function depends on the used speech material In this implementation the weighting function SPIN for speech in noise is used Roughly SII values above 0 75 indicate good speech intelligibility and SII values below 0 45 are related to poor intelligibility BINSII Binaural extension of the SII according to BEUTELMANN amp BRAND 5 It uses both ear signals to predict the intelligibility of speech This model is able to predict a binaural benefit in conditions with spatially separated speech and noise sources STI Calculation of the intelligibility according to the speech transmission index principle IEC 60268 16 4 using 7 octave bands and a weighted rating of the signal to noise ratio in each band Additionally the STI accounts for time domain distortions for example reverbera tion by calculating the modulation transfer function Here the modulation transmission in each octave band is determined for 14 different m
10. Speech Intelligibility Prediction Toolbox An easy way for modeling intelligibility and quality of speech October 8 2010 User Guide H Fraunhofer IDMT Fraunhofer Institute for Digital Media Technology IDMT Project Group Hearing Speech and Audio Technology Oldenburg Telephone 49 441 2172 433 Email sip toolbox idmt fraunhofer de Contents 1 Introduction Ll About IP TOOIDONR sara o a A A AAA o ES 1 2 AREQUIESMEDES ia dal a a E EA a ai ra ES DEMO VETO as ai O lo do SS TE TS A ri a dr EE ARSENIO Loi A ete e DA E ra is ee E 1 4 1 Install PEMO Q copy protection 14 2 Install SIP T00lbOX v e ir a de a de Getting started Signal selection Signal generation to evaluate signal processing algorithms Hagerman Olofsson 4 1 Batch processing using the method of Hagerman amp Olofsson Module selection 5 1 Module Speech Intelligibility se s s a u ros tos kuenen fosei ire E 5 1 1 Calculation of speech intelligibility ZS Module Loudness s r 2 24 4 ad 4 44 woo e Rae eee deeb hOSS SS S21 AESUANESS PrEGICHON sse reri dd te A ES ERS SSH 55 Module Speech Quality lt racho 540 da o be ee ere ee BoE a 5341 Select the reference signal 2 446444 d p34 4d we eed Bee ea E 5 3 2 Assessment of speech quality 000000048 5 4 Module Impulse Respone Analysis 0 0002 2 ae 5 4 1 Perform room impulse response analysis
11. a statistically expected hearing loss is calculated and displayed in the audiogram After having set the audiogram the new values are automatically included in any subsequent model predictions for models which include the hearing threshold e g the SID 6 3 Menu item Binaural Options Set Binaural Configuration After calling the main menu item Set Binaural Configuration a new window is opened as shown in Figure 24 This window offers a visual way to configure different acoustical situations The graphical binaural configuration is directly linked to the transfer function selection of the SIP Toolbox main window and will only work with the provided transfer functions anechoic cafe teria and office The binaural configurations can be saved and loaded again into the software i E Directional Settings System anechoic System anechoic Speech direction Noise direction dO o frontal 99 66 Figure 24 Graphical binaural configuration for the provided transfer functions anechoic cafeteria and office within the SIP Toolbox 25 References 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Hagermann B and Olofsson A 2004 A method to measure the effects of noise reduction algorithms using simultaneous speech and noise Acta Acoustica Unite
12. arity German Deutlichkeitsmaf Cs 9 Klarheitsmafi Cgo The clarity is an objective measure which is used to characterize the transparency of speech Cso or musical presenta tions Cgo as the energy ratio of the first 50 or 80 milliseconds to the overall energy of the room impulse response It is defined by XX MS CLA 10 1 Ce dB IO SS a f h t de XX MS 6 where xx is the early time normally 50 ms for the clarity of speech and 80 ms for the clarity of musical presentations Desired values for a good clarity are Cso gt 0 dB for speech and Cgo 0dB 4 dB for music CT Center time German Schwerpunktszeit The center time 1s an objective measure used to characterize the room for the purpose of musical presentations It is defined by 0 f thO dt y har 7 An appropiate value for the center time is CT 120 ms 30 ms DRR Direct to Reverberation Ratio This objective measure discribes the energy ratio between the intensities of the direct sound and reverberation in dB It is known to be an important acoustic cue for sound source distance perception 5 4 1 Perform room impulse response analysis Like in all other modules of the SIP Toolbox you can start the impulse response analysis by press ing the button Compute see Figure 18 The results are displayed for each objective measure and condition Every calculation is set as a new condition and the results are displayed in the corresponding colu
13. berg amp Moore 2002 Time varying loudness TVL 8 The method is appropriate for time variant sounds The signals are resampled within the algorithm to 32 kHz Glasberg amp Moore 1997 A much older but well known loudness model of Glasberg amp Moore 9 to predict the loudness sensation of time stationary sounds All loudness models are calibrated such that a two channel sinusoidal signal of 1 kHz at 40 dB SPL results in loudness values between 0 98 lt N lt 1 01 Sone depending on the loudness model 14 Speech Intelligibility Loudness Speech Quality IR analysis Models Choose Test Signal Y DIN 45631 A1 2010 DIN 45631 A1 Speech Noise Select a signal and start the analysis by pressing Compute Loudness model selection Y DIN45631 150532B DIN 45631 Glasberg amp Moore 2002 GM02 O Mixed signal y Glasberg amp Moore 1997 MG97 Figure 13 Zoom of the SIP Toolbox loudness module The module currently contains four different loud ness models two models are standardized 5 2 1 Loudness prediction To predict the loudness sensation for the selected signal press the button Compute see Fig ure 13 The result will be displayed as the overall loudness for both ears For a mono signal there will be only one result the overall loudness for both ears is divided by 2 The standard unit for the loudness sensation is Sone It is possible to switch the loudness unit to Phon or to Catego
14. d With Acustica Vol 90 356 361 ANSI S3 5 1969 1969 Methods for the calculation of the articulation index American National Standards Institute ANSI S3 5 1997 1997 Methods for calculation of the speech intelligibility index Amer ican National Standards Institute IEC 60268 16 2003 Objective rating of speech intelligibility by speech transmission in dex International Electrotechnical Commission 3 rue de Varemb PO Box 131 CH 1211 Geneva 20 Switzerland Beutelmann R and Brand T 2006 Prediction of speech intelligibility in spatial noise and reverberation for normal hearing and hearing impaired listeners J Acoust Soc Am 120 1 331 342 DIN 45631 Al 2010 Calculation of loudness level and loudness from the sound spectrum Zwicker method Amendment 1 Calculation of the loudness of time variant sound Deutsches Institut f r Normung DIN 45631 ISO 532B 1991 03 Calculation of loudness level and loudness from the sound spectrum Zwicker method Deutsches Institut f r Normung Moore B C J and Glasberg B R 2002 A model of loudness applicable to time varying soands Journal Audio Engineering Society 50 5 331 334 Moore B C J Glasberg B R and Baer T 1997 A model for the prediction of threshold loudness and partial loudness Journal Audio Engineering Society 45 4 224 240 Dau T P schel D and Kohlrausch A 1996 A quantitative
15. d a minimum screen resolution of 1280 x 1024 pixels 1 3 Demo version A demo version of the SIP Toolbox is available as a p coded function for MarLaB version 2008b and higher Additionally a stand alone version of the SIP Toolbox without the use of Mar LABS is available The demo version contains all modules and functions of the full version except for the following limitations e No processing and assessment of user selected signals e Saving and loading of results are disabled Limited selection of predefined audio signals e Batch processing 1s disabled e PEMO Q speech quality model see 5 3 1s disabled e No hearing impairment can be included in the models If you have any questions about the demo version of the SIP Toolbox or if you are interested in further information please do not hesitate to contact us 1 4 Installation the installation may depend on the features of your SIP Toolbox If you are using a demo version or a version without the speech quality model PEMO Q please continue with section 1 4 2 If your version contains PEMO Q you have to install the copy protection of this module first as described in section 1 4 1 PLEASE DO NOT insert the copy protection before you have installed the driver 1 4 1 Install PEMO Q copy protection To install the PEMO Q copy protection driver please unzip the installation files in any folder Then open a windows command line Winpbows START RUN gt
16. e See E ESE ee eee gag gages PARAMETER SELECTION See Re Re eR ee ee FIESIGMAL 3 FIELEVEL 65 SNR 3 2368 5 4005 See Re eR ee ee MODEL SELECTION z SEP SSeS ee eee ee eee eee Figure 7 Batch text file HandO_Batch_processing_SpeechIntel_1 txt to evaluate the speech intel ligibility with presettings created by the batch processing according to Hagerman amp Olofsson 5 Module selection The SIP Toolbox contains different modules You can customize your SIP Toolbox by selecting the modules needed for your application Currently the following modules are available e Speech Intelligibility e Speech Quality e Loudness e Impulse Response Analysis Each module contains several models and objective measures These models offer the opportunity for comparative evaluation of the selected signals and communication situations 5 1 Module Speech Intelligibility The module Speech Intelligibility offers several objective measures to assess the intelligibility in a communication situation as shown in Figure 8 The models are described in the following AI Calculation of the intelligibility according to the articulation index principle ANSI S3 5 1969 2 Speech and noise signals are split up in 20 frequency bands Kryter JASA 1969 In every band the signal to noise ratio is evaluated and combined with a weighting function The weighting function is independent of the speech material 11 Speech Intelligibility
17. e results the SIP Toolbox provides the possibility to plot all processed calculations in a separate figure by using the button Plot Furthermore it is possible to display all parameters from previous calculations like signal names or SNR values by moving the cursor over the text cond above the displayed results see right side of Figure 17 17 Reference Signal Quality Scores Reference Signal Quality Scores Cond 1 Cond 2 Cond 3 Cond 4 Cond 5 Cond 1 Cond 3 Cond 4 Cond 5 Test signal nonsense_sentence Plot PEMOQ 1 Reference signal unprocessed Plot Test system Level of test signal 65 0 dB SPL Plot PEMOQ_2 Plot ISD 60 318 2 970 Plot ISD 2 750 35 303 2 750 Plot LAR 8 546 9 015 Plot LAR 8 645 10 020 Plot LLR 1 479 2 381 Prt Show results of all conditions 0 767 0 603 Plot Result presentation for different conditions Browse through results Figure 17 Detailed view on the result presentation of the speech quality module left side To display all set parameters used during the calculation move the mouse cursor over the corresponding word cond right side The SIP Toolbox provides example files to test the speech quality module The reference signal is called unprocessed wav and the test signal is integrated by the name processed wav Both signals are located in the main folder or already included in the pop up menu of the signal selection demo version For more detailed information abou
18. for loudness models for time varying sounds 5 3 Module Speech Quality The module Speech quality contains several models and objective measures as shown in Fig ure 15 All models require a reference signal You can select which signal speech noise or mixed signal to be evaluated by comparison to the reference signal All implemented speech quality model are described in the following Speech Intelliqibility Loudness Speech Quality IR analysis Models Set model Choose Test Signal parameters 4 7 PEMOQ 7 wss Speech Select signal to 7 J i i ISD SNR Noise be evaluated in Y LAR Mixed signal comparison to Y LLR reference signal Compute Figure 15 Zoom on the implemented models within the speech quality module PEMOQ_1 Model to predict speech quality based on the Oldenburger perception model 10 11 using a linear cross correlation Within the model the option SpeechQual is used You will find further information about PEMO Q and the parameters in the PEMO Q manual and in 13 As result a value of 1 indicates a perfect quality according to the reference signal perfect match The value O indicates no match with the reference PEMOOQ 2 Model to predict speech quality based on the Oldenburger perception model 10 11 using a linear cross correlation The difference to PEMOQ_1 is the usage of the option AudioQual You will find further information about PEMO Q and the parameters in the PEMO Q manual
19. h special parameters to control the batch processing Figure 6 shows an example text file testHandO0_SigGen batch in txt to initialize the batch processing The signal paths for SplusN und SminusN have to be written correctly in the text file It is possible to set an absolute path or a relative path To set a relative path please use following syntax In addition you have to name directories where the extracted speech and noise signals will be stored All steps will be explained more precisely in the following A A A A A A a PSeSeS 5555555555555 5555555 A ee ee A ee a A s r r Measurement signals for speech noise and speech noise SPEECH PLUS NOISE SIGNAL DIF Signals processed SplusN SPEECH MINUS NOISE SIGNAL PIR Signals processed SminusWN 2 target directories to save resulting Speech and noise signals SPEECH TARGET DIR signals Hand speech NOISE TARGET_ DIR signals Hando noise Figure 6 Example text file testHandO_SigGen_batch_in txt to control the batch processing accord ing to Hagerman amp Olofsson Step 1 Separate the processed signals according to SplusN and SminusN and place the files in the correct folders Name the files consistently for example by numbering them Step 2 Open or create text file for instance testHandO_SigGen_batch_in txt and insert the paths of the source and result files Please use the correct syntax according to the example text file Step 3 To start the batch processing
20. he opportunity to extract the speech and the noise signals from a large collection of processed signals batch processing according to the scheme by HAGERMAN amp OLorsson Therefore you need the processed signals speech plus noise _SplusN wav and speech minus noise _SminusN wav The functionality and the operating of this batch pro cessing is described below The software package contains four different example signals to illustrate batch processing These example signals are located in the folder signals processed_ They are composed of speech and white noise at different signal to noise ratios and have been processed by two noise cancella tion algorithms folder Signals processed_SminusN 01_proc_speech whiteNoise_0OdBSNR_SminusN_algol wav 02_proc_speech whiteNoise_ 5dBSNR_SminusN_algo2 wav folder Signals processed_SplusN 01_proc_speech whiteNoise_0OdBSNR_SplusN_algol wav 02_proc_speech whiteNoise_ 5dBSNR_SplusN_algo2 wav These signals are already separated into SplusN and SminusN and placed in the respective folder For the batch processing 1t 1s important that all signals are stored in the correct folder and named in a consistent order e g by numbering The batch processing to extract the speech and noise signal from processed signals can now be started in the menu item Setup Batch Process ing Hagerman amp Olofsson Signal Generation Before it is necessary to create a text file wit
21. ignals and transfer functions are visualized in the time and fre quency domain Figure 3 illustrates this for the mixed signal consisting of speech and noise The panels provide a comfortable way to switch between the visualization of the different signals speech noise mixed signal or transfer functions As shown in Figure 3 for the mixed signal you are able to set the signal to noise ratio in dB using the slider Additionally you can define which channel of the signal should be displayed This part of the SIP Toolbox also allows you to listen to the signals For playback the signals will be amplified or attenuated to a full scale RMS value of 25 dB to ensure distortion free listening This adjusting 1s independent of the visualization Signal selection for visualization and playback Speech Signal Noise Signal SpeechIR Noise IR Mixed Signal Speech and noise signals Transfer functions RIRs amplitude power dBFS 125 500 1k 4k 8k Frequency Hz y Stop Show left Adjustment of Playback the or right signal to noise ratio selected signal channel for mixed signal in dB Figure 3 Visualization of the mixed signal speech and noise in the time and frequency domain 4 Signal generation to evaluate signal processing algorithms Hager man amp Olofsson This section describes the possibility to evaluate the effect of signal processing algorithms for instance algorithms used in hearing aid
22. mn At maximum it is possibly to show 50 result values before it is necessary to save and or clear the result display For a better presentation and to compare the results the SIP Toolbox provides the possibility to plot all processed calculation in a separate figure by using the button Plot Furthermore it 1s possible to display all parameters of previous calculations like signal names or SNR values by moving the cursor over the text cond above the displayed results 20 6 Main menu bar The main menu bar offers an easy and quick access to the global function within the SIP Toolbox All menu items are described in the followings 6 1 Menu item Setup 6 1 1 Load The menu item Load offers the possibility to load and display saved results of previous predic tions Additionally it is possible to load custom signal lists see next section 6 1 2 Save The menu item Save offers the possibility to save all kinds of data generated with the SIP Toolbox Data Save results of the different modules and objective measures Signals Save signal speech noise or mixed signal as a dual channel wave file according to the displayed time structure Custom information It is possibly to save a custom signal list If this list is loaded in the SIP Toolbox again the custom signals will be listed and integrated in the signal selection This menu item also allows you to save information window entries as text file 6
23. mportant that you name the files consistently for example by numbering them If you place only one speech signal in the speech folder this signal is used with all eight noise signals PARAMETER SELECTION To evaluate the speech intelligibility you have to decide which signals are fixed at a defined level Similarly you have to define a signal to noise ratio for every signal combination As for the signals the number of parameters has to be equal to the number of signals If you want to use just one parameter value for all signal pairs it is only necessary to set one value MODEL SELECTION Define here the objective measures and models you want to work with To start the batch processing select the item Setup Batch Processing Speech intelligibility from the menu bar Now there is a dialog to select the batch processing text file Select the text file for instance testSI_batch_in txt and the assessment of the speech signal in noise will start You will be informed about the completion on the screen The results will be also saved in a text file labeled like the start file with the name extension out txt Every batch processing is separated by a date and time stamp Figure 20 shows an example of a result file of a speech intelligibility batch processing 22 Ps BATCH MODE SpeechIntel Results computed from testSI batch _ in txt Time stamp 24 Mar 2010 14 40 51 AI 0 077 0 077 0 171 0 271 left channel
24. ng across segments 5 3 1 Select the reference signal The implemented models for the assessment of speech quality are based on a comparison between test signal and reference signal Without a reference signal the prediction of the test signal quality is not possible Therefore if you want to proceed without a reference you will receive a warn ing The selection of the reference signal is the same as for the test signal using a pop up menu illustrated in Figure 16 Reference Signal Quality Scores k E a E o Visualization of the reference signal power dBFS 500 1k 4k 8k Frequency Hz Reference Signal unprocessed Selection of the Listen to reference signal the signal Figure 16 Selection of the reference signal for the speech quality module The assessment of speech quality is based on a comparison between test and reference signal 5 3 2 Assessment of speech quality Like in all other modules of the SIP Toolbox you can start the speech quality assessment by press ing the button Compute see Figure 15 Make sure you have selected a reference signal The results are displayed for each selected quality model and condition as illustrated in Figure 17 Every calculation is set as a new condition and the results are displayed in the corresponding col umn At maximum it is possibly to show 50 result values before it is necessary to save and or clear the display For a better presentation and to compare th
25. odulation frequencies In the SIP Toolbox the modulation transfer function is determined from the selected room impulse response RIR Schroder 1981 The classification of the STI is shown in Figure 9 POOR FAIR GOOD EXCELLENT Figure 9 STI classification according to IEC 60268 16 STITEL Simplified version of the STI IEC 60268 16 using 7 octave bands but only one mod ulation frequency in each octave band TEL in STITEL stands for telecommunication sys tems RASTI Simplified version of the STI IEC 60268 16 using only two octave bands and four or five modulation frequency in each octave band respectively RA in RASTI stands for Rapid or Room Acoustics 12 5 1 1 Calculation of speech intelligibility There are two different ways to calculate intelligibility of speech using the SIP Toolbox You can assess the communication situation for one adjustable SNR Figure 10 or for an adjustable range of SNR values Figure 12 Prediction of the speech intelligibility for a single SNR value First you have to set an SNR value in dB using the slider Figure 10 To start the calculation using the selected models press the button Compute Single SHR Range of SNRs Cond 1 Cond 2 Cond 3 Cond 4 Cond 5 Al left 0 511 right 0 796 Sil left 0 494 right 0 752 STI left 0 554 right 0 850 RASTI left 0 620 Show results right 0 850 of all conditions in a figure STITEL left 0 554 9 right 0 851 BINSII 0 871 Browse through
26. ons like preferences and the load save menu The menu item Batch Processing allows an automatic processing of more than one signal Fur ther information about the batch processing in the SIP Toolbox is given in section 6 1 3 All menu items are explained in detail in section 6 You will find practical information about the item Hagerman amp Olofsson from the menu bar in section 4 This menu item allows you to generate special signals which can be used e g to eval uate noise reduction algorithms It is an implementation of the method described by HAGERMAN amp OLOFSSON 1 The left side of the main window is used to select the signals you want to work with and to set the acoustic parameters The signals are automatically visualized in the time and frequency domain On the right side you can select a module to evaluate the selected signals The modules are described separately in section 5 The information window at the bottom shows warnings and information of processed calculations of the SIP Toolbox This protocol can be saved using the menu bar 3 Signal selection The signal selection see Figure 2 allows you to load PCM audio files wav into the SIP Toolbox The pop up menus Speech Signal and Noise Signal provide a comfortable way to select predefined signals or to integrate your own signals Furthermore it is possible to add a transfer function in form of a room impulse response RIR to the selected signals
27. pace after the sound source is switched off Teo is the time required for the energy of a sound to decay by 60 dB after being switched off The algorithm calculates the time of a decay by 30 dB 5 to 35 dB using the energy decay curve and interpolates the result to a decay by 60 dB The unit of T6o 1s second as shown in Table 1 which contains some standard reverberation times see also 15 purpose Too s speech 0 8 1 4 symphonic music 1 1 2 6 organ music es es Table 1 Desired reverberation times for different purposes The times To depend on the volume and absorption properties of the room 15 DEF Definition German Deutlichkeit by Taree 1953 Generally our hearing does not perceive reflections with delays of less than about 50 ms as separate acoustical events In stead such reflections enhance the apparent loudness of the direct sound therefore they are often referred to as useful reflections The remaining reflections with longer delays are responsible for what is perceived as the reverberation of the room The relative contribution of useful reflections may be characterized by several parameters derived from the impulse 19 response h t One of them is the definition or Deutlichkeit given by a h t 2d DEF 100 5 h h de It can serve as an objective measure for speech intelligibility The definition should be above DEF gt 50 for a good intelligibility CLA Cl
28. rical Units using the menu item Preferences Apart from the overall loudness the specific loudness DIN 45631 and DIN 45631 Al and the dynamic loudness DIN 45631 Al and Glasberg amp Moore 2002 are calculated and displayed see Figure 14 Overall Loudness Specific Loudness Dynamic Loudness Overall Loudness Specific Loudness Dynamic Loudness Specific Loudness Sone Bark Loudness Sone 0 5 10 15 20 25 0 1 2 3 Z 5 6 7 8 Critical Band Rate Bark Time s 1 Y left 5 50 R Y left right gt Export figure Clear right Export figure Clear Figure 14 Zoom on the specific and dynamic loudness predictions from models appropriate for time varying sounds Specific loudness Within the implemented loudness models the input signal is divided into psychoacoustically mo tivated frequency bands like BARK 16 or the equivalent rectangular bandwidth ERB 17 The loudness is calculated in each band resulting in the so called specific loudness To obtain the overall loudness for a given time sample the specific loudness is added across bands The SIP Toolbox offers the possibility to display the specific loudness function see Figure 14 You can display different time frames divided in 1 steps from the total signal length using the slider above the specific loudness figure 15 Dynamic loudness The loudness vs time function is displayed in the panel Dynamic Loudness see the right side of Figure 14 only
29. s It is based on research results of HAGERMANN amp OLOFS SON 1 Firstly two mixed signals consisting of speech and noise are generated according to Ainlt s t n t 1 Din t s t n t 2 The difference between the two signals consists only in the opposite sign of the noise signal Both signals aj t and b t are now ready to be processed by an algorithm The results of the process are recorded and can be represented in simplified way by dou t s 0 n t 3 boult s n 4 From these two processed signals do t and boyr t the speech signal s t and the noise signal n t are extracted separatly Now it is possible to compare the processed speech signal s t with the original speech signal s t to evaluate the effects of the algorithm Analogously the method also works for the noise signals lFor restrictions and limitations of the method see 1 Step 1 Select the signals and transfer functions you want to work with according to the SIP Toolbox signal selection If you want to work with the mixed signal please set the desired SNR Mixed Signal Now generate signals according to a t and bj t from your se lected signals using the main menu bar Setup Hagemann amp Olofsson Generate test signals see Figure 4 You will receive a request if all settings and signals are cor rect If you answer with Yes you have to choose a file name and a storage loca
30. select the item Setup Batch Processing Hagerman amp Olofsson Signal Generation from the menu bar Now there is a request to select the correct text file Select the text file for instance testHandO_SigGen_batch_in txt and the extraction for the speech and noise signal will start You will be informed about the completion on the screen Now the extracted speech and noise signals are located in the respective folders for example e signals HandO_noise e signals HandO_speech These signals are now ready to be evaluated in the SIP Toolbox For convenience you can assess these signals also using batch processing The previously described signal generation according to Hagerman amp Olofsson additionally creates a new text file which contains ready set parameters to evaluate the speech intelligibility using batch processing Figure 7 shows this text file for the processed example files During the extraction the signal to noise ratio is determined and written to the text file Further information about the batch processing for the quality and intelligibility assessment of speech is given in section 6 1 3 10 BATCH MODE Hando SigGen signals computed from testHando SigGen batch in txt 2 Time stamp 31 Mar 2010 16 43 39 NN A A a A a a SIGNAL SELECTION E TT ELITE TELE TELE ETT TE SPEECH SIGNAL DIR eignals HandO speech NOISE SIGNAL DIR signals Handa noise SPEECH RIR DIR NOISE RIR_DIR Se
31. t With the menu item Reset the software will be restarted with default values Saved settings will be kept 6 2 Menu item Audiogram Set Audiogram After calling the main menu item Set Audiogram a new window is opened as shown in the left side of Figure 23 This audiogram visualization allows you to manually set any hearing loss for the standard frequencies of a pure tone audiogram via mouse input Additionally it is possible to create a tone audiogram according to ISO1999 1990 Another win dow is opened as shown in the right side of Figure 23 Using the age noise exposure time in years 24 Age y lt 80 Frequency Hz Set age 125 250 500 1k 2k exposure Displays the Noise fi a standard audiogram exposure Noise exposure level dB A Ime In years i 75 air transmission level in dB A B Audiogra SS me Som Gender male or female male i Percentile 0 05 0 10 0 90 0 95 S left ear E er percentile 05 ee l E EN E Set age and noise exposure Activate manual setting of hearing loss by mouse Select Ear Create Audiogram according ISO 1999 1990 Set back to 0 dB HL normal hearing Figure 23 The SIP Toolbox offers the opportunity to set an individual hearing loss according to the method ISO1999 1990 It is also possible to manually set any hearing loss via mouse input and the average expositions level
32. t the speech quality models see the referenced literature 5 4 Module Impulse Respone Analysis The module IR analysis contains several objective measures for the room impulse response RIR analysis as shown in Figure 18 Some of these measures are useful to do a first quick evaluation of speech intelligibilty in the room represented by the RIR and others indicate if a room is suitable for musical presentation The implemented RIR analysis techniques are described in the following 18 Speech Intelligibility Loudness Speech Quality IR analysis Models Choose IR v T60 DRR O Speech System DEF Noise System CLA E Set model parameters here the early time in ms IR Analysis Scores Cond 1 Cond 2 Cond 3 Cond 4 Cond 5 T60 left 1 02 0 67 0 67 0 00 Plot right 1 09 0 81 0 81 0 00 Result presentation DEF left 0 99 0 99 Plot for different conditions for both channels right 0 99 0 39 CLA left 17 27 15 04 15 04 Plot right 18 41 15 27 15 27 CT left 322 48 408 05 40805 42 76 Plot right 285 18 42512 42512 43 07 DRRieft 613 Plot right 8 65 Browse through o conditions lear Figure 18 Zoom on the implemented impulse response analysis module Within the module there are currently five different analysis functions available For the models DEF and CLA you can specify the early time in milliseconds as parameter T60 Reverberation time T 9 Reverberation is the persistence of sound in a particular s
33. testsig SminusN wav according to Do t There are to subsequent dialogs to select the processed file according to Speech plus noise and for the file according to Speech minus noise Finally the software extracts the speech signal s t from both processed signals do t and boyr t see 1 Similarly the noise signal n t is obtained To save these files please select a storage location and a file name Only one file name is necessary the second file 1s automatically generated By default the files are named like HagOlof_Processed_Signals_speech wav according to s t HagOlof_Processed_Signals_noise wav according to n ft After the saving is completed these files are automatically inserted in the SIP Toolbox as speech and the noise signal Now you are able to evaluate the files and compare the results with the assessment of the original files s t and n t You will find more precise information on all modules in the following sections SIP Toolbox 2010 Fra Audiogram Binaural Options Help Load Save Speech Intelligibility Batch Processing Hagerman amp Olofsson gt System on Preferences Reset Quit specoroystenr yurcen0ic left gt h shaped noise stationary z Noise System anechoic frontal v Figure 5 Menu item to process the measured signals according to dout t and bout t 4 1 Batch processing using the method of Hagerman amp Olofsson The SIP Toolbox also provides t
34. the results _ lt al SNR dB 3 Select SNR in dB 30 0 gt 20 0 Clear Clear results Figure 10 Calculation of the speech intelligibility for an adjustable signal to noise ratio SNR in dB It is possible to assess more than five conditions without clearing the results For the monaural models speech intelligibility is calculated and displayed separately for each channel Every calculation is set as a new condition and the results are displayed in the corre sponding column At maximum it is possibly to show 50 result values before it is necessary to save and or clear the result display For a better presentation and to easily compare the results the SIP Toolbox provides the possibility to plot all processed calculations in a separate figure by using the button Plot Furthermore it is possible to display all parameters from previous calcu lations like signal name or SNR value by moving the mouse cursor over the text cond above the displayed results see Figure 11 Prediction of the speech intelligibility for a range of SNR values To assess the speech intelligibility for a range of SNR values you can select the range and the calculation steps in dB see bottom of Figure 12 The calculations will start after pressing the compute button and the results are plotted as a function of SNR Within the result figure it is possible to mark values using the mouse cursor You can also export the figure for later use 5 2 Mod
35. tion Only one file name for example testsig is necessary the second file will automatically be generated The files are saved with a special name extension like e testsig SplusN wav according to aja t e testsig SminusN wav according to b t You will be informed about the successful completion in the information window SIP Toolbox 2010 Fra Audiogram Binaural Options Help Load gt Save Speech Intelligibility Batch Processing Hagerman amp Olofsson gt Generate test signals System on Preferences Process measured signals Reset Quit specorroystenr puree dic left ch shaped noise stationary v v Noise System anechoic frontal v Figure 4 Menu item to generate signals according to a t and bin t Step 2 Both signals testsig_ SplusN wav and testsig_SminusN wav are now available for the signal processing with an algorithm It 1s important that both signals are processed and the two results do t and boy t are recorded and saved separately to a file In the next step these processed signals are loaded in the SIP Toolbox to evaluate the effect of the algorithm Step 3 To add the processed signals to the software please use the function Process measured signals in the menu Setup Hagemann amp Olofsson Process measured signals shown in Figure 5 In the example the processed signals are processed_testsig SplusN wav according to doy t processed_
36. ty with PEMO Q which have to be defined in curled brackets as shown in Figure 21 For more information about these parameters see the PEMO Q manual 23 Batch Processing Hagermann amp Olofsson The batch processing according to the method of Hagermann amp Olofsson is explained in detail in section 4 6 1 4 Hagermann amp Olofsson The menu functions for the method of Hagermann amp Olofsson are explained in section 4 6 1 5 Preferences In the Preferences menu you can specify general parameters for your usage of the SIP Toolbox which will be kept after closing the program Here you can select which spectrum analysis to be used by the SIP Toolbox and set parameters like FFT Size and Window type You can also set parameters used in the module Loudness like leading and tailing zeros for the selected signals or specify the loudness unit PJ SIP Toolbox Preferences 0 x Spectrum Calculation Settings Method via FFT z FFT Size 4096 v Window Rectwin v Loudness Estimation Settings Overall loudness Dynamic loudness a Max Leading zeros 100 ms mM Manual Settings Tailing zeros 100 ms Percentile Ramps 20 ms Show Loudness in 4 Sone Phon Categorial Units Set Default Save Settings Figure 22 Preference menu of the SIP Toolbox with possibilities to define parameters used by the signal visualization and the loudness estimation 6 1 6 Rese
37. ule Loudness The module Loudness contains several models and objective measures as shown in Figure 13 These models offer the opportunity to comparatively evaluate the selected signals and communi 13 Single SHR Range of SNRs Cond 1 Cond 3 Cond 4 Cond 5 Al left 0 170 3 Speech signal speech shaped noise right 0 170 4 Noise signal speech shaped noise stationary Speech system Sil lett Noise system anechoic left che SNR 0 00 dB fn Level of speech signal 65 0 dB SPL STI left Plot right Figure 11 Display all set parameters used during the calculation by moving the mouse cursor over the corresponding word cond Single SNR Range of SHRs Mark specified index value in Show SRT with threshold 0 33 figure IA 4 SNR dB Set limits min max and steps of the SNR range Export to be analyzed Export figure or clear data Clear Figure 12 Calculation of speech intelligibility for a range of signal to noise ratios The range and calcu lation steps are adjustable cation situations The implemented loudness models are described in the following DIN 45631 A1 Calculation of loudness level and loudness from the sound spectrum Zwicker method Amendment 1 Calculation of the loudness of time variant sound 6 DIN 45631 ISO 532B Calculation of loudness level and loudness from the sound spectrum Zwicker method This method is only appropriate for time stationary sounds 7 Glas

Speech Intelligibility Prediction Toolbox User Guide

Contents

Download Pdf Manuals

Related Search

Related Contents