Home

openBliSSART User Manual

1. 0000009 NMD Component Refresh view 0000010 NMD Component Harmonic Figure 4 3 Feature extraction 4 7 4 Label Creation Labels can be created either by pressing the corresponding Create label button located at the user interface s lower left or by selecting the appropri ate item from the application s or context menu Creating a label automat ically inserts the new label into the tree view selects it and allows editing of the label s properties inside the edit area 4 7 5 Assignment of Labels to Classification Objects After a suitable set of labels has been created these labels have to be as signed to classification objects wherever appropriate Selecting a classifica tion object shows a list of all available labels inside the edit area One or more labels can be assigend by checking the corresponding checkbox and then saving this selection Figure 4 4 shows the selection of multiple labels for a particular classification object In order to determine which of the available labels satisfy the needs of a particular classification object one can use the application s preview feature so as to visually explore the sam ples or else playing them back Depending on the applications preferences the Preview checkbox is checked automatically If not either manually check that box to be able to explore the samples or select the corresponding option in the preferences dialog 4 7 BROWSER AT Select Edit
2. Ac H 3 15 where c W H can for example be set to squared Euclidean distance Eq or modified KL divergence Eq 8 7 First openBliSSART supports a straightforward approach introduced by 17 r 1 n c H Hj 3 16 A 3 16 To prevent the scaling from affecting the value of the cost function it normalizes the activations of each component j e g by their standard deviation estimates c 17 The multiplicative update rules for H minimizing the cost function 3 15 are derived as follows The gradient of the cost function is written as a subtraction Vc W H Vct W H Vc W H of element wise nonnegative terms Vct W H Ver W H AVct H and Vc W H Vc W H AVc H For Euclidean distance we have Vct W H W WH 3 17 and Vc W H WTV 3 18 For KL divergence we have Vo W H W71 3 19 30 CHAPTER 3 OPENBLISSART INTERNALS and Vc W H W V WH 3 20 For the sparseness term we have Ver H j ae 3 21 y Zi Hix and Ves lie Hs 242 337 3 22 7 Qk 1 H 3 The final multiplicative update rule is V W H H H 3 23 HT TA E OW H ae As a second approach to sparse NMF openBlisSART implements the algorithm from which is based on a cost function resembling Euclidean distance with a column wise normalized W matrix openBliSSART refor mulates the multiplicative update rules for enhanced performance WW WTW H j
3. Entities Additional infor Classification objects A V Preview 0000001 NMD Component Harmonic 0000002 NMD Component Unlabeled 0000003 NMD Component Harmonic 0000004 NMD Component Harmonic 0000005 NMD Component Harmonic 0000006 NMD Component Drum PARRA 0000007 NMD Component Harmonic 0000008 NMD Component Harmonic 0000009 NMD Component Harmonic 0000010 NMD Component Harmonic Play Pause Rewind 0000011 NMD Component Harmonic 0000012 NMD Component Harmonic 0000013 NMD Component Unlabeled a E 0000014 NMD Cor Sdectt 4 0000015 NMD_Comnonent Harmon amp Drum 1 Harmonic 2 pa Ei Label selection 0000018 NMI 0000019 NMI 0000020 NMI 0000021 NMI 0000022 NMI 0000023 NMI 0000024 NMI 0000025 NMI lt 2222254 Save Cancel HE SRR eS Figure 4 4 Assignment of labels to classification objects It is also possible to select one or more labels for multiple classification objects at once by means of the Select label item in their context menu In this case a dialog is shown which allows the selection of one or more labels The selected labels are assigned to each selected classification object Existing labels are not removed 4 7 6 Response Creation To create an empty response either press the Create response button located at the user interface s lower left or select the corresponding item f
4. D D Lee and H S Seung Algorithms for non negative matrix fac torization in Proc of NIPS Vancouver Canada 2001 pp 556 562 P Smaragdis Discovering auditory objects through non negativity constraints in Proc of SAPA Jeju Korea 2004 M Helen and T Virtanen Separation of drums from polyphonic music using non negative matrix factorization and support vector machine in Proc of EUSIPCO Antalya Turkey 2005 I H Witten and E Frank Data Mining Practical machine learning tools and techniques Morgan Kaufmann San Francisco 2005 S Young D Kershaw J Odell D Ollason V Valtchev and P Wood land The HTK Book version 3 0 Cambridge University Press 2000 B Schuller A Lehmann F Weninger F Eyben and G Rigoll Blind enhancement of the rhythmic and harmonic sections by NMF Does it help in Proc of the International Conference on Acous tics NAG DAGA 2009 Rotterdam Netherlands 2009 pp 361 364 DEGA SQLite database engine http www sqlite org download html February 2009 M N Schmidt and R K Olsson Single channel speech separation us ing sparse non negative matrix factorization in Proc of Interspeech Pittsburgh PA USA 2006 59 60 11 12 14 15 20 21 22 BIBLIOGRAPHY K W Wilson B Raj P Smaragdis and A Divakaran Speech de noising using nonnegative matrix factorization with priors in P
5. amp 0000014 NMD Component Unlabeled IS amp 0000015 NMD it Unlabeled 0000016 NMD Component Unlabeled Select labels Es 0000017 NMD Component amp 0000018 NMD Component Unlabeled 0000019 NMD Component Unlabeled amp 0000020 NMD Component Unlabeled Processes Responses Figure 2 3 Component playback in the browser You can also export the components as audio signals in the WAV format To this end select all of the components click the first then Shift click the last right click and a context menu as in Figure 2 4 will appear Select Edit Entities Classification objects 0000001 NMD Component Unlabeled 0000002 NMD Component Unlabeled 0000003 NMD Component Unlabeled 0000004 NMD Component Unlabeled 0000005 NMD Component Unlabeled 0000006 NMD Component Unlabeled 0000007 NMD Component Unlabeled 0000008 NMD Component Unlabeled 0000009 NMD Component Unlabeled 0000010 NMD Component Unlabeled 0000011 NMD Component Unlabeled 0000012 NMD Component Unlabeled 0000013 NMD Component Unlabeled 0000014 NMD Component Unlabeled 0000015 NMD Component Unlabeled 0000016 NMD Component Unlabeled Select labels 0000017 NMD Component Unlabeled 0000018 NMD Component Unlabeled 0000019 NMD Compo Preview PEREPERE EEH pH AE nia Import audio Create response Processes Create label Responses Extract fe
6. lt gt E m E lt gt lt gt pa o o lt a re Figure 2 1 Import audio dialog Click Add files then select an audio file from the demo wav folder of the openBliSSART source distribution For once use the parameters as shown in Figure 2 1 A progress window as shown in Figure 2 3 should appear The separation process can take several seconds depending on your hardware MM browser Figure 2 2 Progress display when importing audio Once the separation process has finished several items under the Classifi cation objects node in the browser tree view should have been generated Click one of them and it will be synthesized into an audio signal which you can play back using the buttons in the right part of the window 2 1 BASIC SOURCE SEPARATION 9 Select Edit Entities Additional informa Classification objects Y Preview 0000001 NMD Component Unlabeled amp 0000002 NMD Component Unlabeled 0000003 NMD Component Unlabeled E 0000004 NMD Component Unlabeled 0000005 NMD Component Unlabeled amp 0000006 NMD Component Unlabeled amp 0000007 NMD Component Unlabeled 0000008 NMD Component Unlabeled amp 0000009 NMD Component Unlabeled E 0000010 NMD Component Unlabeled 0000011 NMD Component Unlabeled E E E 0000012 NMD Component Unlabeled _ A A A El 0000013 NMD Component Unlabeled Play Rewind
7. 0000174 NMF Component Unlabeled 0000175 NMF Component Unlabeled 0000176 NMF Component Unlabeled 0000177 NMF Component Unlabeled FRED GE GE Gee O O O oe ED Unlabeled Import audio Create response Create label Extract features 000018 Export selected objects 0000 18 000018 Delete selected items 0000188 0000 18 Create response from these items 0000 19 Select label s 000019 000019 Refresh view 000019 zumprormen Figure 2 9 Activating the context menu for components 16 CHAPTER 2 TUTORIAL A dialog will appear that allows to add one or more labels to all selected components at the same time Select Music and click Ok then wait for the operation to finish Upon completion all selected components should show the label Music instead of Unlabeled in the second column By the way you can always refresh the tree view by either pressing F5 or selecting Refresh view from the application s View menu Repeat the above procedure for the remaining components yet this time assign the label Speech 2 2 3 Feature Extraction The next step towards creating a data set is to extract features from the created components Again this is very simple Just select Extract features from all data descriptors from the application s Database menu Another dialog will appear prompting you for the number of feature extraction tasks to start Remem
8. V is transferred back to the time domain using a column wise inverse IDFT using the phase matrix from the original signal Finally time signals for each source are obtained by adding up the time frames respecting their overlap Multiplication of the time frames with the square root of the Hann function can reduce the artifacts resulting from the transformation 5 3 2 SOURCE SEPARATION BY NMF 29 3 2 4 Source Separation by Supervised NMF Supervised NMF means that Thereby W is set to a predefined matrix where each column contains a spectrum corresponding to one of the sources For ex ample in speaker separation these spectra can be computed from phonemes uttered by a certain speaker 10 Then W is kept constant throughout the iteration whereas H is ini tialized randomly and updated iteratively Time signals for each source can be obtained using the procedure which was mentioned above setting Js Eq to the indices of columns of W that were initialized with spectra from source s This paradigm has led to notable results in speech denoising and speaker spearation 13 3 2 5 Sparse NMF The aforementioned cost functions measure the reconstruction error cp How ever for overcomplete bases i e r gt m n sparse NMF can be valuable whereby a term is added that increases the value of the cost function for each non zero entry in H hence dense matrices are penalized The resulting cost function c W H is c W H c W H
9. J E nn WH u Wij Wij WEHT else Lest 3 9 and that cg is non increasing under the update rules W1 V WH j1 WD H Hj P Lest Lyn 3 10 V WH H i j HT Wij Wij i jeg S lessee 311 where 1 is an all unity matrix and indicates elementwise division The above matrix formulation has been shown to yield better performance than the scalar product formulations in when using fast implementations of matrix multiplication as openBliSSART does Thereby the denominators are floored to a very small positive constant such as 1071 to avoid divisions by zero Note that these rules are applied alternatingly with each W update using the new value of H that was cal culated in the previous H update and vice versa Note that the order of calculation indicated by the parentheses in Eq 3 8 and Eq 3 9 can have a great effect on performance due to the different matrix dimensions 28 CHAPTER 3 OPENBLISSART INTERNALS 3 2 2 Initialization and Termination For conventional i e unsupervised NMF W and H can be initialized with the absolute values of random numbers drawn from a Gaussian distribution with y 0 and 1 or from a uniform distribution on the interval 0 1 openBliSSART uses the following stopping criterion for NMF wert Hatt _ WH1 W7H4 p with W1 and H denoting the values of W and H at iteration q respectively and being a small constant However evaluation
10. Sample frequency of the input file e Time at which the process was started Furthermore each process can have an arbitrary number of named param eters where the parameter value can be of any data type Data descriptors A data descriptor contains information meta data about a data object such as a vector or a matrix which is stored as a file The data descriptor entity has the following attributes 24 CHAPTER 3 OPENBLISSART INTERNALS e Data descriptor ID e ID of the process that created the data object e Type annotation in our system one of Gains vector Spectral vec tor or Phase matrix e Index in our NMF case the component index for gains and spectral vectors zero for phase matrices e Data availability flag Functions that need the binary data ignore data descriptors whose data availability flag is false This makes it possible to migrate the database to another computer without copying all the externalized binary data in a consistent way Besides the data descriptor ID the triple process ID type annotation index uniquely identifies a data descriptor Classification objects A classification object consists of several data ob jects described by data descriptors For example in our application we want to classify components generated by a NMF process which consist of a gains vector and a spectral vector A classification object has a unique ID and a type annotation in our case the only
11. Transformation options given on the command line override the correspond ing configuration options see 14 9 3 e w lt function gt window function lt function gt the window func tion to use in short time Fourier transformation Must be one of hann Hann function sqhann Square root of the Hann function hamming Hamming function or rectangle rectangle function The default is sqhann e o lt overlap gt overlap lt overlap gt overlap of windows given as a number from the interval 0 1 The default is 0 5 e s lt size gt windowSize lt size gt window size in milliseconds De fault is 25 e z zero padding perform zero padding before FFT such that the transformation size is a power of 2 4 1 4 Separation e m lt method gt method lt method gt The method to be used for com ponent separation As of the time of writing this option exists only for extensibility reasons and has no effect 4 1 SEPARATION TOOL 37 c lt number gt components lt number gt The number of components which should be separated Default is 20 T lt number gt spectra lt number gt The number of spectra which should be computed per component If the number of spectra is 1 NMD is performed Default is 1 f lt name gt cost function lt name gt The cost function for NMF NMD The following strings are valid ed Euclidean dis tance k
12. a support vector machine trained on the response we have defined in the previous steps and eventually create audio files by summing up all components for each class and transforming them back into the time domain i e re synthesizing the results into an appropriate number of files depending on the number of distinct classes that the given response uses Thus the command line for an arbitrary input file file wav is as follows septool c20 s60 11 v file wav Again it is assumed that our response has ID 1 The v volatile option has been added here because we do not want to store additional components from the given input file file wav into the database The result of this procedure will be two wave files namely file Speech wav and file Music wav Of course you can replace file wav by any suitable WAV or MP3 file Try mixing speech and music 20 CHAPTER 2 TUTORIAL together and then separating them using the separation tool like described above Congratulations you have just finished openBliSSART s introduc tory tutorial For an in depth discussion of openBliSSART s fea tures and toolbox move on to the next sections Chapter 3 openBliSSART Internals 3 1 Data Organization openBliSSART s data storage consists of a SQLite database 9 in the db directory of the installation tree in conjunction with an archive of binary files in the storage directory The database stores information about the available objec
13. and or preview facilities will be provided on the user interface s right hand side the so called edit area Furthermore almost every item provides a context sensitive menu that shows up when the user presses the right mouse button on an item Select Entities Additional informati Classification objects 0000001 NMD Component Harmonic Ls 0000003 NMD Component Harmonic 0000004 NMD Component Harmonic 0000005 NMD Component Harmonic 5 0000006 NMD Component Drum Data descriptors E Gains N 0000013 5 percussiveness 0 2 0 0 0 6219 periodicity 35 240 5 0 7579 pf 0 0 0 0 5258 pl 0 0 0 1 659 Phase Matrix Spectrum Originates from nput 1981 01 0000007 NMD Component Harmonic 0000008 NMD Component Harmonic 0000009 MD Component Harmonic Figure 4 1 Example subtree expansion 4 7 1 Typical Workflow The typical workflow for supervised component classification in blind source separation includes e the import of audio files separated into components e the extraction of the related features e the creation of various labels with arbitrary precision e the assignment of one or more labels to selected classification objects e the creation of one or more responses and finally e the assignment of classification objects to one or more responses 4 7 2 Import of Audio Files Figure shows an example of the Import audio dialog This dialog can be displayed eithe
14. blissart audio remove_dc boolean See the remove dc option of the separation tool e blissart audio preemphasis See the preemphasis option of the separation tool e blissart audio reduce_mids See the reduce mids option of the separation tool 4 9 3 Transformation Options for the short time Fourier transformation can be specified in the configuration file blissart properties Some of these can be overridden in the Import audio dialog of the browser as well as by passing the corre sponding command line parameters to the separation tool In addition the short time Fourier spectrograms can be transformed in various ways as will be explained below e blissart fft windowfunction string See the window function option of the separation tool e blissart fft windowsize positive integer See the window size option of the separation tool 4 9 CONFIGURATION FILES 53 e blissart fft overlap double See the overlap option of the separation tool e blissart fft zeropadding boolean See the zero padding op tion of the separation tool e blissart fft transformations powerSpectrum If set to true converts the spectrum to the power spectrum default square e blissart fft transformations powerSpectrum gamma The expo nent for the power spectrum default 2 0 e blissart fft transformations melFilter If set to true applies a Mel filterbank to the spectrogram The number of Mel bands is controlled by
15. indicates whether the audio preview should be normalized in amplitude Default true browser processCreation costFunction the default NMF cost function 0 for KL divergence 1 for squared Euclidean distance De fault 0 browser processCreation maxlterations the default number of NMF iterations Default 100 browser processCreation numComponents the default number of NMF components Default 20 browser processCreation numThreads the default number of NMF separation threads Default 1 browser processCreation overlap the default overlap to use for Fourier Transformation and NMD NMF processes Default 0 5 browser processCreation windowFunction the default window function to use for Fourier Transformation and NMD NMF processes 0 Square root of Hann function 1 Hann function 2 Hamming function 3 Rectangle function Default 0 browser processCreation windowSizeMS the default window size in milliseconds to use for Fourier Transformation and NMD NMF pro cesses Default 25 Bibliography 1 10 D D Lee and H Seung Learning the parts of objects by non negative matrix factorization Nature vol 401 pp 788 791 October 1999 P Smaragdis and J C Brown Non negative matrix factorization for polyphonic music transcription in Proc of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics New Paltz NY USA 2003 pp 177 180
16. name gt can be one of W spectra first factor H gains second factor or WH both factors not the product The export format is controlled by the blissart separation export format configu ration option see Section 4 9 1 lt response gt classify lt response gt performs feature extrac tion on the separated components classifies them using training data from the given response and generates audio files for each class which are named like lt input_file name gt lt class_name gt wav L lt label gt preset label lt label gt during classification assigns the label with the given ID to the components which have been ini tialized by the I option instead of the class label predicted by the classifier 4 1 6 Usage Examples septool file wav Separates file wav into 20 components using the default NMF set tings and saves the components septool c30 s60 17 test wav Separates test wav into 30 components using a window size of 60 ms saves the components and classifies them using the response with the ID 7 Assuming this response contains classes Class1 and Class2 files named test_Class1 wav and test_Class2 wav are generated 4 2 FEATURE EXTRACTION TOOL 39 e septool v c30 s60 17 test wav Like the above except that separated components are not stored e septool n4 filel wav file2 wav file3 wav file4 wav file5 wav Separates the files filel wav to file5 wav using d
17. of the criterion is costly as the matrix product WH has to be computed and the previous values of W and H or the previous value of their product have to be stored Thus to reduce computational cost it is preferred to perform a fixed number of iterations Experience shows that 100 200 iterations ensure a small reconstruction error which is not significantly reduced by further iterations lt G 3 12 3 2 3 Supervised Component Classification In scenarios like speaker separation or drum accompaniment reduction sources speakers drums can often not be modelled by a single spectrum NMF based approaches in this area thus have to use a number r of compo nents which is larger than the number of sources Consequently an assign ment of the components to sources has to be made For the following discussion we formally define the j component of the signal to be the pair w h of a spectrum w W along with its time varying gains h H the subscript j denotes the jY matrix row openBliSSART uses the following approach to decide which components belong to which source First a Support Vector Machine SVM classifier is trained from the features in a response variable according to Section 3 1 After classification a magnitude spectrogram Vs for each source s can be computed let Js j wj hj assigned to s 3 13 be the set of indices of components assigned to source s Then Vs D wyhy 3 14 jEds
18. of the tool applies as for the separation tool 4 1 7 4 3 Cross Validation Tool The cross validation tool cvtoo1 performs stratified cross validation of a data set given by a response The following options can be specified on the command line h help displays information about command line parameters A echo print the base name of the application binary and its named command line options in long format with their parameters if given before executing C config lt filename gt use the specified configuration file prop erties format instead of the default one See section for details r lt id gt response lt id gt gives a response ID All classification ob jects that are assigned a label in this response are validated f lt n gt fold lt n gt gives the number of folds If 0 is given leave one out cross validation is performed The default value is 10 4 3 CROSS VALIDATION TOOL Al e t lt id gt train lt id gt gives a response ID for a training set instead of performing n fold cross validation e s shuffle shuffles the data set before validation i e randomly reorders the classification objects within the data set Of course this does not make sense for leave one out cross validation e fs lt algorithm gt enables automatic feature selection If algorithm is anova features are rated by their t test score only available for responses with two class
19. possible annotation is NMF component and furthermore a list of IDs of data descriptors that make up the classification object Finally for each classification object a preselection of possible class labels is stored For example a drum component could be labelled with Drum or more specifically with Snare drum Classification objects are subject to the following constraints e All data descriptors that make up the object must be created by the same process e Every type of data descriptor determined by the type annotation attribute may occur at most once Features A feature is a named value assigned to a data object for exam ple a cepstral coefficient of a spectral vector Thus the following attributes are required e ID of the data descriptor describing the data object e Feature name e g MFCC e Feature parameter e g the coefficient index in the MFCC case e Feature value Every feature of a data object can be uniquely identified by feature name and parameter 3 2 SOURCE SEPARATION BY NMF 25 Labels A label is a textual class label that can be assigned to classification objects In our case we could define the labels Drum Harmonic or more specific labels like Guitar or Snare drum Responses A response is an assignment of classification objects to labels Additionally every response has a response ID a name e g Drum vs Harmonic and a textual descrip
20. remember how many audio files of each class speech and music you have imported as this will simplify the next step 14 CHAPTER 2 TUTORIAL 2 2 2 Defining Classes Having imported the neccessary audio files we will now define the two classes Speech and Music by creating two corresponding labels Click the Cre ate label button in the lower left corner of the browser window A new label entry will be inserted under the Labels node of the tree view with its text defaulting to the current date and time Use the textfield on the right hand side to change the text to something more meaningful like Music then hit the Save button Repeat this step for the Speech label The Labels node should now look like in figure 2 7 Labels Music Figure 2 7 Two defined labels Next we assign these labels to the separated components which we just have created Try and select a component in the tree view and a preview as well as a list of our labels will then appear on the right hand side see figure MM browser Database Edit View Select Entities Additional information pa Classification objects Preview E 0000002 NMF Component Music E 0000003 NMF Component Music E 0000004 NMF Component Music 0000005 NMF Component Music 0000006 NMF Component Music 0000007 NMF Component Music amp 0000008 NMF Component Music 0000009 NMF Component Music 000001
21. there are now P 1 updates in each iteration one for each matrix W p p 0 and one for H In detail the update rules for minimization of c Eq B are given by cd p V H Wip e Wip OD i 1 m j 1 r 3 30 ADD P 1 TY W p Vis W p A A Vis t while ci Eq 3 29 is minimized by Hj lt Hy er ae O 3 31 I Mi DS E W p ij Wi En H Dya i 1 m j 1 r 3 32 D H is Pp E Weis Min Hj lt Hjt 5 gt Sr WO F test Lise 3 33 E Thereby V is the element wise division of V and A and the operator shifts the columns of its argument by p spots to the left introducing zeros in the rightmost p columns Furthermore the denominators are floored to a very small positive constant such as 10 10 to avoid divisions by zero 32 CHAPTER 3 OPENBLISSART INTERNALS Notice that the update rules for H were both obtained by first deriving an H update rule that takes into account only one W p then taking the average of these updates for all p 0 P 1 The value of the approximation A must be updated after execution of each update rule but openBliSSART reduces the computational cost for this step by the formulation introduced in 19 a P gt p gt A A W p H W p H 3 34 after update of each W p where W p denotes the value of W p before the update NMF can be regarded as a special case of NMD by setting P 1 the convolutive signal model as
22. well as the NMD update rules reduce to the linear signal model and NMF update rules respectively Besides NMD a sliding window NMF variant is supported by openBliSSART Here simply a matrix V pen is created from V by concatenating T subsequent columns of V into one column of the larger ma trix V Compared to NMD this method has the advantage that no special update rules are needed hence any algorithm for NMF can be immediately exploited 3 3 Source Separation by ICA ICA approaches the problem of blind source separation based on the as sumption that observed signals can be regarded as linear combinations of independent sources Hence the basic ICA model can be expressed in matrix notation as X A S 3 35 where X denotes the observed signals A is considered as the mizxing matrix and the S contains the signal sources Since both A and S are unknown ICA provides a solution by considering the signals as independent random variables and consequently the values of the signals at time t as random samples of these variables ICA makes use of the Central Limit Theorem in terms of assuming that due to the fact that X is a linear combination of the sources X eventually has a more Gaussian distribution than the original random variables in S Vice versa A has to be determined such that it mazimizes the non gaussianity of the original random variables in S in order to retrieve the independent source signals The FastICA alg
23. 0 NMF Component Music 0000011 NMF Component Music E 0000012 NMF Component Music S 0000013 NMF Component Music E 0000014 NMF Component Music 0000015 NMF Component Music 0000016 NMF Component Music 0000017 NMF Component Music amp 0000018 NMF Component Music S 0000019 NMF Component Music 0000020 NMF Component Music amp 0000021 NMF Component Music E 0000022 NMF Component Music 0000023 NMF Component Music 0000024 NMF Component Music 0000025 NMF Component Music E 0000026 NMF Component Music Es 0000027 NMF Component Music amp 0000028 NMF Component Music E 0000029 NMF Component Music 0000030 NMF Component Music 0000031 NMF Component Music 0000032 NMF Component Music 0000033 NMF Component Music 0000034 NMF Component Music 0000035 NMF Component Music E 0000036 NMF Component Music 0000037 NMF Component Music 0000038 NMF Component Music 0000039 NMF Component Music 0000040 NMF Component Music amp 0000041 NMF Component Music E 0000042 NMF Component Music 0000043 NMF Component Music 0000044 NMF Component Music Lo 60000045 NMF Comnonent Music v Import audio Create label Create response Figure 2 8 View of a classification object NMF component 2 2 SUPERVISED COMPONENT CLASSIFICATION 15 Should you want to listen to the
24. CA on the four given input files mix41 wav to mix44 wav and outputs the results in Weka ARFF format with the names bazi arff to baz4 arff e icatool prefix ext force shorter mp3 longer 1 2 mp3 Performs ICA on the three given input files one of which has less sam ples than the others The time signal of the delinquent is expanded by its expected value Output will be as WAVE audio files with the names ext1 wav to ext3 wav e icatool prefix reduced nsources 2 mix5 1 5 ogg Performs ICA on the five given input files mix51 ogg to mix55 ogg and output the results as WAVE audio files Before the actual application of ICA however the two principal signals i e the signals with the greatest variance and thus most information are selected amongst all available signals 4 9 Configuration Files Audio processing feature extraction classification and browser behavior can be fine tuned by means of configuration files in the Java properties file format Basically files in this format may contain option lines of the form lt option name gt lt option value gt as well as comment lines starting with which are ignored Boolean values can be notated as 0 false or 1 true respectively The configuration files reside in the etc directory of the installation tree 4 9 1 Global Options e blissart global mfcc count positive integer The number of Mel frequency cepstral coefficients MFCCs to compute Default is 13 e bliss
25. CHAPTER 3 OPENBLISSART INTERNALS openBliSSART applies NMF in the frequency domain by factorizing magnitude spectrogram matrices obtained by short time Fourier transforma tion STFT Thereby the signal is split into overlapping frames of constant size In speech processing it is common to use a frame size of 25 ms and an overlap of 60 corresponding to a frame rate of 10 ms Each frame is multi plied by a window function and transformed to the frequency domain using Discrete Fourier Transformation DFT with transformation size equal to the number of samples in each frame First openBliSSART provides the Hamming function for windowing 27k where T is the frame size and k 0 T Other window functions are the Hann ing function hey 08 Obes 55 3 3 and its square root which can be used for reducing artefacts resulting from the transformation Only the magnitudes of the DFT coefficients are retained and the frame spectra are put in the columns of a matrix Denoting the number of frames by n and the frame size by T and considering the symmetry of the coefficients this yields a T 2 1 x n real matrix The crucial idea behind NMF based blind source separation is to as sume a linear signal model Note that Eq B ljcan be written as follows the subscripts t and j denotes the t and j matrix columns respectively T Vew Y H W 1 lt t lt n 3 4 j 1 Thus if V is the magnitude spectrogram of a sign
26. Supervised and unsupervised NMF feature extraction e Data export ARFF 6 and HTK 7 formats In many places in this document and the applications NMF and NMD are used as synonyms The reason is that mathematically NMF is a special case of NMD The remainder of this manual is divided into three chapters Chapter 2 provides a brief introductory tutorial on how to use openBliSSART for typi cal blind source separation tasks Chapter 3 explains the data storage archi tecture and algorithmic concepts of openBliSSART in detail The manual is concluded by a detailed description of the openBliSSART toolbox its command line parameters and configuration options in Chapter 4 For detailed information about how to use the classes in the openBliSSART framework please consult the HTML or LaTeX documen tation in the doc directory of the openBliSSART source distribution which can be created using the doxygen utility CHAPTER 1 OVERVIEW Chapter 2 Tutorial This tutorial provides a brief introduction to the main features of openBliSSART First we will describe basic source separation that results in an audio file for each component Second we will move towards super vised component classification using a data set separating audio files into signals corresponding to classes like music and speech 2 1 Basic Source Separation In this section we will explain the basic steps needed for non negative matrix factorization NMF based sour
27. Transactions on Signal Process ing vol 57 no 7 pp 2858 2864 July 2009 J F Gemmeke and T Virtanen Noise robust exemplar based con nected digit recognition in Proc of ICASSP Dallas TX USA March 2010 A Hyvarinen New approximations of differential entropy for inde pendent component analysis and projection pursuit in Proc of NIPS Denver Colorado USA December 1998 pp 273 279 C Uhle C Dittmar and T Sporer Extraction of drum tracks from polyphonic music using independent subspace analysis in Proc of ICA Nara Japan 2003
28. al with short time spectra in columns the factorization from Eq represents each short time spectrum V as a linear combination of spectral basis vectors W with non negative coefficients H 1 lt j lt r When there is no prior knowledge about the number of spectra that can describe the source signal the number of components r has to be chosen empirically depending on the application 3 2 1 Basic NMF Algorithms A factorization according to Eq is usually achieved by iterative mini mization of a cost function c W H W H arg minc W H 3 5 W H 3 2 SOURCE SEPARATION BY NMF 27 In fact many variants of NMF only differ by their choice of a particular cost function The core of these functions is a measurement of the reconstruc tion error between the original matrix and the product of the NMF factors Thus a basic cost function is the squared Euclidean distance between V and WH ce W H V WH z gt V WH i 1 j 1 ij 3 6 where r denotes the Frobenius norm Another cost function consists of a modified version of Kullback Leibler KL divergence m v it log Wis V win 3 7 i 1 t 1 For minimization of either cost function openBliSSART implements the two algorithms by Lee and Seung 3 which iteratively modify W and H using multiplicative update rules It can be shown that ce is non increasing under the update rules W V jt l H H A A 3 8 jt Jt WTwW H
29. an be given This option can be repeated to export multiple objects or ranges 4 6 Cleanup Tool Because openBliSSART stores binary data in a filesystem directory which is physically independent of the database there exist some cases where or phaned binary files remain in the storage directory without a data descrip tor referencing them The purpose of the cleanup tool cleanup is to purge the storage direc tory of these files After execution it displays the number of files that have been deleted The s or simulate option can be used if no deletions should be performed but just the number of orphaned files should be printed 4 7 Browser The main purpose of the browser application is to facilitate the creation of data sets responses which can be used for classification of NMF compo nents in blind source separation It also supports playback of components displays component features and allows export of selected data sets to Weka 6 for a more detailed assessment of suitability The user interface has been designed with simplicity in mind i e having everything at hand where it might be needed or helpful Thus the database entities are displayed in a tree like view on the main window s left hand side Further information related to any entity can be displayed by simply expanding the corresponding subtree For an example refer to figure 4 1 44 CHAPTER 4 TOOLBOX Also when selecting a database entity edit
30. art global mfcc mfccO boolean Whether the first MFCC should be computed Default is true If this option is set to false and blissart global mfcc count is set to N MFCCs 1 through N 1 are computed e blissart global mfcc lifter double The parameter for MFCC liftering Liftering with parameter L means that the ith coefficient is multiplied with 1 L sin 2ri L i e if L 0 this procedure has no effect More information can be found in the HTK book 7 52 CHAPTER 4 TOOLBOX e blissart global mel filter high freq double the upper limit frequency of the Mel filter bank If this is 0 default the Nyquist frequency is assumed If this is larger than the Nyquist frequency an error is raised e blissart global mel filter low freq double the lower limit fre quency of the Mel filter bank Default 0 e blissart global mel_bands positive integer the number of Mel frequency bands to use for Mel filtering e g in MFCC computation e blissart global deltaregression theta positive integer The parameter 6 for the regression procedure which is used to compute delta and delta delta MFCCs More information can be found in the HTK book 7 4 9 2 Audio Preprocessing Audio preprocessing options can be specified in the configuration file blissart properties These are valid for the browser as well as the sepa ration tool but can be overridden by passing the corresponding command line parameters to the separation tool e
31. as WAV files Open a command line window change to the openBliSSART installation directory and type septool v p lt file wav gt The v option tells the tool not to write to the database hence the components will not be visible in the browser and the p option causes the components to be exported as WAV files Change to the directory where your input WAV file resides There should now be files named file_00 wav file_19 wav corresponding to the 20 components You can use them for the mixing process as described above As an exercise you can repeat the separation and mixing procedure using different parameters For once try the Squared Euclidean distance cost function that is available in the Import audio dialog instead of the default Extended KL divergence You can also choose other values for window size overlap window function etc The above septool command can be adjusted to select squared Eu clidean distance as cost function and to use a window size of 40 ms with the following options septool cost function ed s40 v p lt file wav gt You can also try different numbers of components in the Input audio dialog of the browser as well using the c lt number gt option of the septool Congratulations you have finished the first part of openBliSSART s tutorial 12 CHAPTER 2 TUTORIAL 2 2 Supervised Component Classification In this section we will consider supervised component classif
32. ature set given by the configuration file e cvtool r7 t8 Validates each object in the response with ID 7 using the response with ID 8 as training set 42 4 4 CHAPTER 4 TOOLBOX cvtool r7 fs anova m10 p Validates the response with ID 7 using stratified 10 fold cross valida tion using the 10 features which score best in a t test and outputs all objects which have been misclassified along with the classification probability Export Tool The export tool export exports objects usually NMF components in the storage to a file HTK or Gnuplot output format can be selected The following options can be specified on the command line h help displays usage information a all exports the data from all data descriptors of the given type t in the database p lt list gt process lt list gt exports data descriptors associated with the given process IDs Single process IDs or ranges x y can be given and must be separated with commata f lt format gt format lt format gt selects an output format Must be one of htk or gnuplot Default is htk c concat concatenates data descriptors of the same type so that only one output file per type is generated The type of concatenation column or row wise depends on the type of data descriptor Spec tra are considered column vectors hence concatenated column wisely conversely gains are considered row vectors hence c
33. atures Create response from these items Select label s Refresh view Figure 2 4 Exporting components as WAV files Select the Export selected objects as WAV item and in the appearing dialog choose a directory where you want to create the WAV files 10 CHAPTER 2 TUTORIAL The next step of this tutorial will show how you can mix these compo nents together using the free audio editor Audacity and manually subtract some components You can skip this part if you do not have and do not want to install Audacity and move to the Command line separation sec tion below 2 1 2 Manual Component Mixing Start Audacity and select Import audio from the Project menu Select all of the WAV files that you exported from the Browser in the previous step The 20 components should appear as signals below each other in the Audacity window as shown in Figure 2 5 clocks_1 File Edit View Project Generate Effect Analyze Help any TH HI we 45 3630 24 18 42 6 0 Sm 45 36 30 24 18 12 6 0 b a emam ole eee 7 7 30 40 50 6 0 70 80 9 0 Mono 44100Hz 32 bit float Mute Solo Lug E Mono 44100Hz 32 bit float ks E Mono 44100Hz 32 bit float Mute Solo 5 ecs 1 0 Drag the label vertically to change the order of the tracks Project rate 44100 Cursor 0 00 000000 min sec Snap To Off Figure 2 5 Mixing components in Audacity First listen to the
34. ber that if you have a multicore system you might want to set this number to the number of cores for max imum performance but usually feature extraction is done quite fast anyway After the feature extraction has completed expand one of the classifica tion object nodes and in turn also the Data descriptors node inside Three entries will appear Gains Phase Matrix and Spectrum The phase matrix is only used for conversion of components to wave files features are extracted from either one of the other two elements Open the nodes Gains or Spectrum A list of features with values will be displayed see figure B 10 The numbers inside the parentheses are feature parameters such as the MFCC index The meanings of the features are discussed in 8 2 2 SUPERVISED COMPONENT CLASSIFICATION 17 Select Entities Additional information a Classification objects Data descriptors 3 Gains 0000003 0 percussiveness 0 2 0 0 0 7519 E periodicity 35 240 5 0 3442 pf 0 0 0 3 667 pl 0 0 0 3 375 Phase Matrix E Spectrum 0000002 0 centroid 0 0 0 3517 mfcc 0 0 0 8 952 mfcc 1 0 0 11 3 mfcc 10 0 0 6 344 mfcc 11 0 0 6 837 mfcc 12 0 0 14 98 mfcc 2 0 0 22 09 mfcc 3 0 0 19 15 mfcc 4 0 0 23 21 mfcc 5 0 0 18 81 mfcc 6 0 0 39 57 mfcc 7 0 0 0 3739 mfcc 8 0 0 2 297 mfcc 9 0 0 17 84 noise likeness 0 0 0 0 6518 rolloff 0 0 0 3797 stddev 0 0 0 0 04065 Figu
35. ce separation You will need some music files preferably short segments 10s in WAV format A good choice is to use the WAV files from the demo wav directory in the openBliSSART source distribution for example Upon completion of this section you will be able to extract and listen to the components generated by NMF and synthesize WAV files for them In the first step we will use the Browser GUI application which can be found in the bin directory of the openBliSSART installation tree Upon starting the browser you will notice a tree view on the left hand side which at the first start contains only four entries nodes namely Clas sification objects Labels Processes and Responses For the purpose of this section only the Classification objects will be relevant The right hand side of the browser window is used to display and edit the objects you have selected in the tree view 2 1 1 Separation with the Browser Probably the easiest way to use NMF is via the Import audio dialog of the Browser which can be accessed using the corresponding button on the bottom of the left side panel 8 CHAPTER 2 TUTORIAL E import audio files Files Parameters wen Desktop openBliSSART demo wav docks Processing steps Fourier Transform NMF NMD Window function Square root of Hann function Window size ms 60 Cost function Extended KL divergence Overlap in
36. eens Bea gee ee Rac st es a 38 eee 39 4 2 Feature Extraction Tool 0 39 4 3 Cross Validation Tool o o 40 Eos Rn e ete Sek ee a en 41 4 4 Export Tooll o 0220020004 42 n ods Oo ve dee Oe ee ir aa 43 4 5 Audio Export Tool o o e 43 4 6 Cleanup Tool o o e 43 AT Browsen 2 fie eas oe Oe sa ae 43 4 7 1 Typical Workflow o 44 4 7 2 Import of Audio Files 44 hing a bee Ee Ee ee tea 45 pedante ia pd ed a 46 o 46 Da ee 47 o 47 ia dd er a 48 4 7 9 Browser Preferences 49 4 8 ICA Tool o o 00202 eee eee 49 st Rc BORO Pee oe ai ER 50 OS ed at gone A WE oe ah eg ere we 51 man He a a ey dees air ge 51 4 9 2 Audio Preprocessing 2 2 22 2 52 o ene Bea neh te Be Geet op dag Gres 2a 52 LR a a as cd 53 O 54 a a ee 56 4 9 7 Browser 2 2 2 oo Ho oo nn 57 Chapter 1 Overview openBliSSART is a framework and toolbox for Blind Source Separation for Audio Recognition Tasks Main features include e Component separation using non negative matrix factorization NMF and non negative matrix deconvolution NMD 4 e Component classification Feature extraction from components Creation of response variables assigning audio components to classes Assembly of audio files for different classes such as in drum beat separation e
37. efault settings and saves the components using at most 4 concurrent threads e septool v T5 c40 17 111 30 P L3 test wav Separates test wav by means of NMD into 40 components c 40 each consisting of 5 spectra T5 Thereby the first 20 are initialized using the classification objects with IDs 11 to 30 which must in turn be NMD components 111 30 Spectra are not updated during the iteration P Classification is done using the response with ID 7 17 where the first 20 components are assigned the label with ID 3 regardless of the classifier s decision L3 Nothing is written to the database v e septool v T20 c10 ptestcomp test wav Separates test wav by means of NMD into 10 components consisting of 20 spectra each The components are exported as WAV files with the prefix testcomp e septool v 11 20 P c20 export matrices H test wav Separates test wav into 20 components whose spectra are all prede fined in the classification objects with IDs 1 to 20 The gains matrix H is exported to a file e septool cost function kls y0 5 test wav Like the first example but using sparse NMF setting the sparsity parameter to 0 5 4 1 7 Multithreading vs Multiple Processes It is important to note that while there is an option to run mul tiple threads simultaneously from one single instance of the separation tool in this case only one user process is created by the operating sys tem starting multiple c
38. es Otherwise if algorithm is correlation features are rated by their correlation with their class label e m lt number gt max features lt number gt gives the maximum num ber of features that automatic feature selection should select The de fault value is 10 e v verbose enables verbose output see below e p prob estimates probabilities for SVM classification If this option is given verbose output is automatically enabled e dump lt prefix gt for each fold write the training and test data to an ARFF data file with the given prefix default prefix fold These files can be used to manually reproduce the cross validation result with the Weka 6 software If a response ID was specified the tool outputs the number of classifica tion objects that were validated the recalls for each class the mean recall as well as the overall accuracy Finally a confusion matrix for all classes in the response is printed If verbose output is enabled additionally a list of misclassified objects their ID their class label their predicted class label and if the corresponding option is given their prediction probabilities is printed Unless automatic feature selection is enabled the features to be used for classification are read from the configuration file see section 4 9 5 4 3 1 Usage Examples e cvtool r7 f3 Validates the response with ID 7 using stratified 3 fold cross validation and the fe
39. generated by NMF or NMD from an audio file Note that it is perfectly valid for a data descriptor to occur in relation to more than one classification object For example each classification object generated by a NMF process contains a reference to the phase matrix of the original signal so as to be able to re synthesize wave files from one or more components The phase matrix however is stored only once Each data descriptor is associated with a separation process with a unique ID These IDs can for instance be found out by looking at the pro cess listing in the browser application and are needed for component feature extraction as well as data export Features Responses and Labels Data descriptors relate to features which are used during classification A response assigns classification objects to labels Classification is done using features from the data descriptors that make up the classification objects in the response The browser 4 7 can be used to conveniently explore the database struc ture 3 1 1 Database entities A graphical overview over database entities and their relations entity relationship diagram is given by figure 3 1 3 1 DATA ORGANIZATION 23 Classification object Ye Response Figure 3 1 Entity relationship diagram of our database scheme Processes A process creates objects by processing an audio file It has the following attributes e Process ID e Name e Input file name e
40. gnments are not automatically stored Instead they have to be saved explicitly Figure 4 5 shows the described features MM browser Database Edit View Select Edit Entities 2 Additional information Classification objects Name Drum vs harmonic Labels Processes Description Enter description here Relations Classification objects to Labels 1 gt 2 1 NMD Component 0000001 Harmonic e 3 NMD Component 0000004 Harmonic 4 NMDComponent 0000005 Harmonic 5 NMD Component 0000006 Drum 6 NMD Component 0000007 Harmonic x Add CLO s by label Select label Import audio Create label Create response Save Figure 4 5 Assignment of classification objects to a response 4 7 8 Exporting Selected Objects If a selection of classification objects should be exported as audio files one can simply select the desired objects and choose Export selected objects as WAV via the corresponding objects context menu item When selecting this item a directory selection dialog shows up and allows selecting the destination directory for the exported files 4 8 ICA TOOL 49 4 7 9 Browser Preferences MM Preferences O Preview Process creation Feature extraction Window function Square root of Hann function Window size ms Cost function Extended KL divergence N Overlap Number of components Number of threads Maximum iterations Figu
41. he phase matrix of the original signal false otherwise 54 CHAPTER 4 TOOLBOX e blissart separation storage magnitudematrix true if the sepa ration tool should store the magnitude matrix of the original signal false otherwise default Usually this option should be disabled 4 9 5 Feature Extraction Feature extraction options can be found in the configuration file blissart properties Unless stated otherwise these options are boolean values which include exclude certain features in the feature set The avail able features and the default set by data descriptor type is shown in Ta ble Data descriptor Feature Default type Sampled MFCCs x Magnitude matrix 6 66 coefficients x Mean and standard deviation of 6 x 00 coefficients Mean MFCCs 0 12 x Sampled MFCCs 6 66 coefficients Mean and standard deviation of 6 00 coefficients Standard deviation Spectral centroid Spectral rolloff Noise likeness Dissonance Flatness Spectrum Standard deviation Skewness Kurtosis Gains Periodicity Peak length Peak fluctuation Percussiveness Table 4 1 Available Audio Features Note that for NMD spectra are actually spectrograms hence func tionals of MFCCs and the other features are computed mean standard deviation and sampled values of MFCCs can be computed The following options control feature extraction from magnitude matri ces 4 9 CONFIGURATION FILES 55 blissar
42. ication This is basically the procedure you did above but instead of manually mixing the tracks a classifier is used that assigns each component automatically This is exactly what the openBliSSART demonstrator for drum beat sepa ration does check it out in the demo subdirectory of the openBliSSART distribution if you have not yet done so In this tutorial instead of drum beat separation we will now use the sce nario of speech and music discrimination assuming that you have recordings available that correspond to each of these classes In the first step we will create a data set containing components from speech and music signals For this purpose we will again use the Browser GUI Upon completion of this section you will know what the background of the Labels and Responses is and where Classification objects got their name 2 2 1 Importing Audio Files To start with we will now import audio files and separate them into com ponents using NMF Simply click the Import audio button in the lower left corner of the browser window so that the corresponding Import audio dialog figure appears 2 2 SUPERVISED COMPONENT CLASSIFICATION 13 E Import audio files Files Parameters D Audio 01 Buggles Video Killed The Radio S Processing steps Fourier Transform NMF D Audio 02 David Bowie amp Queen Under Pre D Audio 03 John Lennon Woman wav Window function Square r
43. imum bpm beats per minute value to consider for periodicity e blissart features gains periodicity bpm max positive integer The maximum bpm beats per minute value to consider for periodic ity e blissart features gains periodicity bpm step positive inte ger The distance between the bpm values to consider for periodicity 4 9 6 Classification Classification options control SVM parameters and scaling They can be specified in the configuration file blissart properties 4 9 CONFIGURATION FILES 57 The type of kernel function that is used to build the SVM is given by the blissart classification svm kernel option Possible val ues include linear for linear functions poly for polynomials of higher degree rbf for radial basis functions and sigmoid for sigmoid func tions Default is linear The polynomial degree can be given by the blissart classification svm degree option which defaults to 3 The precision of the training procedure is controlled by the blissart classification svm epsilon option default le 3 Bias components i e one component that is always 1 can be added by settings the blissart classification addBias to true Scaling is controlled by the blissart classification scaling family of options e blissart classification scaling method minmax for linear scaling such that all values of one feature are in a given interval by default 1 1 musigma for linear scaling such that all val
44. imum number of iter ation steps Default is 100 I lt range gt init lt range gt Pre initializes the separation using the spectra of several classification objects specified as a range of classi fication object IDs range is a string of the form min max where min and max are IDs of classification objects This option can be repeated to specify multiple ranges If the number of initialization objects is smaller than the number of components randomized spec tra are added The option can be repeated to give multiple ranges of objects for initialization 38 CHAPTER 4 TOOLBOX P preserve preserves the initialization i e do not update it during iteration Nevertheless if the number of initialization objects is smaller than the number of components the additional randomized are updated in any case 4 1 5 Component Processing v volatile run in volatile mode i e components are thrown away after the tool terminates This only makes sense when either the classify or one of the export options are activated If the volatile option is not specified components are stored for later use export prefix lt prefix gt sets the filename prefix for export of components as WAV files or matrices p lt prefix export components exports the separated compo nents as WAV files with the given prefix export matrices lt name gt Export the separation matrices lt
45. l Kullback Leibler divergence 3 eds Euclidean dis tance with a sparsity constraint eds Euclidean distance with a sparsity constraint kls KL divergence with a sparsity constraint and finally edsn Euclidean distance with a sparsity constraint measured using normalized basis vectors as in 16 Default is kl Note that NMD i e 1 spectrum per component can only be per formed using the ed and kl cost functions y lt number gt sparsity lt number gt The sparsity parameter for the NMF cost function Only has an effect if either eds edsn or kls are selected as cost function N normalize matrices Normalize NMF NMD matrices such that the second factor has unity Frobenius norm g generator lt func gt Sets the generator function for initializa tion of the matrices gaussian for absolute Gaussian noise uniform for values uniformly distributed on the interval 0 01 0 02 or unity for every value equal to 1 Default is gaussian The unity genera tor makes the separation process deterministic and can hence be used for debugging purposes e lt number gt precision lt number gt The desired precision relative error in terms of Frobenius norm of the result If set to zero the maximum number of iteration steps is performed in any case Default is 0 i lt number gt max iter lt number gt The max
46. lissart features spectrum noiselikeness sigma The sigma standard deviation parameter for the calculation of noise likeness blissart features spectrum dissonance Whether to compute spectral dissonance 22 Be aware that this operation can be time consuming as its time complexity is quadratic in the length of the spectra 56 CHAPTER 4 TOOLBOX e blissart features spectrum flatness Whether to compute spec tral flatness 22 Furthermore the blissart features spectrum mfcc blissart features spectrum mfccD blissart features spectrum mfccA and blissart features spectrum stddev_mfcc configuration options are available for spectra but make only sense for NMD where each component is described by a spectrogram The default is to only compute the mean MFCCs The following options control feature extraction from gains vectors e blissart features gains stddev Whether to compute standard deviation e blissart features gains pl Whether to compute peak length 5 e blissart features gains pf Whether to compute peak fluctuation Bl e blissart features gains percussiveness Whether to compute percussiveness 22 e blissart features gains percussivness length double The length in seconds of the percussive impulse to use for computation of percussiveness e blissart features gains periodicity Whether to compute peri odicity of gains 5 e blissart features gains periodicity bpm min positive integer The min
47. mix of all components Depending on the type of music there are probably hearable artefacts resulting from the information reduction performed by NMF for separation By using the Mute and Solo buttons you can mute some of the com ponents or mute all other components respectively Try to identify com ponents which represent drum sounds using the Solo button Normally this is quite easy as they show a high degree of periodicity Now mute the identified drum components and listen to the result 2 1 BASIC SOURCE SEPARATION 11 2 1 3 Command Line Separation An alternative to the browser is the septool Separation Tool command line application which is more flexible and has more separation features than the browser The separation process that you performed using the Import audio dialog can be realized with septool as follows Open a com mand line window change to the bin directory within the openBliSSART installation directory and type septool lt file wav gt The default options correspond to the parameters shown in Figure After executing this command open the browser again There should now be 40 Classification Objects listed 20 from the recent septool process and 20 from the previous separation using the browser Note that if you left the browser open while running the septool you have to refresh the view using the F5 key The septool also has the feature to directly save the separated com ponents
48. mponent 0000181 Speech 182 NMF Component 0000182 Speech 183 NMF Component 0000183 Speech LDDODIDA Comm mle Add CLO s by label Select label Remove selected Figure 2 11 Editing a response 2 2 5 Cross Validation To assess the quality of the response we have just defined we might perform a stratified 10 fold cross validation Currently this function is not accessible from the browser but is available through a separate tool cvtool Open a shell or Windows command prompt change to the bin direc tory of the openBliSSART installation tree and type cvtool r1 2 2 SUPERVISED COMPONENT CLASSIFICATION 19 assuming the response has the ID 1 which is the case if it is the first response you created otherwise check the number appearing before the respective response s name in the tree view The cross validation tool should output something like this Validated 320 samples with 10 fold cross validation Confusion matrix predicted real Music Speech Music 179 1 Speech 0 140 Accuracy 0 996875 Recalls Music 0 994444 Speech 1 Mean recall 0 997222 2 2 6 Using a Response for Blind Source Separation Finally we are now able to separate audio files into their music and speech parts by means of the response that we have created For this purpose we also use a command line tool septool We want the separation tool to perform NMF into 20 components using a window size of 60 ms then classify the components by
49. onal costs involved in the separation process The components of the chosen audio files will appear in the Classifica tion objects tree on the left hand side of the Browser main window 4 7 3 Feature Extraction While it is possible to extract the features of individual classification ob ject see figure 4 3 via their context menu the features of all classification objects can be extracted in one step as well by selecting the Database Extract features from all data descriptors item in the application s menu 46 CHAPTER 4 TOOLBOX Again the number of threads can be specified when extracting all fea tures at once and significantly reduces the processing time on multiprocessor machines If you change the configuration options for feature extraction see 14 9 5 you have to restart the browser for changes to take effect Select Edit Entities Additional informati amp Classification objects Pre 0000001 NMD Component Harmonic Ls 0000003 NMD Component Harmonic 0000004 NMD Component Harmonic 0000005 oe po e Import audio 5 0000006 NMD Component Data descriptors Create response E Gains Create label 0000013 percussiveness 0 2 Periodicity 35 240 5 Export selected objects as WAV pf 0 0 0 i pI 0 0 0 Delete selected items 4 Phase Matrix re Create response from these items inates from 0000007 NMD Component Select label s kla 0000008 NMD Component z
50. oncatenated row wisely Magnitude and phase matrices are concatenated column wisely t lt type gt type lt type gt selects the type of data descriptor to ex port Available types are Spectrum spect Gains gains Mag nitude Matrix mmatr and Phase Matrix phase strip prefix lt path gt when selecting the output file name the default is to use the full path name of the corresponding input file is used This option can be used to strip a certain path prefix to create relative file names target dir lt path gt sets the target directory for output Output files are placed in this directory and relative path names are inter preted with respect to this directory T add type adds a string giving the type of data to the file names e g spect 4 5 AUDIO EXPORT TOOL 43 4 4 1 Usage Example export p17 fgnuplot tgains c Exports the gains vectors created in process 17 and concatenates them The output in this case a gains matrix is written to a file in Gnuplot format 4 5 Audio Export Tool In contrast to the export tool the audio export tool exportaudio exports objects usually NMF components in the storage to an audio file The following options can be specified on the command line e h help displays usage information o lt id1 gt lt id2 gt object id lt id1 gt lt id2 gt selects the ob jects to export Single IDs or ranges of IDs c
51. oncurrent instances of the separation tool and hence multiple user processes can lead to errors as the integrated SQLite database can only be written by one user process at a time 4 2 Feature Extraction Tool The feature extraction tool fextool extracts features from stored compo nents and saves them into the database It can be controlled via the following command line options 40 CHAPTER 4 TOOLBOX h help only display information about command line parame ters A echo print the base name of the application binary and its named command line options in long format with their parameters if given before executing C config lt filename gt use the specified configuration file prop erties format instead of the default one See section for details a all performs feature extraction for all components whose data is available p lt id gt process lt id gt performs feature extraction on the com ponents that have been generated by the separation process with the given ID n lt number gt num threads lt number gt the number of concurrent threads to use for separation and classification Should be set to the number of CPUs cores present in the computer for maximum per formance The feature extraction process itself can be influenced by a great variety of configuration options which are all listed in section 4 9 5 The same note about multithreading and multiple instances
52. oot of Hann function D Audio 04 Kim Wilde Kids In America wav ES D Audio 05 Kool amp The Gang Celebration we Window size ms 60 gt D Audio 06 Phil Collins In The Air Tonight w y gt z D Audio 07 Police Every Little Thing She Dog Algorithm Multiplicative update divergence Y D Audio 09 Rolling Stones Start Me Up wav A D Audio 10 Soft Cell Tainted Love wav Overlap la gt Figure 2 6 Import audio dialog Ensure that the parameters on the right hand side are set exactly as in figure and select some audio files WAV or MP3 containing music preferably around 10 20 seconds long Then click Ok and wait for the process to finish Depending on the number and length of your audio files this process may take several minutes as it is computationally intensive In order to increase performance on multicore systems you can adapt the Number of threads settings to reflect the number of available cores before actually starting the process Once the process has completed you can expand the Classification objects node in the tree view so as to examine the entries reflecting the separated components The second column states that they are still Unlabeled we will take care of that in the next step However at first please repeat the above procedure while this time select ing audio files containing speech Make sure to
53. openBliSSART User Manual Felix Weninger Alexander Lehmann Bj rn Schuller TU M nchen TU M nchen TU M nchen Version 1 2 May 2010 1 weninger tum de 2lehmannaQin tum de 3schuller tum de Contents 1 Overview 2 Tutorial ae dee ee A ae ee Ge eto de eden ope nee 2 1 2 Manual Component Mixing hi A Geter OS pi amp a oes GA 2 2 2 Defining Classes 2 2 3 Feature Extraction 2 2 4 Defining a Response o 2 2 5 Cross Validation o 2 2 6 Using a Response for Blind Source Separation 3 openBliSSART Internals 3 1 Data Organization 3 1 1 Database entities 2 2m mo on 3 1 2 Storage of binary files 3 2 Source separation by NMF 3 2 1 Basic NMF Algorithms 3 2 2 Initialization and Termination 3 2 3 Supervised Component Classification 3 2 4 Source Separation by Supervised NMF 3 2 5 Sparse NMF 3 2 6 Convolutive NMF 08 3 3 Source Separation by ICA 22 2 2 2 m onen 4 Toolbox 4 1 Separation Tool 4 1 1 General 4 1 2 Audio Preprocessing 20 4 4 1 3 Transformation 10 11 12 12 14 16 17 18 19 21 21 22 25 25 26 28 28 29 29 30 32 CONTENTS bide PRY So owas bb eG ead es eo 36 4 1 5 Component Processing 04 38 pe iid t
54. orithm implemented by openBliSSART constitutes a good compromise between the properties of both kurtosis and negentropy It uses a fast fixed point algorithm for the following cost function 1 C x F log cosh a w x 3 3 SOURCE SEPARATION BY ICA 33 where a is a real constant within 1 2 and w is the current weight vector which maximizes projected data s non gaussianity and hence is constantly updated throughout the FastICA iterations 34 CHAPTER 3 OPENBLISSART INTERNALS Chapter 4 Toolbox 4 1 Separation Tool The separation tool septool is the central command line application of openBliSSART It takes one or more audio files and separates them into components by using non negative matrix factorization Components can be stored and or classified using an existing response variable In the former case each component is saved to the database as classification object Also the parameters of the separation process are saved In the case of classification an audio file is generated for each class An arbitrary number of files to be processed gt 1 can be given as arguments WAV OGG and FLAC formats are supported Furthermore the process can be controlled via a variety of parameters are listed below 4 1 1 General e h help display information about command line parameters and exit e A echo print the base name of the application binary and its named command line options in long format with thei
55. r by pressing the respective button or by selecting the corresponding entry from the application s main menu or alternatively the context menu of the tree view 4 7 BROWSER 45 E Import audio files Files Parameters P BISSART Drumbeat2 bin money_for_nothing wav Processing steps Fourier Transform NMF NMD p BIISSART Drumbeat2 bin hotel_california wav z p BIiSSART Drumbeat2 bin like a_virain wav Window function Square root of Hann function Window size ms lt Cost function Extended KL divergence Overlap Number of components Number of spectra Number of threads ololo lolo Maximum iterations lt f Figure 4 2 Example audio import dialog While an arbitrary number of input files can be specified on the left hand side the right hand side allows the selection of the intended parameters for the separation process Currently only a subset of the parameters e g cost functions of the separation tool is offered by the browser The user can choose whether to perform a separation process or whether to only load the file s spectrogram into the database Note that increasing the number of threads is only useful when working with multiple files because they will be distributed individually among the available worker threads Also the number of threads should not exceed the number of available processors as there are only few disk operations but rather heavy computati
56. r parameters if given before executing e C config lt filename gt use the specified configuration file prop erties format instead of the default one See section for details e n lt number gt num threads lt number gt the number of concurrent threads to use for separation and classification Should be set to the number of CPUs cores present in the computer for maximum per formance Generally speaking all audio file formats supported by the SDL_sound library can be read 35 36 CHAPTER 4 TOOLBOX e S scripted run in scripted mode i e assume that the input files contain file names of audio files separated by newlines This op tion can be useful if lots of files should be processed and to ensure compatibility with systems that limit the number of command line options 4 1 2 Audio Preprocessing Transformation options given on the command line override the correspond ing configuration options see 14 9 2 e r lt function gt reduce mids subtract right from left channel when converting from a stereo to a mono signal e k lt k gt preemphasis lt k gt preemphasizes the signal with a factor of k such that for all t gt 0 s st ks _ where s and sj are the sample values at position t in the original and preemphasized signal respectively e d remove dc subtracts the mean DC component from each frame before transformation 4 1 3 Transformation
57. re 2 10 Feature subtree of a classification object 2 2 4 Defining a Response Eventually we will have to feed the extracted features to a support vector machine SVM To this end we create a response variable from all compo nents we have in the database Click the Create response button in the lower left corner of the browser window A response entry will be created under the Responses node of the tree view Like it was the case for labels the name of the new response defaults to the current date and time Use the textfield on the right hand side to change it into something more meaningful like for example Speech vs music see igure 2 11 Then click the Add CLOs by label button and select both labels Mu sic and Speech in the corresponding dialog After clicking Ok your response should look like in figure 2 11 18 CHAPTER 2 TUTORIAL Edit Name Speech vs music Description Discrimination of speech and music 20 NMF components were generated from speech and music signals Quality feedback Relations Classification objects to Labels 1 2 3 pa 172 NMF Component 0000172 Music 173 NMF Component 0000173 Music 174 NMF Component 0000174 Music 175 NMF Component 0000175 Music 176 NMF Component 0000176 Music 177 NMF Component 0000177 Music 178 NMF Component 0000178 Music yu 179 NMF Component 0000179 Music 180 NMF Component 0000180 Music 181 NMF Co
58. re 4 6 Browser Preferences Dialog Figure 4 6 shows the preferences dialog of the Browser which allows the user to choose options for the audio preview select the default parameters for creating separation processes from the Browser and set the default number of threads to use for feature extraction 4 8 ICA Tool The ICA tool icatoo1 performs blind source separation on multiple audio input files by applying independent component analysis to the corresponding time signals Possible choices for the output of the results are either WAVE audio files or Weka ARFF format The format of the input files may diffe yet all of them must have the same sampling rate and equal number of samples Should the latter vary the corresponding signal can be expanded by using the expected value of its time signal 2Generally speaking all audio file formats supported by the SDL_sound library can be read 50 CHAPTER 4 TOOLBOX Whenever an input file contains more than one channel only the first one will be used for computation If the number of sources to be separated is smaller than the number of input files the corresponding number of signals with the greatest variance and thus most information will be selected from all available signals Since principal component analysis is a preprocessing step for ICA anyway this yields no particular further computational effort Readers should note that this is a stand alone application that makes no fur
59. roc of ICASSP Las Vegas NV USA 2008 pp 4029 4032 K W Wilson B Raj and P Smaragdis Regularized non negative matrix factorization with temporal dependencies for speech denoising in Proc of Interspeech Brisbane Australia 2008 P D O Grady and B A Pearlmutter Discovering convolutive speech phones using sparseness and non negativity constraints in Proc of ICA London UK 2007 P O Hoyer Non negative sparse coding in Proc of IEEE Workshop on Neural Networks for Signal Processing Martigny Switzerland 2002 pp 557 565 P O Hoyer Non negative matrix factorization with sparseness con straints Journal of Machine Learning Research vol 5 pp 1457 1469 2004 J Eggert and E K rner Sparse coding and NMF in Proc of Neural Networks Dalian China 2004 vol 4 pp 2529 2533 T Virtanen Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria IEEE Transactions on Audio Speech and Language Processing vol 15 no 3 pp 1066 1074 March 2007 P Smaragdis Convolutive speech bases and their application to su pervised speech separation IEEE Transactions on Audio Speech and Language Processing vol 15 no 1 pp 1 14 2007 W Wang A Cichocki and J A Chambers A multiplicative al gorithm for convolutive non negative matrix factorization based on squared Euclidean distance IEEE
60. rom the main menu or the context menu of the tree view The newly created response is automatically inserted into the entities tree while the response s properties name description and assigned classification objects can be modified inside the edit area To create a response that contains a set of classification objects simply select the desired classification objects in the tree view and click Create response from these items in the context menu 4 7 7 Adding Classification Objects to Responses Currently the only way to assign classification objects to an existing response is via the Add CLO s by label button located inside a response s edit area Pressing this button pops up a dialog that allows the selection of the desired label Thereupon all classification objects related to this label will be assigned to the current response 48 CHAPTER 4 TOOLBOX Since multiple labels can be assigned to a classification object one might wish to change the label in use In order to do that simply select the cor responding classification object from the list inside the response s edit area and press Select label Note that this button will be enabled as soon as more than one label is linked to the selected classification object Classification objects can be removed from the assignment list by select ing them followed by pressing the Remove selected button As with all of the browser s edit options the newly made assi
61. s A Hy lt Hy 3 24 where A is the sparseness weight and VHT j HH W7W Wij Wij lt Wij T T x 3 25 W HH ij HV W 55 Wij where W is the column wise normalized matrix W Euclidean norm 3 2 6 Convolutive NMF Convolutive variants of NMF consider spectra that evolve over time In other words the acoustic events that build the signal are no longer instantaneous but rather sequences of observations In speech processing these sequences can correspond to phonemes or even whole words 4 First openBliSSART supports Non Negative Matrix Deconvolution which is based on a convolutive signal model VzA W p H 3 26 p 0 gt p where W p p 0 P 1 is a set of P matrices and is a matrix operator that shifts the columns of its argument by p spots to the right filling the leftmost p columns with zeros Analogously to Eq this equation can be rewritten as 3 2 SOURCE SEPARATION BY NMF 31 r min P 1 t vs So HirWb 1 lt t lt n 3 27 where again r is the number of components and n is the number of columns of V n gt P Note that the inner sum now resembles a convolution It is straightforward to extend the cost functions ce Euclidean distance Eq and cq modified KL divergence Eq to the convolutive signal model a V Allr X V A 3 28 i 1 j 1 DAL ilog ge V A 3 29 A multiplicative update algorithm can be derived for either cost function 4 19 Note that
62. selected component for example to inspect the results of the NMF procedure make sure that the Preview checkbox is enabled Once the preview is available you can listen to the component move around and zoom in and out within the respective signal data by using the corresponding buttons inside the preview area While it is possible to assign labels to each component individually us ing the checkboxes on the right hand side for our scenario it is much more convenient to select all components that were created from music files re member how many it were then right click to open the context menu and use the Select label s item see figure 2 9 Entities 2 Additional information sa 0000152 NMF Component Unlabeled 0000153 NMF Component Unlabeled 0000154 NMF Component Unlabeled 0000155 NMF Component Unlabeled 0000156 NMF Component Unlabeled 0000157 NMF Component Unlabeled 0000158 NMF Component Unlabeled 0000159 NMF Component Unlabeled 0000160 NMF Component Unlabeled 0000161 NMF Component Unlabeled 0000162 NMF Component Unlabeled 0000163 NMF Component Unlabeled 0000164 NMF Component Unlabeled 0000165 NMF Component Unlabeled 0000166 NMF Component Unlabeled 0000167 NMF Component Unlabeled 0000168 NMF Component Unlabeled 0000169 NMF Component Unlabeled 0000170 NMF Component Unlabeled 0000171 NMF Component Unlabeled 0000172 NMF Component Unlabeled 0000173 NMF Component Unlabeled
63. t features magnitudematrix mfcc Whether to compute MFCCs MFCCs are sampled at a given num ber of equidistant frames which can be modified by the blissart features magnitudematrix mfcc frame_count option default 5 blissart features magnitudematrix mfccD Whether to compute delta coefficients using the regression procedure described in the HTK book 7 blissart features magnitudematrix mfccA Whether to compute delta delta Acceleration coefficients using the regression procedure described in the HTK book 7 blissart features magnitudematrix mean_mfcc Whether to com pute the mean of each MFCC and possibly its regression coefficients over the whole signal blissart features magnitudematrix stddevmfcc Whether to compute the standard deviation of each MFCC and possibly its re gression coefficients over the whole signal The following options control feature extraction from spectra blissart features spectrum mean mfcc For NMF these are sim ply the MFCCs For NMD this option indicates whether to compute the mean of each MFCC and possibly its regression coefficients over the whole signal blissart features spectrum stddev Whether to compute stan dard deviation blissart features spectrum centroid Whether to compute the spectral centroid blissart features spectrum rolloff Whether to compute spec tral rolloff blissart features spectrum noiselikeness Whether to compute noise likeness 22 b
64. the blissart global mel_bands global option e blissart fft transformations slidingWindow If set to true ap plies a sliding window to the spectrogram i e multiple columns frames are concatenated into a single column e blissart fft transformations slidingWindow frameSize The frame size for the sliding window transformation i e the number of columns to concatenate for each output column Default is 10 e blissart fft transformations slidingWindow frameRate The frame rate for the sliding window transformation i e the number of columns to skip between subsequent concatenations 4 9 4 Separation Options for the separation process can be specified in the configuration file blissart properties e blissart separation notificationSteps The number of iteration steps after which a notification is generated i e the progress bar is up dated in the septool and Browser applications Default ist 25 Setting this number to a low value may result in performance loss for small input files whereas raising it to a high value prevents any progress begin seen over long periods of time e blissart separation export format One of bin htk or gnu for BASSART binary matrix format HTK format or Gnuplot format respectively This option has an effect on the separation tool with the export matrices option enabled e blissart separation storage phasematrix true default if the separation tool should store t
65. ther use of the framework s storage and or classification components General e help display information about command line parameters and exit e as wave output the results as WAVE audio files which is also the default e as arff output the results as Weka ARFF files e prefix lt prefix gt the prefix to be used for the output files The filenames will be comprised of lt prefix gt lt nr gt lt format gt where lt nr gt equals the number of each separated source and format resembles the chosen output file format Separation e nsources lt x gt the number of sources to be separated Must be greater one and less than or equal to the number of input files e force in case of varying lengths of the input signals extends shorter input signals by their expected values instead of aborting e prec lt x gt the desired precision for the projection of the compo nents Must be a real value greater than 107 Defaults to 107 0 e max iter lt x gt the maximum number of iterations per component for FastICA Applies only if the desired precision has not been achieved before reaching this limit 4 8 1 Usage Examples e icatool prefix foo mix31 wav mix32 wav mix33 wav Performs ICA on the given input files and outputs the results as WAVE audio files with the names foo1 wav foo2 wav and foo3 wav 4 9 CONFIGURATION FILES 51 e icatool as arff prefix baz mix4 1 4 wav Performs I
66. tion 3 1 2 Storage of binary files Binary files corresponding to data objects i e vectors and matrices are stored in a directory layout such that the file name can be uniquely deter mined by the attributes of the corresponding data descriptor All multi byte values are saved in little endian order Our binary file format for vectors consists of the following elements e Orientation header 0 row vector 1 column vector as 32 bit unsigned int e Vector dimension 32 bit unsigned int e Array of components 64 bit double Our binary file format for matrices consists of the following elements e Matrix header 2 as 32 bit unsigned int e Number of rows 32 bit unsigned int e Number of columns 32 bit unsigned int e Array of matrix entries 64 bit double in column major order i e entry a of a matrix with m rows is stored at position j m i 3 2 Source separation by NMF Non Negative Matrix Factorization NMF is an algorithm originally pro posed for image decomposition 1 As a method of information reduction its most promiment feature is the usage of non negativity constraints unlike other methods such as Principal Components Analysis it achieves a parts based representation where only additive never subtractive combinations of the Given a matrix V R and a constant r N non negative matrix factorization NMF computes two matrices W RY and H RY such that V WH 3 1 26
67. ts such as components generated by the NMF their features and class labels while the object data itself is externalized to binary files Generally when processing audio files e g by FFT and or NMF openBliSSART saves information about the separation process such as the name of the input file the number of components the STFT parameters etc in a respective process entity Furthermore the computed objects such as NMF components are saved as classification objects Each classification object consists of one or more data descriptors which describe data like spectral vectors or phase matrices Classification Objects openBliSSART currently creates and handles the following types of classification objects NMD component generated by applying STFT and NMD or NMF to an audio file Spectrogram generated by applying STFT to an audio file Data Descriptors The following types of data descriptors are used The separation process can also be run in a volatile mode that does not store anything This is useful for example if the result of a NMF separation should be output as WAV files See section for details 21 22 CHAPTER 3 OPENBLISSART INTERNALS Magnitude matrix the magnitude spectrogram of an audio file Phase matrix the phase spectrogram of an audio file Spectrum a magnitude spectrum generated by NMF or NMD from an audio file a vector in case of NMF or a matrix in case of NMD Gains a gains vector
68. ues of one feature have the given mean y and standard deviation o by default y 0 0 1 none for no scaling e blissart classification scaling lower lower bound of the scaling interval if blissart classification scaling method is set to minmax e blissart classification scaling upper upper bound of the scaling interval if blissart classification scaling method is set to minmax e blissart classification scaling mu desired mean of the fea ture values if blissart classification scaling method is set to musigma e blissart classification scaling sigma de sired standard deviation of the feature values if blissart classification scaling method is set to musigma 4 9 7 Browser The browser configuration file browser properties contains options for the audio file preview and the default settings for importing audio files The options are listed below e browser featureExtraction numThreads the default number of threads to use for feature extraction Default 1 58 CHAPTER 4 TOOLBOX browser mainwindow height stores the size of the browser window Default 768 browser mainwindow isMaximized stores whether the browser win dow is maximized Default false browser mainwindow width stores the width of the browser win dow Default 1024 browser preview alwaysEnabled indicates whether the audio pre view should be enabled by default Default true browser preview normalizeAudio

openBliSSART User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents