Home
        Two-Dimensional Fourier Processing of
         Contents
1.        Analysis Mode  Rhythmic    232        Raster Image Width  One eighth note      Frequency Mode  Single Point       Threshold  25    Remove  Above Threshold    e simple120_eighthwidth_single_25below  wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One eighth note      Frequency Mode  Single Point      Threshold  25       Remove  Below Threshold    e simple120_eighthwidth_single_55below  wav    Source file  simple120 wav        Analysis Mode  Rhythmic    Raster Image Width  One eighth note        Frequency Mode  Single Point    Threshold  55       Remove  Below Threshold  e trumpetG3_aud_25above wav      Source file  trumpetG3 wav      Analysis Mode  Timbral      Frequency Mode  Audible      Threshold  25       Remove  Above Threshold    e trumpetG3_aud_63below  wav    233        Source file  trumpetG3 wav    Analysis Mode  Timbral        Frequency Mode  Audible    Threshold  63       Remove  Below Threshold  e trumpetG3_rhyth_20above wav      Source file  trumpetG3 wav      Analysis Mode  Timbral      Frequency Mode  Rhythmic      Threshold  20       Remove  Above Threshold    e trumpetG3_rhyth_20below  wav    Source file  trumpetG3 wav      Analysis Mode  Timbral       Frequency Mode  Rhythmic      Threshold  20        Remove  Below Threshold    e trumpetG3_single_9above wav    Source file  trumpetG3 wav      Analysis Mode  Timbral       Frequency Mode  Single  Threshold  9         Remove  Above Threshold  e trumpetG3_single_20b
2.        Raster Image Width  One quarter note  Row Shift  4        Wrap rows around  e simple120_eighthwidth_shift4wrap wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One eighth note  Row Shift  4        Wrap rows around  e simple120_quartwidth_shiftiremove wav        Source file  simple120 wav    Analysis Mode  Rhythmic        Raster Image Width  One quarter note    Row Shift  1      Remove original rows   e simple120_quartwidth_shift4leave wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One quarter note      Row Shift  4    229        Leave original rows  e simple120_quartwidth_shift4wrap wav        Source file  simple120 wav    Analysis Mode  Rhythmic      Raster Image Width  One quarter note      Row Shift  4      Wrap rows around  e simple120_quartwidth_shift6wrap wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One quarter note      Row Shift  6      Wrap rows around    e trumpetG3_shift20_leave wav    Source file  trumpetG3 wav      Analysis Mode  Timbral    Row Shift  20        Leave original rows   e trumpetG3_shift40_remove wav      Source file  trumpetG3 wav      Analysis Mode  Timbral    Row Shift  40        Remove original rows    Stretch Rhythm  These audio files demonstrate the results of processing by changing the rhythmic frequency    range     230    e simple120_eighthwidth_Opt4 wav    Source file  simple120 wav        Analysis Mode  Rh
3.       below_thresh       Boolean       true       Determines whether the threshold re   moves components below the threshold  value  if true  or above the threshold    value  if false         Table 5 3  2D Magnitude Threshold Parameters Set in create_thresh    129        lt Student Version gt    Threshold    Thresnod 1Yp     Rhyth  Freq   Aud  Freq   Single Freq    Remove    Below   Above Threshoki  47     Bypass    Figure 5 4  2D Magnitude Thresholding GUI    5 3 3 Running the Magnitude Thresholding    The magnitude thresholding process is run in process_thresh  which calculates the new  FT data frame by frame  For each frame  the magnitude and phase components are first  extracted from the FT array using fftshift  abs and angle  Then a switch   case     statement is used to determine the type of thresholding to be applied  given by the ttype  variable  For each thresholding type the maximum magnitude must be calculated using  Matlab   s max function  When thresholding by only rhythmic or audible frequency the sum  of magnitudes of each the row column is required first  The actual magnitude value of  the threshold can then be calculated from this maximum and the value of thresh  The  magnitude value of each component is compared with the threshold value and set to zero  if above below depending on the below_thresh setting  The thresholding of rhythmic fre     quency is shown in listing 5 17 as an example           totals   sum rot90 mag       max_tot   max totals    thresh_v
4.     2 1 2 Fast Fourier Transform   6  lt 6 aa ee ed ie we eee ee ee  2 1 3 Two dimensional Discrete Fourier Transform                Frequency Domain Audio Processing               0  20004   2 2 1 Time Frequency Representation of Audio                  2 2 2 Short Time Fourier Transform       use baw ee ea eee  2 230 Window FIMO IONS  sins s i e oaar paa AO e e oa a a hen  2 2 4 Resynthesis Methods     aa aa aaa ceca ak ES ok EA  220   Phas   Vocoder esaten Genga ia ee Denia a eae Ge uE A  2 2 6 Other Analysis Resynthesis Tools       o 00a aa  2 2 7 Two Dimensional Fourier Analysis of Audio                2 2 8 An Application of Two Dimensional Fourier Analysis of Audio      Tmase PrOCESSINE sretan eae WE Ae ETE eS a ee oS aa  2 3 1 Two Dimensional Fourier Analysis of Images                2 3 2 Frequency Domain Image Processing                     Audio Feature Extraction   ga a  4 tne ovate Gea eee eee Re  Graphical Representations     542 4 eas cesar  aoa Gi arm Kido aok Bhd    200k  Raster SCAM  t aoed e tg de  down Wk lenis ek ee Ge HA Bes HA    ZiOue  OPEGITOL TANIS  e e E ye  Abs ea Ree EO a BOR Mee  ROMs Sd ae os 29    2 5 3  Synthesisers  amp  Sound Editors with Graphical Input           29  2 6 Matlab Programming Environment       ooa a 29  Introduction to the Matlab Tool 31  3 1 Requirements of the Software Tool     ooo a a 0 00 0 00 000   31  3 1 1 Signal Representations       sc  dws a ww 6 EA ee BO 31  312 Analysis Tool 9 hins A bak a ra ine Sica na E a 32  3 1
5.     WUT A  MANION       UUE EELE       WEEE        a  Correct Tempo  120 bpm   Raster Image Width  22050    samples    Whee MOAT AL ate  IL      WE MM Veeck  WL yey DP  ai EEEE E    a MNN    WERE RR EES JTLT      MULE    ii WUE EEI  WILLE E E E        b  Incorrect Tempo  122 bpm   Raster Image Width   21689 samples    Figure 6 3  Determining Correct Tempo In The Raster Image    The mirtempo function correctly calculated the tempo of the audio signal as 120 bpm   One indication that this is the correct value is the tempo based duration in the analysis  settings GUI  The loop is known to contain exactly four bars so when the duration is  displayed as 16 4 beats  the tempo must be set correctly  Figure 6 3a shows the raster    image for this audio signal  loop120 wav  at the correct tempo with a row size set to 1 4    157    beats  a quarter note   The image clearly displays vertically aligned periodicity due to the  repetitive elements of the drum rhythm  When the tempo is adjusted to 122 bpm in figure  6 3b  the drum hits are no longer vertically aligned but skewed and as a result the raster    image gives a much less intelligible representation of the signal     When the tempo is correctly set  the 2D Fourier spectrum representation is clear  figure  6 4a   Some rows of the spectrum display clearly display many points of large magni   tude  These rows correspond to rhythmic frequencies that are prominent within the drum  rhythm  Other rows have much lower average magnitudes  
6.    any signal frame     119       Variable    Data Type    Default Value    Description          type    String       Filter       The name of the process  obtained from  the proc_names array  Used to identify    the process as a filter        bypass    Boolean    false    Determines whether or not the filter  process should be bypassed when the    chain of transformations is run        ftype    String       LP       The filter type     LP        HP        BP    or     BS           rhythmic_mode    Boolean    true    Determines which axis of frequency is  filtered  The rhythmic frequencies are  filtered when true and audible frequen     cies when false        keepDC    Integer    Indicates what DC component  if any   should be preserved  Either no data is  preserved  0   the single point that is  DC in audible and rhythmic frequency   1   all audible frequencies with DC  rhythmic frequency  2  or all rhythmic    frequencies with DC audible frequency     3         cut    Boolean    true    Determines whether filter cuts the stop     band or boosts the pass band        ideal    Boolean    true    Determines whether the filter   s fre   quency response is ideal or Butter     worth           order       Integer          The order of Butterworth filter used  to define the frequency response when    ideal   false        Table 5 1  2D Spectral Filter Parameters Set in create_filter    120       5 2 2 Calculating Filter Properties Based On The 2D Fourier    Spectrum    The function c
7.   26   The same technique has been implemented in the 2D Fourier  spectrum  The audible frequency bin values are rescaled by a linear factor according to the  desired pitch change  and the data is then resampled to get this scaled data at the original    audible frequency values     5 7 1 Pitch Shift Parameters    The pitch shift process only has one parameter  shift  which gives the required pitch shift  in semitones  The value of shift can be any integer  positive or negative  and the process  will rescale the audible frequency data by the linear factor corresponding to this shift  The  initial setting for shift is 0  defined in the function create_pitch_change which initialises    the pitch shift process     137    5 7 2 Adjusting the Pitch Shift Process    The function adjust_pitch_change allows the user to set the value of the shift parameter  using a text edit object in the GUI shown in figure 5 7  The callback for this text edit  object rounds the numerical input and stores the value in shift  and corrects the text edit    string      lt Student Version gt    Pitch Shift    Pitch snitt  semitones   7    Bypass    Figure 5 7  Pitch Shift GUI    5 7 3 Running the Pitch Shift Process    The function proces_pitch_change performs the operations shown in listing 5 20 for each  2D spectrum frame  The linear scaling factor  change variable  for the desired pitch change    is calculated from the shift value using the following equation     change   1 2Sbi t 12  5 10     The dat
8.   E  Woods  Digital Image Processing  Prentice Hall  3rd edition   2008     R  Kirk and A  Hunt  Digital Sound Processing for Music  amp  Multimedia  Focal Press   1999     J  Laroche and M  Dolson  New phase vocoder techniques for pitch shifting  harmo   nizing and other exotic effects  IEEE Workshop on Applications of Signal Processing  to Audio and Acoustics  pages 91 94  1999     O  Lartillot and P  Toiviainen  A matlab toolbox for musical feature extraction from  audio  In Proceedings of the 10th International Conference on Digital Audio Effects   2007     O  Lartillot  P  Toiviainen  and T  Eerola  Mirtool   box  Department of Music  University of Jyvaskyla   http   www jyu fi hum laitokset  musiikki en research coe materials mirtoolbox     R  McAulay and T  Quatieri  Speech analysis synthesis based on a sinusoidal represen   tation  IEEE Transactions on Acoustics  Speech and Signal Processing  34 4  744   754   1086     J  A  Moorer  The use of the phase vocoder in computer music applications  Journal  of the Audio Engineering Society  26 1 2  42   45  1978     C  Penrose  Chapter 2  Spectral representations  Taken from incomplete thesis at    http   silvertone princeton edu  penrose thesis   2008     J  O  Pickles  An Introduction to the Physiology of Hearing  Academic Press  2nd  edition  1988     202    25    26    2    28           31      32      33      34     35    36          37     38      39     W  Pratt  Digital Image Processing  Wiley Interscience  2nd edit
9.   When in rhythmic  or tempo synced analysis mode  the user is only presented with tempo based frame row  size options  however the underlying frame_size_sec variable is used to store the frame  size in terms of seconds whatever mode the analysis is in  If the tempo or the note nu   merator or denominator are changed then the actual frame size is recalculated using the  update _framesize function  There is also a display of the number of frames that will be  produced by the analysis options currently chosen  which is updated whenever the frame    size is changed     4 6 Analysis Implementation    Once the analysis options have been chosen  the analyse_audio function proceeds to obtain  the 2D signal representations  The process of obtaining the raster image varies according  to the analysis mode  and in rhythmic mode  the synchronisation option also determines  between two processes  The implementation of each of the three options is described in    detail in this section     The first step in analyse_audio  before the raster image can be obtained  is to get the  frame size parameter in terms of samples and store it in the analysis structure as  frame_size  The following equation is used to extract the frame size in samples from    the frame_size_secs variable     83       frame_size   round frame_size_secs   F    4 21     Where F  is the sampling rate     4 6 1 Timbral Mode    In timbral mode  the audio signal must first be divided into frames  This is done using the  raster
10.   data is contained within a one dimensional array  The other representations  the raster  images and the Fourier data  are stored within cell arrays since there can be more than one  matrix of data for each  It is worth noting that the 2D frequency domain data is stored  in the complex representation returned by fft2 not the conventional polar representation   this is to allow flexibility in the display of this data  Each cell will contain a 2D array of    data and this can be of any size  which is the main benefit of using cell arrays in this way     On the right hand side of figure 3 4 are four structures containing parameters concerning    the signal and its representations  they are described in table 3 2     41       Structure Name    Description          audio_settings    Contains the sample rate and bit depth parameters  obtained from  the source file with wavread  and also the duration in seconds of  the audio data        analysis_settings    This structure contains the direct parameters of the initial analysis    options menu  section 4 5         analysis    The subsequent parameters from the initial analysis that define  the 2D representations  including frame size and number of frames     pitch related information and image width        spec2D settings          This structure contains the parameters concerning the conversion  between raster image and 2D spectrum  The amount of padding    applied and the maximum magnitude of the Fourier data        Table 3 2  Descr
11.   gt  Se  Audible Frequency Axis y    Figure 5 8  Rhythmic Frequency Range Compression  Expansion    5 8 1 Rhythmic Frequency Stretching Parameters    The function create_spec_stretch is used to initialise this rhythmic frequency stretching  process  As with the pitch shifting process  there is only one adjustable parameter which  corresponds to the linear factor by which the frequency range is adjusted  However in  rhythmic frequency stretching this variable  named amount  is the precise value of range  adjustment  there is no intermediate equation before running the process  The amount  variable is initially set to 0 in create_spec_stretch  it can be set to any positive number    within the range of Matlab   s double precision floating point data type     5 8 2 Adjusting the Rhythmic Frequency Stretching Process    The GUI window shown in figure 5 9 is created using adjust_spec_stretch  It allows the  user to set the value of the rhythmic stretching process    amount parameter using a text  edit object  If a negative value is entered then its absolute value is stored in amount and    the text edit object   s string is corrected accordingly     141     lt Student Version gt    Str      stretch Factor  2 35    Bypass    Figure 5 9  Rhythmic Frequency Range Stretching GUI    5 8 3 Running the Rhythmic Frequency Stretching Process    Rhythmic frequency range compression  expansion is performed by resampling the columns  of the 2D Fourier data array  adjusting the range of frequ
12.   introducing extraneous frequency components     The most rudimentary window to use to extract a short section of a signal is a rectangular  function  however this is likely to introduce discontinuities in the signal and will provide  a distorted analysis  which is represented by high magnitude side lobes in the window   s    frequency response     A smooth bell shaped curve  such as a Gaussian function  will prevent spectral distortion  due to discontinuities at the edges  It   s frequency response has much lower side lobes   however a smoother curve also has a wider main frequency lobe  This means that the    frequency resolution will be effectively reduced by the window   s frequency response  26      There are many different window functions commonly used  all with different spectral  response characteristics  The MATLAB Signal Processing toolbox provides a window  design and analysis tool that provides a detailed analysis of many common window types   The window function must be chosen according to the specific requirements of analysis or    processing     2 2 4 Resynthesis Methods    Resynthesis constructs a time domain signal from the analysis data  which may or may not  have been modified  There are a variety of different resynthesis techniques available  33    a lot depend on a specific type of analysis but some simple examples are oscillator bank    resynthesis and source filter resynthesis     The overlap add resynthesis method is the inverse of STFT and therefore
13.   yin function are obtained using these API routines and then yin is called     Listing 4 9 shows the gateway function of yin c  which determines the pointers and values  of the mxArray variables  so that they can be used in a call to the yin function  Note that  the empty Matlab array yin_vec is passed in as an argument to use in the YIN algorithm  calculations  The YIN algorithm itself is implemented as in  3  in the yin function which    uses the given pointers to store its results in the mxArray variables           the gateway function x      91       pertod_length   yin input_vec  tolerance  yin_vec         void mexFunction  int nlhs  mxArray  plhs          int nrhs  const mxArray  prhs        double xinput_vec  xyin_vec  tolerance   double  period_length   int yin_length        check for proper number of arguments     if  nrhs  3    mexErrMsgTxt     Three inputs required         if  nlhs  1    mexErrMsgTxt    One output required             check to make sure the first input argument is a scalar     if   mxIsDouble prhs 1      mxIsComplex prhs 1       mxGetN prhs  1     mxGetM prhs 1    1        mexErrMsgTxt   Input    tolerance    must be a scalar             get the scalar input tolerance       tolerance   mxGetScalar prhs 1           create a pointer to the input matrices     input_vec   mxGetPr prhs 0       yin_ vec   mxGetPr prhs 2          get the length of the yin vector     yin_length   mxGetM   prhs  2            set the output pointer to the output matrix x   
14.  1    handles cur_process size l         handles  cur_process min_quadrant _size 1l     end   set  handles resize_popup edit_height   String          num2str handles cur_process size 1l         calculate ndivs   handles cur_process ndivs   handles cur_process size 1  2    1          handles data spec2D_settings  height _pad     set  handles resize_popup edit_duration_beats     String            146       num2str handles cur_process ndivs       changed  handles       end       Listing 5 23  Dealing With A Change In Height Setting Using the height changed    Function    The function width_changed is called when the width or tempo values are adjusted  The  callback function for the width text edit object gets the numerical value entered and rounds  it to a positive integer to store in size 2  before calling width_changed  The callback  for the tempo text edit object gets the given numerical value and stores it in the pro   cess    tempo variable before calculating the spectrum quadrant width that most closely  approximates this tempo  This width is obtained by the following equations before calling    width_changed               240 x tempo_div_num   Fs  im_width   round  5 12   tempo   tempo_div_den   width   im_pad  size 2    ceil      5 ae    5 13     The width of the output raster image is calculated first using the tempo division  the  sample rate  and the current tempo value  then the output spectrum quadrant width  is determined from this  The width changed function perfo
15.  10 shows the use of mirtempo from within the    Calculate    button callback in  analysis_settings  The MIRtoolbox requires that the audio data is within a file on  disk  hence path of the source WAVE file is passed in as an argument  The functions of  MIRtoolbox return their results encapsulated within their own class type  so the data is    extracted using the provided mirgetdata function     4 8 Visual Analysis Tools    Several tools have been provided within the software to enable more thorough plot analysis     They are described in this section     93       4 8 1 Plot Zoom and Pan    The zoom and pan tools are provided in the default figure menu and toolbar layout to  allow the user to examine aspects of a plot display more closely  Matlab functions were  written to reproduce this default zoom and pan functionality  allowing these tools to be  incorporated into the 2D Fourier software whilst using a custom menu and toolbar layout   This was vitally important since each of the axes on the main GUI represent a large amount  of data that could not be fully observed at normal resolution  Matlab provides zoom and  pan    mode    objects that can be applied to a particular figure and allow the behaviour of    these tools to be customised using settable properties     The tools can be horizontal  vertical or unconstrained as in a default Matlab figure and  the zoom tool can go in or out  These options are available to the user through the    Plot  Tools    menu  and the callb
16.  2 Hz   Bandwidth  0 125 Hz   Frequency response  2      Order Butterworth    Cut  boost  Boost    e loop120_eighthwidth_HP_1Hz_ideal_cut_DCrhyth wav    Source file  loop120 wav   Raster Image Width  One eighth note  Frequency Mode  Rhythmic   Type  High pass   Cutoff  1 Hz   Frequency response  Ideal   Cut  boost  Cut    DC rhythmic frequency row added as a pass band    e loop120_eighthwidth_HP_1Hz_ideal_cut wav    Source file  loop120 wav  Raster Image Width  One eighth note    Frequency Mode  Rhythmic    218        Type  High pass    Cutoff  1 Hz        Frequency response  Ideal    Cut  boost  Cut  e loop120_eighthwidth_LP_Opt5Hz_but04_cut wav      Source file  loop120 wav      Raster Image Width  One eighth note      Frequency Mode  Rhythmic      Type  Low pass      Cutoff  0 5 Hz      Frequency response  2   4 Order Butterworth      Cut boost  Cut    e loop120_quarterwidth BS_OHz_pt125_ideal_cut wav This filter removes the DC    rhythmic frequency row        Source file  loop120 wav      Raster Image Width  One quarter note      Frequency Mode  Rhythmic      Type  Band stop      Cutoff  0 Hz      Bandwidth  0 125 Hz      Frequency response  Ideal      Cut boost  Cut   e loop120_quartwidth_LP_Opt5Hz_but04_cut wav      Source file  loop120 wav      Raster Image Width  One quarter note        Frequency Mode  Rhythmic    219    Type  Low pass  Cutoff  0 5 Hz  Frequency response  4    Order Butterworth    Cut  boost  Cut    e simple120_eighthwidth_BP_1Hz_pt125bw wav    Source
17.  3 Signal Transformations         a  Wa a See ES ee 33  3 1 4 Additional Tools   ct x Ate aa 8 chet  oe dee el acts de 33  3 2 Software Development Process             0 00 eee ee eee 34  3 2 1 Analysis Phase        a lyese RS Go A Gee AS Ss Ee 34  3 2 2 Processing Phage  ai ireira wi tetera we OA kee ee Be 35  3 3 Graphical User Interface Design       ooa aa a ai ee ee 36  33  Mam Window Sesa d a eenas aar a ar a a ia i ei 36  3 3 2 Additional Windows     ute duc  a a Mes eae 6 HA Bee s Hn 39  3 4 Data Structures and Data Handle    a pate ee a Po Be Dee HA 39  Oa    he data  Struchire cis aa eos ee Gis SRE RS ae SEE ii 41  3 4 2 Callback Functions     oao aa eck SS eA 42  34 3    GUI Miterachion    lt i  6 2 bok Bik ek E Wo Oe ee 43  3 5 Menu and Toolbar Design sim ac eon oe So ae Goes Od 2 Ee PY oe 43  3 5 1 Implementation of Custom Layout                    45  3 6    Data  Input Outp  t ets ae  4 es eth te ote Ape 8 Patek Ce Re 45  3 6 1 Audio Import Export 3 4524 og coed ei brace oN aoe Hoe de ood 45  3 6 2 image and Spectrum Export     fc ke a dew ln ow we GEA Ge BO 47  3 6 3 Program Data Saving and Loading                    48  Bet  Audio Player ace ou i as BOR a hs a ee ER GR Rene  Se be e a 49  Two Dimensional Audio Analysis 52  AT  Analysis Process  suis n adui Mok MR ee RG AR Re le Boe ae dd 52  4 1 1 1D Fourier Spectrum 451556  ia 2 428  ont Onde Benak  ate th Bae aS 53  4 1 2 Automatic Normalisation of The Signal                  53  4 2    Raster Scanning ye do
18.  440 2   midi_note    69  12         Listing 4 8  Calculating Pitch Frequency From Musical Note In note2freq    The pitch frequency is calculated by using the MIDI note number to determine the ratio  with A3  note 69  at 440 Hz  The ratio between consecutive notes in the equal temperament    scale is 21 12  The period size is then determined using the freq2samples calculation     n  F  f  4 25     Where the symbols have the same definition as in equation 4 22  In timbral mode  the  image width is calculated by rounding the period value to the nearest integer  which is    then stored in imwidth     When the    Info     GUI window is closed  if changes have been made to the analysis data   the calc_image function is run to recalculate the raster image representation and then  spec2D_init is called to obtain the new 2D Fourier data  Finally display_data is called    to redisplay the signal and store the settings as GUI data for the main window     4 7 Feature Extraction    The audio signal analysis options incorporate the requirement for both pitch detection and    tempo estimation algorithms to obtain accurate 2D signal representations  This section    90       describes the implementation of a robust fundamental pitch detection algorithm  the Yin  algorithm   7   and the use of an autocorrelation algorithm to estimate tempo  provided by  the MIRtoolbox  19      4 7 1 Pitch Detection Using The Yin Algorithm    The YIN algorithm was implemented to detect the fundamental pitch pe
19.  Data Structures     sb Be py See che ee else an ne ee ga eo ey Shey  5 1 2 2D Spectral Processes Window Design                   5 1 3 2D Spectral Processes Window Functionality            2    5 1 4 Components of a Generic Transformation Process             5 1 5 Running The Processes    aoa boxe ete eee OB  5 1 6 Observing Original and Processed Data                   5 1 7 Recalculating Signal Properties When Analysis Settings Change      2D Fourier Spectrum Filtering   0 644 eck tA wae ea Se Ea  5 2 1 Filter Parameters        6S eck Gh a SCE oe HOSS ARS  5 2 2 Calculating Filter Properties Based On The 2D Fourier Spectrum    5 2 3 Adjusting Filter Parameters               000000008  5 2 4 Running the 2D Spectral Filter                       Magnitude Thresholding with the 2D Fourier Spectrum              5 3 1 Thresholding Parameters     2  dfets tne Wente ee eae GFK ee GA  5 3 2 Adjusting the Thresholding Parameters                   5 3 3 Running the Magnitude Thresholding                   2D Fourier Spectrum Rotation   442  eeck oS SRA RASA  5 4 1 Spectral Rotation Parameters               000000    5 4 2 Adjusting the Rotation       a2  5 30 ks One eee OG ee oN  5 4 3 Running the Spectral Rotation Process                   2D Fourier Spectrum Row Shift   oer oe ee tate Gee eek ok ES eR ES  5 5 1 Row Shift Parameters       4 5 eerie  ores Ge ark K band Rood    5 5 2 Adjusting the Row Shift Process       eon e sw eons Sal ee ae as 135    5 5 3 Running the Row Shift Pr
20.  From  jason physics harvard edu  Subject  Re  Complex to colour conversion  Date  27 February 2008 07 00 03 GMT    206    To  cwp500 york ac uk    The basic idea is to map the phase to hue and magnitude to lightness in HSL color space   Unfortunately Java has a routine only for HSV  so we have to calculate the saturation and    value   http   en wikipedia org wiki HSV_color_space    Annoying Note  Java calls it HSB where Java   s brightness is Wikiepedia   s value even    though Wikipedia says brightness is synonymous with lightness rather than value     It might help to look at the image of the HSV cylinder  Magnitudes less than R are fully  saturated and their value decreases to become blacker  Magnitudes greater than R have  full value  but their saturation decreases to make them whiter  R is adjusted by the slider   The other variables  bmax  bmin  Imax  Imin  never change and keep their simple 1 or 0  values  so I should really optimize them out  To map an infinite range of numbers  0 to    infinity  to a finite interval  the arc tangent function is very useful  or arccot    http    commons wikimedia org wiki Image  Atan_acot_plot svg    As for mapping the phase to hue  getting the angle from the  x y  coordinates is perfect  for the atan2 function  which keeps track of which quadrant you   re in  Taking atan y x   doesn   t work because if both are negative  for example  you get the same result as when    both are positive  For some reason I had to add 2pi or pure real
21.  If no processes are defined  the    View Original Sig   nal    button on the main GUI   s toolbar is selected and the    View Processed Signal    button  is deactivated  When a process is added  the processed result is displayed and the    View    Processed Signal    button is activated and selected     Section 5 1 1 described the use of the original_data and data structures to store the un   processed data and the currently displayed data  When the    View Original Signal    button    is selected  it   s callback function tb_viewOrigCallback is called  It is shown in listing 5 9           function tb_viewOrigCallback  hObject   eventdata    handles   guidata gef      if stremp get handles tb_viewProc      State      off      set  handles tb viewOrig    State       on        else  set  handles tb_viewProc     State       off        end   if handles  processes num_proc  gt  0  set  handles  process_button     Enable         off       handles data   handles  original_data   display_data handles      end    end       Listing 5 9  Display Original Unprocessed Signal Data    This function simply sets the correct state of the toolbar buttons and if transformation    114       processes exist  it disables the    Processing    button and copies original_data to data before    calling the display_data function to plot the unprocessed data     While the unprocessed signal data is being displayed  the representations of the processed  signal do not exist  Therefore when    View Process
22.  The legend plot is updated    accordingly when brightness and contrast parameters are adjusted     The complex unit circle data is stored in a 2D matrix of size 500x500  with any points  outside the circle having 0 value  This matrix is stored on disk in a MEX file and loaded  during the app _OpeningFcn before the main GUI appears  When the legend is plotted using  the plot_legend function  the complex matrix is converted to a RGB colour representation  by the same process as the FT data in calc_spec2D  although starting from the magN    variable     4 9 Resynthesis    Once a comprehensive analysis tool had been developed  the next stage of the investigation  was to perform signal transformations by manipulating the 2D Fourier data  This meant it  had to be possible to obtain the other signal representations from the 2D Fourier domain    representation  The resynth function was written to resynthesise the time domain data     First the maximum magnitude array  max_mag  must be updated for the new Fourier data   using the calc_spec2D function  The raster image representation can easily be obtained  using Matlab   s 2D inverse FFT function ifft2 upon the FT data  The result of ifft2 is  likely to produce a complex result but the imaginary component is due numerical rounding  errors and can simply be removed using Matlab   s real function  16   The width_pad and  height _pad variables in the spec2D_settings structure must be inspected for each frame  and the padded row and or c
23.  Using The Contrast Parameter as an Exponent    4 4 4 Zero Padding    In  16  it is recommended that the image be zero padded so that the dimensions are both  even numbers before performing the Fourier transform  This is to ensure that the same  fftshift function can be used for the forward and inverse processes  with odd dimensions  the forward and inverse processes are different  It is however more important for analysis  puposes that there is a DC component at the centre of the spectrum  which can only be  achieved with two odd dimensions  Therefore the raster images are padded with a single  row and or column of zeros  as necessary to obtain an odd numbered size  before the 2D  FFT is performed in spec2D_init  This requires use of Matlab   s padarray function  as    shown in listing 4 5        handles data spec2D_settings height_pad frame    1   mod size handles  data   image frame  1   2       76       handles data spec2D_settings width_pad frame    1   mod  size  handles   data   image frame    2   2     padded   padarray handles data image frame         handles data spec2D_settings  height_pad  frame   handles data spec2D_settings width_pad frame   0    post            Listing 4 5  Pre Padding The 2D Spectrum With padarray    This use of zero padding means that a reverse process for fftshift had to be written   Due to the specific requirements  the rev_fftshift function didn   t need to work for even  dimension sizes  It simply obtains the matrix size and moves the fou
24.  a spectrum identical to the original     5 5 2 Adjusting the Row Shift Process    The adjust_row_shift function allows the row shift options to be observed and adjusted  by creating the GUI shown in figure 5 6  There is a single push button component that  toggles between the three options described in section 5 5 1  displaying the following one of  the following three strings according to the option selected     Wrap Rows Around        Leave  Orig  Rows    and    Remove Orig  Rows     The values of the wrap and remove variables are    set appropriately for each option in the button   s callback function     The text edit object   s callback simply rounds the numerical input to the nearest integer  and stores it in shift  The text edit object   s string is then updated with the rounded    value      lt Student Version gt    Row Shift       snitt Spectrum Rows  4    Wrap Rows Around     Bypass    Figure 5 6  Rhythmic Frequency Row Shift GUI    5 5 3 Running the Row Shift Process    As with the other process_  functions  process_row_shift performs its operation on each  frame   s 2D spectrum separately  Listing 5 19 shows the code used to calculate the shifted  FT array for each frame of data  Equation 5 9 must be used to calculate the actual number  of rows to shift for each frame  since the height of the spectrum is variable  The function  uses two arrays oldFT and newFT to hold the Fourier data arrays of the input and output  respectively  If the remove variable is set to t
25.  and  attempt to analyse it  This highlighted more features of the application that needed devel   opment  and the usability of the tool was constantly being improved  The data plots were  enhanced  adding more flexibility and information  especially the 2D Fourier spectrum   The data structure underlying the software tool had to be restructured occasionally and    the GUI was developed and adjusted as new functionality was introduced     The focus of development then shifted to the initial analysis of the data to define the  settings for audio rasterisation  Prior to this it was necessary to be able to import raster  images and 2D spectra as image files as part of the investigation  however this function   ality was not required in the later stages  This pre processing stage was only designed to  analyse an audio input  Feature extraction methods were incorporated into the software  to automatically determine the raster image width  dependent upon the chosen analysis  method     Throughout this development  a variety of different audio signals were imported into the  software to test the analysis process and find areas for improvement  More complex func   tions were tested formally before being incorporated into the tool  the details of testing are  given in section 7  This experimentation led to further improvements to the initial signal  analysis and tools  At this point  loading and saving of application data was added  which    allowed the user to retain analysis setting
26.  cur_process   guidata handles  proc_name_popup  figure   handles     uiresume handles proc_name_popup  figure       end       Listing 5 7  The    Adjust    Figure Window   s CloseRequestFcn    112       This function stores the cur_process data in the process array at the index cur_process_nunm   overwriting the previous settings for the transformation process  The handles structure  is then stored as GUI data with the    adjust    figure and the uiresume function is called     allowing the execution of the adjust_  function to continue     Once the user has closed the figure window  the adjust_  function obtains the handles  structure using guidata  then deletes the figure and data concerned with the current  process  as shown in listing 5 6  The adjust_  function then returns the handles variable  with the process parameters stored in the data structure process proc_num  and the  Boolean variables process_changed which states whether any of the variables have been    altered     5 1 5 Running The Processes    As introduced in section 5 1 3  the parameters of a process can be adjusted via either the  cmenu_clicked or button_clicked callback  The relevant adjust_  function is called to  display a figure allowing the user to observe and set the process parameters  and further  program execution is halted until this figure has been closed  section 5 1 4   Only then  will the adjust_  function return the handles data structure to the calling function  Both  cmenu_clicked and bu
27.  easily observed  In a revised implementation  the filter would be implemented with    a variable amplitude increase     5 3 Magnitude Thresholding with the 2D Fourier Spec     trum    Magnitude thresholding allows spectral components to be separated according to their    magnitude by setting a threshold value below above which 2D spectrum points are re     127    moved  This process serves as a useful analysis tool for determining the level of influence    that the strongest  weakest spectral components have in defining the audio signal     5 3 1 Thresholding Parameters    The magnitude thresholding process features a threshold parameter defined as a percentage  of the spectrum   s maximum magnitude value  in order to make it intuitive to operate and  easy to implement  The process can either be applied to individual points within the spec   trum or the rows of rhythmic frequency or columns of audible frequency  using the sum of  their magnitude values  It also provides the option of removing components that are either  above or below the threshold value  The variables that define these parameters are given in  table 5 3  along with their initial default values as defined in the function create_thresh   These variables are stored in the data structure corresponding to a thresholding process    within the process array     5 3 2 Adjusting the Thresholding Parameters    The adjust_thresh function allows the parameters of a thresholding process to be viewed  and adjusted by cre
28.  file  simple120 wav   Raster Image Width  One eighth note  Frequency Mode  Rhythmic   Type  Band pass   Cutoff  1 Hz   Bandwidth  0 125 Hz   Frequency response  Ideal    Cut  boost  Cut    e simple120_eighthwidth_HP_1lpt5Hz_ideal wav    Source file  simple120 wav   Raster Image Width  One eighth note  Frequency Mode  Rhythmic   Type  High pass   Cutoff  1 5 Hz   Frequency response  Ideal    Cut  boost  Cut    e simple120_eighthwidth_LP_1Hz_ideal_cut wav    Source file  simple120 wav  Raster Image Width  One eighth note    Frequency Mode  Rhythmic    220        Type  Low pass    Cutoff  1 Hz        Frequency response  Ideal    Cut  boost  Cut   e trumpetG3_aud_BP_but02_800Hz_bw400_cut  wav      Source file  trumpetG3 wav      Frequency Mode  Audible      Type  Band pass   Cutoff  800 Hz        Bandwidth  400 Hz      Frequency response  2   4 Order Butterworth      Cut boost  Cut  e trumpetG3_BP_but02_8Hz_4bw_cut wav      Source file  trumpetG3 wav      Frequency Mode  Rhythmic      Type  Band pass  Cutoff  8 Hz        Bandwidth  4 Hz    Frequency response  Ideal    Cut  boost  Cut  e trumpetG3_BP_ideal_OHz_cut wav      Source file  trumpetG3 wav        Frequency Mode  Rhythmic    Type  Band pass      Cutoff  0 Hz    221        Bandwidth  0 5923 Hz      Frequency response  Ideal      Cut boost  Cut  e trumpetG3_LP_but02_5Hz_cut wav      Source file  trumpetG3 wav      Frequency Mode  Rhythmic      Type  Low pass      Cutoff  5 Hz      Frequency response  2   4 Order Butterwort
29.  horizontal and  vertical periodicities  the 2D Fourier spectrum displays four clear points  fr  fa        frs fa    fr      fa   and     f       fa   This signal is an amplitude modulated sinusoid where f  is the modulation     rhythmic  frequency and f  is the carrier  audible  frequency of the signal     65    Signal Components Not Aligned With The Raster Image Dimensions    In a more complex audio signal there will be frequency components that don   t fit the raster  image and therefore run at an angle such as the 2D sinusoids shown in figure 2 4  Signal  components such as this will have non stationary phase in both frequency axes leading to    spectral smearing as shown in figure 4 6a     Raster Image    2D Fourier Spectrurn           Frequency  Hz        100 120 140 160 180 200  Pixels     2500  2000  1500  1000  500    500 1000 1500 2000 2500    0  Frequency  Hz      a  Sinusoid With Incorrect Raster Image Width    Raster Image 2D Fourier Spectrum       Frequency  Hz         2500  2000  1500  1000  500 0 500 1000 1500 2000 2500  Frequency  Hz      b  AM Sinusoid With Incorrect Raster Image Width       Figure 4 6  Spectral Smearing In Signal Components Not Aligned To Raster Dimensions    An AM sinusoid with the incorrect raster width demonstrates a similar spectral smearing    with its 4 points skewed to create a line perpendicular to the direction of the wave  figure  4 6b      66    Effects of Rectangular Windowing    The rasterisation process applies a rectangular w
30.  increments rot_ind  times  using Matlab   s rot90 function  However  when a rotation of 90   or 270   is applied   the analysis settings of the signal have to be appropriately altered to reflect the swapped  dimensions of the 2D representation  The code that adjusts the analysis parameters of the    data is shown in listing 5 18          if rot_ind is 1 or 3 then analysis data has to change   if mod handles  processes  process proc_num  rot_ind  2   temp   handles data spec2D_settings  height_pad    handles data spec2D_settings height_pad          132       handles data spec2D_settings width_pad    handles data spec2D_settings width_pad   temp   if stremp handles data analysis_settings analysis_mode      timbral      for frame   1 handles data  analysis  num_frames  handles data analysis imwidth frame    size      handles   data  FT frame    2       handles   data  spec2D_settings  width_pad  frame    end  else    if rhythmic mode and not synced  handles data analysis imwidth   handles data  analysis  num_frames     temp   handles data analysis frame_size     handles data analysis frame_size   handles data  analysis  num_frames    handles data analysis num_frames   temp   handles data  analysis_settings frame _size_secs   handles data     analysis frame_size       handles data audio_settings Fs   handles data  analysis_settings tempo    240x     handles data analysis_settings tempo _div_num         handles data analysis_settings frame_size_secs       handles data  analysis_s
31.  is commonly  used to synthesise the STFT analysis data  The inverse DFT is applied to each frame of  the analysis data to obtain a series of short time signals  These waveforms can then be  added together with the same overlap as in the analysis stage to reproduce the original  time domain signal  The inverse of the STFT equation  2 11  for overlap add resynthesis  is    l 0 1     L   1    N 1  ie  x  lH  n   Aln v2 X  lv  Jei on ae  2 14   No Paes met eee    16    2 2 5 Phase Vocoder    The phase vocoder was invented in 1966  13  and first applied to music applications in  1978  22   it is now a popular tool in music signal processing because it allows frequency  domain processing of audio  It is an analysis resynthesis process that uses the STFT to  obtain a time varying complex Fourier signal representation  However  since the STFTs  analysis is on a fixed grid of frequency points  some calculation is required in order to  find the precise frequencies contained within the signal  The phase vocoder calculates the  actual frequencies of the analysis bins by converting the relative phase changes between  two STFT outputs to actual frequency changes  9   The phase values are accumulated over    successive frames to maintain the correct frequency of each analysis bin     The magnitude spectrum identifies that certain frequency components are present within  a sound but phase information contributes to the structural form of a signal  2   Phase  information describes the tempor
32.  mag       if ret_mag    ret   max_mag     70          else  magN   mag max_mag     add 1 before logarithm to prevent    inf value occuring    magLN   log2 1 magN     contrast        get the phase  phase   angle fftshift  FT        convert the polar representation to RGB colour representation    spec2D   polar2colour  magLN  phase   brightness   mode       ret   spec2D     end       Listing 4 3  Calculating the 2D Fourier Spectrum Display Data    Figure 2 3 demonstrates that magnitude information gives a more intelligible visual dis   play than the phase  allowing features of the raster image to be inferred  16   Hence the  first stage in development of the 2D Fourier spectrum plot was to display the magnitude  spectrum as a grayscale image  The image function is used to display the Fourier spectrum  on axes  as with the raster image  so the data must be in the range  0 1   The magnitude  data must therefore be normalised by dividing by the maximum magnitude value in the  matrix  Also  by scaling the magnitude data logarithmically the visual display gives much    more detail     4 4 1 Colour Representation of Polar Data    The data for the image function has to be a three dimensional  m by n by 3  array of RGB  colour values  for a grayscale display the three colour values should be equal  However  the design of the 2D Fourier spectrum display was extended  based on the techniques in   15   to include phase information by converting from polar data to RGB colour data   The fu
33.  max_aud_frame  depending on the value of the rhythmic_mode variable  This frame index is also used to  obtain the frequency array corresponding to the frame  allowing the image to be created   The image is returned from calc_filt and displayed on the axes within the filter figure  using Matlab   s image function  The rhythmic and audible frequency arrays are obtained    from the rhyth_freq and aud_freq variables and displayed along the axes of the plot     5 2 4 Running the 2D Spectral Filter    The filter process is run using the function process_filter  which creates an array of the  filter   s 2D frequency response and multiplies it  element wise  by the magnitude component  for each frame of 2D Fourier data  The processed 2D Fourier data array  FT  is then  calculated by combining the new magnitude component with the phase component to    obtain a complex representation by the equations     R  FT     FT  cos  Z  FT    5 1    S  FT     FT  sin  Z  FT    5 2   The calc_filt function requires three input arguments     e process_data   The filter process    structure  as stored in the process array  providing    the parameters and settings of the filter     e filt_image   An empty 2D array of the correct size to take the filter   s frequency    response data     124    e freq array   A 1D array of the frequency values at each point along the axis of    filtering     In order to make calc_filt generic for any filter  the filter function is always defined along  the columns of
34.  numbers didn   t work   phi   2 0 x Math PI   Math atan2 im  re     The only other complication with the hue was that going from angle directly to hue spends  too much time on red  green  and blue  and transitions through yellow  cyan  and magenta  too fast  I added some small amount of a sin going at 3x the speed so that the hue spends  more time in the transition regions to make the map smoother  compare the Wikipedia  HSV cylinder to the rainbow button on the 2D FFT Applet     hue   phi   Math sin 3 0 x phi  5    This description has been for the MAG_ PHASE mapping  When you look at only the real   imaginary  magnitude  or phase info  you can simplify the mapping  When looking at just  the magnitude  I go straight to grayscale  When look at just the phase  I calculate the hue    as before  but set the saturation and brightness to full     207    I pasted this email as a big comment on the top of the ComplexColor java file  which you    can rename ComplexColour if you really must         What do you mean by 2D Fourier representation of audio  You must mean something like  taking a second at a time and calculating the 1D FFT for just that second  then taking  the next second  There   s a name for this kind of plot  but I can   t think of it  It   s NOT a  2D FFT  though  Here   s an example where rainbow colors map to magnitude and phase    info is neglected   http    note sonots com  SciSoftware 2Fstft  Jason    Chris Pike wrote    Hi Jason    I   m doing my MEng project a
35.  of    ones into the filt_image array  initially all zeros  rather than calculating one row at a    125       time  The ideal low pass implementation is shown in listing 5 15  Matlab   s find func   tion is used to obtain the index of the last frequency point in freq_array that is lower  than the cutoff frequency  determining the size of the array of ones that is inserted into  freq_image  centred around 0 Hz  The implementation of the high pass filter is similar al     though two arrays of ones are required  placed at the top and bottom of the filter_image           ind_cut   find freq_array  lt  process_data cutoff  1    last       num_steps   ind_cut     centre   filt_image   centre   num steps  ind_cut       ones  2 num _steps l          size filt_image  2          Listing 5 15  Implementation of an Ideal Low Pass Filter    The implementation of the Butterworth filtering is quite different to ideal filtering  The  filter function is defined by an equation that takes in the freq_array and an equally sized  array cut_array with every value set to cutoff  The result is a 1D array of the filter   s  response for each frequency in freq_array  that can then be inserted into every column  of the filt_image array  The function for the Butterworth low pass filter is given by  equation 5 3  the high pass filter by equation 5 4  and the band pass and band stop filters  by equations 5 5 and 5 6 respectively                          1  H   TF wae m  1  HW   TRF    Hasi 3 ROEN  5 5   F
36.  of  Penrose   s method  23   which extends the STFT into two frequency dimensions  applying  a 2D window function  Rasterisation performs this process with a rectangular 2D win   dow and a hop size equal to the size of each frame  By removing the specific hop size  requirement of rasterisation and overlapping the analysis frames  the frequency resolution  of both axes could be increased  A tapered window such as a 2D Hamming window would  have to be used  but this would also reduce the spectral smearing of unsynchronised signal    components     In summary  the 2D Fourier transform offers an exciting new perspective on audio signal  analysis and processing  It suffers from various limitations as do most signal processing  paradigms however it certainly has a lot of potential  Rasterisation provides a useful visual  display of low frequency waveform variations and has greatly helped the understanding of  2D Fourier domain signal properties  However it applies unnecessary restrictions on the    2D Fourier analysis and for further work it is recommended that a variable hop size is used     198    Chapter 9    Future Work    Suggestions for future work     Variable hop size should be implemented to allow frame overlap with the use of  2D signal windowing  since the analysis resolution can be increased and processing    algorithms may be improved     2D Fourier analysis and processing methods need deeper theoretical investigation so  that the techniques initially investigated in 
37.  of audio data  which is given by the following    equation for a full amplitude input signal     SNRapc   20 logy   2    dB   96 33dB where Q   16  7 3     The signal to noise ratio of the analysis synthesis process is therefore well within reasonable    limits to allow processing in the 2D Fourier domain without undesirable information loss     194       The process_SNR function was also used to investigate Matlab   s hsv2rgb and rgb2hsv  functions  revealing that a value compent of 0 in the HSV representation results in a loss  of information  The range of the value component was therefore limited to  0 001 1  in the    polar2colour function  as described in section 4 4 1     195    Chapter 8  Conclusions    A comprehensive software tool has been developed in Matlab to allow 2D Fourier analysis of  audio signals  It uses the rasterisation process to obtain a 2D time domain representation  of the signal which can then be converted to the frequency domain using the 2D Fourier  transform  The investigation has led to an understanding of signal properties in the 2D  frequency domain  Based on this understanding  experiments were performed into audio  transformation processes that manipulate the 2D Fourier domain representation and these    were incorporated into the software     The rasterisation process applies a rectangular window to an audio signal to divide it into  evenly sized rows with no overlap  These rows can then be aligned and the signal can be  displayed in two di
38.  of the 2D Fourier spectrum in the same way   This processing tool allows some interesting creative transformations of an audio signal by  removing certain 2D spectral components but it can also serve as a useful analysis tool to    observe the 2D spectral structure of a signal     Rhythmic frequency thresholding allows the most dominant sub sonic oscillations of the  signal to be extracted or removed  Audible frequency thresholding allows the removal of  either the strongest or weakest audible components according to the threshold settings     which in timbral analysis mode allows separation of the harmonics of the signal     When the process applies a magnitude threshold to the spectrum according to the values  of individual points  the lack of spectral symmetry is not taken into account  If the signal  contains a 2D spectral component with synchronised audible and rhythmic frequency  the  spectrum will contain two pairs of symmetrical points with different magnitude values  If  the threshold magnitude lies between the magnitudes of the two pairs then one will be  removed and the amplitude modulated sine wave will become an angled sine wave with    unsynchronised pitch     Most audio signals have more low frequency energy and so the 2D spectrum magnitude  thresholding tends to separate components according to their audible frequency  unless  in rhythmic frequency mode  The process could be extended by optionally scaling the  threshold value across the audible frequency rang
39.  of the raster  image given in section 4 5 2  This was a design fault that was only realised once the anal   ysis phase of the project was completed  At this point it seemed the original definition of  signal frames was too deeply implemented into the software  To work around the problem   most processing operations require the commands in listing 5 13  A function containing    this code would then use the nframes variable in replacement of num frames           if stremp handles data analysis_settings analysis_mode   rhythmic      nframes   1    else  nframes   handles data  analysis  num_frames      end          Listing 5 13  Correcting the Number of Frames Parameter in Processing Functions    5 2 3 Adjusting Filter Parameters    The parameters of a 2D spectral filter can be adjusted using the adjust_filter function   which creates the GUI shown in figure 5 3  This GUI window allows the user to set all    122    of the filter options as desired and specify the cutoff frequency of the filter and also the  bandwidth if it is required  It also displays the filter   s frequency response as an image  to  give the user a better impression of how the filter will affect the 2D Fourier spectrum of    the signal      lt Student Version gt    Filter  Fiter Type  LPI BPIHPIBS Filter Mode  Rhythmic Freq  Audible Freq  Centre Freq  6025 5 Hz Bandwidth  4000 Hz  Keep DC  None  Cut Boost  Cut    Filter Preview  Architecture  Butterworth    Order  2    Fhythmic Frequency  Hz         o     o 
40.  output size of the 2D spectrum  quadrant according to the current set     tings  Initialised as the input size        orig size    Integer array    The input size of the 2D spectrum    quadrant  which is read only        change    Integer array     0 0     The required change in 2D spectrum    quadrant size        tempo    Double    n a    The tempo of the output signal  ini   tially set to the value defined in the    analysis_settings structure        ndivs          Integer       n a       The duration of the output signal in  terms of tempo divisions  It is ini   tially set to the height of the raster im   age multiplied by the numerator of the    tempo division        Table 5 4  Parameters of Spectrum Resizing Process Set in create_spec_resize    144       The variable ndivs is the rhythmic duration of the signal  given in terms of the note  duration or tempo division used to define the raster image width in analysis  As an  example  for an audio signal that is 4 bars long  with a raster image width of a quarter   note  ndivs would be 16  This variable is initially set to the height of the Fourier data  array  FT  minus the height padding  height_pad   i e  the raster image height  multiplied    by the numerator of the raster width tempo division  tempo_div_nun     The spectrum size is considered in terms of a single quadrant  including the DC row and  column  rather than the total spectrum size  This ensures that the positive and negative  frequencies remain symmetrical 
41.  pitch _map function also uses a msgbox object to inform the user that the pitch detec   tion is in progress via a dialogue window  with a string that updates for each frame  This  is because the pitch detection takes a few seconds to compute and the user needs to be    informed of what is happening  The message dialogue is closed at the end of the function    85    when all calculations are complete  This technique is used several times in the software    tool to inform the user of a current process     Once the pitch_map function is complete the period array is used to define the image  width settings for each frame  by rounding to the nearest integer  The image and FT cell  arrays are initialised to the correct size according to the number of frames  Then the raster  image representation is obtained for each frame  using the calc_image function  When in  timbral mode  this function simply calls rasterise for each frame  with its associated    image width parameter and stores the resulting image in the image cell array     4 6 2 Rhythmic Mode Without Pitch Synchronisation    This is the simplest process for obtaining the raster image  since only one image is required  and there is no pitch detection  The image and FT variables are instantiated as 1 cell arrays  and imwidth is set to equal the frame_size parameter  The calc_image function simply    calls rasterise to convert the audio array directly into image with a width of imwidth     4 6 3  Rhythmic Mode With Pitch Synch
42.  signal processing  this is described in chapter 5  The    Info     button opens a pop up window that displays several signal properties and the    Reanalyse     button opens a pop up window that allows the user to adjust the initial analysis settings  and redisplay the signal  these features are described in chapter 4  The main window also  has a custom menu and toolbar  which provides access to data I O  settings and analysis    tools  the implementation is described in section 3 5     37     0 Ieu     0 issouluBuA    pal x  z S I 0    pua  a7      wnpads az    so     2H  Aouanbasy  0       S0        wnaysads sauno aZ    0x  2H  Aouanbas  z SI t    wunypads apnyuber aL    do07    doas    so       dde    o   zH  Aguenbayy    0l     000    000z       aprub    ooog    000      lt uoIssaA Juapnys gt     EL       IWE PHIS    S     Bulssa00ig    sieXld    aBeul  4aysey     s338  aw   SI    WUOJaARM  opny            S  LULIY    ojul    axd       eprayduuy    Boo Fd     SA2      suondg sjoo  Ojg sHuias iojd aji4    Main Software Tool GUI    Figure 3 2    38    3 3 2 Additional Windows    Additional windows within the software were designed programmatically  since precise  control over the functionality and layout made it easier to interact with the main GUI   Most of these windows only required a collection of simple GUI components such as static  text  text edit  push buttons and drop down menus  The main aim with the layout of  these GUIs was to have well spaced and aligned compo
43.  that needed  consideration  It was vital that audio files could be imported and exported  for the pur   poses of the project investigation and also for creative users  It was also worth considering  that a composer might want to store a signal along with its analysis settings and transfor   mation processes  so that work could be continued over multiple sessions or shared with  other users  One further requirement was the ability to save the software tool   s display of    the different data representations as an image file  which would be beneficial to the creation    33    of this report     3 2 Software Development Process    Due to the investigative nature of this project  the software could not be produced using  formal engineering methods  It was only possible to produce a specification of the general  requirements of the software  since little was known initially about the properties of the 2D  Fourier transform or the possibilities of using it creatively  It was decided that the most  efficient process would be to develop the software gradually  treating it as a prototype  application and constantly extending its capabilities  Within the time limits of the project  it would not be possible to do the investigation and software engineering sequentially  and    still produce a useful product     The development process can however be divided into two distinct phases  both of which  had a loosely predefined structure  The first phase was to develop a complete analysis to
44.  the 2D spectral processing chain has  been applied  This is discussed in more detail in section  5 1 6                 Table 3 3  Description of Toolbar Buttons    44    3 5 1 Implementation of Custom Layout    The custom menu was designed using GUIDE   s graphical menu layout tool  The callback  functions for each menu item were automatically generated as child functions within the  app M file and their functionality was inserted  The toolbar  on the other hand  had to  be designed programmatically  The function create_toolbar initialises the toolbar object  and all of the components on it  It is called at the start of app_OpeningFcn  which also calls    toolbar_callbacks to define the callback functions for each component on the toolbar     3 6 Data Input Output    Matlab provides many functions that handle file I O operations for different data types   including audio and image files  These built in functions have been utilised in the software  to allow import and export of data from and to external files  It is also possible to store  and retrieve program variables from disk using the MAT file format  which enables the    signal data to be easily stored with any required settings parameters     3 6 1 Audio Import Export    The wavread and wavwrite functions allow audio data to be stored in the Microsoft WAVE    wav  sound file format  which forms the basis of importing and exporting audio in the 2D  Fourier software tool  The function import_audio was written to allow au
45.  the filt_image array  which is then rotated by 90   if the rhythmic_mode  variable is false  The dimensions of the FT array are therefore reversed in process_filter     to create the filt_image array for an audible frequency filter     Within calc_filt  a switch   case    statement is used to determine the required filter  type according to the value of the ftype variable  Within each filter type block  there is  the code for both the ideal and Butterworth options  The general concept for the ideal  filtering is to determine whether each frequency within freq_array is within the pass band  or the stop band and that row of filt_image is set to 1 or 0 accordingly  The implementa   tion of the ideal band pass filter function is shown in listing 5 14  the ideal band stop filter  implementation is identical apart from the 1 and 0 being swapped  The centre variable is  the index of the 0 Hz point within the frequency array  this is the same for all ideal filter    implementations           centre   ceil length freq_array   2    for n   1 length freq_array    if n  lt  centre  freq_pt      freq_array n    else  freq_pt   freq_array n    end  if freq_pt  gt    process_data cutoff     process_data bw 2   amp  amp       freq_pt  lt    process_data cutoff   process_data bw 2   filt_image n      1   else  filt_image n      0   end    end       Listing 5 14  Implementation of an Ideal Band Pass Filter    The ideal low pass and high pass frequency responses are created by inserting arrays
46.  the same duration  therefore the signal repeats itself  This is best    demonstrated visually when processing a single instrument note as in figure 6 19     180       Figure 6 19  Trumpet Note Processed Using    Double Rhyth       Decreasing the rhythmic frequency range demonstrates an interesting result in timbral  analysis mode where the rhythmic frequency resolution is larger  The signal duration is  extended whilst maintaining the original pitch however since the raster image dimensions  have not changed the duration of the data is unchanged therefore the signal restarts at the  beginning of the image  Again this is well demonstrated by an instrument note signal  as    shown in figure 6 20     181        a  Range Reduced By Factor of 0 75        b  Range Reduced By Factor of 0 5  c  Range Reduced By Factor of 0 25    Figure 6 20  Reducing the Rhythmic Frequency Range of a Trumpet Note    It is suggested that these issues stem from aliasing of rhythmic frequencies which seems  unavoidable since the signal will always contain frequency components above half the    rhythmic sampling frequency     6 4 7 Resizing The 2D Fourier Spectrum    The resizing process does not suffer from the same limitations of resampling that the pitch  shifting and rhythmic frequency range adjustment processes exhibit  since the analysis  frequencies are being changed by the same factor as the signal frequencies  Interpolation    still produces unwanted low frequency variations when the frequency 
47.  these parameters    need to be updated accordingly     The function calc_resize_properties is used to adjust the parameters of a resize process  after analysis settings are changed  If the analysis mode is changed from rhythmic to  timbral  then the resize process must be removed from the process chain entirely  using the  same operations described in section 5 1 3  If the signal representation is still in rhythmic  mode then the size of the new input 2D Fourier spectrum array must be determined and  stored in orig_size  The output 2D spectrum dimensions  size  can then be determined    using the change array and orig_size according to the equation     size   orig_size   change  5 15     Finally the new values in size are checked against min quadrant_size to ensure that the    spectrum is not smaller than the minimum allowed spectrum size     150       5 9 5 Fixed Spectrum Resize Processes    There are four fixed processes available in the software tool that allow resizing of the spec   trum     Double Dur         Halve Dur         Double Tempo    and    Halve Tempo     The operation of  these processes is defined by the functions process_dur and process_tempo  which both  have the Boolean input argument double to determine whether to perform the respective     Double    or    Halve    operation  These processes were written to investigate the effects of  tempo and rhythmic duration adjustments without the influence of interpolation error   Doubling and halving of the 2D spe
48.  two opposing strategies   scanning and probing  Scanning methods have a predefined path for data sonification often  with a fixed scanning rate  Probing methods allow variation of the speed and location of  data pointer during sonification  which gives a much more flexible sonification  Scanning  methods can easily be inverted in order to visualise a sound and so are the desired technique    for this project  At any point in the image representation there could be multiple values    21    i e  for a colour image  a sonification method can handle these values in any number of    ways     2 5 1 Raster Scanning    Raster scanning is a very common method of producing or recording a display image in  a line by line manner  used in display monitors and also communication and storage of  two dimensional datasets  The scanning path covers the whole image area reading from    left to right and progressing downwards  as shown in figure 2 5     Raster Image       Legend  Scan Line    gt   Return Line       gt              Figure 2 5  Raster Scanning Path    Raster scanning has recently been applied to audio visualisation and image sonification  techniques  39   it allows a simple one to one mapping between an audio sample and an  image pixel  The process is perfectly reversible allowing it to be used as part of an analysis   synthesis system  It provides two dimensions of time as opposed to the STFT  time     frequency  and the 2D DFT  frequency frequency      Image sonification is per
49.  width  Timbral mode  sets the raster image width so that the frequency bins of the audible axis correspond to    harmonics of the fundamental pitch frequency     4 5 2 Signal Frames    The initial analysis divides the audio signal into one or more frames  however the definition  of a frame differs between the two modes  In timbral mode  the signal is divided into into  sub sections called frames before rasterisation  Each frame is subjected to 2D analysis and  has its own raster image and 2D Fourier spectrum  The width of the raster image for each  frame is determined by the calculated pitch of that signal sub section  The reasoning for  this methodology was that an audio signal could contain more than one note  and it might  be necessary to divide the audio into several frames  each with a different width defined    by its pitch  to get an accurate analysis     In rhythmic mode the frame is synonymous with a row of the raster image and there is  only one raster image and one 2D Fourier spectrum  The whole signal is analysed for its  frequency content  describing the rhythmic variation of audible frequency throughout the  audio data  It is assumed that although pitch is likely to vary within an audio signal     tempo will remain constant for any signal imported into the software tool    4 5 3 Synchronisation Option    Within each of these two analysis modes there is a synchronisation option that helps to  define the conversion into two dimensions  Timbral mode offers tempo s
50. 0 5 1 15 2    0 0  Frequency  Hz  x10  Frequency  Hz  x 107     c  Violin Note  G3  Arco   d  Violin Note  G4  Arco     Figure 6 10  Similar 2D Spectral Form of Instrument Notes of Varying Pitch    6 3 2 Limitations of Integer Raster Image Width    Figure 6 10 also demonstrates the limitations of the signal sample rate in accurately syn   chronising the analysis to the signal pitch  The actual pitch of an audio signal will rarely  correspond to an integer period size and therefore the audible frequency analysis cannot  be guaranteed to synchronise the centre frequency of audible frequency analysis bins with    harmonic components of a signal     The spectra in figure 6 10 show skewing according to the angle of the waveform similarities    in the raster image  Figure 6 11 shows the raster image for the violin note G4  at the    168    nearest integer width to its actual pitch period  This representation clearly does not  represent the fundamental period of the signal in each raster image row since the waveform  is not vertically aligned  The corresponding spectrum of this signal  shown in figure 6 10d   demonstrates this skewing effect when compared to the correctly pitch synchronised violin    signal of figure 6 10c     Raster Image    100  200  300  400 f  i 500  soo  700  800  900    1000       Figure 6 11  Raster Image of a Violin Note  G4  Arco     6 3 3 Tempo Synced Timbral Analysis    Tempo synced timbral analysis allows the user to specify a frame size according to a t
51. 7 DB MD o       _        2  1 0  Audible Frequency  Hz  x 107    Bypass    Figure 5 3  2D Spectral Filter GUI    The callback functions of the GUI   s button and text edit objects update the appropriate  filter parameters in the cur_process structure  section 5 1 4  and then call the function  filter_settings_changed  This function checks whether the cutoff frequency is still in  the correct range  between max_cut and 0  If the filter type  ftype  is set to band pass  BP   or band stop  BS  then the bandwidth is also limited to its range  between min freq_step    and max_cut  The text label next to the cutoff value is set to    Cutoff Freq    for a low     123    pass or high pass filter and    Centre Freq    for a band pass or band stop filter  The text  edit objects for cutoff and bandwidth are updated with the current cutoff and bw values   then the image plot of the filter   s frequency response is calculated and displayed  Finally  the process_changed variable is set to true to indicate the filter parameters have been  adjusted  section 5 1 5  before the handles structure is stored in the filter figure   s GUI  data     The filter image is created and displayed using the create_filter_image function  The  filter   s frequency response is calculated using the function calc_filt  which is also used to  run the filter process and so is described in section 5 2 4  The filter image is created at the  size of the 2D spectrum of the frame indicated by either max_rhyth_ frame or
52. Pitch Shifting Using the 2D Fourier Spectrum in process_pitch_change    5 7 4 Fixed Pitch Shifting Processes    As well as the    Pitch Shift    process  there are also two    fixed    pitch shifting processes      Up Octave    and    Down Octave     As section 5 1 4 stated  fixed processes do not require a  create_  or adjust_  function  since they do not offer any adjustable parameters  The  function process_octave defines the operation of these two processes  with the Boolean  input argument up determining the direction of octave shift  These two processes were  written to investigate the quality of the pitch shifting process described above  without  the effects of interpolation error  A pitch increase of an octave can easily be obtained by  inserting a zero between each of the audible frequency points  using only the points up  to half the maximum audible frequency  F  4   A pitch decrease of an octave can easily  be obtained by removing every other audible frequency point and padding the quadrant   shifted Fourier data array to the original size with an equal number of zeros on either side   Listing 5 21 shows the operations of the process_octave function for each 2D spectrum  frame  Each pitch change is performed as an array operation in one line  using the up    variable to determine the appropriate equation        width   size handles data FT frame    2     height   size handles data FT frame  1      139          oldFT   fftshift  handles data FT frame      newFT   z
53. Two Dimensional Fourier Processing of    Rasterised Audio    Chris Pike  cwpo00 york ac uk    June 13  2008    Abstract    A comprehensive software tool has been developed in Matlab to enable two dimensional  Fourier analysis and processing of audio  using the raster scanning process to obtain a  two dimensional signal  An understanding of the two dimensional frequency domain rep   resentation has been acquired  and appropriate techniques for accurate analysis via ras   terisation investigated  Audio transformations in the two dimensional frequency domain  were explored and some interesting and useful results obtained  The rasterisation method  is however limiting when performing two dimensional Fourier analysis and processing of    audio     ACKNOWLEDGMENTS  I would like to thank   e Jez Wells for his constant support  understanding and enthusiasm   e Yongbing Xu for his useful comments after the initial project report     e Jason Gallicchio for his friendly and helpful advice on complex data to colour con     version     e Christopher Penrose for his innovative perspective on audio signal processing     Contents    1 Introduction    1 1   2    Project Aimo ss 4S a coal t Se a antes Se aa E heey a he el Re a    Report Structure     2   ck 25 on ek Ue ang en ee og a oy ee    2 Background Information    2l    2 2    2 3    2 4  2 5    Fourier Techniques  2 0 fetens ed hie Dove  dnis bn wy  hop arte a en EOE A ik eo  2 1 1 One dimensional Discrete Fourier Transform            2
54. a for the pitch shifted FT array is calculated from the original FT array by resam   pling the data rows  Matlab   s interp1 function to perform cubic spline interpolation to  resample the data  The arrays x and xi are calculated to give the input and output indices  of the row signal for the interpolation  The sample point arrays and the original row of  the FT array are input to the interp1 function  which returns the pitch shifted row  The  2D Fourier data must be quadrant shifted before and after the resampling to ensure that  the data is correctly processed The interp1 function is set to insert zeros when the output  sample points are outside the range of the input function  This was chosen as opposed to  extrapolation  since the audible frequency envelope is likely to be too complex to extrap   olate accurately for an upwards pitch shift  bearing in mind that shifting the pitch down    an octave would mean that half of the audible frequency information would have to be    138    calculated by extrapolation             perform pitch shift on FT once fftshifted  width   size handles data FT frame    2     height   size handles data FT frame  1    oldFT   fftshift  handles data FT frame       newFT   zeros width  height      x   linspace    1 1 width    xi   linspace    change      change   width     row   l height   newFT   row    interp1 x rot90 oldFT row     xi    spline     0    handles  data FT frame    rev_fftshift  rot90 rot90 rot90 newFT              Listing 5 20  
55. a ote Bs Ma he ee Ea ee ea Se ait 54  4 2 1 Raster Image    se ae hd ae a Bs ee Sa Oe Bk oe og 54    4 3    4 4    4 5    4 6    4 7    4 8    4 2 2 Implementation of Rasterisation   2   2      che dale ea bal 54    4 2 3 Raster Image Width    pikes asa  ce he gerd Exe beg eR ep gr ek 56  4 2 4 Limitations of Raster Image Width                    56  4 2 5 Displaying The Raster Image                   0 4  58  Two Dimensional Fourier Transform                   00    59  4 3 1 Obtaining the 2D Fourier Spectrum in Matlab              59  4 3 2 Analysis of the 2D Fourier Transform       4 266 404 2 A eee 59   Signal Components With Audible Frequency Only           61   Signal Components With Rhythmic Frequency Only           61   Signal Components With Audible  amp  Rhythmic Frequency         62   Analysis of The Complex DFT                    4  63   Signal Components Not Aligned With The Raster Image Dimensions 66   Effects of Rectangular Windowing                    67  4 3 3 Signal Analysis Using The 2D Fourier Spectrum             69  Two Dimensional Spectrum Display                        70  4 4 1 Colour Representation of Polar Data                   71  4 4 2 Spectrum Display Options vs 0  6s eh Se he a x oe ee 74  44 3 Brightness  amp  Contrast of Display          4 40 dacala da gales 75  AAD  Zero Padding    eters  seed  sr Ue  Behe Gh as Ow Od Qe  Was SG 76  Audio Analysis Options   veg    a dey  oe eg Ge BRE Se od PE 78  ADL  Analysis Mode ses oak os a ge  Ge a Rok 
56. a raster image of audio signals  however during this  investigation it has been established that any periodicity within the signal will provide  a useful raster image  This period could be harmonically related to the fundamental  pitch period or alternatively it could be larger  corresponding to a rhythmic periodicity  in an audio signal  Figure 4 1 shows two raster images  one of a cello playing a C  at  approximately 69 Hz with the width corresponding to the fundamental period and the  other of an electronic drum beat at a tempo of 120 bpm  beats per minute  with the width    corresponding to one crotchet quarter note beat     Correct assignment of the raster image width is important during 2D Fourier analysis  The  2D Fourier transform analyses the sub sonic frequency variation in the audible frequency  bins of each row  see section 2 2 7  If the image dimensions correspond to a periodic element  of the signal then the Fourier data representation will be clearer  These details are fully    explained in section 4 3     4 2 4 Limitations of Raster Image Width    By the current definition each pixel in the raster image corresponds to a single audio  sample  but it is rare for a pitch to correspond to an integer sample period and often a  rhythmic duration such as a single beat will not be represented by an integer number of  samples either  However an image file or data matrix must have integer dimension sizes  in terms of pixel sample units so the periodicity within the sign
57. ach  frame display within the signal  hence a loop is set up to display and capture each frame  before redisplaying the original frame  indicated by the cur_frame variable  Listing 3 4    shows this display  capture and export loop for the raster image     47             for frame   1 nframes  plot_image  handles   data  image  frame        handles data audio_settings nbits       handles  axes_image   handles source type     I   getframe handles axes_image     imwrite I cdata   path handles source file          _    num2str frame       tif         tiff       end  plot_image handles data image cur_frame       handles data audio_settings nbits  handles axes_image         handles source type          handles source image_path   path        Listing 3 4  Exporting Raster Image Display    3 6 3 Program Data Saving and Loading    Matlab provides the built in functions load and save to enable storage of program variables  on disk  This has been utilised in the software tool to allow the user to store the signal  data  in all four representations along with its accompanying analysis parameters  The  software can therefore load a signal and immediately enter a working state where the data    is able to be analysed and processed     The data is stored in a MAT file  however the default suffix   mat  can be changed  any  file can be treated as a MAT file when opened with the load function  A suffix was chosen  to identify files specific to the software tool   tda  two dimensional audio  
58. ack functions within app m set the properties of the zoom_obj  and pan_obj appropriately  The tools can be toggled on and off from both the toolbar and    the    Plot Tools    menu     4 8 2 Data Cursor    The data cursor tool is also provided in the default Matlab figure  however this tool has  been extended to provide a custom data tip according to the particular plot  as shown in  figure 4 13  This allows the user to extract the precise data value represented by a plot    point     The datacursormode object is instantiated in the same way as pan and zoom  by passing  in a figure handle  It   s properties can then be set to define the behaviour of the data cursor  tool on that particular figure  This tool can again be toggled on and off from either the  toolbar or the    Plot Tools    menu  and the option to view the data cursor as a pointer or in  a window is provided  as in the default Matlab figure  The    UpdateFcn    property is set in  the same way as a callback  section 3 4 2  to indicate the function that defines the custom    data cursor     The dcm_updateFcn function is called when the data cursor tool is used on any plot within  the main GUI window  It is provided with the event_obj argument which identifies the  point indicated by the cursor  The handle of the plot axes can be obtained by getting the    94    2D Fourier Spectrum    Audible Freq  978Hz  Rhythmic Freq  OHz  Magnitude  362 6096  Phase  degrees   68 6326    Frequency  Hz   i        0 5 0 0 5  Freque
59. ae we eke CR Be ni 78  4 5 2 Signal Frames    eT alae We Sai cock eS a ee Sea en Raed 79  4 5 3 Synchronisation Option    4 ob ae GR Bae OR ek ee ae 79  4 5 4 Analysis Options GUI 4 ted acini ts   aie eee Oa wae 80  Analysis Implementation A825 So be Sean cn haste aa Se Ye eet 83  46d Timbral Mod    sss  cn disain ga Wee PES eee ee eee 84  4 6 2 Rhythmic Mode Without Pitch Synchronisation             86  4 6 3 Rhythmic Mode With Pitch Synchronisation              86  4 6 4 Readjusting Analysis x  oc  oe Go eid Be as Shs BG 88  Feature STAC IOI  ike tae ey cat ae Ae gs ae ck ae hs tei Sea aa Be ge eck hg a 90  4 7 1 Pitch Detection Using The Yin Algorithm                91  4 7 2 Tempo Estimation With MIRToolbox                  93  Mista Analysis Tools die pisete de  AA te fee  deans be Bee SE A En 93    4 9    4 8 1 Plot Zoom and Pan   oaoa 6 26 bo HG Ww ARS al Oars ese eee 2 ah  4 827 Data CUrsor maeth ra arp  bev  oP a Rohe el gs Ak ee OE Ap ore N  4 8 3 2D Fourier Spectrum Legend  oe a gay 28 ie Shea A ee  Resynthesis      608 one el oe et as nog en Ba oo ck he Oa a dR ad  4 9 1 Derasterisation      Se aidd ws Ja eS ee oe Rl BS ee Bak ee  420023   Timbral Mode    fy WS ht oe eke ke E Sek Ae SE OR ES  4 9 3 Rhythmic Mode Without Pitch Synchronisation              4 9 4 Rhythmic Mode With Pitch Synchronisation                 5 Two Dimensional Fourier Processing of Audio    5 1    5 2    5 3    5 4    5 9    2D Fourier Processing Overview in MATLAB Tool                5 1 1
60. ailed de   scription of the temporal development of a signal whilst yielding no information about the  frequency content  Fourier signal representations present the signal as a sum of infinite  stationary sinusoids  which yield no information about time varying events without con   verting back to the time domain  By using a time frequency analysis framework  in which  components are discrete in both time and frequency  a more intuitive analysis of the signal  characteristics is possible  Non stationary sinusoidal components are more familiar to the    way we understand and listen to sounds     2 2 2 Short Time Fourier Transform    The short time Fourier transform  STFT  is an adaptation of the Fourier transform that  enables a time frequency representation of audio  It operates by dividing the original long   time signal into many short frames  typically in the range of lms to 100ms in duration  27    which are shaped in amplitude using a chosen window function  The DFT can then be    used to analyse the spectrum of each of these frames thereby producing a representation    14    of the variations in the spectrum of the signal over time     The STFT equation  9  can be viewed as the DFT of an input signal x  m  of arbitrary  length M  multiplied by a time shifted window function h  n     m  of length N     y  So  2 mlh n     me FOr Nem yy  0 1     N 1  2 10     The output function X  n v  has N frequency points  indexed by v  for each of the time    windows at a discrete tim
61. al   handles  processes  process  process_num  thresh xmax_tot    for row   1 size mag 1   if handles  processes  process  process_num   below_thresh  if totals row   lt  thresh_val  mag row      zeros 1 size mag 2      end  else    if totals row   gt  thresh_val    130       mag row      zeros 1 size mag 2      end  end    end       Listing 5 17  Thresholding Rhythmic Frequencies    Once the new magnitude component has been determined  it is recombined with the  phase to produce the new complex FT array using equations 5 1 and 5 2  followed by    the rev_fftshift function     5 4 2D Fourier Spectrum Rotation    This process lets the user rotate the 2D Fourier spectrum in 90   increments  When rotations  of 90   or 270   are applied the frequency axes of the 2D spectrum are swapped  The original  rhythmic frequency bins can be redefined as audible frequencies in the range  0 F  2  and  the original audible frequencies would then define a new range of rhythmic frequency    variation in the vertical axis     5 4 1 Spectral Rotation Parameters    There is only one parameter for the rotation process  apart from the type and bypass  variables used by all adjustable processes  This is the rot_ind parameter which describes    the rotation of the spectrum  r  by the equation     r   rot_ind   90    5 8     The default value of rot_ind is 1  which is defined when the rotation process is created in    create_rot_spec     5 4 2 Adjusting the Rotation    The adjust_rot_spec function crea
62. al cannot be represented    precisely     56        a  Cello C  at   69 Hz   Fundamental Period   Width  640 samples    Ve REL WER TERE    if  i WEEE EE  MUUE E      il ULE REITER REESE    il WEE REET  MILLETI    UUTE EELE   TOIU  a WINTER EERE     UUTE EEE  MUUE EE E       al ITU LITRE E REESE     b  Electronic Drum Beat 120 bpm   Quarter note Beat    Width 22050 samples       Figure 4 1  Raster Images Demonstrating Different Periodicities In Audio Signals    The idea of using interpolation to obtain an integer period  adjusting the sample rate  of the data  was considered  However this was deemed an unnecessary extension of the  program requirements  because it would have added a lot of computing overheads  More  importantly  it was decided that at this stage in investigating the properties of 2D audio  analysis  the signal periodicity could be appropriately approximated using only integer  periods since it was more important to investigate the potential of this signal processing    paradigm and understand its properties than to spend a lot of time fine tuning individual    57    processes  It only becomes a large problem at periods of less than 50 samples  where the  jump in pitch for a width change is  gt  20 Hz  The corresponding frequencies are very high   approaching the limits of human hearing and it is much less common for musical notes to  be in this frequency range  It would not be difficult to incorporate sample rate changes  into the software to allow more accura
63. al development of sound and enables the construction of    time limited events using stationary sinusoidal components     17    Time domain Signal       v    Windowing             STFT  Phases  Phase Calculation    g   J    Inverse Inverse  STFT FFT      Overlap add             Time domain Signal    Figure 2 1  Phase Vocoder Functionality  after  8     The ability of the phase vocoder to determine the deviation from centre frequency of each  analysis bin allows effective stretching and compression of time without affecting pitch     The frequency changes are calculated on a different time basis and then the signal is    18    resynthesised using the inverse STFT  A diagram of the processes involved in a phase    vocoder is shown in figure 2 1     It is worth mentioning other sound transformations using the phase vocoder since these    may serve as inspiration for two dimensional transformations     Frequency domain filtering   It is possible to alter the amplitude spectrum of a sound  by multiplying it with the transform data of a filter function  29   Time variant  filtering is possible and the spectral variations of one sound can be used to filter    another  this is a form of cross synthesis     Pitch shifting   It is commonly performed by time stretching then resampling the  signal to get an output of the same length as the input  However Dolson suggested a  method of formant preservation using the original magnitude spectrum to shape the  shifted one  9   There are more 
64. alc_filter_properties is used to determine a series of filter properties  that help to define the range of values that cutoff and bw can take when being adjusted  by the user  Table 5 2 shows the variables obtained by this function  which are stored in    the process structure        Variable Description          rhyth freq 2D array of rhythmic frequencies represented by the 2D Fourier    spectrum of each signal frame        aud freq 2D array of audible frequencies represented by the 2D Fourier spec     trum of each signal frame        max_rhyth_frame   The frame number of the 2D spectrum that represents the highest    maximum rhythmic frequency        max_aud_frame The frame number of the 2D spectrum that has the maximum num     ber of points in the audible dimension        max_cut The maximum allowed value of cutoff  the filter   s cutoff frequency   in the currently selected frequency axis and also the maximum fil   ter bandwidth  bw  This variable is determined by the maximum  frequency in the appropriate axis represented by the 2D spectrum    of any signal frame        min_freq_step The minimum frequency interval between adjacent bins on the cur   rently selected frequency axis of the 2D spectrum of any signal    frame  This is the minimum value of the bw variable                 Table 5 2  Filter Data Obtained From The 2D Fourier Spectrum    The calc_filter_properties function determines the frequency arrays corresponding to  the centre of each frequency bin on both axe
65. alexMath sin 3 phi       float  2 Math PI       Correct for the    banding    in HSB color where RGB are wider than the  others  int rgb   Color HSBtoRGB hue  sat  brightness      return rgb        Listing A 1  Gallicchio   s ComplexColor Class  http   www brainflux org  java tar bz2     A 2    Two Dimensional Fourier Processing of Audio    This section reproduces the e mail correspondence with Christopher Penrose about 2D  Fourier processing of audio  Chapter 2 of Penrose   s incomplete thesis  23   is available on his  personal website  It was this work that first suggested the use of the 2D Fourier transform  to adjust sub sonic frequency content of audio  Penrose discussed the developments of his    work on 2D Fourier processing in the e mail shown below   Hi Chris     I have written a few different sound processors that use 2d Fourier transforms  They  were developed for the UNIX shell  Unfortunately  most of my 2d processors were written  using Apple   s veclib so they aren   t very portable  But your screenshots on your website  make it look like you are using OSX yourself  Some of these processes would have been  documented in my thesis  but I still haven   t finished that       I am working on a starting  a music software company at the moment and I am getting its first application ready for    release  And there is a long line behind that application already formed     bad news for    211       my thesis        I really believe that 2D Fourier transforms have intere
66. along with their  default values that are applied when the filter process is created  using the create_filter  function  These variables will be stored in the structure within the process array that    corresponds to the filter process     Aside from these options  the parameters that define the filter process are the cutoff fre   quency and if the filter is a band pass or band stop  the bandwidth of the filter  The  cutoff frequency actually corresponds to the centre frequency of a band pass or band stop  filter  The properties of the 2D Fourier spectrum have to be analysed to calculate the  range of values that the cutoff centre frequency and bandwidth of the filter can hold  This  is done using the function calc_filter_properties  which is called when the filter is  created in create_filter and also whenever the properties of the 2D Fourier spectrum  are changed  see section 5 1 7  The Boolean parameter new is given as an argument in  calc _ filter properties to identify between the two situations  If new is true  then the  function has been called from create filter and the default cutoff and bandwidth of the  filter need to be set  The cutoff centre frequency  cutoff  is set to the maximum rhyth     mic frequency represented by the 2D Fourier spectrum of any signal frame  the filter is in    118    rhythmic frequency mode by default   The bandwidth  bw  is set to two times the mini   mum frequency interval between adjacent bins on the rhythmic axis of the 2D spectrum of 
67. am  nae   OER  5 6   Lar    Where n is the filter order  order   Fo is the cutoff centre frequency  cutoff   F is the  frequency array  freq_array   W is the filter bandwidth  bw   H is the filter   s frequency    response and u is the index within the arrays     Once the frequency response has been obtained  the filter   s keepDC variable is inspected    126       and if required  the appropriate DC points are set to 1 in filt_image  as shown in listing  5 16  This functionality was incorporated to mainly for analysis purposes  to investigate the    effects of retaining the DC rhythmic frequency row during  for example  high pass filtering           switch process_data keepDC  case 1  mid_row   ceil size filt_image  1   2      mid_col   ceil size filt_image  2   2      filt_image mid_row  mid_col    1   case 2  mid_row   ceil size filt_image  1   2    filt_image mid_row      ones 1 size filt_image  2     case 3    mid_col   ceil size filt_image  2   2    filt_image   mid_col    ones size filt_image  1   1      end          Listing 5 16  Retaining the DC Component When Filtering    If cut is set to false  the filter frequency response is adjusted when returned from calc_filt  to process filter  to give the pass band frequencies a 20 dB boost whilst leaving stop band  frequencies unaltered  This is achieved by the following equation    filt_image    filt_image   1     10  5 7     Such a significant amplitude increase is applied so that the effects of boosting frequency  may be
68. an be used to convert from a monophonic audio signal to a     grayscale image          array   the 1D data representation    width   the required width of the 2D output        image   the 2D data representation    hop   width    calculate the required image size  height   ceil length array   hop      create an empty matrix  image   zeros height   width      rasterise data  i     add in this line for when image has only 1 row  if length  array   gt hop  for arrayindex 0 hop length  array     width   1  image i      array  arrayindex 1 arrayindex width     i   i l   end  arrayindex   arrayindex hop   rem   length array     arrayindex   image i l rem    array  arrayindex 1 length array     else  image 1 1 length array     array     end          Listing 4 1  Rasterisation Process    The last row of the image must be assigned separately since it is likely that it will not be  entirely filled by the remainder of the 1D array  If this is the case  the unassigned samples  will all be zero valued and appear grey  since the image data has the same value range as    the input data  this is equivalent to adding silence to the end of an audio signal     59       4 2 3 Raster Image Width    A raster image is most useful when the width of the image corresponds to a periodicity  within the audio signal  The variation of this periodicity over time can then be observed  much more easily than using an audio waveform display  In  39  the fundamental period  of an audio signal was used to obtain 
69. aring  Fig   ure 6 21 demonstrates the operation of this process by showing the result of applying it to  loop120 wav  The resulting signals are barely audible and it is difficult to distinguish the  expected change in rhythmic frequency characteristics as a result  The operation could  be improved by combining downward pitch shifting with logarithmic rescaling of audible    frequencies to get a more audible result     183    2D Fourier Spectrum    Frequency  Hz   5    0 2    0 4    0 6    0 8        2  1 5 al  0 5 0 0 5 1 1 5 2    Frequency  Hz  x10     Figure 6 21  2D Fourier Spectrum of loop120 wav After Quadrant Shifting    184    Chapter 7    Testing the Matlab Tool    Since this was an investigative project  the software development process was not clearly  divided into phases of design  implementation and testing  but this does not mean that the  design and testing was non existent  Chapter 3 introduced the basic features of the software  design and this chapter describes the testing processes used during the development of the    2D Fourier software tool     The software tool has developed into a reasonably large application over the course of this  investigation with many different functions and huge range of data paths and processing  options  Whilst programming a constant effort was made to produce a robust and efficient  design  however more importance was placed on achieving the aims of the project than    exhaustively testing the software tool     Over the cours
70. as been applied  to the signal        processes The chain of 2D Fourier processing operations with all parameters  that define them        plot_settings All variables that define the display options for the data plots in  the main GUI        player_settings   Variables used to control the operation of the audio player        frame_settings   Data that allows individual frames of the signal to be displayed             separately    source File path  name and extension of the data source and file paths for  data I O    opt Any other program options  Only the    auto normalise    option is       currently in this structure        Table 3 1  Description of Sub level Data Structures    40       When a temporary GUI window is created  such as the analysis settings window  section  4 5   the object handles for the figure and all of its components are stored within a sub     structure of handles which is deleted when the figure is closed     3 4 1 The data Structure       audio_settings   lt 1x1 struct gt         audio   lt double array gt     image data   lt cell array gt   lt 1 x 1 struct gt     FT   lt cell array gt                     analysis   lt 1x1 struct gt         analysis_settings   lt 1x1 struct gt                spec2D_settings   lt 1x1 struct gt          Figure 3 4  The data Structure    This structure contains the signal data and all related parameters  On the left hand side of  figure 3 4 are the various representations of signal data that need to be stored  The audio
71. ating a GUI window as shown in figure 5 4  This window allows the  user to set the process options using push buttons defined as button groups in code  The  threshold value is set using a slider object with the value range  0 1  and the text next to  the slider displays this value as a percentage  rounded to the nearest integer   When any  of the options are adjusted  or the threshold slider moved  the object callback functions set  the appropriate variables in the cur_process structure and the process_changed variable    is set to true to indicate that the settings have been adjusted     128       Variable    Data Type    Default Value    Description          type    String    proc_names 3          Threshold       The name of the process  obtained from  the proc_names array  Used to identify    the process as magnitude thresholding        bypass    Boolean    false    Determines whether or not the thresh   olding process should be bypassed  when the chain of transformations is    run        ttype    String       rhythmic_freq       The type of frequency component that  is processed  Either single spectrum  points with defined rhythmic and au   dible frequency     single point      spec   trum rows with defined rhythmic fre   quency only     rhythmic_freq     or spec   trum columns with defined audible fre     quency only     audible_freq            thresh    Double    The magnitude threshold value given as  a proportion of the maximum magni     tude with the range  0 1      
72. av    223        Process  Down Octave  e trumpetG3_halverhyth wav       Process  Halve Rhythm  e trumpetG3_specshift wav       Process  Spectrum Quadrant Shift  e trumpetG3_upoctave wav       Process  Up Octave    Pitch Shifting    These audio files demonstrate the results of the pitch shifting process   e loop120_quartwidth_up12 wav      Source file  loop120 wav      Analysis Mode  Rhythmic      Raster Image Width  One quarter note      Pitch Shift   12 semitones  e simple120_eighthwidth_downd wav        Source file  simple120 wav    Analysis Mode  Rhythmic      Raster Image Width  One eighth note    Pitch Shift   5 semitones    e simple120_eighthwidth_up9_psync wav    Source file  simple120 wav      Analysis Mode  Pitch Synchronous Rhythmic        Raster Image Width  One eighth note    Pitch Shift   9 semitones  e simple120_eighthwidth_up9 wav        Source file  simple120 wav    224        Analysis Mode  Rhythmic      Raster Image Width  One eighth note      Pitch Shift   9 semitones  e simple120_quartwidth_up9 wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One quarter note      Pitch Shift   9 semitones  e trumpetG3_down7 wav      Source file  trumpetG3 wav      Analysis Mode  Timbral      Pitch Shift   7 semitones  e trumpetG3_upd5 wav      Source file  trumpetG3 wav      Analysis Mode  Timbral      Pitch Shift   5 semitones  e trumpetG3_up36 wav      Source file  trumpetG3 wav      Analysis Mode  Timbral      Pitch Shift   36 s
73. ave to be manually created     3 3 1 Main Window    The main window for the 2D Fourier tool was designed using the GUIDE tool  because it  allowed easy and immediate GUI development  It also provides a visual design interface  so that design and development can be combined  The layout of the main window focuses  around the graphical displays of the four data representations  section 3 1 1   They have  been positioned to separate both the time and frequency domains  and the one and two  dimensional representations as shown in figure 3 1  GUIDE saves the GUI design as a  FIG file and the M file of automatically generated code  the main window is defined by  app fig and app m     36    Time Domain Frequency Domain          Audio Waveform Display    One Dimensional Fourier Spectrum    One  Dimensional             Raster Image Display    Two Dimensional Fourier Spectrum    Two  Dimensional                                                          1       Figure 3 1  Plot Layout for User Interface    The GUI for the main window is displayed in figure 3 2  Below the four data plots there are  several controls  On the right hand side there are two panels containing the audio player   section 3 7  and the 2D spectrum display controls  section 4 4   On the left of the audio  player are the frame selection buttons that allow the user to select which frame of data is  displayed in the plots  The    Processing    button opens the processing pop up window that  provides access to 2D Fourier
74. be solved using 2D implementations of the techniques  used to improve spectrum resolution in one dimension  12   The centre frequency of the  points on both frequency axes of the 2D spectrum is defined by the width of the raster  image so an informed choice of raster image width has to be made based on signal analysis    in order to obtain a clear 2D spectrum     4 4    Two Dimensional Spectrum Display    The 2D Fourier spectrum plot needed to represent the complex Fourier data matrix clearly   allowing the user to take in a lot of information at once  Fourier data is more easily under   stood in the polar representation which is standard in 1D Fourier analysis  so the magnitude  and phase components are extracted  The function calc_spec2D  shown in listing 4 3 was  written to calculate the spectrum display data  from the complex Fourier data stored in  the FT cell array in the data structure  The Fourier transform data is first shifted using the  built in fftshift function to place the DC component in the centre of the matrix  Then  the angle function returns the phase component and the abs function then returns the  magnitudes  The polar representation is converted to an RGB colour matrix as described    in section 4 4 1           function ret   calc_spec2D  FT  brightness   contrast  mode  ret_mag     PLOT_SPEC2D calculates a 2D spectrum given the Fourier transform data       get the magnitude component  centred   mag   abs fftshift  FT        normalise   max_mag   max max
75. ble or rhythmic  frequencies of the 2D spectrum components  since the period length rarely corresponds  to an integer number of samples  Resampling should have been performed before the 2D    Fourier transform to ensure synchronised and accurate analysis     The 2D Fourier domain processing experiments performed in this project have shown mixed  results  Filtering and magnitude thresholding of the 2D spectrum data provides unique  and musically useful results when applied to the rhythmic frequency dimension  altering    the sub sonic structure of the signal without affecting its audible frequency range  In    197    the audible frequency dimension  these processes operate in a similar manner to their 1D    equivalent     Time pitch modifications of the signal that involve resampling of the spectrum data have  shown poor results  This is due partly to the analysis process which was not designed to  ensure these processes were possible  The resolution of the analysis frequencies is generally  too low to allow accurate resampling of spectrum data  It is believed that there may  still be potential for independent rescaling of rhythmic and audible frequencies to produce  useful results  especially when the rhythmic oscillations of signal harmonics are adjusted  after pitch change  It appears that there are also issues with the aliasing of rhythmic    frequencies  which may be unavoidable     The use of rasterisation in 2D Fourier analysis is essentially a restricted implementation
76. c2D_init as described in section 4 3 1   If there are 2D spectral transformations applied to the signal  they are then be processed   this is discussed in section 5 1  It then calls the loaded function to set up the GUI and  display the data  at which point the importing of the audio signal is complete     It is possible that the user might want to adjust the analysis options after observing the  signal display  The main GUI presents the    Reanalyse    button  which simply calls the  analyse_audio function again to bring up the analysis settings GUI and allow the user  to make adjustments to the settings  It proceeds through the new analysis process and  redisplays the signal  The analysis settings GUI also contains a    Cancel    button which  aborts reanalysis and returns to the original settings  or if it is the initial analysis the    software aborts the audio signal import     After the signal analysis  each signal frame   s detected pitch can be adjusted from within  the    Info    GUI window  shown in figure 4 12  This GUI displays signal information such  as duration  data range  sampling rate and bit depth  as well as allowing observation and  editing of frame specific pitch analysis information in timbral or pitch synced rhythmic  mode  The reason for this functionality is to correct any errors by the pitch detection  algorithm and also allow experimentation with raster widths other than the fundamental    period value     88    e000  lt Student Version gt    Signa
77. cates that initial signal analysis is required to obtain a valid raster image  by setting the appropriate image width  It was assumed that this would affect the 2D  Fourier transform data and so the software tool also needed to allow definition of the  initial settings for analysis  The user cannot be expected to know the signal characteristics  such as pitch and tempo so feature extraction algorithms were also required within the    software as part of this initial analysis     3 1 3 Signal Transformations    The project includes an investigation of a variety of audio transformations obtained by  manipulating the 2D Fourier domain data  The software had to incorporate these processes    and allow the user to apply them to the audio signal     Initially some key requirements of the transformation capabilities were decided  It had to  be possible to adjust parameters of the processes once applied  so the processing could not  destructive  Chaining of multiple signal transformations was required to provide a useful  and flexible creative tool  Finally  the ability to switch between the unprocessed and  processed signals was an important requirement both for the theoretical analysis of 2D  Fourier transformations and as a reference during creative or experimental transformation    of audio     3 1 4 Additional Tools    It was important to consider any further tools that may be required by users  either com   posers or audio analysts  Data input and output is an important feature
78. ce none    of the analysis frequencies correspond precisely to rhythmic variations in the signal     6 2 2 Limitations of Integer Raster Image Width    After experimenting with audio signals in rhythmic analysis mode it is apparent that the  data sample rate limits rhythmic synchronisation in the 2D Fourier analysis  The raster    image width must be an integer since no resampling functionality has been provided     A very limited set of tempo values give integer image widths  for example a quarter note  duration at 121 bpm corresponds to a period of 21867 768 samples  to 3 decimal places    Even tempos that do allow integer image widths  reach their limit as the rhythmic duration  of the image width is decreased  At 120 bpm  integer widths cannot be specified at a    duration one sixteenth note or less     Within the software tool  when the required raster image width is not an integer  the    163    nearest integer value is used  The analysis is therefore no longer perfectly rhythmically   synchronised and the centre frequencies of the rhythmic axis points no longer match sub   sonic sinusoidal variations of the signal  Hence the clarity of the 2D Fourier spectrum    analysis is reduced    This is issue is demonstrated in figure 6 8  Figure 6 8a shows the 2D spectrum of the  audio signal loop120 wav for a row width of an eighth note  which can be represented by  an integer  The spectrum in figure 6 8b represents the same signal but with a required  row width of a sixteenth 
79. ch timer  The use of tic  toc is shown  in listing 7 4  The function code to be measured is inserted at the point     oany statements        and the variable t returns the elapsed time in seconds           tic  Zany statements     toc    S COC       Listing 7 4  Use of tic  toc to Measure Code Performance    The two implementations of the YIN algorithm  yinpitch  M file  and yin  MEX file   were both run multiple times for a series of different audio signals  This is because the  function performs a different number of operations depending how quickly the autocorrela   tion process identifies a value below the tolerance  Both functions used an input tolerance  of 0 15  the recommended default in  3   which gave a satisfactory compromise between  execution time and accuracy  Both functions were run 10 times for each audio signal and    the results of this testing are shown in table 7 1     191                                           Duration l Execution Time  secs    Speed  File Pitch  Hz         seconds  M File MEX File Increase      sinel6 wav 0 9977 219 9556 0 1306 0 0194 673 2  squarel6 wav 0 9977 219 9501 0 1301 0 0196 663 7  am_bipolar wav 2 219 9611 0 2588 0 0387 668 7  pianoC1 wav 2 8669 65 7846 1 2378 0 1824 678 6  clarinetC6 wav 2 0142 2018 3066 0 0309 0 0059 520 5                Table 7 1  Performance Comparison For Implementations of YIN Algorithm    The set of test signals was chosen to cover the common range of duration and pitch expected  within the software t
80. colour  Another web demonstration provides a Java applet to demonstrate the relationship  between an image and its Fourier spectrum  15   allowing the user to modify the image or  its spectrum and observe the effects on the other representation  This demonstration also  shows magnitude and phase information in a signal display  this time using brightness to    represent magnitude and chromacity for the phase     23        a  Unshfited Magnitude  b  Unshifted Phase        c  Shifted Magnitude  d  Shifted Phase    Figure 2 3  Shifting of the 2D Fourier Spectrum  after  31     Each point of the frequency domain representation X  u v  corresponds to a sinusoid  how   ever since an image is based in the spatial domain rather than time  the understanding of  this component is different  The point  0 0  corresponds to the DC value  which is average  intensity value in the image  All other points refer to two dimensional sinusoids  which  have both frequency and direction  31   This is demonstrated in figure 2 4  which shows  an image with only a DC value and two images containing a sinusoidal waves of different  frequencies and directions  Note that the frequency representation in this figure has not  been shifted  so the point  0 0  is in the top right corner and high frequencies are located    near the centre of the spectrum whilst low frequencies are at the corners     24    Time Domain Frequency Domain          column column  0 N 2 N 1 0 N 2 N 1  0 0     M 2    M 2  M 1 M 1  col
81. ctrum dimensions can be performed simply by insert   ing zero values between each point and removing every other point  Listing 5 27 shows the  array operations that define the new Fourier data for the    Double Dur     and    Halve Dur        processes within the process_dur function           if double  newFT   zeros  height 2     1   width    row   l height    newFT  2 row    1       oldFT row       else  newFT   zeros ceil  height  2    width     row   1  ceil height  2     newFT row      oldFT  2 row     1        end  handles data FT 1    rev_fftshift  newFT          Listing 5 27  Doubling Halving Rhythmic Duration in process_dur_resize    The    Double Tempo    and    Halve Tempo    processes are performed using the same opera   tions on the width  rather than the height  However    Double Tempo    requires halving of  the spectrum width and    Halve Tempo    requires the width to be doubled  as shown in  listing 5 28           if double  newFT   zeros  height   ceil  width 2      col   1  ceil width 2    newFT   col    oldFT     2 col     1       else    151             newFT   zeros  height   width 2     1     col   1 width   newFT    2 col     1     oldFT   col    end  handles data FT 1    rev_fftshift  newFT          Listing 5 28  Doubling Halving Tempo in process_tempo_resize    After the new Fourier data array has been calculated the signal analysis parameters must    be updated appropriately as in the process_spec_resize function     5 10 Inversion of the 2D Four
82. d  all MIDI velocities were set to the same value to remove any subtle rhythmic variation    from the signal           Figure 6 5  A Simple Drum Loop Programmed in a MIDI Sequencer    This audio signal was then imported into the software tool and analysed at the known  tempo with a row size of 1 4 beat  The resulting 2D representations of simple120 wav are    shown in figure 6 5     160    Raster Image    Pixels               Pixels     a  Raster Image    2D Fourier Spectrum    Frequency  Hz        a    Frequency  Hz  x10     b  2D Fourier Spectrum    Figure 6 6  2D Analysis of simple120 wav With 1 4 Beat Row Size    The 2D spectrum display clearly shows that most rhythmic frequency energy contained  in the rows with rhythmic frequency centred at 0 Hz and 1 Hz  The 0 Hz row represents  the audible frequency components that are the same in every row of the raster image   The 1 Hz row corresponds to a rhythmic variation with a period of one half note  Notice  how the spectral components with 0 5 Hz and 0 25 Hz rhythmic frequency  1 bar and  2 bar periods at 120 bpm  have little energy in this spectrum  but the spectrum of the    161    audio signal loop120 wav contains a lot of energy in this row  figure 6 4a   This is because  simple120 wav is exactly repeated every half note whereas loop120 wav has varies subtly    over the four bars     6 2 1 Changing Rhythmic Frequency Range    When row size is altered according to rhythmic duration  it changes the rhythmic frequen   cies t
83. d as        V   max R G  B   4 13   pa V  min  G  B   4 14     In the HSL system  lightness brightness  L  and saturation  Sz  are defined as    min R  G  B    max R G  B                4 1  i z  4 15   max R  G  B      min R  G  B       fL lt 1 2  1 max R G  B    min R G B     7 gt  G  4 16     max R  G  B  aa min R  G  B  if L  gt  1 2     2     max R  G  B      min R  G  B   Therefore to convert HSL to HSV  the V and Sy values need to be obtained from L and    Sr  as follows     V   L 1 0   Sz  if L  lt  1 2    4 17     L S    L S  ifL gt 1 2   Sy    2   V    L   V  4 18     When Sy   1 the equations in the function polar2colour are valid  as shown in listing  4 4  The function finally calls Matlab   s built in hsv2rgb function  to convert the matrix  from HSV colour mode to RGB  Each of the HSV components must be in the range  0 1      73    however the lightness array is rescaled within polar2colour to prevent it from reaching  0  During testing it was found that a colour with a value component of 0 gives an RGB    colour of  0 0 0   no matter what the hue is  hence losing the phase information             calculate lightness from mag  lightness   atan mag brightness       2 pi      adjust range to prevent 0 value  lightness    lightness 0 001   1 001     calculate saturation and value in range  0 1   for m 1 size  lightness  1   for n 1 size  lightness  2   if  lightness  m n   lt  0 5     saturation  hsv  m n 2   1     value  hsv m n 3    2 lightness  m n    else light
84. d the processed result  after i iterations  The RMS amplitude of the noise and then the signal to noise ratio are  calculated  The signal to noise ratio for each iteration of the process is stored in the array    SNR  which is returned by process_SNR along with the processed signal           function  SNR  processed    process SNR  original  iterations   rms_orig   sqrt  sum sum sum  original  xconj original     size original  1       processed   original   for i   1 iterations    HERE THE PROCESS IS PERFORMED  processed   proc_function processed         get noise component  noise   processed   original    rms_noise   sqrt  sum sum sum noise  conj  noise      size noise  1       calculate signal to noise ratio  SNR i    20 logl10 rms_orig rms_noise     end    end       Listing 7 6  Obtaining the Signal To Noise Ratio of a Process    The SNR of the analysis synthesis process was tested by replacing proc_function with  analysis_synthesis  The audio signal loop120mono wav was used to test the SNR of the  process  The raster image width was set to 22050  which is a quarter note duration at the  signal   s tempo  This signal was chosen since it requires image padding in both dimensions    and provides rich spectral content in both frequency dimensions     The signal to noise ratio after the first iteration was 285 9285  and after the hundredth  iteration it had only fallen to 279 1797  This can be compared to the SNR of quantisation  error for 16 bit analogue to digital conversion
85. data then original data is copied    103    directly into data  If the processed data is to be displayed then data is obtained by running    original_data through the chain of 2D Fourier transformation processes     The details of the processes are stored in the processes structure  which is shown in  figure 5 1  It contains the integer variable num_proc  which indicates how many processes  are currently in the chain  and a 1D array of struct objects of the size of num_proc  This  array  called process  stores the details of each process in the chain in the order they  should be implemented  Each structure in process has the type variable which contains  the name of the process it represents  and the bypass variable which allows a process to  be removed from the processing chain temporarily  without losing its settings  The rest of    the variables used in each structure are particular to the specific transformation process     processes  process_changed  lt 1x1 boolean gt   AN                     process   lt 1 x num_proc struct gt    type  lt string gt    bypass  lt 1x1 boolean gt    All other process specific variables     Figure 5 1  2D Spectral Processes GUI Window    The names of all of the processes in the software tool are stored in the cell array proc names  within handles because they are used at several points within the software and it was much  easier during implementation to add or change process names when they were all defined  in one place  Each process name 
86. decided to show all four quadrants of the 2D Fourier spectrum in the display  as is  the convention in image processing literature  16  31   since it allows the user to observe    the relationship with the raster image more intuitively     The most important aspect of this investigation was to determine what information can be  taken from the 2D Fourier spectrum of an audio signal  To understand the representation  of an arbitrary signal in this 2D spectrum  simple signals must first be analysed so that the  basic 2D Fourier component can be defined  The concept of a 2D sinusoid  with a direction  parameter  was introduced in section 2 3 1  Any single point on the 2D Fourier spectrum  grid represents a sinusoidal signal travelling in a certain direction within the raster image   However in  23   Penrose describes the 2D spectrum in terms of audible frequencies that    vary at a lower rhythmic frequency     As stated in section 2 1 3  the 2D discrete Fourier transform is periodic  an M x N spectrum  of an image has a period of M samples in one dimension and N samples in the other  Each  point has a matching point with opposite phase due to aliasing of a sampled signal  except   0 0    7  2 N 2    0 N 2  and  M 2 0  which cannot be represented by any other point  on the spectrum  In a spectrum with two odd dimensions  the only point that does not  have an equivalent is  0 0  since the other co ordinates do not exist  This is the DC content  of the signal  so if an audio signal 
87. dio data to be  imported and it is called when the user selects    Import Audio    from the    File    menu within  the main GUI window  Listing 3 2 shows an abbreviated version of the import_audio  function  The function brings up a file chooser GUI that will return the file name and path  of the audio file to be imported  Matlab provides the uigetfile function to do this and  also a uiputfile for saving to a file  The structure in import_audio  to open a file browser    and then proceed if a file is chosen  is used for all data I O operations in the software tool     The wavread function is then used to get the audio data and also the sample rate and  bit depth properties  Also the function checks if the audio data is stereophonic and if so  converts it to mono  This is because the 2D Fourier tool currently performs all analysis    and processing only on monophonic signal data  Performing the software investigation for    45    stereophonic data would have been much more difficult  it would be simpler to redevelop    it now the investigation has been carried out           function import_audio hObject  handles    LOAD_AUDIO reads the desired  WAV file and stores it in the data    structure      use dialog to choose the audio file    handles source  file  handles source audio_path   FilterIndex         uigetfile      wav       Select a WAV file               Users chris Documents MEng Project  Matlab Media Audio          if not handles  source  file  0     set the source ind
88. dio to produce an acceptable output signal  The author found that a hybrid technique  of 1D  amp  2D filtering produced the most effective results  strongly suggesting that there are    still advantages to be gained from 2D Fourier processing of audio     2 3 Image Processing    A digital image is a two dimensional representation of a sample array  often this is a  spatially sampled version of a continuous physical image  Image processing is a well   established and highly developed research field  16  25  in which two dimensional transform  techniques are applied in many areas  Therefore when exploring the use of two dimensional  transform techniques for audio  it is useful to develop an understanding of their use in image  processing and how this relates to audio signals  However  whereas digital audio signals  are sampled in time  images are sampled spatially with no timing information  we must    bear this in mind when making associations between the two signal types     2 3 1    Two Dimensional Fourier Analysis of Images    To obtain the Fourier magnitude and phase information of an image  the equations given  in section 2 1 3 should be used  When the spectrum of an image is to be displayed  it is  common to multiply the image z m  n  by     1         which centres the spectrum making  it easier to comprehend  In the frequency domain the spectrum is periodic so shifting by  M 2 pixels up down and N 2 pixels left right will centre the point  0 0   During image  resynthesi
89. e   MAGPHASE     One of the above constants     public ComplexColor           public ComplexColor  float R     this R   R     public ComplexColor float R  float bmin  float bmax   float Imin  float Imax  int type       set_constants R  bmin  bmax  Imin  Imax  type      public void set_constants float R  float bmin  float bmax     float Imin  float Imax  int type     this  R R   this   bmin bmin    this   bmax bmax      this  lmin Ilmin      209          this   lmax lmax      this  type type     public void setR float R     this R   R     public float getR        return R     public void setDrawingType int type     this type   type     public int complex_to_color float re  float im       if  type  REAL     im   0   if  type JMAG   re   QO     float hue  sat  brightness   float r   if  type    PHASE   H  SR   else    r re   re im   im     For now  r  2  if  type    MAG2    type    MAG2 PHASE   r    float  Math sqrt r       float li   Math min 1 0f   Math max 0 0f    float  Math atan r R     2 0f  float  Math  PI    bmax   bmin    bmin   JEE   Imax lmin   min    if  type MAG     int dark    int  1i1 256f    return 0xFF000000  dark lt  lt 16   dark lt  lt 8   dark        if  li lt 0 5f     sat   1 0f           210          brightness   2fxli      else    sat   2 0f 2 0f 1li   brightness   1 0f   i  J   brightness     float Math  atan r R  x 2f   float  Math  PI    float phi    float  Math PI  float  Math  atan2 im  re     double scale    double  4 0 20 0   hue    float   phit tsc
90. e Fourier Transform    The discrete Fourier transform  DFT  is a more relevant tool in digital signal processing  since it allows analysis of frequency characteristics for a discrete time signal  such as a    sampled audio waveform     The spectrum of a discrete time signal is periodic about the sampling frequency  17  and  a signal that is periodic in the time domain has a discrete spectrum  since it is composed  of only harmonically related sinusoids  placed at integer multiples of the fundamental  frequency  The discrete Fourier transform views the signal under analysis as a complete  cycle of an infinitely periodic waveform  hence the signal is discrete in both time and  frequency  The transform process is invertible using the inverse discrete Fourier transform   IDFT   which recreates the time based signal from its frequency points  The formulae for    the DFT and its inverse  of a signal of length N samples  are     N 1   XS amne Te v 0 1     N 1  2 1   n 0  S 2rj   x n    gt  X olew  n 0 1      N   1  2 2   v 0    The DFT is a discrete time representation of the exponential Fourier series  It produces  the same number of frequency samples as there are time samples  this is N  the size of the    DFT  The actual frequency that each point in X  v  represents is given by the formula     f  S h  2 3     Where f  is the sampling frequency  The DFT only analyses the signal at these discrete    frequencies and it is likely that the actual frequency components of the signal wil
91. e index n     It is common to overlap the time windows  since this enables a greater time resolution  It  also ensures equally weighted data when windows overlap at the half power point of the  window function  Serra  30  gave an alternative formulation of the STFT indicating the    time advance of each window  commonly known as the hop size     N 1  l 0 1 L 1  X  l  v    h  n  z  LH    n  e7107 ee 2 11  mae   v 0 1     N   1 l    The number of spectral frames is given by   M  L 7  2 12     Here M is the length of the input signal  A  n  is the window function that selects a block  of data from the input signal    is the frame index and H is the hop size in samples  The  frequency corresponding to each point v is given by equation  2 3   for each point n the  corresponding time index is     t    1 x  H fs   2 13     Equation  2 11  describes the operation of the STFT  however it is quite often used for  real time signal processing  where the input signal length M is unbounded  In a real time    implementation the STFT would be performed on a frame by frame basis     2 2 3 Window Functions    The window function is used to extract a small section from a long time signal  so that a  short FFT can be performed  The multiplication of the two functions  audio signal and    window  in the time domain corresponds to a convolution of the frequency spectra of these    15    functions  due to the theory of convolution  Therefore the window function will distort the    analysed spectrum
92. e of the project  the roles of developer and user were carried out side by side   The software tool was constantly used to analyse and process audio  which not only allowed  an understanding of the underlying signal processing to be developed but also revealed  errors or unexpected processing results  The algorithms could then be examined using  Matlab   s debugging tools and corrections made or further testing pursued  Particularly  complex or vulnerable areas of the program were tested with more formal methods  and    several examples are provided in this chapter     185    7 1 Black Box Testing    Many of the functions within the software tool perform specific manipulations of data  arrays  either to convert between data representations or to transform the 2D spectrum  data  Where appropriate  these functions were supplied with test data that would provide  an expected output result  The actual result could then be compared with the expected    result to confirm that the process performs the correct operation     7 1 1 Testing rasterise and derasterise    The rasterisation process and its inverse are fundamental components of the project  see  section 4 2  The rasterise function was written to convert from a 1D array to a 2D array  using rasterisation  This process has a one to one sample mapping  so every item in the  1D array must be present in the 2D array and the 1D data stream should be readable in    the manner shown in figure 2 5     The following 1D 25 element arra
93. e passed in as an argument to the play function   The sample indices for the current frame are stored in the frame_settings structure  The  audio player   s    Play    push button has a callback function that calls the play_audio shown  in listing 3 6           function play_audio handles        Plays the audio data using the audioplayer object  The    play_frame       setting determines between playing a subsection of audio or the     whole array   if handles  player_settings play_frame  play  handles   player    handles  frame_settings frame_start handles   frame_settings frame finish       else  play  handles  player      end       Listing 3 6  Playing Audio    The loop variable corresponds to the state of the    Loop    toggle button on the audio player   s  user interface  If loop is true then the audio should resume playing from the beginning  unless the    Stop    button has been pressed  The audioplayer object   s StopFcn property  can be set to define a function that is called each time the audioplayer stops playback   whether it has played to the end of the audio or been stopped using stop  In order to  differentiate between these two situations  the stopped variable is used  Pressing the    Play       button sets stopped to false and it only becomes true when the    Stop    button is pressed     50          Listing 3 7 shows how the StopFcn property is set to call the function stopFcn and how    this function controls looping        set  handles  player      StopFc
94. e potential of techniques  based on the initial  understanding developed to that point  This is the first part of an iterative process of  analysis and refinement  that would continue to further the understanding of 2D Fourier    processing and lead to the development of more useful and robust signal transformations     A set of signal transformations have been developed and integrated into the 2D Fourier  software tool  This chapter will describe the extension of the software to allow 2D Fourier  processing of audio  and discuss each the implementation of each transformation individ   ually  Chapter 6 will discuss the results of audio signal processing using these transforms    and analyse their operation     102    5 1 2D Fourier Processing Overview in MATLAB Tool    The preliminary investigation of 2D Fourier domain signal processing was performed using  Matlab   s debugging tools  The Fourier data was altered in the command line and then  the signal was resynthesised using the functions developed in section 4 9  However it was  soon apparent that the architecture of the software had to be extended to allow efficient  investigation  Section 3 1 3 details the software requirements that were determined to allow    signal transformation     Users can apply a chain of various transformations to the signal data  using a GUI that  mimics the style of DAW plug ins  It was decided to limit the maximum number of  transformation processes to 5 to prevent excessive use of memory and 
95. e rows and columns of padding  or not at all  to maintain the synchronous  analysis  The drawback of using significant zero padding in a 2D signal representation is  that it greatly increases memory requirements  which are already considerable larger than    for a 1D representation     6 4 2D Fourier Processing    The investigation into audio transformation using the 2D Fourier transform has provided  some novel and interesting results  but there are also many limitations of the analysis  data when it comes to processing  This section will describe the general features of the  2D Fourier spectrum  as obtained by the analysis techniques of this investigation  that  affect the transformation of audio data  It will then give an evaluation of each of the  transformation processes within the software tool  A set of audio examples are provided    on the accompanying CD to this report  appendix B      6 4 1 Analysis Limitations    Section 4 3 2 gave an analysis of the 2D Fourier transform  It stated that the 2D Fourier  spectrum in not symmetrically valued in both axes since the process involves a Fourier  transform of complex data  However  when properly synchronised in both axes a single    amplitude modulated sinusoidal component is represented by four symmetrically placed    170    points with the same rhythmic and audible frequency but different signs  Equation 4 3   shows the linearity of the DFT  which implies that linear processes can be applied to the 2D  Fourier spectrum p
96. e the figure is loaded     A description of each of the sub level structures is given in table 3 1  however the details of  their contents and functional role will be covered in later sections  The top level structure  handles also contains a cell array named proc_names  which holds strings of the names of    all processes described in chapter 5     39       handles   lt 1x1 struct gt  source    All UI object handles that are children of main figure   lt 1x1 struct gt   JNJ unitcircle  lt 750 x 750 double gt  be E  zoom_obj  lt graphics zoom gt   pan_obj  lt graphics pan gt  opt   gt  dcm_obj  lt graphics datacursormanager gt  4  lt 1x1 struct gt   legend_dcm_obj  lt graphics datacursormanager gt  C     7  player  lt audioplayer gt  ae  load_cancel  lt 1x1 boolean gt  proc_names              plot_settings   lt 1x1 struct gt   player_settings   lt 1x1 struct gt   frame_settings   lt 1x1 struct gt                                       reanalyse  lt 1x1 boolean gt   lt 1x19 cell gt   loaded  lt 1x1 boolean gt   data original_data processes   lt 1x1 struct gt   lt 1x1 struct gt   lt 1x1 struct gt   Figure 3 3  Top Level Data Structure  Structure Name   Description  data Contains the four representations of the signal data along with any    parameters that define the data and its representations  If any 2D  Fourier processing has been applied then the signal data represents    the output of the processing chain        original data A copy of data before any 2D Fourier processing h
97. e to match the logarithmic pattern of    human hearing     175    6 4 4 Rotating The 2D Fourier Spectrum    Rotation of the 2D Fourier spectrum produces the same results as rotating the raster image  would  A rotation of 90   or 270   allows the rhythmic frequency energy of the signal to be  converted to audible frequency energy and vice versa  A rotation of 180   simply reverses  the signal  although when the vertical axis is zero padded the zero row is not removed since    it is now at the top of the image     In rhythmic mode the result of 90   or 270   is composed of a few widely spaced audible  sinusoids with intricate sub sonic variations  In timbral mode the results are generally more  useful because there are many more audible sinusoids  since the two spectrum dimensions  are typically closer in size  The process provides somewhat interesting sounds but its    musical application is limited     Musical signals generally have much more low frequency energy than high frequency  To  improve the results of 90   and 270   rotations  the frequencies of the audible axis could  be scaled to reflect this  since there is currently an unnatural amount of high frequency    energy     6 4 5 Pitch Shifting    The results of audio transformation using this process have been quite poor     In timbral analysis mode the resolution of audible frequency analysis is too low to perform  an interpolation process  Each point in the audible frequency axis is harmonically related  to the funda
98. e wu wens a GM BG A    T 2 Data Input Testing  s s oh kek ah ek  evga be ia So BE oe ES  T3 Code Opiimisaniony  lt 2 2 0 4 a lee Rds Or dk Da SA ae ia   7 3 1 YIN Pitch Algorithm Optimisation                     7 4 SNR of Analysis Synthesis Process     1    20    eee ee    Conclusions    Future Work    Correspondence With External Academics    A 1 Complex Colour Representation    4   cea ae 2a a ee    A 1 1 Email Correspondence               0 00000 ee eee  A 1 2  ComplexColor Java  Class      lt 4 6    445 24 4G Oo ba ee    A 2 Two Dimensional Fourier Processing of Audio                      Accompanying Compact Disc    185  186  186  187  188  189  191  192    196    199    206  206  206  208  211    214    Chapter 1  Introduction    Transform techniques provide a means of analysing and processing signals in a domain most  appropriate and efficient for the required operation  There are many different transform  techniques used for audio processing  27   however these techniques all have limitations  and no method is suitable for all types of signals  There is continuous research effort into  expanding  adapting and improving the available transform techniques for audio applica     tions     There has been recent work in the rasterisation of audio and sonification of images using  raster scanning  39   which provides a simple mapping between one dimensional  1D  and  two dimensional  2D  representations  A 2D audio representation allows the application of  transform techniq
99. e120_eighthwidth_shift280_leave wav      Shift  280      Option  Leave original rows  e simple120_eighthwidth_shift280_remove wav      Shift  280      Option  Remove original rows  e trumpetG3_shift5_remove wav      Shift  5      Option  Remove original rows  e trumpetG3_shift7_remove wav      Shift  7        Option  Leave original rows    216    Filtering    These audio files have been processed using the 2D spectral filter     e loop120_aud_1D_LP2kHz_ideal  wav  This example has been processed using ideal 1D Fourier low pass filtering at 2kHz    to serve as a comparison to the 2D equivalent   e loop120_aud_1D_LP2kHz_ideal  wav        Source file  loop120 wav    Raster Image Width  One quarter note      Frequency Mode  Audible       Type  Low pass       Cutoff  2 kHz    Frequency response  Ideal    Cut  boost  Cut  e loop120_aud_quartwidth_LP_750Hz_but02_cut  wav      Source file  loop120 wav      Raster Image Width  One quarter note      Frequency Mode  Audible      Type  Low pass  Cutoff  750 Hz        Frequency response  2     Order Butterworth    Cut  boost  Cut    e loop120_aud_quartwidth_LP_750Hz_ideal_cut wav    Source file  loop120 wav      Raster Image Width  One quarter note      Frequency Mode  Audible        Type  Low pass    217    Cutoff  750 Hz  Frequency response  Ideal    Cut  boost  Cut    e loop120_eighthwidth_BP_2Hz_pt125_but02_boost wav    Source file  loop120 wav   Raster Image Width  One eighth note  Frequency Mode  Rhythmic   Type  Band pass   Cutoff 
100. eXT computers  Fortunate for me  MacOS X has embraced the  same application development API that I used in the early 90s on a NeXT  I will even be  able to easily port to the iPhone and iPod touch     And thank you for reminding me about this  It helped me rescue a program from obscurity    which I wrote a few years ago using 2D DFTs    Best    Christopher   On May 21  2008  at 5 27 AM  Chris Pike wrote     Hello  I   m glad to have found you  I   ve only read the two chapters of your thesis that are on  your web page but my MEng project has been based on the idea of two dimensional Fourier    analysis that you suggest in Chapter 2  I   m finishing it up now but I   ve been working in    212    MATLAB to create a tool that allows a user to load audio and view its 2D Fourier spectrum   as well as apply several transformations to the spectrum data and resynthesize the audio   I   m using raster scanning  http    ccrma stanford edu  woony works raster   to obtain a    2D representation with no frame overlap  since this was my project brief     Anyway  I was just keen to know how far you took this idea of manipulating rhythmic  frequency and whether you think it has much potential  From the work I   ve done the  transformations seem quite unusual and interesting but to the extent I   ve developed it   perhaps a bit limited  It seems like there   s also scope for this rhythmic audible frequency  grid to be used for in depth audio analysis  I   d really like to hear what you have t
101. ectrum  The shift parameter can take any integer    value however the amount of rows actually shifted is given by the equation     row_shift   mod shift  ceil M 2    5 9     Where M is the height of the 2D spectrum  ceil is the Matlab function that rounds  the input towards infinity and mod is the Matlab function that returns the modulus after  division  This operation is performed in the process_row_shift function  It means that  a shift value of any integer  including negatives  can be converted to a value in the range   0 ceil M 2       The row shift process also has two Boolean option parameters  wrap and remove  which  together define the three different modes of operation  When wrap is true  any rhythmic  frequency rows that are shifted beyond the end of the spectrum are wrapped back around  starting from the 0 Hz row  meaning that every row in the spectrum contains new data  If  wrap is false  then remove determines what data is stored in the low frequency rows that  are not overwritten  this will match the value in shift after equation 5 9   If remove is  true then these rows will be empty  containing only zeros  otherwise the original row data    is retained     With these options defined  it can now be shown that a row shift greater than ceil M 2   is never required  When the rows are wrapped around  the shifting is periodic about  ceil M 2  and if the rows are not wrapped round  then a shift of more than ceil M 2     134    would lead to either an empty spectrum or
102. ectrum in the order that will result in the least number of calculations     using the instructions shown in listing 5 24           if handles  processes  process process_num  change 2   lt  0    handles   do_audible_dim handles   process_num      end   if handles  processes  process process_num  change 1l       0  handles   do_rhythmic_dim handles   process_num      end    if handles  processes  process process_num  change 2   gt  0  handles   do_audible_dim handles   process_num       end       Listing 5 24  Determining The Order Of Dimension Resizing in process_spec_resize    The width of the spectrum is generally larger than the height  so if the spectrum width is to  be decreased then this is done first using do_audible_dim  This means that fewer columns  need to be resized in the do_rhythmic_dim function  If the width is to be increased then  it is done after the height  The functions do_rhythmic_dim and do_audible_dim perform  similar operations  though for different frequency axes  Only the resizing of the rhythmic    frequency axis is discussed here to prevent repetition     The do_rhythmic_dim function is displayed in listing 5 25  It uses Matlab   s interp1 func   tion to resize the complex Fourier spectrum data using cubic spline interpolation  The  width and original height of the spectrum are obtained from the FT data array dimensions   and the new height is calculated from the process variable size 1   The new Fourier  data array can then be initialised with ze
103. ed Data    is clicked  the run_processes  function is used to calculate the processed signal again before it can be displayed  This is  shown in listing 5 10 within the function tb_viewProcCallback  which is the callback of  the toolbar button           function tb_viewProcCallback  hObject   eventdata   handles   guidata gef    if stremp get handles tb_viewOrig     State        off      set  handles  tb_viewProc      State       on       else  set  handles  tb_viewOrig   State     off       end  set  handles  process_button   Enable        on        handles  original_data   handles  data   handles   run processes  handles   handles  processes  num_proc     display_data handles       end       Listing 5 10  Display Processed Signal Data    Note how the data structure is copied into original_data before the processed data is  recalculated  This is because the    Info     GUI window allows the user to adjust the pitch  analysis information  section 4 6 4   which would change the original unprocessed data  representation  The pitch analysis adjustment in the    Info    window is not made available  to the user when the processed signal data is being viewed  since the analysis parameters    can be changed by the transformation processes     5 1 7 Recalculating Signal Properties When Analysis Settings  Change    The analyse_audio function has already been discussed in depth  but it has not been    stated that the run_processes function is called before the loaded function  This al
104. elow wav        Source file  trumpetG3 wav    234        Analysis Mode  Timbral       Frequency Mode  Single       Threshold  20        Remove  Below Threshold  e trumpetG3_single_60below wav       Source file  trumpetG3 wav       Analysis Mode  Timbral       Frequency Mode  Single  Threshold  60         Remove  Below Threshold    235    
105. em in the process array that the menu re   lates to  If an item in the bottom context menu  which corresponds to the    Empty    button   is clicked  the process chosen will simply be added at the end of the chain in the process  array  If this process has adjustable parameters  then it will display a window requesting  the values of these parameters before it is added  Once added  the relevant button   s string  must be changed to match the process name  the GUI window resized and a new item must  be created in the button  opt_button and cmenus arrays  The buttons are then displayed  at the bottom of the window  with the process button deactivated and displaying the string     Empty     The function add_button creates and displays the new button and menu objects   after resizing the figure and moving the other GUI objects to the correct position  It is  shown in listing 5 3 as a demonstration of the adjusting of the GUI layout that is common    in the design of this window     107             function handles   add_button handles   if handles  proc_popup num_buttons  lt  5  cur_pos   get handles  proc_popup  figure      Position         set  handles  proc_popup  figure       Position  cur_pos 2    10 cur_pos 3  cur_pos 4  20         cur_pos  1     cur_pos   get handles proc_popup  title      Position       set  handles  proc_popup  title      Position      cur_pos 1   cur_ pos 2  20 cur_pos 3  cur_pos 4      for num   1 handles proc_popup num_buttons  cur_pos   get handles p
106. emitones    Resize  These audio files demonstrate the results of the resizing the 2D Fourier spectrum in rhyth   mic analysis mode  Both signals had an original tempo of 120 bpm and a duration of 32    eighth note beats   e loop120_eighthwidth_dur12 wav        Source file  loop120 wav    225        Duration  12 eighth note beats       Tempo  120 bpm       Quadrant Dimensions   7 11026   e loop120_eighthwidth_dur24 wav       Source file  loop120 wav        Duration  24 eighth note beats    Tempo  120 bpm      Quadrant Dimensions   13 11026   e loop120_eighthwidth_tempo90 wav      Source file  loop120 wav      Duration  16 eighth note beats        Tempo  90 bpm    Quadrant Dimensions   9 14701     e loop120_eighthwidth_tempo150 wav    Source file  loop120 wav        Duration  16 eighth note beats    Tempo  150 bpm      Quadrant Dimensions   9 8821   e simple120_eighthwidth_dur12 wav      Source file  simple120 wav        Duration  12 eighth note beats    Tempo  120 bpm      Quadrant Dimensions   7 11026   e simple120_eighthwidth_dur24 wav        Source file  simple120 wav    226        Duration  24 eighth note beats       Tempo  120 bpm       Quadrant Dimensions   13 11026   e simple120_eighthwidth_tempo90 wav       Source file  simple120 wav        Duration  16 eighth note beats    Tempo  90 bpm      Quadrant Dimensions   9 14701    e simple120_eighthwidth_tempo150 wav      Source file  simple120 wav      Duration  16 eighth note beats        Tempo  150 bpm    Quadrant Dimensi
107. empo   synchronised duration  So for example  if a quarter note duration was specified  the analysis  would result in a raster image and 2D Fourier spectrum for each quarter note of the signal  at the analysis tempo  This method of analysis was designed to allow each note of a melodic    signal to be analysed separately according to its pitch     The 2D spectra of each frame will only be correctly pitch synchronised by timbral analysis  if note changes happen at the frame boundaries  since each frame has a single pitch value   The frame size should therefore be specified according to the minimum note duration in the  audio signal  The rhythmic timing of the audio signal has to be precise to allow accurate    tempo synced timbral analysis     169    6 3 4 Zero Padding    The use of zero padding in the software tool   s signal analysis was not well designed  In  Fourier analysis zero padding is used to reduce the frequency interval between spectrum    points  improving the resolution of frequency analysis     The analysis process of this project used a single row and or column of zero padding to  ensure that the spectrum had odd dimensions  see section 4 4 4   The issue is that this  resizing of the spectrum changes the centre frequency of the analysis bins  invalidating  any synchronous analysis which would originally have improved the clarity of the spectral    representation     Zero padding should have been used to greatly improve the spectral resolution  applying  many mor
108. empo 2   dooce ug Bley A ee AK ee BA 156   6 2 Rhythmic Mode Analysis        5  4nd aoe eek ee ee ee ee 160  6 2 1 Changing Rhythmic Frequency Range                   162  6 2 2 Limitations of Integer Raster Image Width             2   163  6 2 3 Pitch Synchronised Rhythmic Analysis                  165   6 3 Timbral Mode Analysis   Ye 1a  0 ioe  dele Fs Ae se Od Be Eas eG 166  6 3 1 Instrument Timbres     aaa ar athe Ge a as Bg dR A Ee ore Ge 166  6 3 2 Limitations of Integer Raster Image Width               168  6 3 3 Tempo Synced Timbral Analysis                 00  169    6 3 4    Zero PACOIIC a e raa eee A Hel bo Be a Ook ee Bes a ae aa    6 4 2D Fourier Processing pikes ae eg a ee ee ee ee poe ES    6 4 1  6 4 2  6 4 3  6 4 4  6 4 5  6 4 6  6 4 7  6 4 8  6 4 9    Analysis Limitations     he Goh 8 ore aia SPIRO  Filtering The 2D Fourier Spectrum                 2    Thresholding The 2D Fourier Spectrum                   Rotating The 2D Fourier Spectrum                     Ie AGE ShiRtInE 20 Ga  Sade ne et ca saeco oo Mas HA ate ds sae set eee  Rhythmic Frequency Range Adjustment                  Resizing The 2D Fourier Spectrum                  4   Evaluation of Resampling Techniques                    Shifting Quadrants The 2D Fourier Spectrum                 Testing the Matlab Tool  T  Black Bog Testinin etch vie ts ar pate 4 av Dw ir gia Gee Ar an deine es Cobo    7 1 1  7 1 2    Testing rasterise and derasterise                      Testine rev fftshift  se s di adel
109. encies that the original data points  represent  This process is performed in the process_spec_stretch function  processing  each data frame separately  The operations performed on the 2D Fourier data each frame  are shown in listing 5 22  The amount variable is used to calculate xi  the array of output    sample indices for the interp1 function           width   size handles data FT frame    2      height   size handles  data FT frame  1     oldFT   fftshift  handles data FT frame        newFT   zeros height   width      x   linspace    1 1 height      xi   linspace    1 handles  processes  process proc_num  amount      1 handles  processes   process  proc_num   amount  height       col   1 width    newFT   col    interp1 x oldFT   col  xi    spline     0     handles  data FT frame    rev_fftshift  newFT          Listing 5 22  Rhythmic Frequency Range Compression Expansion in    process_spec_stretch    5 8 4 Fixed Rhythmic Frequency Stretching Processes    As with pitch shifting  there are two fixed processes that perform rhythmic frequency range    adjustments without using interpolation  These processes are referred to in the software    142       tool as    Halve Rhythm    and    Double Rhythm    respectively and are defined in the function  process_fixed_rhythm which uses the Boolean argument up to determine whether the    rhythmic frequency range should be stretched or compressed by a factor of 2     The operations performed on the 2D Fourier data of each frame are the 
110. end  of create_proc_panel  It can be seen that the context menus  process buttons and option  buttons are all stored in arrays with the length of num_buttons  Therefore it is clear which    item in the process array each object corresponds to           for num   1 handles proc_popup num_buttons  for ptype   1 size handles proc_popup cmenus  num   proc  2   set  handles  proc_popup cmenus num  proc ptype      Callback            cmenu_clicked  num  ptype     Interruptible       off       end  set  handles  proc_popup  buttons  num       Callback        button_clicked       num      Interruptible       off       set  handles  proc_popup  opt_buttons  num      Callback            opt_button_clicked  num     Interruptible        off       end    set  handles  proc_popup  figure      CloseRequestFcn        proc_popup _close            Listing 5 1  Setting Object Callbacks In The 2D Spectral Processes Window    The same callback function is used for all the process button objects in the buttons array   the index of the clicked object is simply passed as the argument num into the function  The    same technique is used for both opt_buttons which contains the option buttons objects     The cmenus array contains a structure for each process in the process array  containing the  handles for all objects in the menu  Each uzmenu object corresponding to a transformation  process has its handle stored in the proc array  within each cmenus structure  The index  of these handles corresponds 
111. er image width is defined as the  smallest rhythmic duration in the signal  for example an eighth note in simple120 wav   the pitch can be effectively altered without changing the tempo or rhythm of the signal   However if the raster image rows contain more than one note event then the rhythm of  the signal is also changed since the pitch adjustment is essentially changing the pitch and  duration of each raster image row  The rectangular windowing of rasterisation also causes    problems here since discontinuities are likely to occur at the joins of raster image rows     In rhythmic analysis mode the interpolation also retains a component at the fundamental    178    frequency of analysis  causing an oscillation across the raster image rows  This effect is well  demonstrated by the pitch shifting of loop120 wav in figure 6 17  and calls into question    the appropriateness of resampling by interpolation     TU TT TIT  n MHNA    M  NTL  wA en TI Wt    1 ai    HUES mnt  i Wn         f       Figure 6 17  Raster Image of loop120 wav After Pitch Shifting    Even when the pitch shifting process produces acceptable results there are much more  effective and efficient techniques available in 1D Fourier processing  It is still believed that  there might be potential in effective pitch changing of signals using 2D Fourier processing   especially in timbral scale analysis where the rhythmic oscillations of harmonics could be  adjusted independently from the audible frequencies to mainta
112. eros height   width      start   ceil width 4     col   1  ceil width 2      if up   newFT    2  col    1     oldFT   col start    else   newFT   col start    oldFT     2 col    1     end    handles  data FT frame    rev_fftshift  newFT          Listing 5 21  Octave Pitch Shifting Without Interpolation in process_octave    5 8 Rhythmic Frequency Range Compression  Expan     sion    Rhythmic frequency compression and expansion processing is performed by resampling  the columns of the 2D Fourier domain data to alter the range of rhythmic frequency   Its operation is similar to the pitch shifting process in section 5 7  but for the rhythmic  frequency axis rather than audible frequency  Figure 5 8 attempts to visually describe the  process  demonstrating how the range of the original rhythmic frequencies can be stretched  beyond the scope of the 2D Fourier signal representation or compressed to less than the  original range  This process is referred to within the software tool using the short name     Stretch Rhythm    and will be commonly referred to as the rhythmic frequency stretching    process in this section     140                i ater cetans as oink niga ee ee ry PEE  2    x r    eho See E E   gt    S   5      P  A 7 y 7  iow   Original Rhythmic Compressed Rhythmic Expanded Rhythmic Signal Representation     2D Fourier Spectrum Frequency Range Frequency Range Frequency Range Limits  2    E          5 ETEN r EEEE ES    A    cc    ER DEE Sch hose ais teeta Gas Dag ek eens
113. erted at the correct  position in the audio array  even though the frames of data are all of different length  The  frames will overlap  since they are all longer than frame_size  which will cause signal  discontinuities  This issue could be addressed by the use of windowing and setting the  amount of overlap to the window functions half power point  as with the STFT  section  2 2 2   However  it was decided that for the purposes of analysis and signal transformation  in the software tool  the pitch synchronisation technique did not provided much additional  benefit  The resampling of each row individually means that the 2D spectrum display axes  do not give the correct frequency scale for any point  The design of signal transformations  would also have to be much more complex to properly compensate for the resampling of  row data  As a result the pitch synchronous analysis and synthesis were not developed any    further     101       Chapter 5    Two Dimensional Fourier Processing  of Audio    The second major phase of the project was to investigate methods of audio signal transfor   mation by processing 2D Fourier data using the knowledge gained from the prior analysis  investigation  This processing has two uses  to provide novel musical transformations for  creative sound design and to further the understanding of 2D Fourier analysis  The pro   cesses used to obtain signal transformations should be viewed as prototypes  They were  designed with the aim of investigating th
114. ettings  tempo_div_den     end    end       Listing 5 18  Correcting Analysis Settings After Rotation in process_rot_spec    The width_pad and height pad variables need to be swapped so that the correct amount of  zero padding can be removed from each dimension when the raster image is obtained from  the spectrum  Any analysis settings corresponding to the dimensions of the raster image  then need to be recalculated  In timbral analysis mode  the width of the raster images   imwidth array  is obtained using the FT array sizes and the new values of width_pad   In rhythmic mode  without pitch synchronisation only   the imwidth and frame_size  variables  which are both the same  need to be swapped with num_frames  since these    parameters correspond to the dimensions of the raster image     133       5 5 2D Fourier Spectrum Row Shift    This process allows the rows of rhythmic frequency content to be shifted within the 2D  Fourier spectrum  setting them to a new rhythmic frequency value  This process is a  useful tool to aid the understanding of rhythmic frequency content and provides interesting    perceptual effects when transforming audio     5 5 1 Row Shift Parameters    The row shift process uses the integer parameter  shift  to define the number of rows by  which the spectrum data is shifted  The rows are shifted away from 0 Hz for a positive  value of shift  so positive and negative frequency rows are shifted in opposite directions   to maintain the symmetry of the 2D sp
115. f the analysis and processing in this project  There is a large amount of data in  the 2D Fourier representation  since each point in the 2D matrix has both magnitude and    phase values  the display needed to clearly portray this data     The 1D Fourier spectrum has also been included in the software tool as it is the more  familiar representation of the frequency content of an audio signal and it would help users  to grasp the concept of the 2D Fourier spectrum by comparison  It also aids general audio  analysis and the understanding of the relationship between 1D  amp  2D Fourier representa   tions  The 1D Fourier representation is obtained from the audio data using the FFT and  is then decomposed into magnitude and phase components  Both of these needed to be    obtained and represented within the software tool     3 1 2 Analysis Tools    In order to allow proper investigation the software needed to provide several tools that  would allow accurate analysis and aid inspection of the various representations of the  signal  All plots needed proper headings  axis labels and scales to describe the data on  display  Provision of plot tools such as zooming and panning  as well as a data pointer that    displays the precise value at a specific point in the plot  was also important for thorough    32    analysis  Another required feature was a display of time domain signal statistics such as    range  mean and length  which would help characterise the signal     Section 2 5 1 indi
116. for both frequency axes  The input spectrum size can  easily be calculated from the FT data array but since it is required frequently in the adjust  function  it is stored in the process data structure  The values of the min_quadrant_size  array were chosen to yield a minimum signal duration of just under 100 ms  since this is    nearing the threshold of human auditory perception  27      5 9 2 Adjusting Resize Process Parameters    The GUI window for the resize process  shown in figure 5 10  allows the user to specify  the required quadrant size of the output spectrum  Alternatively the user can enter the  required tempo or rhythmic duration and the nearest quadrant size to this value will be    set      lt Student Version gt    Resize Spec     Resize the 2D spectrum   Heignt  mytnmic treg   q7 Duration     pg  4 beats  6 6667 secs   Width  audibie treg     7351     Tempo      4gg     VPM    Minimum size   2 11   Onginal size   9 11026      Bypass    Figure 5 10  Resize Spectrum GUI    The function adjust_spec_resize creates the resize spectrum GUI and defines the callback    functionality of the GUI objects  allowing the properties of the resize spectrum process to be    145    set  The    Minimum size    values displayed on the GUI are given by the min_ quadrant size  variable and the    Original size    values are given by orig_size     Height    and    Width    display    the required quadrant dimensions of the output spectrum  as stored in size     The signal duration i
117. formed using the raster scanning path  this process will be referred  to as derasterisation  Sonified images produce a pitched time developing sound as a result  of the similarity between adjacent rows of pixels  Derasterised image textures produce    similar audio textures and various visual filters were identifiable in listening tests  39      Sound visualisation uses raster scanning to convert from a sound into an image  a process    28    which will be referred to as rasterisation  When the width of this raster image is correctly  set to correspond to the sound   s fundamental pitch period  the result shows temporal    variations in pitch and timbre     2 5 2 Spectrograms    The spectrogram is arguably the most popular method of sound visualisation  It produces  a display from STFT analysis data  with time on one axis and frequency on the other   The intensity of the spectrogram at a certain point corresponds to the amplitude of a  particular frequency at a certain time  The inverse spectrogram method is a popular  scanning technique for image sonification  and if the parameters of the inverse STFT are    the same as the analysis parameters  a sound can be recreated from its spectrogram     2 5 3 Synthesisers  amp  Sound Editors with Graphical Input    It is possibly to perform sound transformations by directly modifying the spectrogram   There are two notable software applications  Metasynth and Audiosculpt that allow sound  design and modification using spectrogram edit
118. given in section 1 1  These aims yield the software requirements that ensure  that a useful and useable tool is produced     3 1 1 Signal Representations    One of the main aspects of this project was to investigate different representations of    audio data and find out what each offers  There were three signal representations required    31    explicitly in the project aims  the audio signal and the raster image in the time domain   and the 2D Fourier spectrum in the frequency domain  It was decided that all signal  representations should be displayed in the software tool  to help gain an understanding of    the 2D Fourier domain properties by comparison     The time domain audio signal is the basis from which all other representations are obtained   The signal can be displayed graphically to allow waveform inspection but it was most  important to have an audio player in the software tool that would allow playback over a  loudspeaker  Although the software tool displays signal representations visually  the input    and output of this signal processing is sound     The raster image is the 2D time domain representation of the audio signal obtained by  rasterisation  This image is an intermediate form between the audio waveform and the 2D  Fourier spectrum  but it also provides insight into signal variations over time that are hard    to visualise in a 1D waveform display     The 2D Fourier spectrum is the key signal representation in the software tool since it is  the focus o
119. gure object and so could have its own GUI data  which enabled a local copy of  the handles structure to be stored  In this way the program data could be edited without  losing the original values and then the main figure   s GUI data could be overwritten at a  point decided by the program or the user  This method of data handling made the software    tool much more flexible     3 5 Menu and Toolbar Design    Matlab GUI figures allow the use of menus and a toolbar  which is useful when designing  a software application  There are default menus and a toolbar that provide a lot of useful  functionality such as access to plot tools but they also allow unnecessary and often un   desirable operations such as insertion of GUI objects and saving of the figure file  which  would overwrite the GUIDE figure layout  It was therefore necessary to develop custom  menus and a custom toolbar that incorporated the required functionality from the defaults    plus any additional functionality specific to the software tool     The menus and toolbar can be seen in the main GUI display in figure 3 2  There are four    menus     e The    File    menu  which contains the application specific data I O operations load     save  import and export  section 3 6      e The    Plot Settings    menu  which allows the user to adjust the display settings for  both the Fourier spectrum plots     43    e The    Plot Tools    menu  which provides the use of the zoom  pan and data cursor tools     The functionali
120. h      Cut boost  Cut  e trumpetG3_LP_but02_30Hz_cut wav      Source file  trumpetG3 wav      Frequency Mode  Rhythmic      Type  Low pass  Cutoff  30 Hz        Frequency response  2   4 Order Butterworth    Cut  boost  Cut    e trumpetG3_LP_ideal_30Hz_cut wav    Source file  trumpetG3 wav      Frequency Mode  Rhythmic      Type  Low pass   Cutoff  30 Hz        Frequency response  Ideal    Cut  boost  Cut  Fixed    These audio files demonstrate the results of the fixed transformation processes for simple120 wav    222    and trumpetG3 wav  in rhythmic and timbral analysis mode respectively  An eighth note    row duration was used for simple120  wav     loop120_aud_1D_LP2kHz_ideal  wav  This example has been processed using ideal 1D Fourier low pass filtering at 2kHz    to serve as a comparison to the 2D equivalent   simple120_eightwidth_doubledur wav       Process  Double Duration  simple120_eightwidth_doublerhyth wav       Process  Double Rhythm  simple120_eightwidth_doubletempo wav       Process  Double Tempo  simple120_eightwidth_downoctave wav       Process  Down Octave  simple120_eightwidth_halvedur wav       Process  Halve Duration  simple120_eightwidth_halverhyh wav       Process  Halve Rhythm  simple120_eightwidth_halvetempo wav       Process  Halve Tempo  simple120_eightwidth_specshift wav       Process  Spectrum Quadrant Shift  simple120_eightwidth_upoctave wav       Process  Up Octave  trumpetG3_doublerhyth wav       Process  Double Rhythm    trumpetG3_downoctave w
121. h  69 0141 Hz   Raster Image Width   639 samples    Figure 6 1  Correcting Pitch Analysis For A Cello Playing C     Correcting the pitch analysis settings to obtain vertically aligned signal periodicities will  ensure a more concise 2D Fourier spectrum since the signal components will be aligned  with the centre frequencies of analysis bins  preventing the spectral energy from spreading  across several bins  This is demonstrated by the spectrum of the cello sound when the    pitch analysis is corrected  as shown in figure 6 2     155    2D Fourier Spectrurn    Frequency  Hz         1  0 8  0 6  0 4  0 2 0 0 2 0 4 0 6 0 8 1  Frequency  Hz  x10      a  Calculated Pitch  68 6916 Hz   Raster Image Width   642 samples    2D Fourier Spectrum    Frequency  Hz         1  0 8  0 6 04  0 2 0 0 2 0 4 0 6 0 8 1  Frequency  Hz  x 197     b  Corrected Pitch  69 0141 Hz   Raster Image Width   639 samples    Figure 6 2  Improving 2D Spectrum By Correcting Pitch Analysis    6 1 2 Determining Correct Tempo    The scale of raster image dimensions is quite different in rhythmic analysis mode  where  the width is defined according to a tempo based duration  However the same principal  applies as with timbral analysis  When the tempo of the audio signal is correctly defined     the signal waveform will demonstrate vertically aligned patterns in the raster image  This    156    is demonstrated with an electronic drum beat in figure 6 3     MUNII J    MITET  WEEE EERE   IELE SIL EERE  ati  AVL Aneel
122. has a constant non zero value   0 0  will be the only    point in the 2D Fourier spectrum with a non zero magnitude value and no point will have    60    non zero phase     Signal Components With Audible Frequency Only    A sinusoidal signal with an integer period size can have a raster image that contains  exactly one period in each row  When this is analysed using the 2D Fourier transform   the resulting spectrum contains only two points both having zero rhythmic frequency since  there is no vertical variation in the raster image  and with the symmetrical positive and  negative audible frequency of the sinewave  as shown in figure 4 2  The 2D spectrum has    been zoomed in in this figure  to more clearly display the two points     Raster Image 2D Fourier Spectrurn  T       Frequency  Hz         2500  2000  1500  1000  500 0  Pixels Frequency  Hz     500 1000 1500 2000 2500    Figure 4 2  2D Analysis of a Sinusoid With Correct Raster Width    Signal Components With Rhythmic Frequency Only    When there is no variation across each raster image row but a clear periodicity in the  columns  the opposite occurs  The 2D Fourier spectrum demonstrates two points with zero  audible frequency and symmetrical positive and negative rhythmic frequency  as shown in  figure 4 3     61    Raster Image 2D Fourier Spectrum        Frequency  Hz         2500  2000  1500  1000  500 0 500 1000 1500 2000 2500  Pixels Frequency  Hz     Figure 4 3  2D Analysis of a Rhythmic Sinusoid With Correct Ras
123. hat are represented by the points of the 2D spectrum   s vertical axis  The maximum  frequency point on the rhythmic axis corresponds to a rhythmic periodicity that is double  that of the row size  In other terms  the maximum frequency of the rhythmic axis is half    the rhythmic sampling frequency   Rhythmic sampling frequency is given by the equation   frs   fs N  6 1   The interval between rhythmic frequency points is given by   Af    f  MN  6 2     Where M and N are the width of the 2D spectrum     When the duration of the raster image width is halved  for example from a quarter note  to an eighth note  the rhythmic sampling frequency is therefore doubled  To obtain 2D  spectral components with higher rhythmic frequency  the row width must be reduced  as    demonstrated in figure 6 7 for simple120 wav with a row width of an eighth note     162    2D Fourier Spectrum    Frequency  Hz   2 S        in  gt  in   in    in        2  1 5  1  0 5 0 0 5 1 1 5 2  Frequency  Hz  x10     Figure 6 7  2D Analysis of simple120 wav With 1 8 Beat Row Size    This spectrum has a rhythmic frequency ranging up to 2 Hz  which is a quarter note rhythm  at 120 bpm  There are many 2D spectral signal components of 2 Hz  since the kick and  snare drum hits occur alternately at this frequency  The rhythmic frequency envelope in  the range of 0 Hz to 1 Hz is the same as in figure 6 6 though with half as many points   The signal   s rhythmic frequencies are less well defined between 1 Hz and 2 Hz sin
124. he functions  Better results can be obtained using smooth    filter functions such as Gaussian curves     The Fourier transform of audio data converts from a confusing time domain waveform into  a more intuitive frequency spectrum representation  however in images the information is  much more straightforward in the spatial domain and the Fourier representation is not  very descriptive of the image  As a result  frequency domain filter design is not very useful   31   The basic component of an image is the edge and edges are made of a wide range  of frequency components  Filter design is more effective in the spatial domain where it is    thought of as smoothing and edge enhancement rather than high  amp  low pass filtering     The most common use of the 2D Fourier transform for images is FFT convolution  A spatial  image filter operates by convolving a neighbourhood kernel with the image  operating on  each pixel one by one  For a large kernel this convolution operation becomes quite complex  and it is more efficient to take the FFT of the image and the kernel  then multiply their  spectra in the frequency domain  As previously stated  the DFT views the input function    a periodic signal  In order to prevent aliasing distortion from corrupting the convolution     26    both the image and the kernel must first be zero padded  16   The 2D DFT is also used  for image restoration noise reduction  data compression  motion estimation  boundary    description and texture analysis 
125. he image as the fundamental period size in  samples  rounded to the nearest integer  Rhythmic mode uses the tempo and a note  length  such as one quarter note  to determine the width of the image in samples  again    rounded to the nearest integer     Each of these representations offers a different scale of rhythmic frequency  In timbral mode  the upper rhythmic frequency boundary is in the order of tens and sometimes hundreds  of Hertz  extending into the lower end of the audible frequency spectrum  Frequencies in  this range describe the slow oscillations over a signal that are commonly related to the  conceptual timbre of a sound  describing the evolving amplitude and audible frequency  spectrum envelopes  In rhythmic mode the upper rhythmic frequency boundary is much  lower  generally less than 10 Hz  This frequency range corresponds to conventional rhyth   mic frequencies  for example at a tempo of 120 bpm  eighth notes occur at a frequency of  4 Hz     The choice of raster image width affects the resulting 2D Fourier spectrum  as described    78    in section 4 3 3  Each analysis mode attempts to represent a different frequency axis  accurately  setting the raster width accordingly  In rhythmic mode  the raster image width  is set so that the frequency bins of the rhythmic axis are likely to correspond to prominent  rhythmic components within the signal  The maximum frequency on the rhythmic axis is  half the frequency of the beat duration that defines the raster image
126. he output sample points along the same signal  Cubic spline interpolation was  chosen because it gives a lower interpolation error than polynomial interpolation for the  same order of polynomial  The frame variable indicates the current image row being cal     culated  This code is contained within a for loop that increments frame for all row indices             set range indices for audio array  start   1    frame   1  handles data analysis frame_size     length of frame has to be rounded up  len   ceil handles data analysis num_periods frame  xhandles data    analysis  period frame       finish   start   len     1   Y   zeros len l      prevent going over the edge of the audio vector  if finish  gt  length handles  data  audio   finish   length handles  data  audio     len   finish     start   1   end    extract audio frame  Y 1l len    handles data audio start   finish      frame sample index values  x   linspace 1 length Y   length Y         work out sample indices for new resampled frame  xi   linspace 1  length Y   handles data  analysis  imwidth       interpolate the frame data to get resampled frame and add as a    row in the image  Yi   interpl x Y xi    spline       handles  data image 1  frame  1 length Yi     Yi           Listing 4 7  Calculating The Raster Image In Pitch Synchronised Mode    87       4 6 4 Readjusting Analysis    Once the raster image data has been obtained  the analyse_audio function proceeds to  determine the 2D Fourier transform data using spe
127. heir precise rhythmic frequency     both positive and negative     A signal component that is exactly periodic in both dimensions of the raster image is a  sinusoid harmonically related to fao modulated by a sinusoid harmonically related to f o   It will be displayed as four points on the 2D spectrum  corresponding precisely to the    positive and negative rhythmic and audible frequencies     Any signal component that cannot fit a whole number of periods into a raster image row   will be displayed at an angle and the points of its frequency spectrum will be skewed  so they are perpendicular to the direction of the sinusoid  This occurs with sinusoids  with an amplitude modulation as well as those with no modulation  The energy of these  components will also be spread across adjacent frequency bins  since its frequency does not    directly match the centre frequency of any point on the spectrum     69    The 2D frequency spectrum represents sinusoidal components with sinusoidal amplitude  modulation  When the frequency of either the carrier or the modulator exactly matches  the centre frequency of one of the 2D spectrum points for its respective axis  the component  will be precisely modelled by a single bin in that axis  If the frequency of either carrier or  modulator doesn   t match the centre frequency of a 2D spectrum point  then its energy will  be spread across several bins along the respective axis  This energy smearing is the same  as in 1D Fourier analysis and might 
128. i algorithm  21   detect peaks in the STFT analysis spectrum and assign them to tracks that record ampli   tude and frequency variations  Each peak is assigned to its nearest track from the previous  frame and the number of tracks can vary over time  thereby allowing the system to mon   itor short time components of the signal  The tracks are used to drive a bank of many    sinusoidal oscillators at the resynthesis stage     This method of analysis resynthesis can more readily handle input signals with time   varying characteristics and inharmonic frequency content than the phase vocoder  Also  the TPV allows more robust transformations of the analysis data than the regular phase  vocoder  A similar set of transformations can be performed with the TPV such as spectrum    filtering  cross synthesis and pitch time transformation  27      The Spectral Modelling Synthesis algorithm  SMS   30  provides an extension to the TPV  that analyses the signal in terms of deterministic and stochastic components  The peak  tracking method is carried out and then the residual signal is analysed using the STFT     The SMS method is more effective at handling noisy signals     2 2 7    Two Dimensional Fourier Analysis of Audio    Two dimensional Fourier analysis has very rarely been applied to audio signals but it  could potentially lead to new signal representations and creative sound transformations   In the initial chapters of his thesis  23   Penrose outlined a two dimensional audio analys
129. icator    handles source type      audio         handles data audio  handles data audio_settings Fs handles data   audio_settings nbits    wavread  handles source audio_path    handles source  file          if the audio data is stereo then convert to mono   if  size  handles   data  audio  2    2    handles data audio   mean  handles  data audio  2       end    end    end       Listing 3 2  Importing Audio Using wavread    The rest of the function  which was omitted  calculates some properties of the audio data  and then sets the variables required to allow processing operations to be applied  before    calling the analyse_audio function to bring up the analysis settings menu     When the    Export Audio    option is chosen from the    File    menu  the export_audio function  is called  shown in listing 3 3  This is much simpler than the import operation  It brings  up a file browser to select the save file and then  if a file is named  it calls a utility function  that was written to remove the suffix from a file name  Then it uses the wavwrite function  to save the audio data in a  wav file using the chosen file name and path  The GUI data    is finally updated to retain the new source structure settings     46             function export_ audio hObject   handles    SAVE AUDIO will save the audio data as a WAVE file    handles source  file   handles  source audio_path  FilterIndex           uiputfile      wav       Save audio as a WAV file             handles source  audio_
130. ier Spectrum    This is a simple fixed process that uses the fftshift function to invert the frequency  content of the 2D Fourier spectrum so that components near the centre of the spectrum   with low rhythmic and audible frequency  are shifted towards the corners of the spectrum   and components at the corners of the spectrum  with high rhythmic and audible frequency   are shifted towards the centre of the spectrum  It was thought that the transformation    might provide an interesting perceptual effect     152       Chapter 6    Evaluation Two Dimensional Audio    Analysis and Processing in Matlab    Now that the 2D Fourier analysis and processing capabilities of the software tool have been  fully desribed  it is important to evaluate these techniques and identify the conditions in  which they are most appropriately used  A variety of audio signals have been analysed  in the software tool  investigating their properties using 2D representations in both time  and frequency domains and revealing the capabilities and limitations of the developed  techniques  The 2D Fourier domain transformation processes have been explored using  different audio signals  Their operation can now be analysed and appropriate applications    can be identified to achieve interesting transformations     6 1 Effects of Raster Image Width on 2D Fourier Anal   ysis    The importance of applying the correct raster image width was explained in section 4 3 2   If the centre frequency points of the 2D Four
131. ier analysis bins are correctly aligned with  frequency components of the signal data then these components can be defined by a single  point in the spectrum yielding a clearer data representation  Depending on the analysis  mode  the raster image dimensions are either determined by the pitch or tempo of the audio    signal  This section demonstrates the benefits of correctly defining the relevant analysis    153    parameter     6 1 1 Determining Correct Pitch    When the raster image width is defined according to the fundamental frequency period   signals often exhibit patterns that depict slow pitch variations  The analysed pitch is  correct when these patterns align vertically within the raster image  This is demonstrated    for a cello playing a C  in figure 6 1     The pitch was calculated at 68 6916 Hz giving an image width of 642 pixels  however figure  6 la shows that this doesn   t properly represent the pitch of the signal  since the periodic  signal pattern is displayed at an angle  The image width was adjusted to 639 pixels using  the    Info    GUI window  this corresponds to a pitch of 69 0141 Hz  Figure 6 1b shows how  this corrected pitch setting aligns the periodic signal vertically in the raster image  apart    from at the start and end of the signal where the pitch increases and decreases respectively     154        a  Calculated Pitch  68 6916 Hz   Raster Image Width   642 samples    Raster Image       100 200 300 400 500 600  Pixels     b  Corrected Pitc
132. image    image image_data       58            label the plot   title    Raster Image        set  get gca     XLabel        String        Pixels       set  get gca     YLabel         String        Pixels         colormap  gray          Listing 4 2  Plotting The Raster Image    4 3 Two Dimensional Fourier Transform    The 2D Fourier transform can be used to provide a new perspective on frequency domain  audio analysis  by simultaneously giving the audible frequency content and the sub sonic  rhythmic variation in audible frequency  as described in section 2 2 7  It is the core of this  software investigation and the aim was to understand its properties for audio analysis  The  2D FFT was used in the software tool to obtain the 2D Fourier spectrum of audio data     which could then be displayed to the user     4 3 1 Obtaining the 2D Fourier Spectrum in Matlab    As described in section 2 1 2  Matlab provides the function fft2 to perform the 2D dis   crete Fourier transform  which returns the Fourier transform data matrix  The function  spec2D_init was written to obtain the complex Fourier data from each raster image  The  Matlab functions abs and angle can be used to obtain magnitude and phase components  from the complex matrix but this is not done immediately  the Fourier transform data is    stored in the FT cell array within the data structure     4 3 2 Analysis of the 2D Fourier Transform    In order to understand the 2D Fourier domain representation of a signal  it is impo
133. in an acceptable timbre  see    section 6 3 1   However the resolution of analysis must be first be improved     6 4 6 Rhythmic Frequency Range Adjustment    This process exhibits the same issues as pitch detection though based in the other frequency  axis  When the rhythmic frequency range is altered by resampling  especially in rhythmic  analysis mode  the frequency interval is too large and the tempo of the signal remains  unchanged whilst the magnitudes of the original rhythmic frequencies are altered  This  has potential for creative use  as does the timbral mode pitch shifting  however it is not    achieving the desired results     179    Low frequency rhythmic variations are introduced when the rhythmic range is increased  and since the DC rhythmic frequency row typically contains large magnitudes  this causes  a signal drop out in magnitude across the duration of the signal as shown in figure 6 18    where the rhythmic range of loop120 wav was doubled     THOT Hi    1    MUE      i NER     ai    AMAN Hil       Figure 6 18  Raster Image of loop120 wav After Doubling Rhythmic Frequency Range    The same effects are observed in timbral analysis mode where the DC rhythmic frequency  row is also generally large in magnitude  The fixed process    Double Rhyth    that inserted  zero valued rows between each original row achieved the expected result without the er   rors caused by interpolation  This result is essentially increasing the tempo of the signal  whilst maintaining
134. indow function to the audio signal to ex   tract each row of the raster image  When signal components are correctly pitch synchronised  the audible frequency analysis will display these components by a single point  Likewise in  the vertical axis of a raster image  when a sub sonic signal component has a precise period  synchronisation with the image dimensions it can be represented by a single point in the  rhythmic axis  However when there is no synchronisation the effects of the rectangular    window distort the 2D Fourier domain signal representation     Each non synchronised 2D spectral component is represented by its actual audible and  rhythmic frequency convolved with the frequency response of the rectangular window in  both axes  If a component is synchronised in one axis then the other axis will still show    distortion as a result of this convolution     67        a  Rectangular Window        b  Hamming Window    Figure 4 7  Window Function Time and Frequency Representations    The Matlab Signal Processing toolbox provides a window analysis tool which was used to  observe the frequency response of a rectangular window  as shown in figure 4 7a  Figure  4 7b shows a Hamming window in the time and frequency domains to serve as a comparison   Both window functions have a frequency domain form that resembles a sinc function  with  a main frequency lobe with many lower magnitude side lobes  The rectangular window  has a narrower main lobe but at the expense of much highe
135. indow to the audio signal to obtain  each image row  Therefore if any form of transformation has been performed  signal dis   continuities occur in the audio signal at points where the raster image rows join  since the    transformation process will not change each row in the same way     6 3 Timbral Mode Analysis    In timbral analysis mode  the 2D Fourier spectrum provides a display of the sub sonic  variation of harmonic audible frequency content of a signal  The audible frequency axis  of the 2D spectrum are set to precisely match the harmonic frequencies of the signal   based on the detection of the fundamental frequency  The rhythmic frequency points do  not necessarily correspond to the precise rhythmic variations within the signal and so 2D  spectral components are smeared in the rhythmic axis  The rhythmic frequency interval  is defined in equation 6 2 and therefore it is intrinsically related to the pitch of the signal     rather than any rhythmic analysis component     6 3 1 Instrument Timbres    Two dimensional timbral analysis of instrument notes depicts the internal oscillations of  the signal harmonics  Notes of any pitch from the same instrument will produce a similar  form in the 2D Fourier spectrum  This is demonstrated in figure 6 9 which shows the 2D  Fourier spectra of 3 seconds of four piano note recordings  ranging from C1 to A4  This  figure shows a common pattern in all four spectra  however as the pitch increases  the  harmonic content is stretched f
136. ine execu   tion time  The execution time of the code is increased by the Profiler but the ratio between  execution times for each code line is accurate  The information obtained can be used to    identify where code can be modified to achieve performance improvements     The Profiler was used regularly throughout the software development  One example was    to investigate the performance of the rasterise function  This function originally used    189       nested for loops to perform the rasterisation process by single element operations as shown    in listing 7 3  and its execution speed was too slow           function image   rasterise array  width    calculate the required image size  height   ceil length array    width        create an empty matrix  image   zeros height   width     for i 1l height  for j 1 width  arrayindex     i   1  width  4j   if arrayindex  lt   length  array   image i j    array arrayindex    end  end    end       Listing 7 3  Original Rasterisation Process    The rasterise function was run 10 times using a sinusoidal test signal  at a frequency of  220 5 Hz with a 1 second duration at a sampling rate of 44 1 kHz  The raster width was    set to 200  the fundamental period of the sine wave     The average total execution time was 1 205 seconds  Each code line within the two for  loops was called for every sample of the input signal  44000 times  which led to a signifi     cantly long processing time     The design of the function was reassessed to pr
137. ing 5 4 shows how a process is removed from the process    array using the remove_process function           function handles   remove _process  handles  num   handles  processes  process  num  handles  processes num_proc   1         handles  processes  process  num 1 handles  processes  num_proc     handles  processes  num proc   handles  processes num_proc     l   handles  processes  process   handles  processes process 1l      handles  processes  num_proc     if handles  processes num_proc    0  set  handles tb_viewProc     Enable        off       set  handles  tb_viewProc     State       off       set  handles  tb_viewOrig     State       on       end    end       Listing 5 4  Removing a Transformation Process Using remove_process    If the current process  which is checked ticked on the menu  is selected  that process     settings window will be displayed for an adjustable process allowing the parameters to  be altered  for a fixed process no action will occur  This functionality is identical to the  button_clicked callback  which is called when a process button from the buttons array    is clicked     If a new process is selected  this will replace the currently existing process  This is the same  as creating a new process  but the array index is set so that the old data is overwritten in    the various arrays and no new button is added to the GUI     109       5 1 4 Components of a Generic Transformation Process    Section 5 1 2 introduced the fact that some transforma
138. ing and inverse spectrogram sonification   There is a large range of sound transformations possible with these techniques  such as  directly adding and removing components to change the sound timbre and time pitch  scaling of data  The application of image processing effects leads to some interesting  changes in the sound also  A list of spectrogram transformations available in Metasynth is    given in  27      2 6 Matlab Programming Environment    All investigation and software development has been carried out using Matlab  It integrates  computing  visualisation and programming facilities into a single environment and there   fore it is very well suited to an investigative programming project such as this  Software  development in Matlab is often much quicker than a lot of other languages  and its use  allowed this project to progress much further than would otherwise have been possible  Its    basic data element is an array that does not require dimension specification unlike scalar    29    languages such as C  This allows much more flexibility  It   s functionality focuses on matrix  computation and so is well suited to signal processing applications  The software package  provides a comprehensive set of pre programmed Matlab functions  M files   all of which  are well documented  These functions span many categories including mathematics  data    analysis  graphics  signal processing  file I O  and GUI development     Matlab is an interpreted programming language  37   
139. ins a    Bypass    button  which allows the user to temporarily bypass the processing of that particular transformation  when the processes are run  This buttons callback function enables or disables the editable  components of the figure and inverts the value of the bypass variable  This callback is  utilised in the adjust_  function to set enable disable the GUI components appropriately  according to the initial state of bypass  which requires the variable to be inverted before    bypass_Callback is called     The adjust_  function then stores the handles structure as its figure   s GUI data  and  calls a local function to set the callbacks of the GUI objects  as well as the figure   s  CloseRequestFcn which determines the actions performed on closing the figure window   Finally it makes the figure visible to the user and then calls Matlab   s uiwait function with  the    adjust    figure   s handle as an argument  This function blocks program execution in the    code sequence until the uiresume function is called with the same figure handle     Use of uiwait and uiresume means that the parameters of the process can be adjusted by  the user  whilst the adjust_  function is frozen  When the user closes the figure window   the function defined in its CloseRequestFcn property is called  which is set to my_closefcn    shown in listing 5 7           function my_closefcn  hObject  eventdata   handles   guidata gef     handles  processes  process  handles cur_process_num   handles
140. io with two temporal axes   The horizontal axis has the time granularity of an individual sample period  that is 1  fs  seconds  and the vertical axis has a lower time granularity dictated by the width of the  image  which can be defined as w f  seconds  where w is the image width in pixels  The  amplitude of the audio signal at each sample point is displayed as a grayscale value creating    an image display of the time domain signal     4 2 2 Implementation of Rasterisation    The rasterisation process has been implemented in Matlab in rasterisation  a low level  function that has been abstracted from its application  It has two input arguments  the  one dimensional data array to be rasterised  array  and the integer width for the resulting    image  width      The function was designed with the option of row overlap in mind  which would abandon  one to one mapping in favour of separating vertical time granularity from row width  This  feature was not required in the final implementation hence the hop variable is set to equal  width at the start of the function  To ensure fast operation it was decided that the width  variable would always be integer valued and any variable passed in as width by the soft     ware tool would always be an integer  The rasterisation function is shown in listing 4 1        function image   rasterise array  width    RASTERISE converts between data from a 1D representation to a 2D  representation using raster scanning          54    This function c
141. ion  1991     C  Roads  Pitch  amp  rhythm recognition  In The Computer Music Tutorial  chapter 12   MIT Press  1995     C  Roads  Microsound  MIT Press  2001     E  A  Robinson  A historical perspective of spectrum estimation  In Proceedings of  the IEEE  volume 70  pages 885 907  September 1982     M  H  Serra  Introducing the phase vocoder  In C  Roads  S  T  Pope  A  Piccialli   and G  D  Poli  editors  Musical Signal Processing  Swets  amp  Zeitlinger  1997     X  Serra and J  O  Smith  Spectral modelling synthesis  A sound analysis synthe   sis system based on a deterministic plus stochastic decomposition  Computer Music  Journal  14 4  12 24  1990     S  W  Smith  The Scientist and Engineers    Guide to Digital Signal Processing  Cali   fornia Technical Pub   1st edition  1997     I  Y  Soon and S  N  Koh  Speech enhancement using 2 d fourier transform   EEE  Transactions on Speech and Audio Processing  11 6  717   724  2003     T  Tolonen  V  Valimaki  and M  Karjalainen  Evaluation of modern sound synthesis  methods  Technical Report 48  Helsinki Institute of Technology  March 1998     H  Valbret  E  Moulines  and J  P  Tubach  Voice transformation using psola tech   nique  In IEEE International Conference on Acoustics  Speech and Signal Processing   volume 1  pages 145 148  San Francisco  CA  USA  1992     WSC  Css level 3 color module specification  http   w3 org TR css3 color       J  J  Wells  Real Time Spectral Modelling of Audio for Creative Sound Transformatio
142. ions this requires specification of the initial analysis requirements using  the GUI described in section 4 5  This is done using the analysis_settings function     which stores the chosen parameters in the analysis_settings data structure     The function analyse_audio was written to carry out the initial signal analysis once the    settings have been determined  It initialises the cell array variables image and FT at the    52    required size as well as the necessary variables within analysis structure  which vary  according to the chosen settings  The calc_image function is called to convert the audio  to one or more images by rasterisation  according to the analysis settings and then the  spec2D_init function obtains the complex Fourier data from the image  Once both 2D  representations have been obtained  the loaded function is called  as with TDA files  to  adjust the settings of the GUI and its components to allow the user access to analysis  and processing functionality  The loaded function ends by calling display_data which  will plot the data representations on the GUI  The details of plotting the raster image  and the 2D Fourier spectrum will be covered in sections 4 2 5 and 4 4 respectively  The  display_data function ends by storing the handles structure as GUI data for the main    figure window  to retain the new signal data and settings     4 1 1 1D Fourier Spectrum    The 1D Fourier data is calculated as it is plotted since it is a quick straightforward proce
143. iption of Sub level Data Structures    3 4 2 Callback Functions    The data flow within the software is event driven  based on the interaction between the    user and the GUI components  Matlab GUI components allow the use of callback func     tions to define their interactive behaviours  Listing 3 1 shows the definition of a generic    object callback and the corresponding function  It also demonstrates the use of GUI data    to pass variables into the callback function without explicitly naming them as arguments     When GUIDE is used the generated callback functions receive the handles structure as    an argument automatically  however all GUI windows but the main one were developed    programmatically and so their callbacks use the guidata function to obtain the handles    structure           guidata component_handle  arg3      set component_handle     Callback       function_name  argl  arg2           function function_name source_handle  eventdata  argl  arg2        arg3   guidata source_handle      end    42             Listing 3 1  Defining a GUI Object Callback and Using GUI Data    3 4 3 GUI Interaction    Most of the peripheral GUI windows were concerned with input  adjustment of parameters  for analysis and processing algorithms  Often it was not just a case of obtaining a few  numbers and returning them  many program variables were used in calculations to update  several display objects or set several variables upon data entry  Each of these GUIs was a  Matlab fi
144. is    framework     This framework uses a 2D Fourier transform to analyse the audio  however the samples must  first be arranged into a 2D array  In order to do this Penrose uses the STFT  as in equation   2 11   giving an array of complex valued time frequency points X l  v   For a frequency  v  the L components of X l  v  can be considered as a complex valued time domain signal   The signal X   v  can be reinterpreted as N time domain signals  each representing the  changes in amplitude of a particular frequency v  over time  Taking the N point DFT of    these time domain signals provides a pure frequency domain representation  The process    20    can be expressed by substituting equation  2 11  into the following equation     Y  u v    XC ALYX E  ve FOr  l              2 15     ll       This formulation of the 2D DFT differs from the one in equation  2 4  because the time  frames of the STFT can overlap  Therefore the number of output points in Y  u v   which  is  L x N   is not necessarily the same as the number of input points  M  The overlap of  frames prevents the frequency resolution of both dimensions from being implicitly linked   giving more flexibility in the analysis  The inverse 2D Fourier transform process can be    achieved by substituting equation  2 14  into the following equation           i  Al SY  u  vjet    u    1    XL     2 16     ll      fond  I            pi    The STFT framework presents a two dimensional signal representation where one dimen   sio
145. is accessed by calling proc_names with the array index of    that process    name     5 1 2 2D Spectral Processes Window Design    Once a signal has been loaded and displayed in the main GUI window  the    2D Spectral    Processes    window is created using the function create proc_panel  This window  shown    104    in figure 5 2  allows the user to add and remove signal transformation processes to the pro   cessing chain in a similar style to DAW plug ins  It is displayed by clicking the    Processing     button in the main GUI     When the    2D Spectral Processes    window is created create_proc_panel first determines  the number of buttons required in the window  This corresponds to the number of trans   formation processes in the chain  plus an extra    Empty    button underneath to allow a new  process to be added  If the audio signal has just been imported into the software then  there will only be an    Empty    button  as in figure 5 2a  however a signal can also be loaded  as a TDA file with the process chain data  such as in figure 5 2b  The GUI figure   s height  and position change according to the number of buttons displayed  so they are calculated    in create_proc_panel before the figure object and the required buttons are created      lt Student Version gt    Processes    2D Spectral Processes   Filter lv    lt Student Version gt    Processes Threshold fv    Stretch Rhythm  V   2D Spectral Processes re   Resize Spec   v    z        a  No Transformation Proce
146. is is because the 2D Fourier transform is  implemented by two sequential DFT processes on the same audio data  the second one is  operating on the complex result of the first  The complex DFT does not produce sym   metrically valued results but it is thought that the relationship between the positive and    negative frequency points is linear     When the analysis of a signal component is not synchronised with its audible or rhythmic  frequency  the signal is displayed at an angle in the raster image and the corresponding 2D  spectrum points are skewed according to this angle  The spectral energy of this component  is also smeared between adjacent points because it   s audible and rhythmic frequencies do  not precisely match the analysis frequencies  This reduces the clarity of the 2D Fourier    spectrum representation     Concise 2D spectral analysis  with less spectral smearing of frequency components  is  achieved when the raster image width is set according to analysis of either pitch or tempo   When the raster image width is synchronised according to the pitch of the signal  the audi   ble frequency content is well defined since frequency bins correspond to signal harmonics   When the raster image width is synchronised according to the tempo of the signal  the  rhythmic frequency variation is well defined since frequency bins correspond to prominent    rhythmic variations at the signal   s tempo     The sample rate is a limiting factor when attempting to synchronise audi
147. is used and  the functions load_tda and save_tda have been written to allow these files to be used   These functions are very simple  a file browser dialogue is presented as with the other I O    functions introduced in this section and then if a file is chosen  the data is stored retrieved     There are three data structures stored in each TDA file  data  original data and processes   which are described in table 3 1  All the data representations and analysis settings are com   pletely contained within data but the other two structures are required in order to retain  the 2D Fourier processing chain and its settings  this is covered in chapter 5  In order to  save the structures they have to be renamed so that they are not within a higher level    structure  therefore when loading the variables have to be reinserted into the handles    48    structure  Once a TDA file has been loaded  the loaded function is called which sets up    the GUI and internal variables accordingly     3 7 Audio Player    The audio player allows the audio data to be played through the host computers audio out   put device at the sample rate stored in the audio settings  which is obtained from the source  audio file  The software tool uses Matlab   s audioplayer object to control audio playback   This object is created by passing the audio data and settings into the audioplayer func   tion  as shown in listing 3 5  Once instantiated the audioplayer object provides methods    and properties that allow 
148. ise function  which can divide the audio array into a 2D matrix where each row  contains a frame  Before each frame can be converted to a raster image  its pitch must be  determined to define the image width  The pitch_map function  shown in listing 4 6 was  written to take in the array of data frames and analyse the pitch of each one to produce    the required variables          calculate fundamental period values for each frame  for frame   1 handles data  analysis  num_frames    display message box informing of pitch calc  msg       Calculating pitch for frame      num2str frame            num2str   handles data analysis num _frames              hMsg   msgbox msg    Pitch Calculation         replace         calculate pitch    yin_vec   zeros floor handles data  analysis  frame_size  0 5   1    period   yin pitch_ map_frames frame     0 15 yin_vec    if period    warning      The pitch for frame      num2str frame       is  undefined              A default period of 500 samples will be used        hWarnDlg   warndlg warning    Pitch detection error       uiwait  hWarnDlg     period   500    end   handles data  analysis  period frame    period    handles  data  analysis  pitch frame    samples2freq period       handles data audio_settings  Fs        handles data  analysis note frame      handles data  analysis  octave   frame              freq2note handles data analysis  pitch frame         84    end  delete  hMsg       end       Listing 4 6  Calculating Pitch For Frames Usi
149. ivisions  from 1  a semibreve  up to 128  a semihemidemisemiquaver  which includes  triplet divisions  The numerator simply allows a range from 1 up to the first integer that    makes it longer than the signal duration     81    000  lt Student Version gt    Audio Analysis Settings          a  Rhythmic Mode    000  lt Student Version gt    Audio Analysis Settings     b  Analysis Mode  Tempo Sync Off     000  lt Student Version gt    Audio Analysis Settings     c  Analysis Mode  Tempo Sync On     Figure 4 11  Audio Analysis Options GUI    82    The GUI not only displays the analysis options but also supplementary information to help  the user decide on appropriate settings  The signal duration is displayed in seconds and  if in rhythmic or tempo synced timbral mode it is also displayed as the exact number of  the current note denomination  For example  at 120 bpm a signal duration of 8 seconds  can also be displayed as 16 4 beats and at 110 bpm it would be 14 6667 4 beats  This can  help the user determine the correct tempo  for example if they know the loop is exactly    four bars long then the correct tempo setting would lead to a duration of 4 1 beats     When the note division is changed  the numerator option is updated to get the near   est value to the previous frame size using the update_ndivs function  The function  update duration is then called to update the duration in terms of the current tempo  division  this function is also called when the tempo value is changed
150. ken from The University of Iowa Musical    Instrument Samples  It has the dynamic mf and the note Cl     Piano  mf F2 wav  A short extract of a piano recording taken from The University of Iowa Musical    Instrument Samples  It has the dynamic mf and the note F2     simple120 wav  A 4 bar drum rhythm programmed in a MIDI sequencer using Native Instruments    Battery 3   s    Basic Kit    drum samples     sine1l6 wav    A sine wave at 220 5 Hz  synthesised in Matlab     trumpetG3 wav  A short extract of a B  trumpet recording taken from The University of Iowa Musical    Instrument Samples  It has the dynamic mf and the note G3     TrumpetG4 wav  A short extract of a B  trumpet recording taken from The University of Iowa Musical    Instrument Samples  It has the dynamic mf and the note G4     Violin arco mf sulG G3 wav    A short extract of a violin recording taken from The University of Iowa Musical    215    Instrument Samples  It has the dynamic mf  and the note G3 played arco on the G    string     e Violin arco mf sulG G4 wav  A short extract of a violin recording taken from The University of Iowa Musical  Instrument Samples  It has the dynamic mf  and the note G4 played arco on the G    string     Column Shift    These audio files have been processed using the column shift transformation   e simple120_eighthwidth_shift50_leave wav      Shift  50      Option  Leave original rows  e simple120_eighthwidth_shift50_wrap wav      Shift  50      Option  Wrap rows around  e simpl
151. l Information       Figure 4 12  Info GUI Window Allowing Pitch Analysis Adjustment    The    Edit Frame    button toggles between a static text display of the frame specific param   eters and the editable objects shown in the figure  In timbral mode the analysis is adjusted  in terms of image width  however in rhythmic mode there is no image width parameter   since this is fixed in the analysis settings GUI  Instead the underlying period parameter  is used to adjust the signal analysis  which is calculated from the pitch frequency  Pitch  frequency  note and image width are all linked  so when one is changed  the others must be  recalculated  The utility functions note2freq and freq2samples were written to perform  the inverse operation to the freq2note and samples2freq functions already introduced   When a new note value is selected by the user  the pitch frequency is calculated using    note2freq  shown in listing 4 8           function freq   note2freq  note   octave   ZNOTE2FREQ determines the frequency corresponding to a given note and  octave       freq   note2freq  note  octave     switch lower  note     7 7    case    c     num   QO     case    c      num   1     89          case      d      num   2   case    d      num   3   case    e      num   4   case Gf      num   5   case    f      num   6   case   g      num   7   case    g      num   8   case    a      num   9   case    a      num   10   case b      num   11     end    midi_note   num   12  octave 2    freq  
152. l fall    11    between these analysis bins  When a frequency component is not located directly in the  centre of an analysis channel its energy will be spread across adjacent bins  27   This is  a major source of uncertainty in the DFT analysis  the frequency spectrum may indicate  components that are not actually present in the signal  and particularly when more than  one component is between analysis channels     The range of frequencies covered is from 0 Hz to       f  Hz  27   However the values in the    frequency spectrum are reflected about x  1 due to the periodicity of the spectrum  so the  DFT signal X  v  contains redundant information for values of v above this point  Therefore  the longer the duration of x  n   the greater the resolution of the frequency analysis will be   36      2 1 2 Fast Fourier Transform    The DFT was not efficient enough to be widely used in digital signal processing applications  because it requires     N    complex multiply  amp  add operations  The fast Fourier transform   FFT   introduced in the 1960s  5   reduced the complexity of the operation to     Nlog  N   multiplications by exploiting the redundancy in the spectrum  This improved efficiency  has allowed the use of the Fourier transform in many signal processing operations and    real time Fourier processing is now commonplace     There are many different implementations of the FFT available  however MATLAB has  built in functions  which employ the FFTW  the fastest Fourier tran
153. lated analysis  settings must be adjusted after the new FT array has been obtained  The period and  num_periods arrays are also interpolated to the correct size  allowing the transformed    data to be resynthesised and converted back to a 1D audio representation     Once both frequency axes have been resized in this way  the signal analysis parameters    need to be updated to reflect the new spectrum size  as shown in listing 5 26        handles data analysis frame_size   size handles data FT 1  2    handles   data  spec2D_settings  width_pad      handles data analysis  imwidth   handles data analysis frame size     149       handles data analysis num_frames   size handles data FT 1  1    handles   data spec2D settings height_pad    handles data analysis_settings frame_size_secs   handles data analysis   frame_size handles data audio_settings Fs   handles data analysis_settings tempo    240 xhandles data   analysis_settings tempo_div_num       handles data analysis_settings  frame_size_secs handles  data   analysis_settings tempo_div_den     handles data audio_settings dur   handles data analysis  num_frames     handles data analysis_settings frame_size_secs        Listing 5 26  Adjusting Analysis Parameters After Resizing The Spectrum in    process_spec_resize    5 9 4 Recalculating Resize Process Properties    Some parameters of the resize spectrum process are related to the size of the unprocessed 2D  Fourier data array  When the signal analysis is changed  section 4 6 4  
154. long processing  times  It is assumed that a user would rarely require more than 5 transformation processes    at once     Many of the available processes have adjustable parameters that can be set by the user  when the process is applied and also readjusted at any time  This processing is not in  real time so the transformations are destructive  losing the original signal information  A  copy of the original unprocessed data has to be stored to allow this process parameter  adjustment  giving the user the impression that processing is non destructive  The pro   cessed data is recalculated from this original data each time a new transformation process  is added  removed or adjusted  The user can also switch the display between the original    and unprocessed signals to analyse the effects of the signal transformations     5 1 1 Data Structures    After an audio signal is imported into the software tool and analysed  as described in  section 4 6  the data structure  containing the signal representations and analysis data  is  copied into the original_data structure  This is done at the end of analyse_audio just  before the loaded function is called  The original_data structure maintains a copy of  the unprocessed data  and is only overwritten when the analysis settings are changed by    calling analyse_audio again  section 4 6 4      The data structure stores the current data representation  as displayed in the main GUI     If the user chooses to view the original unprocessed 
155. lows    115       the data to be reanalysed even when transformation processes are applied     Two of the transformation processes  filtering  section 5 2  and resizing  section 5 9  of  the spectrum data  use the signal analysis settings when calculating their process pa   rameters  Therefore when the analysis settings of the signal data are altered  these pro   cesses need a means to recalculate their parameters appropriately  The two functions  calc filter_properties and calc_resize properties were written to adjust the nec   essary parameters of the filter and resize signal transformations when analysis settings are  changed  The details of each of these functions will be covered in the respective process    sections  however their integration into the software tool is discussed here     Before run_processes is called  a loop is set up so that parameters are recalculated for  any filter or resize processes in the process array  Listing 5 11 shows an excerpt from the    end of the analyse_audio function to clarify the sequence of instructions             calculate images   handles   calc_image  handles       create the spectrum images   handles   spec2D_init  handles       store data as original also  handles  original_data   handles  data       get processed data      must recalculate the process properties according to the new data  if    necessary  for n   1 handles  processes  num_proc  switch handles  processes  process n  type  case handles proc_names 2    filter  ha
156. mensions as an image  When the row width is correctly set accord   ing to a signal periodicity the raster image can display slow variations of the waveform    characteristics in the vertical axis     The raster image and the 2D Fourier spectrum are displayed visually in the software  tool along with the 1D time and frequency domain representations  A novel method of  displaying the complete 2D spectrum data has been implemented by converting from a  polar representation to a colour value  where brightness corresponds to magnitude and hue  corresponds to phase  This process incorporates brightness and contrast parameters to  allow the range and scaling of the data to be adjusted  allowing a flexible investigation of    the signal  Matlab   s plot tools are utilised to provide a detailed analysis environment     The fundamental component of the 2D Fourier spectrum can be considered as a sine wave    196    with a stationary audible frequency and a stationary sub sonic frequency of rhythmic vari   ation  which is an amplitude modulated sine wave  When the analysis frequency points of  the 2D spectrum are precisely synchronised with the audible and rhythmic frequency of this  component  it is represented by four distinct points symmetrically placed in each quadrant  of the spectrum  These points do not have identical magnitude values and correspond   ing opposite phase as with the two points that represent a single sinusoidal component  in the 1D Fourier domain representation  Th
157. mental frequency of the analysis which is synchronised with the pitch of the  unprocessed signal  When the data has been interpolated in this axis  the resulting signal  has the same pitch but the harmonic content of signal has been spread  This is demon   strated in figure 6 16  which shows the 1D Fourier magnitude spectrum for a trumpet note  and the spectrum of the same signal after pitch shifting  The magnitude envelope is inter   polated over a different range of values but these components are still harmonically related    to the same fundamental pitch     176        a  Original Waveform        b  Pitch Shifted Up 5 Semitones        c  Pitch Shifted Down 7 Semitones    Figure 6 15  Results of the Pitch Shift Process on a Trumpet Note in 1D Magnitude    Spectrum    The interpolation also maintains a component at the original fundamental frequency caus   ing an oscillation across the raster image width  The raster images of the pitch shifted  signals shown in figures 6 15b and 6 15c are given in figure 6 16a  along with that of the    unprocessed signal  to demonstrate the effects of this process on the time domain waveform     177        a  Original Waveform        b  Pitch Shifted Up 5 Semitones  c  Pitch Shifted Down 7 Semitones    Figure 6 16  Results of the Pitch Shift Process on a Trumpet Note in 1D Magnitude    Spectrum    In rhythmic analysis mode the resolution of the audible frequency axis is much higher and  the pitch of the signal can be changed  When the rast
158. methods  rather than the data representation    itself     172    In the rhythmic frequency dimension  spectral components can be filtered according to  their sub sonic variation without changing the audible frequency range  which produces    unique and musically useful results     For rhythmic analysis mode  the filtering changes the variations between rows of the raster  image  When properly tempo synchronised this alters the variation between adjacent beats  of the signal  To demonstrate the process the signal loop120 wav was rhythmically filtered  using a low pass ideal filter with a cut off of 0 Hz  so that only the DC rhythmic frequency  row was present  The result was an audio signal with the same audible frequency range  but no variation between quarter notes  the raster image width duration  The raster image    of the resulting signal is shown in figure 6 13           Figure 6 13  Raster Image of loop120 wav Rhythmic Frequency LP Filtered With a Cutoff  of 0 Hz    Many variations of the same rhythmic audio signal can be produced using rhythmic fre   quency filtering  Changing the raster image width changes the beat duration on which the  filtering is based and therefore alters the output of the filtering process  even when the    same filter settings are used     In timbral mode rhythmic frequencies cover a much larger range  describing the internal  sub sonic oscillations between audible harmonics  The low frequency range corresponds to    the slowly varying compone
159. n        stopFcn   handles         function stopFcn hObject  eventdata   handles        stopFcn is called when the audioplayer stops     restart playback if loop is true and player stopped naturally  if    handles  player_settings stopped  amp  amp  handles  player_settings loop  play_audio  handles    end    end          Listing 3 7  Player Looping Using StopFcn    51    Chapter 4  Two Dimensional Audio Analysis    A large proportion of the project was devoted to investigating 2D audio analysis techniques  and developing the software capabilities to employ these techniques appropriately and effi   ciently  This chapter describes the details of 2D audio analysis as performed in this project   both from theoretical and implementation perspectives  It has already been established in  section 3 1 1 that two 2D signal representations are required  one in the time domain and  one in the frequency domain  The properties of these signal representations will be fully  explained  However it is first necessary to explain the analysis options presented to the  user upon importing audio into the software tool  since this defines the structure of the  signal representations  The chapter will also detail the implementation of analysis tools    within the software     4 1 Analysis Process    Once the audio data has been imported into the software tool  a series of processes must be  carried out to obtain and display the various data representations  With regards to the 2D  audio representat
160. n   PhD thesis  University of York  2006     J  J  Wells  Writing mex files for matlab  MA MSc Diploma Music Technology Audio    Processing Techniques and Environments Laboratory Script  University of York  2008     W  S  Yeo and J  Berger  A framework for designing image sonification methods  In  Proceedings of 11th International Conference on Auditory Display  Limerick  Ireland   July 2005     W  S  Yeo and J  Berger  Application of raster scanning method to image sonifi     203    cation  sound visualization  sound analysis and synthesis  In Proceedings of the 9th  International Conference on Digital Audio Effects  Montreal  Canada  2006     204    Appendices    205    Appendix A    Correspondence With External    Academics    All correspondence with academics other than the two project supervisors is documented    here     A 1 Complex Colour Representation    The process of converting between a polar data representation to an RGB colour value was  based on the work by Jason Gallicchio  a graduate physics student at Harvard  His personal  website  15  contains a Java applet demonstrating the properties of the 2D Fourier trans   form of images which used displayed Fourier data using a colour representation  Gallicchio    was contacted regarding this technique and his reply is reproduced in this section     A 1 1 Email Correspondence  Here is the e mail correspondence with Jason Gallicchio about his work in  15  on conversion  from complex data to a colour representation    
161. n denotes time at intervals of H  f  seconds and the other dimension denotes frequency  components at intervals of f  N Hz  The 2D Audio Analysis framework  on the other  hand  presents two dimensions of frequency information  One dimension is the same as  in the STFT representation  frequency components at intervals of f  N Hz and the other    gives a lower frequency analysis at intervals of f  LH Hz     The intention behind this 2D DFT process in  23  was to achieve an analysis of the subsonic  variations of audible frequencies in spectral form  hence Penrose refers to the two frequency  dimensions as the audible and rhythmic dimensions  To clarify the concept  the value of Y  at point  u v  corresponds to the amplitude and phase of a sinusoidal signal at an audible    frequency of v x f  N  that varies in amplitude at the rhythmic frequency u x f  LH     2 2 8 An Application of Two Dimensional Fourier Analysis of    Audio    Speech enhancement using the 2D Fourier transform was implemented in  32   Noise reduc   tion filters common in image processing  such as the 2D Weiner filter were applied to noisy  speech signals  However although this technique produced effective results for speech free    segments of the audio  during speech the noise was still quite high and musical in timbre    21    since its energy was correlated to the speech spectrum  Although the author did not spec   ulate  this may be to do with the different requirements of noise reduction in image and  au
162. nction polar2colour was written to perform this data conversion  It takes in the  magnitude and phase data arrays of size  m by n  and returns a single  m by n by 3  RGB  colour array  This function was developed after correspondence with the author of  15  and  with the benefit of the Java source code for that Java applet  both of which are available  in appendix A     71       White          Green  120    Yellow  60       Cyan  180         Blue  240    Magenta  300                   Black     a  HSL Colour Space    Green  120    Yellow  60            Cyan  180           Blue  240    Magenta  300                    Black     b  HSV Colour Space    Figure 4 8  Comparison of Colour Space Representations  after  1     72    The initial conversion is based on the hue  saturation and lightness  HSL  colour model   shown in figure 4 8a  This system is often considered to be a far more intuitive representa   tion of colour  35   The phase component is mapped to hue and the magnitude to lightness   with a full saturation value  These mappings are demonstrated in the following equations     where FT is the complex Fourier data     H  Alir  4 11   L   arctan  FT      2 7   4 12     Matlab can deal with colour in the HSV colour space  figure 4 8b  but not HSL  so a  conversion must be performed  The differences between HSV and HSL are given in  11   and are repeated here  Hue  H  is defined exactly the same in both systems  In the HSV    system  value  V  and saturation  Sy  are define
163. ncy  Hz  x 104    Figure 4 13  Custom Data Cursor    event_obj   s    Target    item and inspecting this item   s    Parent    property  The    Position     property of event_obj can be used to identify the indices of the cursor pointer within the    data  this is stored in the local pos variable     For the audio waveform plot  the time and amplitude values can be displayed directly from  the pos array  The 1D Fourier spectrum is almost as simple  the magnitude and frequency  data can be displayed directly from pos but it was decided that phase should be displayed  in degrees rather than the 7 radians scale used on the display  because many people are    more familiar with the degree representation     The raster image plot provides four values in the data cursor  the  x y  co ordinates of  the cursor within the image  and also the time and amplitude settings  demonstrating the  relationship with the audio waveform  Time is calculated using the sample rate and image    width parameters  in the following equation     time_index     pos 2      1  x imwidth current_frame    pos 1   Fs  4 26     Where Fs is the data sample rate  The data structure prefixes have been removed from    95    the variable names for clarity  The amplitude is obtained from the raster image data in    the image cell array  using the pos co ordinates     The data cursor for the 2D Fourier spectrum is slightly more complex  The pos vector  contains the frequency values of the cursor point  so the indices 
164. ndles   calc_filter_properties  handles  n  false    case handles  proc_names 15    resize  handles   calc_resize_properties  handles  n     end  end  handles   run_processes handles   handles  processes  num_proc       complete loading  loaded  hObject   handles       116          Listing 5 11  Recalculating Process Parameters and Running Processes at the End of    analyse audio    The resize process  and also the rotation process  section 5 4   can alter the signal analysis  parameters during their processing  Therefore any time either of these processes is run   any subsequent resize or filter processes must again recalculate their parameters according  to the new analysis settings  Within the switch   case    statement of run_processes   the resize and rotate cases both contain similar code to analyse_audio allowing the pa   rameters of filter and resize processes to be recalculated  though only subsequent processes    need to be considered  as shown in listing 5 12           if n lt handles  processes  num_proc  for m   n l handles  processes  num_proc  switch handles  processes  process  m  type  case handles  proc_names  2   handles   calc_filter_properties  handles  m  false    case handles  proc_names  15   handles   calc_resize_properties  handles  m     end  end    end          Listing 5 12  Recalculating Process Parameters in run_processes    The index n identifies the current process in the list i e  a resize or rotate process  and  the incrementing index m allow
165. nds to the sample points of the source data  in this case the upsampled data  and xi  gives the required sample points for the interpolated output data  which is at the audio  sample rate in this case  To create arrays x and xi  the size of the frame at the original  audio sample rate must first be calculated using the period size and the number of periods  contained in the frame  The frame can then be resampled and stored in the tempo_frames    array             rhythmic mode  pitch   synced     resample each row by interpolation  height   size handles data image 1  1    width   size handles data image 1  2       tempo frames   zeros  height  handles data analysis  frame size  2       100          for frame   1 height  Y   handles  data image 1  frame        num_samples   ceil handles data  analysis  period frame        handles  data  analysis  num_periods frame       x   linspace 1 num_samples  width     xi   linspace 1 num_samples  num_samples     tempo_frames  frame  1 num_samples  interp1 x      handles data image 1  frame     xi    spline       end  handles data audio   derasterise tempo_frames         handles data  analysis  frame_size          Listing 4 13  Calculating audio From The Raster Image In Pitch Synchronised Mode    Once all of the frames have been resampled  the tempo_frames array is derasterised to  produce the audio signal  This is when the hop argument of derasterise is utilised  By  setting it to frame_size  the rows of the tempo_frames array can be ins
166. nents that were logically grouped    according to the information they portrayed     3 4 Data Structures and Data Handling    The software tool handles a large amount of data  the data structures and the manner  in which the software handled this data were important aspects of the software design   Matlab provides a mechanism for associating data with a GUI figure  which can be written  to or read from at any time using the guidata function  Each graphics object in Matlab  has a unique identifier called a handle  which can be used to adjust obtain the object   s  properties and is used to associate data with a GUI figure  whether using the figure   s  handle or that of a child object  Only one variable can stored in GUI data at a time  so to    store more information a structure is used     Matlab   s GUIDE tool  section 3 3  uses the GUI data to create and maintain the handles  structure  which contains the handles of all components in the GUI  This structure has  been extended in my software to contain all program data organised into relevant substruc   tures  as shown in figure 3 3  The handles structure also contains the audioplayer and  various plot tool objects  a matrix representing a unit circle  see section 4 8 3  and some  additional Boolean variables used to define program behaviour  The M file app  which was  automatically generated by GUIDE  contains the function app_OpeningFcn which allows  this program data to be initialised with the default settings just befor
167. ness  gt 0 5    saturation  hsv m n 2    2   2 xlightness  m n      value  hsv m n 3    1   end  end  end    hue with no banding correction  hsv     1     phase   pi    2 pi         Listing 4 4  HSL to HSV Conversion in the polar2colour Function    4 4 2 Spectrum Display Options    The function polar2colour has a mode input argument  which determines whether the  RGB image returned displays the magnitude data  the phase data or both combined  In  combined mode  the default  the data is calculated as in listing 4 4  For a magnitude only  display  all of the RGB values are set to the normalised magnitude value  producing a  grayscale image  To display only the phase information  saturation and value are both set    to 1 and the hue shows the phase information  These options are presented to the user    74       through the 2D spectrum mode menu  a sub menu of    Plot Settings    and the selection is    stored as a string in the spec2D_mode variable within the plot_settings structure     4 4 3 Brightness  amp  Contrast of Display    In order to allow a more flexible display  additional brightness and contrast parameters  have been introduced into the algorithm and sliders are provided on the user interface to  allow their adjustment  The brightness parameter adjusts the overall brightness of the  display by scaling the magnitude values and the contrast parameter is used as an exponent    of the magnitude data to adjust the display scale     The brightness slider   s value has a 
168. ng 5 19  Shifting Rows Of The 2D Spectrum in process_row_shift       The data is shifted using sub arrays of the oldFT array to increase the process operation    speed  Matlab processes array mathematics much faster than program loops  since it is    an interpreted programming language  section 2 6   Note how  when the wrap variable is    true  the new 0 Hz row  at the index given by middle  is defined in two halves  The neg     ative audible frequencies are obtained from the row that originally had negative rhythmic    frequency and the positive audible frequencies are obtained from the row that originally    136    had positive rhythmic frequency  This ensures that the symmetry of the 0 Hz rhythmic    frequency row is maintained  see section 4 3 2      5 6 2D Fourier Spectrum Column Shift    The column shift process allows the columns of audible frequency content to be shifted  within the 2D Fourier spectrum  giving them a new audible frequency value  This process  can be used to adjust the audible frequency content of an audio signal whilst maintaining    the same rhythmic structure     The parameters and implementation of the column shift process are identical to those of  the row shift process  section 5 5  apart from the frequency axis on which the process    operates and therefore these details are omitted     5 7 Pitch Shifting with the 2D Fourier Spectrum    Conventional 1D frequency domain pitch shifting is obtained by scaling the frequency  values of each component
169. ng pitch_map    The yin function  described in section 4 7 1  is used to determine the fundamental period  of each frame and the results are stored in the period array which is in the analysis  structure  The yin function can return a period of 0 when it fails to calculate the pitch   in which case the frame is given a default period of 500 and the user is warned  The pitch  can then be set manually  after the initial analysis is complete  This is discussed in section  4 6 4     The pitch information is made available to the user in terms frequency and musical note   so these are calculated in pitch_map too  Pitch frequency is calculated from the period    data using the simple samples2freq function  which performs the following operation          F       4 22     4 22     Where f is the pitch frequency in Hertz  n is the number of samples in the period and F  is  the sampling rate  This frequency value is stored in the pitch array within the analysis  structure  Musical note value and octave number are determined from the pitch frequency  using another utility function  freq2note  This function calculates the nearest MIDI note    number  4  to the frequency  by this equation     midi_note   round 69   12 x log   f  440     4 23     Matlab   s round function rounds the input argument to the nearest integer  The note name  and octave can easily be obtained from the MIDI note number and are returned to be    stored in note and octave arrays within the analysis structure     The
170. note  This rhythmic duration has a period of 5512 5  f  8  which  has to be rounded to 5513  The spectrum shape is therefore skewed and smeared since  the rhythmic frequency bins no longer precisely match the rhythmic frequencies of the 2D    spectrum components     164    2D Fourier Spectrum    Frequency  Hz   o         2  1 5  1  0 5 0 0 5 1 1 5 2    Frequency  Hz  x 107     a  Eighth Note Width Duration    2D Fourier Spectrum    Frequency  Hz    gt      2  1 5  i  0 5 0 0 5 1 1 5 2    Frequency  Hz  x10      b  2D Fourier Spectrum    Figure 6 8  Limitations of Raster Image Width When Reducing Row Duration    6 2 3 Pitch Synchronised Rhythmic Analysis    The pitch synchronised rhythmic analysis mode has some fundamental problems  The  idea was to resample each row to ensure that each row contained a whole number of    periods of the signal  Firstly  the signal cannot be guaranteed to start at a zero crossing    165    at the beginning of each row and therefore a whole number of periods may still result in a  discontinuity between the beginning and end of the row  More importantly  resampling the  content of each the raster image row separately removes makes the 2D Fourier spectrum  display incorrect  It is not know what frequency a component represents in the 2D Fourier  representation because it   s sample rate is not known  Frequency domain PSOLA processing    uses a variable frame size not a variable sample rate     Rasterisation essentially applies a rectangular frame w
171. ntaining the original pitch  This  would essentially alter the tempo of the signal  It was predicted that adjusting the height  of the spectrum would change the duration of the signal whilst maintaining the same tempo  because the same rhythmic frequency components would be oscillating over a larger num   ber of raster image rows  It adjusts the number of beats bars of the signal leaving the  tempo and rhythm unchanged  The resize process therefore attempts to allow indepen   dent adjustment of the tempo and the duration in terms of rhythmic beats  the rhythmic    duration      143    5 9 1 Resize Process Parameters    The resize process has several parameters that need to be initialised in the create_spec_resize    function  when the process is initialised  Most of them are required to display to the user     in the resize process    GUI window rather than to run the process  The process variables  are described in table 5 4        Variable    Data Type    Default Value    Description          type    String       Resize Spec        The name of the process  obtained from  the proc_names array  Used to identify    the process as a resize operation        bypass    Boolean    false    Determines whether or not the filter  process should be bypassed when the    chain of transformations is run        min_quadrant_size    Integer array     3 51     This is a read only parameter that de   fines the minimum allow size of a 2D    spectrum quadrant        size    Integer array    The
172. nts that define the stable timbre of a sustained note  whilst    173    high rhythmic frequencies correspond to the note onset  The resulting audio waveforms  after high pass and low pass rhythmic frequency filtering of a trumpet note are displayed    in figure 6 14             a ee     a  Original Waveform           b  2     Order Butterworth 15 Hz Low Pass Filter        c  2    Order Butterworth 45 Hz High Pass Filter    Figure 6 14  Rhythmic Frequency Filtering of a B   Trumpet Note    Low pass filtering can therefore be used to reduce the attack of a note  below a certain  point it begins to effect the oscillations defining the timbre of the note  Band pass filtering    enables interesting amplitude modulations to be observed within the signal  which could    174    be useful for analysis of the sub sonic timbral variations     Using a Butterworth frequency response has little noticable effect in the rhythmic frequency  axis since the frequency resolution is generally too low  In the audible frequency axis  however a Butterworth frequency response provides a more familiar and comfortable effect  with less noticable resonance  Boosting of pass band frequencies can be used to provide  more subtle rhythmic variations to a signal  It   s effects are less noticable in the audible    frequency axis     6 4 3 Thresholding The 2D Fourier Spectrum    Magnitude thresholding of the audible frequency columns or rhythmic frequency rows is a  linear process that processes each quadrant
173. o say  on the matter since I   ve been working for nearly 5 months on this and not really had any    reference point to other research  I   ve occasionally felt a bit out of my depth     I   ve got a web page at http   www users york ac uk  cwp500   There   s not much infor   mation on there and the sound files are from my earliest investigations but there is a ZIP    file containing my MATLAB code if you   re interested   Thanks  Chris   On 21 May 2008  at 08 19  Christopher Penrose wrote   Hi Chris    Greg Smith forwarded your inquiry about my thesis to me  I have to look into the broken    forwarding on my music princeton edu address  Fire away any questions you may have   Best     Christopher    213    Appendix B  Accompanying Compact Disc    The accompanying compact disc provides an electronic copy of this report in PDF format   as well as a the Matlab code for the 2D Fourier software tool and a folder of audio examples    that demonstrate the results of 2D Fourier domain processing     The software tool can be run by setting Matlab   s workspace directory to the    Source    folder  containing the M file code and then entering app in the command line  Please note that  in order to use the tempo calculation function  the MIRtoolbox must be downloaded and  installed  20   The software was development on an Apple computer so whilst the M file  functions are system independent  the YIN algorithm must be recompiled as a MEX file  from yin c on the system on which it will be 
174. ocess                   0    135  5 6 2D Fourier Spectrum Column Shift  aps a elele 2 See eee A es 137  5 7 Pitch Shifting with the 2D Fourier Spectrum                   137  5 7 1 Pitch Shift Parameters   is kg 0S ee OS le ee ee ee ee 137  5 7 2 Adjusting the Pitch Shift Process     yn eae Soe PG a ew 138  5 7 3 Running the Pitch Shift Process       a ake ates wide bea 138  5 7 4 Fixed Pitch Shifting Processes        e284 aaa ee ee a ee 139  5 8 Rhythmic Frequency Range Compression Expansion              140  5 8 1 Rhythmic Frequency Stretching Parameters               141  5 8 2 Adjusting the Rhythmic Frequency Stretching Process         141  5 8 3 Running the Rhythmic Frequency Stretching Process          142  5 8 4 Fixed Rhythmic Frequency Stretching Processes             142  5 9 Resizing the 2D Fourier Spectrum   e s  aS oo Se eR ee 143  5 9 1 Resize Process Parameters 3 54 2k Ge ee oe ERR Oe BSS 144  5 9 2 Adjusting Resize Process Parameters                0 6 4 145  5 9 3 Running the Resize Spectrum Process              0 4  148  5 9 4 Recalculating Resize Process Properties                  150  5 9 5 Fixed Spectrum Resize Processes             000000 a 151  5 10 Inversion of the 2D Fourier Spectrum     ape e Ga ya Ea eos OS 152    Evaluation Two Dimensional Audio Analysis and Processing in Matlab 153    6 1 Effects of Raster Image Width on 2D Fourier Analysis             153  6 1 1 Determining Correct Pitch    lt 0    aoaaa ES ak ESA 154  6 1 2 Determining Correct T
175. oduce the implementation given in listing  4 1 using array operations instead of code loops  The new rasterise function was again    run in the Profiler tool 10 times using the same sinusoidal test signal     This time the average total execution time was 0 008 seconds  a significant reduction  The  maximum code line hit count in this implementation was only 219  corresponding to the    height of the image     The testing in Profiler proved that data operations are much faster when performed using  arrays than loops in Matlab  Array mathematics are inherent in the design of the Matlab  environment whereas looped processes are slow because Matlab is an interpreted program     ming language  section 2 6   After this testing  array operations were used in favour of    190       code loops wherever possible  However conditional statements that compare a variable to  a fixed value will not work with array variables since Matlab cannot support concurrent    program paths  This therefore limits the use of array operations in certain instances     7 3 1 YIN Pitch Algorithm Optimisation    Section 4 7 1 described the implementation of the YIN pitch algorithm in a MEX file to  obtain faster execution speeds than the original M file implementation  The performance  of MEX file code cannot be analysed using the Profiler tool since it has to be pre compiled  from the C language source file  Matlab provides the tic  toc function which enables  simple performance measurement using a stopwat
176. of the data matrix must  be obtained using Matlab   s find function  The magnitude and phase components are dis   played in the data cursor  which have to be obtained from the RGB display data using  the colour2polar function which performs the inverse operation to polar2colour  The  brightness and contrast functions alter the display  therefore they are incorporated into  the calculations  An excerpt from dcm_updateFcn is given in listing 4 11  it shows the    calculation of the data cursor display for the 2D spectrum           xdata   get target      XData        ydata   get target     YData       find  xdata gt pos 1   1     first         1    y   find ydata gt pos 2   1     first      1     xX    I      when only one row in spectrum  if isempty y   ye ok  end  data   get target      CData       value   data y x       magLN  phase  colour2polar value       handles  plot_settings specBrightness       handles  plot_settings  spec2D_mode     magN    2   nthroot  magLN  handles  plot_settings specContrast      1    mag   magNxhandles data spec2D_settings  max_mag        handles  frame_settings current_frame       txt        Audible Freq         num2str pos 1      Hz              Rhythmic Freq         num2str pos 2      Hz               Magnitude       num2str mag            Phase  degrees        num2str phase 180 pi            Listing 4 11  Custom Data Cursor For 2D Fourier Spectrum    The output of the dcm_updateFcn is the txt variable  which is displayed in the data cursor  
177. ol   without which the properties of the 2D Fourier transform could not be properly investigated  or understood  This then allowed a more informed investigation into 2D Fourier audio  processing in the second phase  developing algorithms based on an understanding of the    underlying theory and with some preconception of the results     The details of implementation are omitted here  instead focusing on the path followed to  reach the final software tool  The later parts of this section and the following sections 4    and 5 provide a more in depth guide to software implementation     3 2 1 Analysis Phase    The analysis phase of software development was the longest  since the software tool had to  be designed and restructured as the understanding of the 2D Fourier audio representation  improved  Initially it was important to become fully aquainted with the Matlab environ   ment and its capabilities  and then to produce the core functions required to load and    display the data  so that from the earliest possible stage a useable tool was in operation     The first step was to convert between the four required signal representations  section  3 1 1   The initial GUI and data structure designs were then produced  and a Matlab    34    figure was created to display the data plots  A simple audio player and import export    operations for the audio and images were also added at this point     Once this initial version of the analysis tool was running  it was possible to load data
178. oley and J  W  Tukey  An algorithm for the machine calculation of complex  fourier series  Mathematics of Computation  19 90  297 301  1965      6  K  Cowtan  Picture book of fourier transforms     http   www ysbl york ac uk  cowtan fourier fourier html      7  A  de Cheveign   and H  Kawahara  Yin  a fundamental frequency estimator for speech  and music  Journal of the Acoustical Society of America  111 4  1917 1930  2002      8  A  De G  tzen  N  Bernardini  and D  Arfib  Traditional     implementations of the  phase vocoder  In Proceedings of the 8rd International Conference on Digital Audio  Effects  2000      9  M  Dolson  The phase vocoder  A tutorial  Computer Music Journal  10 4  14 27   1986      10  T  Erbe  SoundHack User   s Manual  School of Music  CalArts      11  K  Fishkin  A fast hsl to rgb transform  In A  S  Glassner  editor  Graphics Gems     201     12      13      14      15    ke     16      17      18      19      20      21      22      23      24     pages 448 449  Academic Press  1998    K  Fitz and L  Haken  On the use of time frequency reassignment in additive sound  modeling  Journal of the Audio Engineering Society  50 11  879 893  2001    J  L  Flanagan and R  M  Golden  The phase vocoder  The Bell System Technical  Journal  45 8  1493 1509  1966    D  Gabor  Acoustical quanta and the theory of hearing  Nature  159 591 594  1947    J  Gallicchio  2d fft java applet  http   www brainflux org java classes FFT2DApplet html     R  C  Gonzalez and R
179. olumn should be removed if present to achieve the correct    raster image representation     The method by which the data is converted from raster image to audio depends upon the  analysis process followed  as described in section 4 6  The parameters in the analysis_settings    structure are inspected to determine the correct synthesis process     98    4 9 1 Derasterisation    All resynthesis methods in the software tool use the derasterise function to obtain a  1D array from a 2D representation  The function incorporates the hop parameter to al   low image rows to overlap within the 1D array representation  This option is required  in pitch synced rhythmic mode  however to perform the standard derasterisation process   as described in  39   the hop parameter should be set to the width of the image array   This maintains a one to one sample mapping between input and output  The code for    derasterise is given in listing 4 12           function array   derasterise  image  hop    Z DERASTERISE converts between data from a 2D representation to a 1D  representation using raster scanning         This function can be used to convert from a grayscale image to a     monophonic audio signal        image   the 2D data representation     array   the 1D data representation     calculate the required output array size  imsize   size  image     arraysize   imsize 2     imsize 1    1 x hop      create an empty array  array   zeros arraysize  1     derasterise data  for i   l imsize 1   ar
180. on mismatch error when the array operations were run  This meant that the code    was corrected before the test was performed  but it highlighted the need to verify that the    187    correct process was being performed        function out   rev_fftshift  in   y size   size in 1    x_size   size in 2    y cent   ceil y_size 2    x_cent   ceil x_size 2    out   in      top left from bottom right   out  1  y_cent  1  x_cent   in y_cent y_size  x_cent x_size       bottom right from top left   out   y_cent 1   y_size   x_cent 1  x_size   in  1  y_cent    1   1  x_cent    1        top right from bottom left   out  1  y_cent   x_cent 1  x_size   in y_cent y_size  1  x_cent    l        bottom left from top right       out   y cent 1   y_size  1 x_cent   in  1   y_cent    1   x_cent x_size            Listing 7 1  Reverse Fourier Spectrum Quandrant Shifting Function  rev_fftshift     The intermediate array was input to rev_fftshift and the required result was the original    test array     Output result for rev_fftshift on intermediate array     1 2 3 4 5  7 8 9 10  11 12 13 14 15  16 17 18 19 20  21 22 23 24 25    D    7 2 Data Input Testing    Text edit objects are used many times within the software tool to allow the user to enter  numerical parameters  This is a vulnerable point within the software since the user   s actual    input cannot be controlled     The function isnumber  shown in listing 7 2  was written to check whether a text edit string    188    entered by the user co
181. ons   9 8821     Rotate    These audio files demonstrate the results of the rotating the 2D Fourier spectrum     e simple120_eighthwidth_90 wav    Source file  simple120 wav        Analysis Mode  Rhythmic    Raster Image Width  One eighth note      Rotation  90      e simple120_eighthwidth_180 wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One eighth note      Rotation  180       e simple120_eighthwidth_270 wav    227        Source file  simple120 wav    Analysis Mode  Rhythmic        Raster Image Width  One eighth note    Rotation  270      e trumpetG3_90 wav      Source file  trumpetG3 wav      Analysis Mode  Timbral      Rotation  90      e trumpetG3_180 wav      Source file  trumpetG3 wav      Analysis Mode  Timbral      Rotation  180       Row Shift    These audio files demonstrate the results of processing using the row shift transformation     e am_bipolar_shift2remove wav  This provides an excellent demonstration of rhythmic frequency  By shifting the  rows of this amplitude modulated sinusoid upwards  the modulating frequency is    increased     Source file  am_bipolar wav        Analysis Mode  Timbral    Row Shift  2      Remove original rows   e loop120_eighthwidth_shift4wrap wav      Source file  loop120 wav      Analysis Mode  Rhythmic        Raster Image Width  One eighth note    228        Row Shift  4      Wrap rows around    e loop120_quartwidth_shift4wrap wav    Source file  loop120 wav       Analysis Mode  Rhythmic
182. ool  using both simple test signals and recordings of real instrument  notes  The mean speed increase for all tests was 640 94  and this was quite consistent  across the input signals with a standard deviation of 67 5526  It is thought that the  algorithm found pitch of the file clarinetC6 wayv very quickly  after few loop iterations     explaining the lower performance increase     7 4 SNR of Analysis Synthesis Process    It was necessary to investigate the signal to noise ratio of the 2D Fourier analysis resynthesis  process  as used in the software tool  to confirm that the perceptual quality of signal  transformations would not be impaired by information loss in the analysis and synthesis    operations     Listing 7 5 shows the function analysis_synthesis which uses the software tools low level  analysis and synthesis functions  These functions are abstracted from the program data  structure and settings  The function takes an audio array and the desired raster image  width as input arguments and converts from the audio signal to the RGB colour represen   tation of the 2D Fourier spectrum  and then back to the audio array  The colour display  representation is included in the process since it the data cursor display is obtained from    this array and its accuracy needed to be tested        function audio   analysis_synthesis  audio   imwidth     192       image   rasterise  audio  imwidth      height_pad   mod size image 1   2      width_pad   mod  size  image 2   2      
183. ovide an insight into any signal features that can be characterised us   ing the 2D Fourier transform and allow musically interesting manipulations of audio using    the 2D Fourier domain data     1 2 Report Structure    The report will start with the relevant background information required as a precursor  to the project work in section 2  This includes a survey of existing research that covers  many aspects of the project  and similar ideas that have served as inspiration for the work   Section 3 will introduce the software tool developed during the project  It will cover the  overall requirements and broader software engineering aspects of the tool development   The software implementation of 2D Fourier analysis and processing is saved for subsequent  sections 4 and 5  These sections will also include the theoretical aspects of the work  carried out  After the software tool has been fully described  an evaluation of the results  of the analysis and processing of audio using the 2D Fourier transform will be given in  chapter 6  The testing process is described in section 7 and the project conclusions and    recommendations for future work are given in sections 8 and 9 respectively     Chapter 2  Background Information    This section details the background information required for this project including a survey    of relevant research literature  This review has served several purposes     e Understanding the context of the project by examining techniques currently used fo
184. padded   padarray  image   height_pad width_pad  0     post       FT   fft2  padded      max_mag calc_spec2D  FT 1 1     magphase      true     spec2D calc_spec2D  FT 1 1     magphase      false      magLN  phase   colour2polar spec2D  1      magphase          magN   2   magLN     1    mag   magN max_mag     FT   rev_fftshift  complex mag  cos phase  mag    sin  phase       image   real ifft2 FT      actual_height   size image 1      height_pad    actual_width   size image 2      width_pad    image   image 1l actual_height  1  actual_width      audio   derasterise  image  imwidth       end       Listing 7 5  The Software Tool   s Underlying Analysis Resynthesis Process    The function process_SNR was written to determine the signal to noise ratio of a reversible  signal processing operation  The signal to noise ratio  in decibels  is given by the following    equation        A igna   SNR  dB    20log     n    7 1     Where Agignat and Ayoise are the root mean squared amplitudes     The RMS amplitude of a signal x of length n is given by        n    Trms   Yel  x x  4   7 2     Where x  i  is the complex conjugate of x i      The process_SNR function is shown in listing 7 6  It calculates the RMS amplitude of the  input signal  then iterates the required signal processing operation the required number of    times  given by the iterations argument   For each iteration of the operation  the noise    193       component is obtained as the difference between the original signal an
185. path   handles source  file      wav        if not handles source  file  0   handles source  file   remove_suffix handles source  file      wavwrite handles data audio  handles data audio_settings Fs    handles source audio_path  handles source  file      wav         guidata hObject  handles      end       Listing 3 3  Exporting Audio Using wavwrite    3 6 2 Image and Spectrum Export    As stated in section 3 2 1  the initial signal analysis steered the software towards using  only audio source material so there is no import capability for the raster image or 2D  spectrum data  It is however possible to export both displays as an image using the imwrite  function  This export functionality is intended only to store the display of the data at screen  resolution  it is not an accurate signal representation  The 2D data representations can  sometimes have a height of only a few rows of data  which would not produce a displayable  image  The software tool provides other means to store the data precisely for continued    use in the software tool     Matlab provides the function getframe to return a snapshot of a current figure or set of  axes  The resulting pixel map can then be saved as an image  The function save_2D was  written to allow the export of the raster image and 2D spectrum  It has a conditional  argument to determine which display to save but it follows the same general structure as  the export_audio function  The main difference is that an image has to be saved for e
186. plhs  0    mxCreateDoubleMatrix  1 1  mxREAL         create aC pointer to a copy of the output value     period_length   mxGetPr plhs  0           call the C subroutine       yin period_length  input_vec   tolerance  yin_vec   yin_length          Listing 4 9  The Gateway mexFunction In yin c    92       The MEX file was compiled using the command mex yin c in the Matlab command line  and the resulting  mexmaci file could then be used as if it were a normal Matlab function   The MEX implementation yielded approximately a 700  speed increase when compared    to the M file implementation     4 7 2 Tempo Estimation With MIRToolbox    The MIRToolbox  20  was introduced in section 2 4  it provides a large variety of signal  processing algorithms that extract information from audio signals  It was used in the  2D Fourier software tool to provide tempo estimation of audio signals with the mirtempo  function  There are several tempo detection methods offered in the MIRtoolbox but the  autocorrelation option was the most reliable in initial experiments  where other methods  either caused errors or were quite inaccurate  The algorithm operates by obtaining an  onset detection curve of the audio data and then computing the autocorrelation function  of this data             calculate tempo    miranswer   mirtempo  handles source audio_path  handles source file      wav         Autocor       answer   mirgetdata miranswer           Listing 4 10  Determining Tempo Using miraudio    Listing 4
187. programmatic control of its functionality           handles  player   audioplayer  handles  data audio  handles data     audio_settings Fs  handles data audio_settings nbits         Listing 3 5  Creation of the Audioplayer Object    The functionality of the software tool   s audio player is relatively simple  It is controlled  using the push button interface shown in figure 3 5  The    Play    and    Stop    buttons initiate  and terminate the audio playback using the audioplayer object   s play and stop functions   There are two additional playback options  improving the value of the audio player as an  analysis tool  The    Loop    button switches continuous looping of the audio on or off  The  other button toggles between    All and    Frame     and it determines whether the whole audio  signal is played or just the current frame of audio  for more detail on frames see chapter  4             n   lt                 All   Loop                  Figure 3 5  Audio Player User Interface    49       player_settings   lt 1x1 struct gt     play_frame  lt boolean gt     loop  lt boolean gt   stopped  lt boolean gt        Figure 3 6  Data Structure for Audio Player Settings    The audio player   s settings are stored in the player_settings data structure  shown in  figure 3 6  The variable play_frame is set to true when only the current frame should be  played  Playing a subsection of the audio signal is quite simple with the audioplayer ob   ject  The start and end sample indices ar
188. pup  figure      CloseRequestFcn       my closefcn       Interruptible         off       set  handles  proc name popup  bypass button      Callback        bypass_Callback    Tests     Interruptible       off         any other object callbacks required          The current process    structure and its index in the process array are stored within the  handles structure so that they can be easily accessed in callback functions  The Boolean  process _changed variable is used to indicate whether any parameters are changed by the    user  so that the software tool can determine whether the processed data needs to be    Listing 5 6  Adjusting Parameters of a Generic Process    recalculated  section 5 1 5      The design originally incorporated a cancel option into each process    parameter adjustment    111       figure  so the user could decide to abandon the changes and retain their old settings  This  option was deemed unnecessary and removed from the software tool in the later stages of  development  Due to this original design  a copy of the current process    data structure is  stored within the handles structure as cur_process to be edited locally and the process     index within the process array is also stored as cur_process_num  allowing the process    data to be reinserted into process when adjusting is complete     The bulk of the adjust_  function code will create the GUI figure and its objects  which  are specific to the particular process  However  every process conta
189. purposes among others  16  25      2 4 Audio Feature Extraction    In order to obtain a useful image of an audio signal using raster scanning  section 2 5 1    the width should correspond to the period of the signal  39   This identifies a requirement  for pitch detection techniques in this project  also for a longer rhythmic signal it may be  beneficial to calculate the tempo to achieve a useful image display  A MATLAB toolbox for  musical feature extraction from audio is described in  18   these tools are available for free  use in the research community  20   Pitch and rhythm recognition techniques are described  in  26      2 5 Graphical Representations    Image sonification and sound visualisation     Another large research area relevant to this project is sound visualisation and image soni   fication methods  These techniques perform conversions between one dimensional audio  data and two dimensional images  addressing the fundamental differences between the rep   resentations i e  temporal and spatial  There are a huge variety of ways to map audio  features to image features  the choice of method depends entirely upon what information  needs to be portrayed  These techniques are applied to tools such as basic analysis displays     audio visualisation plug ins and graphical composition image sonification software     A framework for the design of image sonification methods is defined in  38   which defines  mapping geometries in terms of a pointer path pair  It presents
190. quency domain is equivalent to summing them in  the time domain  This is demonstrated in figure 4 5 which shows the 2D Fourier analysis  of the sinusoid of figure 4 2 summed with the amplitude modulated sinusoid of figure 4 4    in the time domain     Raster Image 2D Fourier Spectrurn    Frequency  Hz         2500  2000  1500  1000  500 0 500 1000 1500 2000 2500  Pixels Frequency  Hz     Figure 4 5  Demonstrating The Linearity of the 2D DFT    By viewing a complex signal as two separate signals  the real and imaginary components   the complex DFT might be analysed and the 2D spectrum understood  If two real in   put signals were used  the Fourier transform would produce a complex output for each  of the signals  demonstrating aliasing and hence having symmetry about zero frequency   The Fourier domain data could then be summed to get the spectrum of the two signals    combined     The complex input signal  f x   can be viewed as the sum of two real signals  with one    multiplied by 7     r z   R  f x    f x   r 0    jile  where a x   1 f 0     4 4   jaya    The symbols R and I represent the real and imaginary components of a signal respectively                                The Fourier transform of a complex input can then be broken into the sum of two complex    signals  also with one multiplied by j     64    r x    ji x   amp  R x    iI  x    F x   4 5     When the two resulting complex signals are also broken into two real components  the    structure of the overall Fou
191. r    audio and image processing purposes   e Developing an understanding of the techniques that will be used in the project     e Understanding the similarities and differences in Fourier processing of images and    audio     2 1 Fourier Techniques    In the late 19th century  Joseph Fourier proposed a theorem that any periodic signal can  be decomposed into a series of harmonically related sinusoidal functions with specified    amplitude  frequency and phase offset  28   this is known as the Fourier series     The Fourier transform uses this concept for decomposition of a continuous time signal  into sinusoidal components  any signal of infinite extent and bandwidth can be perfectly  reconstructed using an infinite series of sinusoidal waves representing all frequencies from  0 Hz to oo Hz  These sine waves form the Fourier transform spectrum  which represents  the frequency content of the signal and is symmetric about 0 Hz for a real input signal     Using the polar signal representation  the Fourier transform spectrum can be divided into    10    magnitude  amplitude  and phase components  describing their values for the continuous    range of frequencies     Fourier analysis is extremely important in many areas of signal processing  In audio pro   cessing it provides intuitive representation of the signal content  since stationary sinusoidal  basis functions cause the minimum region of excitation on the basilar membrane in the ear     24      2 1 1 One dimensional Discret
192. r magnitude side lobes  These  side lobes cause the spectral representation of a non synchronised sinusoidal component to    leak energy into adjacent frequency bands  reducing the clarity of its representation     68    4 3 3 Signal Analysis Using The 2D Fourier Spectrum    The investigation of section 4 3 2 allows a summary of features that can be observed in  the 2D Fourier spectrum  The fundamental component of the 2D Fourier spectrum is a  sinusoid modulated in amplitude by a lower frequency sinusoid  apart from any point that    is 0 Hz in either frequency dimension  which is just a sinusoid     Any sinusoidal components which appear identical in every row of the raster image must  be harmonically related to the period of the raster image width and have no amplitude  modulation  These components will be displayed on the 2D spectrum with a rhythmic  frequency of 0 Hz and their precise audible frequency  both positive and negative   which  is a multiple of the frequency    fu   F  N  4 9     Where F  is the sample rate  N is the width of the raster image plus padding and fo is the    fundamental audible frequency of the analysis     Any components which appear identical in every column but vary in each row are har   monically related to the fundamental rhythmic frequency of the analysis  which is given  by    fro   fao M  4 10     Where M is the raster image height plus padding  These components will be displayed on  the 2D spectrum with an audible frequency of 0 Hz and t
193. r quadrants as arrays  into a new matrix of the same size  in their correct positions  Figure 4 10 demonstrates    the process of shifting data quadrants for a matrix with odd dimensions                 fftshift revfftshift                            Figure 4 10  The Fourier Data Shift Functionality For Odd Dimensions    TT       4 5 Audio Analysis Options    It is rare for a single analysis method to provide useful information about all types of audio  signal  audio characteristics are hugely varied  An important part of this investigation  was to determine suitable methods for analysing audio signals in two dimensions  and to  understand what each method provides in terms of both signal analysis and subsequently    possible signal transformation     The 2D Fourier software tool provides the user with a GUI window when they import  a new audio file or choose to reanalyse a current signal  this window is defined by the  analysis_settings function  It presents a series of options that determine the way in  which the audio signal is represented in two dimensions  these options were defined based  upon an understanding of the underlying signal processing and the different types of audio    signal the user may want to interact with     4 5 1 Analysis Mode    There are two main modes of 2D signal analysis  t2mbral and rhythmic  They are classified  by the factor that determines their raster image width  Timbral mode uses pitch detection   section 4 7 1  to determine the width of t
194. range is extended    182    in either dimension  although they are of lower magnitude when the dimensions of the    spectrum are adjusted with the data     The resizing of the spectrum width adjusts the tempo of the signal but also the pitch   since the audible frequencies of the signal are rescaled with the spectrum frequency points   Tempo adjustment without pitch change was attempted by the rhythmic frequency range  adjustment  Resizing the height achieves a change in signal duration whilst maintaining the  same tempo although it is limited by the same problems of aliasing as rhythmic frequency    range adjustment     6 4 8 Evaluation of Resampling Techniques    In retrospect the methods used to resample the spectrum were rudimentary  The poor  results can partly be blamed on the frequency resolution of the 2D Fourier spectrum after  raster scanning but the techniques and understanding of the processing could have been  better informed  These initial prototype processes have still served a purpose however since  it is now clearer what kind of signal transformations could potentially be performed and    how the analysis process needs to be adapted to allow it     6 4 9 Shifting Quadrants The 2D Fourier Spectrum    The quadrant shifting process does not produce very useful results  This is due to the dis   proportionate amount of low frequency energy in most musical audio signals which is shifted  to the upper audible frequency range  at the upper frequency limit of human he
195. rangeof  0 0 995   and this is used in the following  equation to obtain the specBrightness variable which has a range of  0 0314 1 633e 016   and is inversely proportional to the slider value  After some experimentation it was found    that this range with a exponential style scale gave the most intuitive control settings     specBrightness      tan   slider_value     1    7 2   4 19     This specBrightness variable is stored in the plot_settings structure  as is specContrast   It is used within the polar2colour function as a divisor to the magnitude component  nor     malised and log scaled  to scale it     The contrast slider   s value has a range of   1 1  and this is converted to an exponentially    scaled range of  0 2 2  using the following equation     specContrast   2 x 10 lider value   1  2  4 20     specContrast is used in the calc_spec2D function  listing 4 3  as an exponent of the  logarithmic normalised magnitude component  which has a range of  0 1   The plot in  figure 4 9 attempts to visualise this display range adjustment  The normalised magnitude  is equivalent to variable magN in the calc_spec2D function and the contrasted logarithmic  normalised magnitude is equivalent to magLN  The given range of the specContrast variable    was chosen to allow the most useful scale of contrast adjustment     75    Adjusting Display Range With The Contrast Parameter                            Contrast Value     Normalised Magnitude    Figure 4 9  Display Range Adjustment
196. ransform is conjugate  symmetric and the magnitude spectrum is symmetric  This is shown by the equations   Flu  v    F       u     v   2 6    F  u  o       F   u   o     2 7     Also the relationship of resolution in sampling intervals between the two representations  is similar to the 1D DFT     1    Where Au and Av are the two frequency domain sampling intervals  Am and An are the  spatial time domain sampling intervals  Therefore the larger the 2D array is  the greater    the resolution of the frequency points is or the smaller the frequency sample intervals    13    are  The precise meaning of the Fourier spectrum obtained from the 2D DFT is difficult  to explain without the context of a signal  The understanding of the 2D spectrum is  considered in different terms for images and audio  these are described in sections 2 2 7    and 2 3 1 respectively     2 2 Frequency Domain Audio Processing    This section provides a review of current Fourier audio processing techniques  their ap   plications and their drawbacks  The existing research into 2D Fourier processing is then    reviewed     2 2 1    Time Frequency Representation of Audio    In 1946  Gabor  14  wrote a highly influential paper that founded the ideas of a time   frequency representation of audio and paved the way for many of the transform techniques  now commonly applied to audio  Gabor proposed that time and frequency representations  were just two extremes of a spectrum  Time based representations provide a det
197. rayindex    i    1   xhop   array  arrayindex l arrayindex imsize 2      array  arrayindex l arrayindex imsize 2      image i 1l imsize 2          end       Listing 4 12  Derasterisation Process    99       4 9 2 Timbral Mode    When the signal analysis is in timbral mode  the conversion from raster image representa   tion to the audio array is simple  Each frame is converted to a 1D array by derasterising  its image with the correct image width value and these arrays are stored in a 2D array   equivalent to the pitch map frames matrix in analyse_audio  The complete audio array  is calculated by derasterising this 2D array using the frame_size parameter to define the    hop  which is equal to the width of the 2D array    4 9 3 Rhythmic Mode Without Pitch Synchronisation    For rhythmic mode without pitch synchronisation there is only one raster image and it is  simply derasterised with no row overlap to obtain the audio array  i e  hop is set to the    image width  imwidth      4 9 4 Rhythmic Mode With Pitch Synchronisation    When pitch synchronisation is used  each row of the raster image has to be resampled to  the original audio sampling rate before the audio array is created  Listing 4 13 shows an  excerpt from resynth that resamples each row of the raster image by cubic spline interpo   lation and stores it in an intermediate array tempo_frames  The interpolation is performed  using two arrays of sample indices  as in calc_image  the forward process  Array x corre   spo
198. rier domain data can be observed  shown in equation 4 8                                                  R x    R  x    jI      where 7   oe l  4 6   I x    Ri x    jL x  where     y  i  4 7   F x    R   x      L x    j  L x    Ri z    4 8     Therefore the DFT of complex time domain data results in complex frequency domain    data  where     e The real component of the frequency domain data is composed of the real component  of the DFT of the real time domain data minus the magnitude component of the DFT    of the imaginary time domain data     e The imaginary component of the frequency domain data is composed of the imaginary   component of the DFT of the real time domain data plus the real component of the    DFT of the imaginary time domain data     This process would need further analysis to investigate the numerical relationship between  positive and negative frequencies of the complex DFT  but it demonstrates how the 2D  spectrum can have four symmetrically placed points due to the periodicity of the separate  real and imaginary components of the complex signal that is output by the first stage of the  2D FFT and input into the second stage  The periodicity of the real DFT in the first stage  of the 2D FFT means that the four symmetrical points can be divided into two pairs  Each  pair of points has an audible and rhythmic frequency of opposite sign  identical magnitude    and opposite phase     It has been established that when the raster image size accurately depicts
199. riod of audio data  since it is known to produce accurate results  The algorithm is described in  7  and an  implementation in the C language can be found at  3   The M file yinpitch was initially  written to perform the YIN algorithm  however this process was much too slow to use in  the software tool  taking several minutes for 15 seconds of audio data  Instead a MEX file  implementation was used  this is a subroutine produced from C source code  It behaves  exactly like a built in Matlab function being pre compiled rather than interpreted at run  time  Unlike a built in Matlab function or an M file  which are platform independent   a MEX file is platform specific and its extension reflects the platform on which it was    compiled  in this case  mexmaci     The function had to be written in C first and then compiled to the MEX file in the Matlab  environment  Microsoft Visual Studio was used to program and debug the file yin c using  the techniques described in  37   In order to create a MEX file using C  a gateway function  has to be written which determines how Matlab interfaces with the C code  It receives the  number of input and output arguments specified when the function was called in Matlab   and an array of pointers to this data  These pointers are declared as type mxArray which is  the C representation of a Matlab array  Several API routines are provided to allow access  to the input data  these functions start with the prefix mx   The variables required for the  
200. rms a similar sequence of  operations to height changed  It checks that the current width setting is above the  min_quadrant_size 2  value and if not corrects it  then the width display is updated in  the text edit object  The actual tempo is then recalculated according to the width setting    in size 2   using the equation   240 x tempo_div_num   Fs        5 14     tempo       F   2 x size 2      1     width_pad  x tempo_div_den    Where 2 x size 2      1 defines the width of the whole 2D spectrum  The tempo value can  then be displayed in the correct text edit object on the GUI     When any parameter is changed on the GUI  the changed function is called  since it is  called at the end of both width_change andheight_changed  This function calculates the  new value of the process    dur variable and displays it on the GUI  It also calculates the  value of the change array as size     orig_size  Finally the process_changed variable  in the processes structure is set to true and the handles structure is set as the resize    process figure   s GUI data     147    5 9 3 Running the Resize Spectrum Process    The function process_spec_resize is used to resize the spectrum according to the resize  process parameters  as stored in the process array  The function contains three compo   nents  resizing in the rhythmic dimension  resizing in the audible dimension and updating  the analysis settings according to the new spectrum size  The function resizes the two di   mensions of the sp
201. roc_popup  buttons  num   Position       set  handles  proc_popup  buttons num       Position      f     cur pos  1  cur_pos 2  20 cur_pos 3  cur_pos 4      cur_pos   get handles proc_popup opt_buttons  num       Position       set  handles  proc_popup opt_buttons  num      Position           cur_pos l  cur_pos 2  20 cur_pos 3  cur_pos  4       end  handles  proc_popup num_buttons   handles  proc_popup num_buttons   1   handles   create_context_menu handles       handles  proc_popup num_buttons     handles  proc_popup  buttons  handles  proc_popup num_buttons     uicontrol handles  proc_popup figure           Style         pushbutton             UIContextMenu     handles proc_popup cmenus      handles   proc popup  num buttons  h       Enable       off           String       Empty             Position      100  20  90 15      E  handles   proc popup opt buttons  handles  proc popup  num buttons     uicontrol handles  proc popup  figure           Style           pushbutton          FS DHA OS A see res      Position      190 20 10 15      J   proc popup callbacks  handles     guidata gef  handles     end    end    108          Listing 5 3  Adding a Button To The Processes GUI    When the context menu corresponds to an existing process then there are three options  If     Empty    is chosen  the process is removed from the process array and the corresponding  process and option buttons are deleted  The GUI window is resized and any buttons below  those deleted are moved up  List
202. ronisation    When pitch synchronisation is used in rhythmic mode  the conversion to a raster image  ceases to be a one to one sample mapping  The process in analyse_audio is initially the  same as for timbral mode  The signal is broken into frames  which are then analysed to  determine their pitch  However the next stage is to calculate the data required for each  frame and interpolate to fit it into the raster image  The image and FT variables are  defined as 1 cell arrays  since in rhythmic mode there is only one raster image and one  2D spectrum  The raster image is first set to double the frame size to ensure the data is  always upsampled preventing a loss of information and the image array is initialised with    zeros  Then the calc_image function is called to fill the image array with data     In calc_image  each row of the image data is calculated one by one  The required number of  fundamental periods of data is obtained by finding the number of periods in the frame_size  value and rounding up to the next integer  as in the following equation     num_periods frame    ceil frame_size period frame     4 24     86    The frame data must then be located within the audio array and extracted to be in   terpolated to the width of the raster image  as shown in calc_image excerpt in listing  4 7  The Matlab function interp1 is used to resample the frame data to fill a row of  the image  It performs cubic spline interpolation on the data given the original sample  points and t
203. ros  and the original data quadrant shifted using    fftshift     The arrays x and xi represent the sample point indices of the original and interpolated    Fourier data columns respectively  The xi array has the same range as x but a different    148       length  meaning that interp1 stretches compresses each data column to fit the new size        function handles   do_rhythmic_dim handles   process_num     width   size handles data FT 1  2     old_height   size handles data FT 1  1     new_height   2 handles  processes  process process_num  size 1     1    newFT   zeros new_height   width      oldFT   fftshift  handles data FT 1      x   linspace 1 old_height   old_height      xi   linspace 1 old_height  new_height       col   1  width    y   oldFT    col     newFT   col    interp1l x y xi    spline        handles data FT 1    rev_fftshift  newFT      if handles data analysis_settings sync  old_im_height   old_height   handles data spec2D settings height_pad   new_im_height   new_height   handles data spec2D_settings height_pad   x   linspace 1 old_im_height   old_im_height     xi   linspace 1 old_im_height  new_im_height     handles data analysis period     interp1 x handles data  analysis  period  xi    spline       handles data analysis num_periods    interp1 x handles data analysis num_periods xi    spline       end    end             Listing 5 25  Resizing The Rhythmic Frequency Dimension    If the signal data has been analysed in pitch synchronous mode  then pitch re
204. rovided that each quadrant is treated in the same way  An extension  of the analysis of the complex DFT  equation 4 8  could reveal the precise relationship  between the positive and negative frequency data  hence allowing non linear processing to    be applied in the future     Many of the transformation processes developed use resampling of the data to perform  time pitch modifications  After experimentation with these processes in the software tool   it is apparent that the frequency resolution of analysis can often be too low to allow    accurate interpolation     In timbral mode  the audible frequency bins are spaced at harmonic intervals of the signal  when correctly pitch synchronised  At this resolution the data will be synthesised with  harmonic components at the analysis frequencies even if the data has been rescaled  There  need to be many points between the harmonic frequencies to allow effective scaling of the  data  However the resolution in the rhythmic frequency axis is high enough to allow scaling  of frequencies by resampling  In rhythmic mode it is the rhythmic frequency interval that    is at a low resolution and so resampling in this dimension is inaccurate     Also whilst zero padding does not affect the signal representation after the basic analysis   synthesis process  it affects the results when certain transformations are applied to the  signal  Any resampling of data that reduces the period length of audible or rhythmic  frequency components will in
205. rtant to  analyse the process by which this representation was obtained  The 2D Fourier transform  can be divided into two separate stages  First the 1D DFT is computed for each column   resulting in a complex valued 2D array separately describing the frequency content of    each column of original data  Secondly the 1D DFT is computed for each row of this    59       intermediate complex array  yielding the resulting 2D Fourier data array  The result is  identical if the rows are computed in the first stage and the columns second  and in terms  of our analysis this is the most logical order  The intermediate data can then be viewed as  a series of STFT frames with no overlap and a rectangular window  Each column of the  data corresponds to the frequency component given by equation 2 3 where N is the width    of the 2D array and v is the column index     The second stage of the 2D Fourier transform requires a DFT of complex data  which  has slightly different properties to the real DFT  31   The periodicity of an N point real  DFT means that the frequency spectrum is reflected about x   1  section 2 1 1   and half  the data is redundant  Whereas the complex DFT has no redundancy and all N points  are required to fully represent the signal  In other terms  both the positive and negative  frequencies are required with the complex DFT  However of the four quadrants of the 2D  Fourier data  two will still be redundant due to the periodicity of the DFT in the first stage   It was 
206. rue  then newFT is initialised as a copy of  oldFT  otherwise it is initially filled with zeros  The two halves of the spectrum  with    positive and negative rhythmic frequency are calculated separately        width   size  handles  data FT frame    2       135       height   size handles data FT frame  1    oldFT   fftshift  handles  data FT frame      if handles  processes  process  proc_num   remove  newFT   zeros height   width    else  newFT   oldFT   end    shift is always  lt  floor height  2       also quadrant dims  middle   ceil height  2    width_middle   ceil width 2       row_shift   mod  handles  processes  process proc_num  shift   middle         top half of spec  end_top_n   middle   row shift    newFT 1   end_top_n      oldFT  row_shift 1l    middle      if handles  processes  process proc_num  wrap  amp  amp  row_shift      0  newFT   middle   row _shift 1     middle   1        oldFT 1  row shift    1        newFT  middle  1   width_ middle   1     oldFT row  shift  1   width_middle    1     end    bottom half of spec  end_bottom_n   middle row shift    newFT end_bottom_n  height       oldFT  middle    height   row  shift        if handles  processes  process proc_num  wrap  amp  amp  row_shift      0  newFT  middle 1l     middle row_shift    1         oldFT   height   row _shift 2    height       newFT  middle   width_middle  width    oldFT  height   row shift 1        width_middle  width     end  handles data FT frame    rev_fftshift  newFT          Listi
207. run  Matlab provides detailed documentation  on this process  The GUI display may also be slightly different on a Windows operating    system and hence will not precisely match the images in the report     The audio examples are grouped in folders according to the transformation process that  was applied  The files are listed here  by each folder within    Audio Examples     along with    the process parameters that were used     Unprocessed  These audio files are the original unprocessed signals that have been used in these examples   as well as signals displayed in the report as analysis examples  They should be used as a    reference to demonstrate the effects of the 2D Fourier domain transformations     e am_bipolar wav    This is an amplitude modulated sine wave synthesised in Matlab with a carrier fre     214    quency of 220 5 Hz and a modulation frequency of 1 Hz     loop120 wav  An electronic drum beat with complex rhythmic content at a tempo of 120 bpm and    with a duration of 4 bars  Taken from the PowerFX sample pack  http    www powerfx com      Piano mf A3 wav   A short extract of a piano recording taken from The University of Iowa Musical In   strument Samples  http    theremin music uiowa edu MIS html   It has the dynamic  mf and the note A3     Piano mf A4 wav  A short extract of a piano recording taken from The University of Iowa Musical    Instrument Samples  It has the dynamic mf and the note A4     Piano mf Ci wav  A short extract of a piano recording ta
208. s a period corresponding exactly to the image height  a well defined  2D Fourier representation is obtained  where the effects of rectangular windowing are  avoided in both dimensions  The 2D Fourier spectrum of this signal has four points  as  shown in figure 4 4  It appears that there are two lines of symmetry within the spectrum     however we know that only half of the spectrum contains redundant information     Raster Image 2D Fourier Spectrurn  T o T T       Frequency  Hz         2500  2000  1500  1000  500 0 500 1000 1500 2000 2500  Pixels Frequency  Hz     Figure 4 4  2D Analysis of an AM Sinusoid    On closer inspection the data only follows the expected symmetry of the 2D Fourier trans   form  as given by equations 2 6 and 2 7  with two pairs of points matching in magnitude  and opposite in phase in diagonally opposing quadrants  However the position of these  points is symmetrical and the values are similar  with a magnitude difference of 0 19  and    a phase difference of 1 76  from the largest value to the smallest     Analysis of The Complex DFT    The complex DFT needs to be better understood to explain the 2D spectrum of this signal   since it appears that there is some form of aliasing occuring in the second stage of the 2D    Fourier transform even though the data is not symmetrical     The linearity of the DFT  16  is given by the following equation     afi x y    bfe a  y   amp  aF   u  v    bFz u  v   4 3     63    Therefore summing two signals in the fre
209. s by the inverse Fourier transform  the same shift must be reapplied to the data  to regain the original form  The range of values in a magnitude spectrum is often very large   so to achieve a more intelligible display a logarithmic scale of pixel values is used  Figure  2 2 shows a test image and its shifted  log scaled Fourier magnitude spectrum  produced  in MATLAB  to illustrate the display representations  The axes of the displays correspond  to the discrete indices from equations  2 4   amp   2 5      22       fa    Figure 2 2  Image Display and 2D Fourier Magnitude Spectrum in Matlab  after  16     Figure 2 3 shows the non shifted and shifted displays of magnitude and phase information   Magnitude information becomes clearer when shifted but as you can see  the phase display  is quite unintelligible and as with audio spectrum displays is often not shown  This does  not however mean that phase information is not important in describing an image  As  with audio  section 2 2 5   the phase information contains the structural information of  the image  The importance of phase in the coherence of an image is demonstrated in   6   this website provides some simple images with their Fourier transform data to help  develop an understanding of the relationship between an image and its Fourier data  The  images in  6  demonstrate a method of displaying the magnitude and phase components  together in a single display  where magnitude is the saturation and phase is the hue of the  
210. s displayed in seconds  given by the dur variable in the audio_settings  structure  and also in terms of rhythmic duration  The numerator of the rhythmic dura   tion is given by the ndivs variable and the denominator is set to the value stored in  tempo_div_den variable within the analysis_settings structure  The tempo value dis   played is the tempo of the signal produced from the output spectrum  as given by the    tempo variable     The function height_changed is called when either the height value or the rhythmic value  is changed  The callback for the height value   s text edit object rounds the numerical input  to a positive integer and stores it in size 1  before calling height changed  The rhythmic  duration value   s callback function stores the absolute value of the numerical input in ndivs  and then calculates the quadrant height that represents the rhythmic duration closest to    this value  using the following equation before calling height_changed         5 11     di height_pad  sont wont  Mom tained     The height_changed function is displayed in listing 5 23  It first ensures that the height  settings is not below the minimum value given by min_quadrant_size 1  and then sets  the string of the height value   s text edit to the correct height setting  The actual rhythmic  duration given by the new height setting is then calculated and displayed           function height changed  handles    if handles cur_process size 1  lt handles cur_process min_quadrant_size 
211. s of the 2D spectrum and stores them in  rhyth_freq and aud_freq respectively  When the signal analysis is in timbral mode this  needs to be done for every frame  since each has its own 2D spectrum  resulting in 2D    arrays  As these frequency arrays are calculated  the max_rhyth frame and max_aud frame    121    parameters are updated to contain the index of the signal frame containing the required    rhythmic and audible frequency arrays respectively     The value of the filter   s cutoff variable can range from OHz to the maximum frequency  represented in the 2D spectrum of any of the signal frames  in the frequency axis indicated    by rhythmic_mode  This maximum value is held in the max_cut variable     The minimum value of the filter   s bw variable is min_freq_step  which is the minimum in     terval between frequency bins of the signal   s 2D spectra  on the axis indicated by rhythmic_mode     If the calc_filter_properties function has been called as a result of altered analysis  settings then the filter   s cutoff and bw variables are checked and if out of range according  to the new settings  they are set to the appropriate range limit  This prevents an error    from occuring when the filter process is run     It is worth noting that in the processing stage  signals in rhythmic analysis mode are con   sidered by the software to have only one frame  since there is only one 2D spectrum to  represent the signal  This is as opposed to the definition of a frame as a row
212. s subsequent processes in the process array to be ex   amined  since only these will be affected by the change in analysis parameters  When  a resize or rotate transformation is removed from the process array  the parameters of  subsequent filter or resize processes again need recalculating  since the analysis settings  are no longer being changed  This occurs in the cmenu_clicked callback function within    proc_popup_callbacks m     117    5 2 2D Fourier Spectrum Filtering    The concept of filtering in the 2D frequency domain was one of the most immediately  obvious and exciting transformation processes  It not only offers filtering of the audible  frequency range as with conventional time and frequency domain filters  but also in the  rhythmic frequency range  allowing the sub sonic rhythmic variations in the signal to be    manipulated     5 2 1 Filter Parameters    The 2D spectral filter that was developed in the software tool provides many different  options  Either frequency dimension can be filtered using one of four common filter types   low pass  high pass  band pass and band stop  to either cut stop band frequencies or  boost pass band frequencies  The filter can have an ideal    brick wall    frequency response  or the frequency response of a Butterworth architecture of any order chosen by the user   Additionally DC content of the spectrum can be retained even if it is in the stop band   The variables that correspond to the filter options are given in table 5 1  
213. s with a signal     The last part of this phase was to redesign the initial analysis capabilities of the software  tool  adding more flexibility and presenting the user with a pop up window of options  as    described in section 4 5     3 2 2 Processing Phase    This development phase was less incremental than the analysis phase  Once the initial  structure for applying processes was in place  it was a simple process to make a new  transformation available  The data structure had to be extended to store the processing  information and retain the original signal data  The GUI and functionality for adding and    removing processes had to be designed  and a generic process structure planned  Then it    35    was simply a case of creating processes one by one  testing them as they were built and  occasionally going back to improve or correct the implementation  The loading and saving    of application data also had to be extended to include processing parameters     3 3 Graphical User Interface Design    Matlab offers two different methods for designing GUIs  the GUIDE  Graphical User In   terface Development Environment  tool or programmatic development  GUIDE allows  interactive design of a GUI and automatically generates an M file containing initialisation  code and the framework for the GUI callback functions  The other option is to specify the  layout and behaviours entirely programmatically  each component is defined by a single    function call and the callback functions h
214. same as in listing  5 21 except that the adjustment is made in the vertical dimension  on the columns  instead    of the horizontal dimension  on the rows     5 9 Resizing the 2D Fourier Spectrum    This process adjusts the dimensions of the 2D Fourier spectrum to transform the audio  signal  When one of the spectrum dimensions is resized  the centre frequencies of each bin  on that axis is changed  The data is interpolated to the new array size so that the signal    components are still at the correct frequency     When there is more than one frame of 2D spectrum data  resizing cannot be performed  since it would result in different frame sizes for each frame and in the current software tool  design this is not possible  Therefore resizing the spectrum is limited to rhythmic analysis  mode  where it is guaranteed that there will only be 2D spectrum  It is recommended  that in future work  chapter 9  this operation is used to transform signals that have been    analysed in terms of their pitch  since it seems there is a lot of potential in this technique     The spectrum resizing process  and the fixed resize operations  section 5 9 5  are defined  last in the proc_names array  so that if they are not present in the context menu  no out    of bounds errors are caused by indexing the context menu   s proc array for other processes     In rhythmic mode  it was thought that adjusting the width of the spectrum would change  the duration of each row of raster image data whilst mai
215. sform in the West   algorithm  23  24   This is a very popular and efficient implementation of the DFT  which  has the advantages of using portable C instructions that aren   t hardware specific  It  computes the DFT in operations for any input length N  unlike some implementations    which are only efficient for a restricted set of sizes     12    2 1 3 Two dimensional Discrete Fourier Transform    The DFT of a two dimensional array of M x N samples can be easily constructed by    extending the one dimensional DFT formulae  16      M 1N 1  X  u v    x  m n  eo 22   ar     e 2 4   gt  soreo o  M 1N 1  x  m n           X  u v eit   Sit  A 2 5  weas S Se a    u 0 v 0 ae    The equations for the DFT and its inverse have the same relationship in two dimensions  as in one i e  the inverse transform is the complex conjugate of the forward transform   divided by the number of points in the transform  The variables u and v are the frequency  variables and the variables m and n are the time or spatial variables  depending on the    type of signal represented     The 2D DFT is obtained by performing the N point DFT of each of the M rows of the  array  so obtaining an intermediate M x N array of the results  The M point DFT of each  of the N columns of this array is then taken to give the final 2D DFT array  The process  can also be done in the opposite order  columns then rows  and the same result will be    obtained   As with the 1D DFT  if the input x  m n  is real then the Fourier t
216. since they do not represent    periodic rhythmic components in the signal     158    2D Fourier Spectrum        Frequency  Hz    gt      0 5 0 0 5 H 2  Frequency  Hz  x 107     a  Correct Tempo  120 bpm   Raster Image Width  22050    samples    2D Fourier Spectrum    Frequency  Hz    gt     Se  1 5 al  0 5 0 5 1 1 5 2    0  Frequency  Hz  x 194     b  Incorrect Tempo  122 bpm   Raster Image Width   21689 samples    Figure 6 4  Effect Of Correct Tempo Analysis On 2D Spectrum    The 2D Fourier spectrum of the drum loop is much less intelligible when the tempo is  not correctly set  The energy of 2D spectral components cannot be precisely represented  by frequency points in either dimension and so their magnitude energy is spread between    adjacent points  This spreading of energy is shown by the spectrum display in figure 6 4b     159    6 2 Rhythmic Mode Analysis    Rhythmic analysis mode provides a clear display of the sub sonic signal variations in both  the time and frequency 2D representations  For audio signals with simple and repeating    rhythms  the rhythmic frequencies of 2D Fourier spectrum components are well defined     A simple drum kit rhythm was programmed using a MIDI sequencer  as displayed in figure  6 5  This MIDI pattern was used to create the audio file simple120 wav at a tempo of  120 bpm and for a duration of 4 bars with Native Instruments Battery 3   s basic drum kit  samples  The rhythm repeats every 2 crotchet beats i e  a rhythmic duration of 1 2 an
217. sophisticated methods available that better preserve  the sound timbre  18      Timbre manipulation   The timbre of the sound can also be manipulated indepen   dently of pitch and time  The techniques in  18  enable effective shifting and copying    of spectral peaks to create effects such as harmonising and chorusing     Stable and transient extraction   Using the frame to frame changes in frequency     stable and transient frequencies can be extracted from the audio signal  10      Dynamic range manipulation   Conventional dynamics processes such as compression   gating and expansion can be performed on each individual frequency band over time     10      Cross synthesis   This involves extracting characteristics of one signal and applying    them to another signal  it includes a variety of different processes  27      Sound representation using the STFT and the phase vocoder has well known limitations     Its popularity is due to the fact that it enables musically interesting transformations of    audio  but the time frequency uncertainty principle is deeply embedded within its imple     mentation  As a result  transforms using the phase vocoder often yield signals that are    blurred smeared in time and frequency  27   Also due to the overlapping of frames  a single    time frequency element cannot be modified alone without causing signal discontinuities     19    2 2 6 Other Analysis Resynthesis Tools    Tracking phase vocoder  TPV  algorithms such as the McAulay Quatier
218. ss  and the data doesn   t need to be stored for processing purposes  The 1D Fourier spectrum  can either be for the whole audio array or the current frame of data represented by the  displayed 2D images  this option is available from the    Plot Settings    menu  Other options  have been made available to provide a useful analysis plot  The user can select to view    either the magnitude or phase spectrum and set either axis to linear or logarithmic scaling     4 1 2 Automatic Normalisation of The Signal    The normalise function is called at the start of display_data  This function was written  to scale the time domain data representations so that their maximum absolute value is one   This provides an appropriate display level and audible playback level in case a signal is too  low high in amplitude  It is an optional process  determined by the auto_norm variable in  the opt structure  and it can be adjusted from the    Options    menu  This option is most  useful when 2D Fourier transformations have been applied to the signal that might severely    affect its amplitude     53    4 2 Raster Scanning    Raster scanning was introduced in section 2 5 1 as a simple method of converting between  1D and 2D data representations with a one to one sample mapping  The process of raster   isation serves as a gateway to 2D audio analysis  converting from the audio signal to one    or more raster images     4 2 1 Raster Image    The raster image is a 2D time domain representation of aud
219. sses Applied  b  Chain of Four Different Processes    Figure 5 2  2D Spectral Processes GUI Window    The process buttons are labelled with their process name  as given in proc_names  and  displayed in a column in the order that the processes are stored in the process array   The    Empty    button is placed at the bottom of the column and is disabled  Each process  button has a small    options    button next to it  which opens a context menu when clicked   The context menu reveals the list of available transformation processes  which are divided  into two groups  adjustable and fixed  according to whether the process has adjustable    parameters or not     The object handles for these GUI components are stored in the proc_popup structure   which is added to handles and stored as GUI data in the    2D Spectral Processes    figure  object  Throughout the software tool  when GUI data is updated  it is stored in both the    main GUI and the processes GUI to ensure that the transformation process functions can    105    access the signal data  The processes figure is closed when the data is reanalysed  hence  a new figure is created  or the software tool is closed  When the user chooses to close the    figure window  it is only hidden temporarily     5 1 3 2D Spectral Processes Window Functionality    The function proc_popup_callbacks sets and contains the callback functions for objects  within the    2D Spectral Processes    window  as shown in listing 5 1  it is called at the 
220. sting potential  particularly when fre   quency independent operations are performed  I have a few different adaptive     rhythmic     filtering processes which have turned out to be compelling sound permutation processes   Focused    rhythmic    filtering  resynthesizing a 2D DFT with a few narrow bands signifi   cantly amplified  is even somewhat interesting  albeit obvious  as a processing technique   Since I have only done this in the simplistic manner that I have described  the time granu   larity I have achieved is limited  But it may be possible to obtain similar effects with finer    granularity than I achieved     Personally  I am a strong advocate for studio sound processing techniques but unfortunately  the marketability of processors that utilize 2D FFTs is limited by their intrinsic non real   time architecture  They wouldn   t play well in even the most liberal audio plug in apis  I  am glad that I can use my 2d processes easily in a UNIX shell however     And MATLAB  is a decent home for them as well    I will try to yoho a copy of MATLAB so I can hear what you have been up to      My company   s first commercial app is a graphical sound synthesis and scoring system  inspired by the Upic system  It is in the late stages of development  particularly since I  had a working prototype in 1992 and have been honing its dsp engine for nearly 16 years     It will be named  Optasia  it is derived from my freeware application Hyperupic which  I originally developed for N
221. subsequently to analyse the signal  The code and calculations in this    function are all fairly trivial and have not been included here     To allow mode selection  two push buttons were programmed to act as a button group  where only one can be selected  The synchronisation option is presented as a push button  programmed to toggle on and off  In timbral mode with the tempo synchronisation off   figure 4 11b   the frame size can be entered in seconds using a text edit object  This was  chosen because the frame size is continuous within the range of the signal duration  The  text edit input is checked to ensure it is numeric using the isnumber function  which was    written to perform this check for all text edit input within the software tool     In either rhythmic mode  figure 4 11a  or tempo synced timbral mode  figure 4 11c   the  user can enter the tempo value of the signal using the text edit object or click the    Calculate     button which uses the process described in section 4 7 2 to estimate the tempo of the signal   The row frame size is defined in terms of a note duration chosen by the user  The software  tool uses the American system of note names  where the note is defined in terms of fractions    of a whole note or semibreve  since this is much easier to represent in software     For the note duration setting  push buttons are provided to move the numerator and    80    denominator values up and down  The denominator moves through an array of standard  note d
222. t the moment and it involves 2D Fourier representation of  audio  Your applet has been really helpful whilst trying to get my head around the 2D  Fourier transform  I was wondering if you could help me by explaining the algorithm  for converting from a complex number to an RGB colour  I   ve looked at your code in  ComplexColour java but I   m not entirely sure why it works or what the variables represent   I appreciate that you might not have time to help  if you have any references you could  give that would also be great help  Thanks    Chris Pike    A 1 2 ComplexColor Java Class    Listing complexcolour shows the Jason Gallicchio   s ComplexColor Java class  which is  referred to in the above email and was used as the basis for the polar2colour and    colour2polar functions written for the software tool           package brainflux graphics     import java awt Color     JRE  x Title  FFT2D    x Description     208            Copyright  Copyright  c  2001  x Company  BrainFlusz  x  author Jason Gallicchio    x  version 1 0    public class ComplexColor      private float R 1 0f  bmin 0 0f  bmax 1 0f  lmin 0 0f  lmax 1 0f      Drawing options  These are the same constants defined in brainfluz   math  Complez    public static final int MAGPHASE    public static final int MAG    public static final int PHASE    public static final int REAL S  public static final int IMAG    public static final int MAG2    public static final int MAG2 PHASE      ao oF wn rr OO    private int typ
223. te 2D analysis  now that the investigation has been    carried out and the software developed     4 2 5 Displaying The Raster Image    The raster image data is displayed using the image function  provided by Matlab to display  image data on a set of plot axes  This function can either accept true RGB colour data or  indexed data in the range  0 1  that uses a colour map to obtain the colour value  Since the  raster image is grayscale  it has to be scaled to the indexed data range and then the Matlab  colormap function can be used to select the gray map  Listing 4 2 shows the plot image    function used to display the image data on the axes     The property DefaultImageCreateFcn is set to    axis normal    which ensures that the  image is displayed at the axes dimensions rather than the image dimensions  The image  can often be much larger in one dimension than the other  so that if it was displayed with  square pixels it would be very difficult to display within a GUI  It was decided that this  display method is more useful to the user  The plot_image function also labels the plot    and its axes           function plot_image image_data  bits   plot_axes   source    PLOTIMAGE plots a gray   scale image on the given axes at the correct bit   depth       convert image_data from range     1 1  to  0 1   image_data    image _data 1   2      select ares   axes plot_axes        ensure pixels are displayed square   set  0     DefaultImageCreateFcn       axis normal          display 
224. ter Height    These two initial examples have both had a frequency component in one of the two axes  and zero frequency in the other  resulting in two points of equal magnitude and opposite    phase  on opposite sides of the  0 0  centre point in the respective frequency axis     Signal Components With Audible  amp  Rhythmic Frequency    A signal with a defined frequency in both axes is an amplitude modulated sinusoid  Am   plitude modulation is conventionally considered in terms of a carrier frequency and a mod   ulation frequency  In the terms used in this project  carrier frequency is synonymous with    audible frequency whilst modulation frequency is synonymous with rhythmic frequency     1D Fourier analysis defines amplitude modulation using two sinusoids with stationary am   plitude characteristics  f   and f    The mean of these two frequencies gives the carrier  frequency whilst their difference gives modulation frequency  In 2D Fourier analysis ampli   tude modulation is given by a single sinusoidal component which has an audible frequency   fa  and a rhythmic frequency  fp  The relationship between 1D and 2D Fourier analysis of    an amplitude modulated sinusoid is shown by the following equations      h   41   hsh   4 2     The AM sine wave can be considered as the two sinusoids f   and f which have opposing    angles within the raster image     62    If the signal   s carrier frequency has a period of exactly the raster image width and the  modulation frequency ha
225. tes a GUI window  as shown in figure 5 5  that allows    the amount of rotation to be set  It simply presents a drop down menu with the three    131    options     90 degrees        180 degrees    and    270 degrees     The index of the menu items is  equal to the value of rot_ind required for the displayed rotation  therefore the selected    item index can be stored in rot_ind within the menu   s callback function    lt Student Versio      Rotate  90 degrees        Bypass    Figure 5 5  2D Magnitude Thresholding GUI    When the signal has been analysed in pitch synchronised rhythmic mode  the spectrum  can only be rotated by 180    This is because the conversion from raster image to audio  is a complex process in this mode  based on the calculated pitch of each raster image  row  section 4 9 4   If the frequency axes were swapped due to rotation then the current  resynthesis process would not work  It was decided that it wasn   t appropriate to develop  the resynthesis process to compensate for rotation adjustments  since pitch synchronous  rhythmic analysis had already been deemed of little use in the project  The drop down  menu therefore only presents one option     180 degrees    and the value of rot_spec is forced    to 2 when the rotation process is performed     5 4 3 Running the Spectral Rotation Process    The process_rot_spec function performs the rotation process upon each frame of 2D spec   tral data in the FT  The process very simply rotates the FT array in 90  
226. the program is compiled to machine  code as it is run and each line of code is reinterpreted each time it is read  This makes  computation a lot slower than compiling languages such as C and Java  however the soft   ware tool for this project does not need to operate in real time at audio rate  The run time  compiler allows quick code development and prototyping  since there are no compilation  issues  Matlab also provides debugging tools and a command line which is useful for testing    new functions and algorithms     30    Chapter 3    Introduction to the Matlab Tool    Most of the project work was performed within the Matlab programming environment   investigating 2D Fourier techniques and incorporating them into a GUI based software  tool  The experimentation and software implementation were carried out simultaneously  throughout the project so that the GUI tool could provide a means of easy interaction with  and observation of signals as new techniques were investigated  By the end of the project   the aim was to sufficiently develop the software tool to make 2D Fourier audio processing    techniques accessible to both engineers and composers     This chapter introduces the software tool which forms the basis of the project  It will    describe the overall design and functionality  as well as the development process used     3 1 Requirements of the Software Tool    The software tool needed to encompass all of the functionality required to meet the aims  of the project  
227. this project can be improved based on a    clearer understanding     Timbral analysis   An instrument  instrument family has similar 2D spectral envelope  for all notes  the 2D Fourier transform can be extremely useful for defining instrument    timbres     Component extraction   the extra rhythmic frequency information may possibly fa     cilitate source separation methods     Transformations   There are surely many more 2D spectral audio transformations  possible  For example  the manipulation of phase information was not investigated   Truly unique techniques may be obtainable by performing operations that alter both  rhythmic and audible frequencies  such as obtaining the inverse matrix  although this    has very specific requirements      The software tool could be developed as a stand alone or offline VST application  in     dependent of MATLAB  to make the techniques more broadly accessible to engineers    199    and composers     200    Bibliography     1  M  K  Agoston  Computer Graphics and Geometric Modeling   Implementation and  Algorithms  Springer Verlag  2005      2  S  Bernsee  Time stretching and pitch shifting of audio signals     http   www dspdimension com admin time pitch overview     3  P  Brossier  Aubio   a library for audio labelling  http   aubio org       4  D  Byrd  Midi note number to equal temperament semitone to hertz conversion ta   ble  Indiana University Bloomington School of Music  http   www indiana edu  emu   sic hertz htm      5  J  W  Co
228. tions have been designed to perform  a fixed process upon the 2D Fourier data  whilst others have adjustable parameters  All  transformation processes have a    run    function that has the prefix process_  followed by  the process name  a short version that doesn   t necessarily match the name in proc_names      This function performs the processing on the FT data     Adjustable processes also have    create    and    adjust    functions with the prefixes create_   and adjust_   The    create    function initialises the required parameter variables in the  process    structure within the process array and these variables are given initial default  values  It is called when the user first creates the process by choosing it from a context  menu within the processing window  Listing 5 5 shows the    create    function for a generic    process  The type variable is set to the appropriate string in the proc_names array           function handles   create_process  handles   proc_num   handles  processes  process proc_num  type        handles  proc_names correct_index     handles  processes  process proc_num  bypass   false   handles  processes  process proc_num  process_specific_varl   default1   handles  processes  process proc_num  process_specific_var2   default2   handles   adjust_process  handles   proc_num          end       Listing 5 5  Creating a Generic Process    At the end of every    create    function  the process       adjust    function is called to allow the  user 
229. to set the initial settings as they desire  The    adjust    function contains the code to  create and display a GUI window that presents the current parameter settings of the GUI   as stored in process proc_num   and allows the user to adjust them  An    adjust    function    for a generic process is shown in listing 5 6        function handles   adjust_process  handles   proc_num   handles cur_process   handles  processes  process  process_num     handles cur_process_num   process_num     handles  processes  process_changed   false     110          end       create figure and components  storing the object handles in handles   proc_name_popup    setting display to present current parameter settings           then   handles cur_process  bypass      handles  cur_process bypass    guidata handles  proc_name_popup  figure   handles      bypass_Callback  handles  figurel              handles cur_process  bypass      handles  cur_process bypass     guidata handles  proc_name_popup  figure   handles     proc_name_callbacks  handles     set  handles proc_name_popup  figure      Visible        on       uiwait  handles  proc_name_popup  figure     handles   guidata handles  proc_name_popup  figure      delete  handles  proc_name_popup  figure      handles   rmfield  handles      cur_process_num        handles   rmfield  handles      proc_name_popup        handles   rmfield  handles         cur_process         function proc_name_callbacks  handles     end    set  handles  proc_name_po
230. to the index of each process name in the proc_names  The  function create_context_menu is called from create_proc_popup  for every menu required     The num argument is passed in  so the menu   s data can be stored in the correct entry in    106       the cmenus array     The callbacks for each item in the context menus are defined in proc_popup_callbacks  also  Listing 5 1 shows how the callback function cmenu_clicked is used for every menu  item on every menu  and this callback receives the arguments num and ptype to distinguish    which menu was used and what process was selected     When the user clicks an option button  the opt_button_clicked callback displays the  corresponding context menu  as shown in listing 5 2  The layout of each context menu is  identical  The top level offers the options    Empty        Adjustable    or    Fixed        Adjustable     and    Fixed    bring up sub menus offering a list of transformation processes  which have    either adjustable or fixed parameters           function opt_button_clicked  hObject  eventdata   button_num   handles   guidata gef     button_pos   get handles proc_popup opt_buttons button_num       Position       set  handles  proc_popup cmenus button_num  h     Position           button_pos 1  button_pos 2      Visible       on         end       Listing 5 2  Displaying The Context Menu    The cmenu_clicked function is called when any item on a context menu is clicked  Its  actions depend on the option clicked and the it
231. troduce the added zero valued samples into the resulting  signal  since the Fourier analysis considers them to be part of the signal   s period  see  section 2 1 1      6 4 2 Filtering The 2D Fourier Spectrum    Two dimensional spectral filtering has proven the most useful and exciting sound transfor   mation method for creative purposes  The process is linear and operates in the same way  on all four quadrants of the spectrum  therefore there are no significant analysis limitations    for this process     Filtering of audible frequencies produces very similar results to 1D frequency domain fil     tering  This is demonstrated in figure 6 12 where the 2D Fourier spectra are compared for    171    loop120 wav filtered using a 2 kHz ideal filter in both 1D and 2D frequency domains     Frequency  Hz      0 8     0 6     0 4     0 2    2D Fourier Spectrum    0 2    0 4    0 6    0 8        2  1 5 zl  0 5 0 0 5 1 1 5 2    Frequency  Hz  x 107     a  1D Frequency Domain Filtering loop120 wav Ideal 2kHz Low  Pass Filter    Frequency  Hz      0 8     0 6     0 4     0 2    2D Fourier Spectrum    0 2    0 4    0 6    0 6          1 5 a   0 5 0 0 5 1 1 5 2    Frequency  Hz  x10      b  2D Frequency Domain Filtering loop120 wav Ideal 2kHz Low  Pass Filter    Figure 6 12  Comparison of Audible Frequency Filtering in 1D and 2D Frequency Domains    On closer inspection the signals are not identical  but this is thought to be due to a different    in audible frequency resolution of the two 
232. tton  clicked inspect the value of the process_changed variable    to determine whether the process parameters have been adjusted as shown in listing 5 8           if handles  processes  process_changed  handles   rmfield  handles      process_changed         processes_changed  handles       end          Listing 5 8  Determining If Process Parameters Have Been Adjusted    If a process has been created or adjusted the processes_changed function is called  which  calls run_processes to obtain the new processed signal data and then display_data to    display the new signal representation on the main GUI     The run_processes function copies the unprocessed data from original_data to data   then loops through each process in the process array calling the correct process  function    to run the transformation process  altering the 2D Fourier domain data in FT  The process    113    type is identified using a switch   case    statement to compare the value of the type    variable with each string in the proc_names array     Once all processes have been run  the resynth function is called to generate the raster  image and audio data representations from the 2D Fourier domain data  as described in    section 4 9     5 1 6 Observing Original and Processed Data    The main GUI   s toolbar  section 3 5  contains the    View Original Signal    and    View Pro   cessed Signal    buttons  allowing the user to switch the display between the unprocessed  and processed signal representations 
233. ty of this menu was taken from the useful aspects of the default menus     e The    Options    menu  which allows the user to adjust the options stored in the opt    data structure     The toolbar allows quick access to some of the most commonly required menu options as  well as some additional functions  Table 3 3 describes the function of each push button on    the toolbar  dividers are used to group related buttons        Name Tooltip Description          Load Data Allows a signal to be loaded with its analysis settings  as in    the    File    menu        Save Data Allows a signal to be saved with its analysis settings  as in    the    File    menu           Export Display This button lets the user save the main GUI display to an  image file   Zoom In Toggles the Zoom tool on and off with the direction set to       in     as in the    Plot Settings    menu              Zoom Out Toggles the Zoom tool on and off with the direction set to     out     as in the    Plot Settings    menu    Pan Toggles the Pan tool on and off  as in the    Plot Settings    menu    Data Cursor Toggles the Data Cursor tool on and off  as in the    Plot Set     tings    menu        2D Spectrum Legend   Displays the 2D Fourier spectrum legend  as the    Legend    but   ton in the main GUI figure does        View Original Signal   These buttons are grouped to toggle between displaying the       View Processed Signal   signal data as loaded and analysed from the source file and  the signal data after
234. ues in two dimensions  such as those used regularly in image processing   The use of 2D transform techniques for audio has seldom been explored  23  and may    produce interesting and useful results     1 1 Project Aims    This project is an investigation into the application of 2D transform techniques to audio  signal processing  particularly focusing on the use of the 2D Fourier transform  It aims to  discover what information can be gained about an audio signal by analysing it in the 2D  Fourier transform domain and investigate what methods of transform data processing can    be used to provide musically interesting sound modifications     In the initial stages of the investigation  the following key aims can be identified     i  Using raster scanning  obtain a 2D representation of an audio signal that can be    visualised as a meaningful image   ii  Apply 2D Fourier transform techniques to audio data     iii  Develop clear understanding of the information given by a 2D Fourier transform of    audio     iv  Identify musically interesting transformations of audio using 2D Fourier transform  data     v  Make these processing techniques accessible to composers and sound designers     In order to achieve these aims a large element of software development will be required dur   ing the investigation  The main objective of the project will be to produce a software tool  that enables analysis and transformation of audio using data in the 2D Fourier transform  domain  It should pr
235. uld be converted to numerical data  It attempts to convert the input  string to a numerical value using Matlab   s str2double function  If the input string is not  numeric then the result will have the value NaN  the IEEE arithmetic representation for  Not a Number  Matlab   s isnan function determines if a numeric variable has the value  NaN and it is used check the ouput of the str2double function  The Boolean output of    isnan is inverted to obtain the output of isnumber           function is   isnumber  num   is      isnan str2double  num       if    is     gt     hMsg   errordlg   num  uiwait  hMsg       is not numeric          end    end       Listing 7 2  Checking Text Edit Input Is Numerical    If the text edit input is not numeric then the user is informed using an error dialogue  window  Each text edit object in the software tool calls isnumber in its callback function   If the output value is true  then the function can proceed to use this numerical input as  required and if it is false  the value of the underlying variable that is represented by the    text edit object should be redisplayed     7 3 Code Optimisation    The Matlab environment provides the Profiler tool to facilitate code optimisation  It is  a GUI that allows Matlab code to be analysed using the profile function and displays  the results  The Profiler tracks the execution time of code and presents information about  number of calls  parent functions  child functions  code line hit count and code l
236. umn column   0  N 2 N 1   0         M 2  M 1  column   0  N 2 N 1    M 2 M 2    row       row  oO    M 1    Figure 2 4  A Demonstration of Two Dimensional Sinusoids in the Spatial and Frequency    Domains  31     25    Within more intricate real life images  low frequency components correspond to the slowly  changing intensity within an image such as the shadow of a wall and high frequency energy  mostly corresponds to object edges  which cause rapid changes in intensity  As with tran   sient components of audio signals  analysing an edge in terms of sinusoidal components  requires many sinusoids of frequencies all along the spectrum  It therefore makes sense  that an image with strong edges will produce prominent lines in the magnitude spectrum  that run perpendicular to the edge angle  High frequency components are also present due    to noise in the image     2 3 2 Frequency Domain Image Processing    The 2D Fourier transform has various uses in image processing  Frequency domain filtering  is performed by multiplying a filter magnitude function with the magnitude spectrum of the  image  16   image filters have zero phase  Any filter response can be used but commonly a  low frequency reduction results in image sharpening  a high frequency reduction results in  image smoothing and band pass filtering allows image feature extraction  Ideal brick wall  filters cause undesirable ringing effects  which are more easily understood by thinking in  terms of spatial convolution of t
237. urther across the spectrum image  The prominent rhythmic  frequency of each harmonic follows an arc that is reflected in the diagonal of the spectrum    for all of the four spectra  The 2D spectrum display   s contrast was set at  50  to give a    166    clearer display of the spectral form     2D Fourier Spectrurn 2D Fourier Spectrurn    Frequency  Hz   Frequency  Hz                2  1 5  1  0 5 0 0 5 1 1 5 2 2  1 5  1  0 5 0 0 5 1 1 5 2    Frequency  Hz  x 107 Frequency  Hz  x 107     a  Note  C1  b  Note  F2    2D Fourier Spectrurn 2D Fourier Spectrurn     100    40    Frequency  Hz   Frequency  Hz     100     2  4 5 ai  0 5 0 0 5 1 1 5 2  2 AS af  0 5 0 0 5 1 1 5 2    Frequency  Hz  x 107 Frequency  H2  x 107     c  Note  A3  d  Note  A4    Figure 6 9  Similar 2D Spectral Form of Piano Notes of Varying Pitch    To demonstrate that other musical instruments demonstrate their own pattern of prominent  rhythmic frequency of harmonic content  figure 6 10 shows the 2D Fourier spectra of two    different notes from a B  trumpet and a violin     167    2D Fourier Spectrurn 2D Fourier Spectrurn               150     100    Frequency  Hz   Frequency  Hz     100    150        2  1 5  1  0 5 0 5 1 1 5 2    0 2 AS  i  Frequency  Hz  x10     0 5 0 5 1 15 2    0  Frequency  Hz  x10     a  B  Trumpet Note  G3  b  B  Trumpet Note  G4    2D Fourier Spectrurn 2D Fourier Spectrurn     150     100    Frequency  Hz   Frequency  Hz     100    150    2  1 5 af  0 5 0 5 1 1 5 2    41 5 el  0 5 
238. window  The output is varied according to the display mode for the 2D spectrum  however     for demonstration purposes  only the output for the combined magnitude and phase display    96       is shown in this listing     4 8 3 2D Fourier Spectrum Legend    A custom legend tool has been developed for the 2D Fourier spectrum display  which  appears in a stand alone GUI window  It was decided that although the existing tools  ensured a flexible display  the normalisation and scaling of data abstracted too much from  the actual values  This legend  in figure 4 14  shows a complex unit circle representing  the logarithmic  normalised scaling of data and it has it   s own data cursor object that  displays the magnitude and phase value represented by a particular colour  It aims to put  the magnitude values of the 2D spectrum in context of the data range and demonstrate    the phase angle in terms of the colour wheel      lt Student Version gt    2D Spectrum Legend    x 107 2D Spectrum Legend   Complex Circle    Magnitude  5692 0955  Phase  degrees   112 4127    Imaginary       Real x10  Figure 4 14  2D Fourier Spectrum Legend Window  The legend figure uses a seperate datacursormode object  however the same    UpdateF cn       is used  The legend data cursor shares the code for the 2D spectrum display in listing    97    4 11 however it   s pointer doesn   t have any frequency values  It also displays the selected  colour in the small plot to the right hand side of the unit circle 
239. y was input to the rasterise function to test that it    performed the conversion corrrectly without losing data     eee ere    The raster width was set to 5  so the expected output was     1 2 3 4 5  7 8 9 10  11 12 13 14 15  16 17 18 19 20  21 22 23 24 25    D    The output from rasterise was identical to this expected result  This 2D matrix was then  used to test the derasterise function with a hop of 5  The 1D output of derasterise    was identical to the original 1D array  showing that both functions operate correctly     186    7 1 2 Testing rev fftshift    The rev_fftshift function was tested to ensure that it correctly reassigned 2D spec   trum quadrants to the original order in the FT array  performing the inverse operation to    fftshift for an array of odd dimensions     The test array with odd dimensions     1 2 3 4 5  6 7 8 9 10  11 12 13 14 15  16 17 18 19 20  21 22 23 24 25    Intermediate result after fftshift     19 20 16 17 18  24 25 21 22 23  4 5 1  9 10 6  14 15 11 12 13    Output result for fftshift on intermediate array     7T 8 9 10 6  12 13 14 15 11  17 18 19 20 16  22 23 24 25 21  2 3 4 5 1    This demonstrates the requirement for the rev_fftshift function to correctly reassign  the spectrum quadrants  The code for rev_fftshift is given in listing 7 1  each quadrant  is shifted from the input array to the output using an array operation  Initially mistakes  were made in the code when defining the quadrant dimensions which caused an assignment  dimensi
240. ynchronisation   which sets the frame size according to a note length duration rather than the default  duration in seconds  This may make it easier for individual notes within an audio signal    to be separated     In rhythmic mode  pitch synchronisation is available  with the aim of improving the res     79    olution of the audible frequency analysis  Each row frame of the audio data is analysed  to determine its pitch  and then the frame length is extended until the frame size is an  integer number of periods of the fundamental pitch  hence removing effects of the rectan   gular window  section 4 3 3   This extended frame is then resampled to match the original  frame size and analysis data for the frame is stored to ensure the process can be reversed     This process is based on the pitch synchronous overlap add technique  34      4 5 4 Analysis Options GUI    When the user imports audio data into the software tool they are presented with a GUI  that allows the analysis settings to be defined for that signal  This GUI changes appearance  depending on the options chosen  The required parameters vary according to the mode    and synchronisation settings  as demonstrated in figure 4 11     The function analysis_settings is called from analyse_audio to produce the analysis  settings GUI and its M file contains all of the object callbacks that define its functional   ity  All analysis options are stored in the analysis_settings structure within the data  structure to be used 
241. ythmic    Raster Image Width  One eighth note    Stretch factor  0 4    simple120_eighthwidth_1lpt5 wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One eighth note        Stretch factor  1 5    trumpetG3_0pt4 wav      Source file  trumpetG3 wav      Analysis Mode  Timbral        Stretch factor  0 4    trumpetG3_1pt5 wav      Source file  trumpetG3 wav      Analysis Mode  Timbral      Stretch factor  1 5    Thresh    These audio files demonstrate the results of magnitude thresholding   e simple120_eighthwidth_aud_10above wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One eighth note      Frequency Mode  Audible  Threshold  10     231        Remove  Above Threshold  e simple120_eighthwidth_aud_80below wav        Source file  simple120 wav    Analysis Mode  Rhythmic       Raster Image Width  One eighth note      Frequency Mode  Audible   Threshold  80         Remove  Below Threshold  e simple120_eighthwidth_rhyth_25above wav      Source file  simple120 wav      Analysis Mode  Rhythmic      Raster Image Width  One eighth note      Frequency Mode  Rhythmic      Threshold  25   Remove  Above Threshold    e simple120_eighthwidth_rhyth_25below wav      Source file  simple120 wav        Analysis Mode  Rhythmic    Raster Image Width  One eighth note      Frequency Mode  Audible      Threshold  25       Remove  Below Threshold  e simple120_eighthwidth_single_25above  wav      Source file  simple120 wav 
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
SERIES  2001年6月版 全部  ÿþM icrosoft W ord - 2 0 1 3 0 6 0 1  Axis P1311 Network Camera (No power supplies, No lens)  Cables Direct 2m HDMI, M - M  ECO Sports Voyage Rolling Backpack  PINSafe v3.3 User Manual    Copyright © All rights reserved. 
   Failed to retrieve file