Home
        User Manual
         Contents
1.  De   fineAmplicons m  If the file where to one is about to save the results already  exists CGH Plotter will write the results after the existing text     O  Plot  button   CGH Plotter plots only the data that are seen in the data  list box and uses  properties that have been specified  CGH Plotter shows a message box that  gives genomic indices to the amplicons   Name of the samples  indices of the  boundaries   Amplicon boundaries  message is modal and it will disappear  permanently after pushing the OK button     P  Main Page  button   Main Page    button takes one back to the main page     A capture of the typical plotting figure is provided in Figure 14  which illus   trates the ratios from chromosome 20 across five samples  It is also possible  to explore only one of the samples by illustrating it separately as shown in  Figure 15  Amplicon deletion boundaries of the samples are listed in Fig   ure 18  while Figure 19 illustrates the created ASCII file that reveals the  properties of each amplicon and deletion     23    Chr 20          T  ail     SKBR3      MCF7      BT474      BT20   gt      MDA361                                                 Figure 14  Ratios from five samples  chromosome 20  illustrated in one fig   ure  Amplicon boundaries are seen with the same color as the corresponding  sample  Combined amplicon boundaries are colored black  Cumulative base   palrs are in the x axis     24    BT474 chr 20  T T                   l 1 1 1 l 1  1 07 1 075 1 08 1 
2.  E MaN 1 017958E NaM  E 1 E 26340 1 1 032050E71 03205009 65 3 20509 55320509 05039469 35  EN 1 Fl 337561 0 1 064994E  Nah 9 97 5556E Mal 1 009504E Mal   1 8 300101 0 9 67 15035E  Nah 9 067315E Mal 9 625 06E Mal   1 9 AD5506 0 1 122229E  Nal 1 039436E NaN 1 067191E NaN   1 10  458205 211 096347E41 07017589 99737281 10198589  500078 1 02    1 11  465625 211 042725841 07017581 21461981 10190581 07233781 02   1 12  502575 0 1 116362E  Nah 1 025793E Hal 970641 E Nah  ts 1 13 654883 5 1 062576E  1 0005046 9 6918096 1 0 2727189 023406E 9 58    1 14  716281 5 9 409512E  1  0005046 1 02727181 0272 7189  4935438 9 56   1 16  751912 5 9 366 153E  10005046 1 025446 1 027271E9 583333E 9 58    1 16  76830  511 112354E 1 000504E 1 094181E 1 02727 1E 1 065174E9 58    1 17   05226 5 1 000504E  1 0005048 1 0541366 10272716 1 0217926 9 56   20  0676 0 1 035680E Na 9 655977E Na PATE  M 4   Hh test 4      Ready A       Figure 22  Example of resulting txt file    D  CGH Data  Filtered Data  DP Data and other Checkboxes  User can choose any combination of active checkboxes in this section  If  checkbox is selected  corresponding data is going to be printed into text file   If none of the checkboxes is selected  only basepair and chromosome infor   mation will be printed  If one or several of checkboxes are in inactive state   it indicates that loaded data file does not contain that type of data     90    If Write interpolation info  checkbox is selected  0 or 1 will be printed af   ter every filter
3.  View Insert Format Tools Data    Window Help Acrobat    Cee Ss 627   Eb 0 0 4z    GE    f           Arial          A  E E  O E F   1 CAROMOSOMEZD SKBRS MCF  ET474 ET20 104 361   2   3  i Type Amplican Amplicon Amplicon Deletion Deletion   4 1 Number of clones g 10 g g 16  5 1 Start 10711 10710 10711 10695 10695  em 1 End 10715 10715 10719 10702 10710  r  1 Stat Basepair 3013370320 3013210324 3013370320 501 1240006   3011240006  a 11 End Basepair 3013370320 3013370320 3013370320 3011424506 3013218324  g  1 Height 1 7251 1 3247 1 5765 0 74567 0 9361  10 1 MaxMin 2 1341 1 4997 1 7485 0 57769 0 76154  11   12 2 Type Amplicon Amplicon Amplicon Amplicon Amplicon  13 2 Number of clones Aq 13 11 E  g  14 2 Start 105030 10590 10030 10711 10711  15  2 End 10556 10902 10545 10715 10715  16 2 Start Basepair 3045696934 3052309073 30456969534  3013370320 3013370320  17  2 End Basepair 30502860636 3053426950 3046560263 3013370320 3013370320  15 2 Height 1 3474 1 2754 27 1995 1 730 1 3475  19  2 MaxsMin 1 0620 1 5692 2 9362 1 9591 1 5641  20   21 3 Type Amplicon Amplicon Amplicon Amplicon Deletion   22 3 Number of clones da E 4 Pale g   23 3 Start 10931 10941 10549 10535 10720  A4 End 11002 10946 10552 11045 10510  25  3 Start Basepalr 3057245592   3057912612   304661 2294   304554025   301 3559603  26 3 End Basepair 3001915962 3059967124 3046700719  3075760768 3043472679  27  3 Height 1 4607 3 1466 1 797 1 2099 0 9443  283 Maxiin 2 2667 6 93429 2 aaa 1 9555 0 70687  Ag    Figure 19  The ti
4.  analyzes data     stores results    CGH Plotter     main page        BP Convert    adds new basepairs    interpolates NaNs    Plot Data     loads analyzed data     plots data     saves results of analysis  in ASCII file    Write TXT     loads analyzed data     saves data to text file  according to users  selections       Figure 3  Main tasks of CGH Plotter blocks     3 1 2 Create Data Struct    In the page    Create Data Struct    one is able to create a data struct that  consists of the CGH data and essential indices  It is assumed throughout  CGH Plotter that the data contain fields given in this section  All the data  has to be either    1  in Matlab    mat  format or    2  in tab delimited text    txt  format     Examples of the files  both formats  are in folder  CGH     Plotter data_structs     PES    Make data struct    select data and chromosome indices for the  struct  Cumulative base pairs and names for the  samples are optional fields of the struct     A Data      B Chromosome indices        C Cumulative base pairs      D Mames forthe samples        E cave as      Create struct   Main Page      F G  Figure 4  Create Data Struct  window     A  Data button  obligatory    This button enables loading of the data file  The data file is assumed to be  m x n matrix  where m is the number of genes and n is the number of the  samples  Furthermore  it is assumed that the genes are arranged according  to their genomic order from p telomere of chromosome 1 to q telomere of th
5.  enables loading of the data struct made in phase    Create Data  Struct        12    A as    Filtering    A Load      B Dave as        Option     Filter parameters  Moving EF median       G  Filter window lenght  in bps     200000    Load optional region info     D Load   PO       Main Page   F    Figure 7  BP Filter  window     B  Save As  button   Before starting the analysis  user has to specify the name for the file where  analyzed data will be saved  Save As  button opens a Save As  dialog and the  name and the location for the result file may be selected  It is recommended  that result files are stored in the folder ampli_data        C  Filter options   Filter type    selection allows user to decide  what kind of filter will be used    during the filtering process  For now only options for filter type are BP Me   dian filter and BP Mean Filter     In    Filter window length  in bps     field user can specify window length for  chosen filter in basepair units  Default option is 500000     13    D  Optional region info   Optional region info can be loaded by pressing Load  button in this section   With this info user can ignore gaps like centromere and telomere regions   Data values outside these regions will have no effect on the filtering result        Startpoints Endpoints       El Microsoft Excel   Book1 i 3 Z joj xj  Edit Format Tools Data    Window Help Adobe PDF   f xX     10    BZU   EE  Er A      22 kd fe  Fae a   AA     50000  217200  267200 307562  36 562  357
6.  filtering utilizing moving window  the fil   tered data are w 1 shorter than the original data  In order to keep data sets       ol    in the same size as the originals  CGH Plotter inputs w 1 NaNs at the end  of each chromosome  The chromosomes are filtered individually because  for  example  otherwise values in the end of chromosome one would affect to the  values of the chromosome two   1  The filtered data are saved and so it is  possible to plot filtered data in the phase  Plot data        32    User inputs  CGH Plotter    Data             2 a  Chromosome indices              gt  reate struc               Basepairs  Names  Find Amplicons BP filtering  Find number of changes e    Filtering  Filtering k means Change function  median filter a o  Median Average filter              gt    median average   clustering  define the num   constant window size in basepair units  Window size                        window size  clusters the   ber of changes  filters only outside known gaps    data to three  clusters  Filtered data Clustered data   Number of change    Constant for  computing DC levels    Window size  Known gaps    BP convert    Conversion to new basepairs   interpolates missing values     A   interpolates only outside known gaps  Analyzed   amp   Interpolated  data  in A    Plot Data   Write TXT  ee    Figure 23  Overall view of CGH Plotter  The user inputs CGH data  chro   mosome indices  basepairs and names of the samples in    Create Struct    phase   CGH Plotter c
7.  given for each of these phases     3 1 User Interface Pages   3 1 1 Main Page   CGH Plotter is started with command    CGH_Plotter     providing that current  directory in Matlab is CGH Plotter  Main page is opened and CGH Plotter    4    is available for use as illustrated in Figure 2     101 xl    Create Data Struct    Find Amplicons    BP Filter     BP Convert    Plot Data    Write TXT      Exit      Figure 2  Main page of CGH Plotter            The main page contains seven buttons   Create Data Struct      Find Ampli   cons      BP Filter     BP Convert     Plot Data      Write TXT    and    Exit     First   the data struct should be constructed in the page    Create Data Struct     if it  is not done already  After the data struct is created and stored  the analysis  part is executed at the page  Find Amplicons    or at the page    BP Filter      The page  BP Convert    allows user to add new basepairs in filtered result  and interpolate data values for them  In the page    Plot Data    the analyzed  data may be plotted and results of the analysis saved in ASCII file  Finally   int the page  Write TXT    user can write all data  if needed  into ASCII file   Button    Exit    ends session and returns the user to the Matlab workspace   The idea of the blocks in CGH Plotter is illustrated in Figure 3     Create Struct    Creates data struct    saves data struct    Find Amplicons     loads data struct    analyzes data     stores results    BP Filter     loads data struct   
8.  storing the analyzed data and it can  be located arbitrary  Folder ampli_data is initially empty     The diagram of folders in CGH Plotter is illustrated in Figure 1     3       CGH Plotter a l          ampli_math     data_structs   ampli_data    Figure 1  Folders of CGH Plotter  Folders gui and ampli math are subfolders  of CGH Plotter  data_structs and ampli_data can be located arbitrary     3  Instructions    Basically CGH Plotter functions as follows  First  CGH Plotter filters the  data using median or mean filter with window size that has been input  Sec   ondly  the filtered data are clustered using the k means clustering algorithm   The purpose of the k means clustering is to find the maximum number of  amplicons deletions at each chromosome  This number is required by the  last phase  dynamic programming  which actually estimates the amplicons  and deletions  CGH Plotter saves the result file  which consists of the orig   inal data  filtered data  probable amplicons and deletions  indices to the  changes of amplicons and deletions of the CGH data  names of the samples   cumulative basepairs and genomic indices   To be more precise  CGH Plotter consists of five phases        1  CGH Plotter creates a data struct of separate data files that the user  has specified     2  CGH Plotter reads the data struct    3  CGH Plotter analyzes the data struct   4  CGH Plotter stores the analyzed data   5  CGH Plotter plots the data     In this section a more detailed explanation is
9.  window size should also be quite large   gt  5      D  Constant for computing the number of changes    One may specify the constant that is used when the number of changes is  computed  Default constant is six  The procedure how to compute the num   ber of the changes along with some guidelines is given in Section 4 2     E  Save As  button   Before starting the analysis  one has to specify the name for the data struct  to be analyzed  Save As  button opens a Save As  dialog and the name and  the location for the result file may be selected  It is recommended that result  files are stored in the folder ampli_data        F  Start  button   After providing all required information the analysis may be started by select   ing the Start    button  Analysis of the data takes few minutes  For example   analysis of CGH ratios of 11994 genes from 14 samples with Intel Pentium  ITI  2 4 GHz took approximately 5 minutes  When CGH Plotter is ready  a  message box appears notifying that the data set has been successfully ana   lyzed and results of the analysis are saved     G  Main Page  button  By pushing    Main Page    button one can return to the main page     3 1 4 BP Filter    In    BP Filter    phase it is possible to filter data with filters that have constant  window length in basepairs  It is also possible to ignore filtering certain areas  in data by specifying only special regions to be filtered  These regions are  given in separate input file     A  Load  button  This button
10. 085 1 09 1 095 1 1    Figure 15  Chromosome 20 of the sample    BT474     CGH Plotter has now  plotted each of selected data into different figures using genomic index  CGH   data is blue line  amplicon boundaries red line  NaN values of original data  are now marked with crosses  Underneath of the data is a bar where the  amplicons and deletions of the data are marked with red and green bars        HCC1428 chr 20  T T    2 5      MI    15 iii      Wi hal j ii    AMA Wife DA     r Ao           j Nif 1  jo AN m ir    0 5                    1 l 1 1  3 02 3  03 3 04 3 05 3 06 3 07    Figure 16  Chromosome 20 of sample HCC1428 plotted against cumulative  basepairs  CGH data are seen in blue  and filtered data as green line     29    SKBR3 chr All          1 2 3 4 5 6 7 8 9 10 11 12 134 1516 17 189 20222 X Y    Figure 17  All chromosomes of sample SKBR3  CGH data are seen in blue   and amplicon boundaries as red line  CGH Plotter plots dividing lines be   tween the chromosomes  The bar below the data is indicating the amplicons  and deletions     Amplicon Boundaries    SKBRS  10713 10722 10799 10840 10889 10933 11014   MEF 10712 10722 10892 10905 10943 10944 10949 10969 10976  10380 10983 110039   BT474 10713 10722 10840 10846 10855 10866 10908 10922 10945  109551 109569 10961 109394 10995 11027   BT20  10705 10713 10721 10759 10801 10837   MDA361  10713 10722 10813    EE       Figure 18  Amplicon Boundaries message box     26    Ed Microsoft Excel   Boundaries       il  File Edit
11. 502  407502 561231  611231  902347  952347  952347  1002347 1196421  1246421  1430675  14005675  1762474  18612474 1002622  1912622  2217008  226 000  3397365  acabo 451543  4565293 4913319  49753519  5105956     t H  Sheet1 La    F        wiew  Ins  rt    Lie L    a O E      A  tm  a    Ready       Figure 8  Example of the region info file    Region info file must be either in matlab    mat  or in tab delimited text      txt  format  If   mat format is used  then file must contain variable that  is Nx2 matrix containing start point and end point  in cumulative basepair  units  for every N separate regions  Element  i 1  of the matrix is considered  as a start point for i   th region and element  i 2  as an end point for i   th re   gion  Name of the variable in   mat the file must equal to filename  If   txt  format is used  file must contain a matrix  similar to one described above   in tab delimited text format  Structure of the region info file is illustrated in  figure 8     14    If user does not load any special region info  chromosome limits included  in data struct will be considered as start points and end points for regions    This means that each chromosome is filtered separately but other known  gaps in data will not be ignored         Figure No  1  Used window lengths  in number of clones in each window  E    aj x   Fie Edit View Insert Tools    Window Help    JOSH Sl LAASL  ER  14    12    10       O 2000 4000 BO000 S000 10000 12000 14000       Figure 9  Exampl
12. CGH Plotter    User Manual    Contents    1    2    32  Crealo Dala E Se ep he A  Sele o AAA A A he a Se Tt    So    SP CONVE  e coa et eee os Ge ee Se He ee    Introduction  Installation  2 1 Installation Instructions               20220082008 4  Instructions  3 1  User Interface Pages        4 lt  635 24 48 wee Bae ow eo  3 1 1 Main Page             ld BP Filter  caemos ae e  3 1 6 Plot Data              Sled Write TAXE  e sr  Methods  A  ti s to Ode  2  tor e Be a  4 2 k means Clustering             4 3 Dynamic Programming             4 4 Filtering according to basepair units    Summary    31  31  34  30  36    3T    1 Introduction    Copy number changes  such as deletions and amplifications  are common  aberrations in cancer and are known to involve genes that play a crucial role  in the development and progression of the malignant disease  5   The copy  number changes span usually large regions of the genome and therefore in   fluence multiple genes at the same time  Comparative genomic hybridization   CGH  on DNA microarray allows simultaneous monitoring of copy numbers  of thousands of genes throughout the genome  6    7     CGH Plotter is a versatile software that allows the user to plot CGH  copy number data as a function of the position of the genes along the hu   man genome  and to rapidly determine the exact locations of copy number  changes  such as amplicons and deletions    In this user manual we explain in details     1  How to install CGH Plotter     2  How t
13. D Remove  lt  lt   E       Plot    Index to Gene  FT Data    l Cumulative base pair K  FT Amplicon boundaries instead of genomic index        Baseline  T Combined amplicon boudaries  method  A       T Median of the chromosome p L  M      Filtered data instead of value 7     Plot results T Combine amplicons and          deletions  l superimpose all data to one figure    FR each plotto own figure Save boundaries    Figure 13  Plot Data  window     20    C  Data type   The CGH data can be plotted either as log transformed or as ratios  If the  the data is plotted as log transformed  CGH Plotter adds    1    to the natural  logarithm value in order to move the baseline to around ratio of one  In every  case amplicon  deletion boundaries and filtered data are seen as ratios        D  Samples   One may choose which CGH data sample he wants to plot  If the last option  All    is selected  CGH Plotter adds the selected chromosome of each sample  to the data listbox        E  Chromosome   In CGH Plotter one needs to select either the chromosome that he wants to  illustrate or the option    All    when the ratios of the sample will be plotted  genome wide        F  Add  button   After above mentioned attributes are selected  Add  button    will take the  facts of the data to the listbox on the right  Data must always be exported  to the data listbox  because CGH Plotter handles only the data in listbox        G  Data listbox   In the data listbox one can see the part or parts of the da
14. ave as    3    C Chromosome       F rite EGH Data    D F rite Filtered Data    F rite DP Data Write   E    F serite Interpolation inte    F serite BE Eilterwindow size  in number ot clones     Main Page   F       Figure 21  BP Filter  window     A  Choose data  button   By pressing Choose data  button user can select analyzed data or a data  structure that will be converted to text file  If selected file is not in correct  format  other options in the window will not be enabled  If file is in correct  format  its name will appear on the right side of Choose data  button     B  Save as  button  By pressing Save as  button user can select filename for output text file  If    29    filename is correctly selected  it will appear on the right side of Save as    button     C  Sample and Chromosome options  User can choose one or all samples chromosomes to be printed in text file        EA Microsoft Excel   test tut   E   File Edit view Insert Format Tools Data Window Help AdobePDF   Typeaquestionforhelp     4 X    a  aval  0  B7U   H 9   8 El   A   gt   A  hi fe Chr   ee ee ee  4  Chr IGenomicinBasepair Windowalze  SkKBRS MCF  MOAASb    CGH Filt CGH Filt CGH Filt   1 1  107913 4 1 024650E  1 0159058 1  O60095E 1 00505361 165759E 9 651   1 2 106097 4 1 004434E 1 015985E 1 0334558 1 085053E 9 376531E9 651   1 31 120902 4 1 007356E  1 015985E 1 261760E 1 055055E 9 929579E9 651  6 1 4  171630 411 223330E31 0159958 1  0020158 1 09505389 26251089 b5   1 5 224500 09  709794E  Nah 9 062247
15. ave as    New basepairs    Load regions    ll    Chromo limits  Interpolate gaps inside regions  shorter than    o BPs otart   G    Main Page   H    o O    Figure 11  BP Convert  window     A  Load  button  By pressing Load  button user can load data file that contains filtered data    Data must have been filtered in either Find Amplicons or BP filter phase      B  Save as  button  By pressing Save as  button user can select filename for output file  If file   name is correctly selected  it will appear on the right side of Save as  button     C  New basepairs  button  By pressing New basepairs  button user can select new basepair info for  loaded datafile  see A   File containing new basepairs must be in either mat   lab   mat or tab delimited text   txt format  If   mat format is used  the  file must contain a one variable  column vector  that lists new basepairs in  cumulative basepair units and in ascending order  Name of the variable must    17       Ed Microsoft Excel   Book1 gg pe ol x    File Edit View Insert Format Tools Data Window Help Adobe PDF   f X   Ey    arial  10     BZU E E Ee fe AA  gt    1 7 fe    107913  105097  129902  171630  224500  276940  33561  3001011  ADBBODb  450205  465625  582575  b54003  162431  Eje  760 301  FO522b  dbrb94  gt     Moa ob a sheetl   Sheet  f Sheet3  al   H          Figure 12  Example of the file containing basepair info    equal to name of the file  For example basepairtest mat must contain variable  called basepairtest  If   t
16. box with text Ready    pops out     G  Main page  button  Main page  button returns one to the main page        Data struct can also be created manually  However  the struct must have  the following fields     e data_struct data  CGH data  size m x n       data_struct chromo  Indices to chromosomes  size 25 x 1      data_struct basepair  Cumulative base pairs  size m x 1      e data_struct samples  Names of the samples  size n x 1      3 1 3 Find Amplicons    Phase    Find Amplicons    involves several components  The aim of this phase  is first to find amplicons or deletions and then create a result file for plotting     A  Load data  button  This button enables loading of the data struct made in phase    Create Data  Struct        B  Selected data  text box  When the data have been selected  the name of the data file can be seen in  the text box next to  Load data    button     C  Filter parameters    e It is possible to specify the type of the filter  possible options are    Move  median    and    Move average     By default CGH Plotter uses Move me   dian    filter     10    Find Amplicons    me    E    EAN    E F    C  Moving mesian         Figure 6  Find Amplicons  window     11       e Also the window size for filtering the data may be defined  Default  window size is five  Window size is dependent on the amount of noise  in the data  When the amount of the noise in the data is small  it is  enough to have small window size  e g  1 3   However  if data are very  noisy 
17. e  Y chromosome  This order of genes is referred to as genomic index  Missing  values have to be replaced with NaNs  Not a Numbers   Finally  the data  should not be transformed e g  with log transform prior to CGH Plotter   After selecting a data matrix  the name of the selected data appears to the  text box next to data button        B  Chromosome indices  button  obligatory     Y    As it is essential to know where each chromosome begins  the starting points  of the chromosomes as indices to the data matrix needs to be specified  Chro   mosome indices is a 24 x 1 matrix  First 22 indices are the starting points of  chromosomes 1 22  23 rd is the starting point of chromosome X and 24 th of  the chromosome Y  Also the chromosome indices can be in   txt or in   mat  format  An example of chromosome indices matrix in   mat format is shown    below   3     ll  1338  2121  2829  3292  3901  4548  5ll5  5480  5924  6408  1047   129  1941  8393  8 01  9193  9812  9994  10695  11047  11198  11529  11980    Chromosome indices      CGH Plotter adds the last index of the chromosome  Y    to chromosome in   dices matrix  Therefore the chromosome indices is a 25 x 1 matrix during  the analysis     C  Base pairs  button  optional     It is illustrative to plot the CGH ratios as a function of their actual location  along the genome in base pairs  Therefore we have included the possibility  to define cumulative base pairs for the data  Also the Base pairs file can be  in   mat or   txt for
18. e of the plot after filtering process    E  Start  button   User can begin filtering by pressing Start button  When filtering process  begins or ends  user will be notified by a message box  After filtering is  complete  used true window lengths for each gene in data will be shown in a  single plot  see figure 9  Information on true window lengths can be useful  when adjusting window length in basepairs  Filtered data can be plotted at  the    Plot Data    page  Such plot is illustrated in figure 10     15     nl x     File Edit View Insert Tools Window Help     D ngal RA ASY  PE    Chr All       1 2 3 da la ff ie  ee  i  A 12 114439617 1 204K Y       Figure 10  Example of the plotted BP filtered result    F  Main Page  button  The Main Page  button takes user back to the main page     3 1 5 BP Convert    In    BP convert    phase it is possible to manipulate basepairs in filtered data  file  Basepair info in the file can be replaced with a new basepair structure  given as a separate input  If the new basepairs contain such pairs that are  not present in original data file  CGH and filtered data values for those pairs  will be NaNs  These  and all other  NaNs can have an interpolated data  value if interpolation window is specified     Interpolation process can be adjusted by giving as a separate input those    regions  in cumulative basepair units  where interpolation is needed  Also  maximum length for interpolation window can be specified     16    lx    Convert    Load  S
19. ed data value in output file  1 indicates that the data value is  interpolated  whereas O indicates that the data value is not interpolated     If Write BP filter window size  checkbox is selected  true window sizes  that  were used in filtering  will be printed for each clone index     E  Write  button   Writing to text file begins when Write button is pressed  When writing pro   cess begins or ends  user will be notified by a message box  Resulting text  file is illustrated in figure 22     F  Main Page  button  Main Page  button takes user back to main page     4 Methods          In this section we describe the methods used in CGH Plotter in greater detail   The overall view for CGH Plotter is given in Figure 23     4 1 Filtering    Before applying the k means clustering  CGH ratios in each chromosome are  filtered with the moving median or average filter  The user may input the  type  i e  mean or median  and the size of window for the filter  Suggested  window sizes are between three and nine    The filtering proceeds as follows  First CGH Plotter computes the me   dian average of first w values  where w is the size of the window  For ex   ample  if w is five  the first value in the filtered data is median average of  the first five CGH data points  Then CGH Plotter takes again w values  beginning from the second data point and computes the median average de   pending on the user   s choice  The filtering stops when the last data point  is reached  Therefore  in standard
20. er  one should note that if the data are very noisy   the user should try smaller constant in order not to detect noise instead of  amplicons and deletions  There are surely many other ways to determine the  number of changes and in that case the user may want to modify the way  the number of the changes is determined to the file  Compute_kmean m     4 3 Dynamic Programming    In this section dynamic programming is briefly explained  More detailed  presentation on dynamic programming can be found  for instance  from  4     In CGH Plotter it is assumed that copy number ratios can be approx   imated with a constant and an error term  As a consequence  CGH data  can be understood as a signal having constant levels  and In essence  there  exists three kinds of constant levels  base line  amplicon and deletion levels  and these are to be identified by the dynamic programming algorithm  It is  assumed that the number of the changes of constant levels  c  is known  We  use k means for this purpose as explained in previous section    Assume that the CGH signal       Ay A 2     Ag n n 1 n1 2     n2  aa S  Ach n n 1 n  2     N    is corrupted by noise  Dynamic programming identifies constant levels  A    4  42 43      Ac 1  and change points n    Ny  N1 N2 N3     Ne  Ney    where ny   1 and n  1   N by minimizing the function    J A n    ES 2   a n      As      T he idea of the dynamic programming is to find the shortest path from the  value  1  to value z  N   Dynamic programming util
21. izes the Markov property   which ensures that the distance between points x n    and z n  does not       30    depend upon which path was used at arriving to the point x n    Therefore  dynamic programming is capable for finding the minimum of J A n  without  checking every possible combinations of n1 n2     Ne    In practice  the procedure for identifying the constant levels proceeds as  follows  First  constant levels are estimated  A  is the mean of the interval  Inj 1   l  n   and    Am at 1  nil   n    x n   Ai         Second  function J A n  is minimized over n using dynamic program   ming     li   L  mindz  Ay  Ni   1  nj     min  mind 2 A   ni1   1  nl    Ax re_1   1  nel       min Ip 1 nx    1    Agl ng   1  LI      This shows that the minimum error for the interval  1  L  can be computed  by adding the minimum error of the last segment to the error of the previous  segments    CGH Plotter stores constant levels A and indices to the change points of  these levels        4 4 Filtering according to basepair units    Filtering according to basepair units is almost same thing as  normal  filter   ing according to clone indices  T he main difference is  that BP filter window  size is constant in basepair units while normal filtering window size is con   stant in the number of clones  The filter window is chosen so that half of it  is chosen from left side and another half from right side of the clone     In the real data there are always locations where adjacent genes sho
22. ka   nen  M   Chen  Y   Bittner  M   Kallioniemi  A   2001   Comprehensive  copy number and gene expression profiling of the 17q23 amplicon in hu   man breast cancer  Proceedings of the National Academy of Sciences   USA  Vol  98  pp  5711 5716     Pollack  J   Perou  C   Alizadeh  A   Eisen  M   Pergamenschikov  A    Williams  C   Jeffrey  S   Botstein  D   Brown  P   1999   Genome wide  analysis of DNA copy number changes using cDNA microarrays  Nature  Genetics  Vol  23  pp  41   46     38    
23. mat  Base pairs file is an m x 1 vector  where m is the  number of genes  If base pairs are not specified  CGH Plotter will use only  the order of the genes along the genome  i e  the genomic indices        D  Names of the samples  button  optional    The names of the samples can be specified  If names are given in   mat  format they should be given in n x 1 string vector  where n is the number of  samples  Names cannot include space characters or special characters that  Matlab considers as mathematical symbols  like         or          For example  if  the number of samples is three  the cell struct can be made and saved in  Matlab as follows      gt  gt Names        BT474          MCF7          ZR7530         gt  gt save Names Names    If names for the samples are not defined  CGH Plotter refers to first sample  as  samplel     second sample as    sample2    etc        Furthermore if the names are defined in   txt file  they must be given in  one row and each in own column as shown in figure 5        Fie Edit Wiew Insert Format Tools Da    leh see   l ba  ll Fe          A  B ic O  1 Sample  Samples Samples  Sampled        Figure 5  Names of the samples in   txt file     E  Save as  button  One must give a name for the data struct and select the folder where it will    be saved  Folder data_structs is meant for this purpose  but it is not obliga   tory to save data structs there     F  Create  button  CGH Plotter creates a data struct  When the struct is created a message  
24. n be stored to tab delimited text file  in which  the results can easily be examined    The freely available CGH Plotter is really easy to operate with  Further it  is easy to modify and add functions to CGH Plotter  CGH Plotter toolbox is  under continuous development and in the future it will include new analysis  and illustration functions    CGH Plotter has shown to be capable of rapid high throughput analysis  of CGH data  Moreover the results obtained from CGH Plotter are consis   tent with chromosomal CGH and thereby the results given by CGH Plotter  are verified by biological knowledge        References     1  Astola  J   Kuosmanen  P   1997   Fundamentals of Nonlinear Digital  Filtering  CRC Press LLC  Florida     2  Duda  R O   Hart  P E   Stork  D G   2001  Pattern Classification  John  Wiley 4 Sons  Inc  New York  2nd edition      3  Hyman  E   Kauraniemi  P   Hautaniemi  S   Wolf  M   Mousses  S    Rozenblum  E   Ringn  r  M   Sauter  G   Monni  O   Elkahloun  A    Kallioniemi  O P  and Kallioniemi  A   2002   Impact of DNA amplifi     37    ol    cation on gene expression patterns in breast cancer  Cancer Research  Vol  62  pp  6240 6245     Kay  S M   1998   Fundamentals of Statistical Signal Processing  Vol   ume IT  Detection Theory  Prentice Hall  New Jersey     Gray  J  W   Collins  C   2000   Genome changes and gene expression in  human solid tumors  Carcinogenesis  Vol  21  pp  443 452     Monni  O   Barlund  M   Mousses  S   Kononen  J   Sauter  G   Heis
25. o use CGH Plotter        3  How to store and analyze the results   4  What are the assumptions behind the analysis     We also provide several examples on the use of CGH Plotter     2 Installation    CGH Plotter requires Matlab 6 1 or higher in order to operate  Accordingly   all data must be in Matlab    mat  format or in tab delimited text    txt     format     2 1 Installation Instructions    Archive    CGH Plotter zip    consists of five folders  CGH Plotter  gui  am   pli math  data_structs and ampli_data     e Main folder CGH Plotter contains the following folders and files         gut        ampli_math       CGH_ Plotter m  and    CGH_Plotter fig        e Folder gui  Graphical User Interface  includes functions and corre   sponding figures      create_struct m      create_struct fig      amplikoni m      amplikoni fig     bp filter m    bp_filter fig    bp_convert m      bp_convert fig    plot_data m        plot_data fig      write_txt m        write_txt fig        end_all m        end_all fig       e Folder ampli  math includes all mathematical functions used in CGH   Plotter      bp_median m    combined m      compute_kmean m        cumulative m      define_amplicons m      dynamic_prog m      filter_data m     find regions      handle_NaNs m    kmean m        transform_data m          writeresults m       e Folder data_structs can be located arbitrary  It is meant for storage of  data structs of CGH data and is initially empty     e Folder ampli_data is intended for
26. or  samples are illustrated with points  filtered data and amplicon  boundaries with lines  Combined amplicon boundaries are seen as thick  black line  Figure 14      If one selects    each plot to own figure     CGH Plotter will illustrate every  sample individually  Figures 15 and 17   CGH Plotter plots CGH data  with blue line and amplicon boundaries with red  If    Filtered data    is  selected CGH Plotter will plot filtered data of the sample with green  line and if  Combined amplicon boundaries    is selected CGH Plotter  will plot combined boundaries with black line     K  Index to a gene  One may select whether he wants to see cumulative base pairs in the x axis  instead of genomic indices     L  Baseline  One may select whether he wants CGH Plotter to use median of each chro   mosome as baseline of the chromosome  By default baseline is value    1        M  One may select to define adjoining amplicons  or deletions  as one am   plicon  deletion  in the resulted boundary file     N  Save Boundaries  button    22    This button allows one to specify a name for the boundary file and select the  folder where he wants to save it  CGH Plotter creates a tabular separated  ASCII file as illustrated in figure 19  If the name is not specified  the results  are not saved  By default CGH Plotter will save the amplicons with height  over 1 2 and deletions with height smaller than 0 95  If needed it is really  straightforward to change these limits in the beginning of the function
27. ramming  One may choose the properties of the created data  set to be illustrated  It is possible to plot the CGH data as ratios or log   transformed ratios  and to plot amplicon boundaries from an individual sam   ple or combined amplicon boundaries from a group of samples    One may plot the CGH data  filtered data  or amplicon boundaries ei   ther from one chromosome or across all chromosomes  It is also possible to  plot results from several samples at the same time  Thus one may choose  whether the results are illustrated in one figure or in multiple figures  By  default CGH Plotter uses genomic indices to plot the data but one may also  select to use cumulative basepairs        A  Choose data  button   By pushing choose data  button one can select the data to be illustrated   Result file has to be constructed in the  Find Amplicons    phase  and consist  of seven fields     e    data     CGH data     e    datafilt     Filtered data     19       dp     Amplicon boundaries computed with dynamic programming     e    tu     Indices to the changes of amplicon boundaries     e    chromo     Indices to chromosome starts     e    basepair     Cumulative base pairs     e    samples     Names of the samples     Only one data set can be illustrated at a time but it is possible to observe  several properties of the data simultaneously     B  Selected data  text box  The name of the selected data file is seen in textbox     Selected data           F H G  Plot Data y al ES  B  A  C  
28. reates a struct that is used in the phase    Find Amplicons     Fur   ther  the user defines the type of the filter and size of the window  which are  used in filtering phase  CGH Plotter clusters filtered data into three clusters  with k means clustering algorithm  Clustered data are delivered to the func   tion that computes the maximum number of the change points  The number  of changes is needed when dynamic programming algorithm computes the  amplicons and deletions  In    Plot Data    and    Write TXT    phase the user  may plot the results of the analysis and save the results in ASCII file     New basepair  Known gap               33    4 2 k means Clustering    k means clustering algorithm is used for finding the number of amplicons   deletions for each chromosome  The idea behind the k means clustering is to  cluster the data to k clusters  k is assumed known   Here the number of the  clusters is three denoting amplified genes  deleted genes and baseline genes    In the k mean clustering means u1  u2  u3 are first initialized to be the  5 th biggest  the median and the 5 th smallest values  respectively  Actual k   mean clustering proceeds as follows  First  a ratio from the sample is drawn  and nearest mean Hwinner 18 found using Euclidian distance  Second   winner  is updated by moving it closer to the ratio  This procedure is repeated until  all m ratios are used  Pseudo code for the training phase   2        1 begin  initialize u1  U2  U3    2 do classify m ratio
29. s to nearest Hi  3 update Hwinner  4 until the last m    5 end  return u1  H2  H3    After training phase every ratio is classified to the nearest cluster  The  clusters are presented as  1  0 and 1  denoting deleted  base line and amplified  genes  The number of the changes is determined as follows  CGH Plotter  computes Tmar that denotes the mean value of 2  of the highest values in  the cluster    amplified           Emax   Mean maxy cluster  1       In a similar fashion Tmin denotes the mean value of 2  of the smallest values  in the cluster    deleted        Emin   mean mingy  cluster     1         We have chosen 2  of the highest  smallest values since the data we used  were not very noisy  However  this parameter can be changed in function        Compute_kmean m       The distance between Tmar and Tmin 18 computed and multiplied with the  constant that the user has determined  The number of the changes  c  is the  result of the multiplication rounded downwards        c   constant    mar     Tmin      34    The default constant is six  T his number was determined empirically by  adjusting it so that known amplicons are found from chromosome 17  The  result was then validated by comparing the results to other chromosomes  containing known amplicons and by chromosomal CGH  illustrated in Figure  20   In other data sets there may be a need to change this number  If there is  known amplicons  we suggest similar way to assess the number of the changes  as we have done  Howev
30. ta that  CGH Plotter is about to plot  Parts of the data are written in the form    Data name  Data type  Sample  Chromosome     It is possible to select sev   eral parts of the data  but the number of genes must be same for every part     H  Remove  button   Remove  button removes selected data from the data listbox  First one has  to select the data that is wanted to be removed        D Plot  One can select the properties to be plotted     e If    Data    is selected  CGH Plotter plots original CGH data     e If Amplicon boundaries    is selected  CGH Plotter will plot the ampli   con deletion boundaries that are computed by the dynamic program   ming algorithm     21    If  Combined amplicon boundaries    is selected  CGH Plotter will plot  combined amplicon boundaries from selected samples     The method for computing the combined amplicon boundaries can be  selected  Possible choices are average  median  maximum  and mini   mum  By default CGH Plotter uses average     If  Filtered Data    is selected  CGH Plotter will plot filtered data that  are computed by the filtering algorithm  The window size and the type  of the filter were determined in the phase    Find Amplicons         J  Show results  One can select how he wants CGH Plotter to present the data     If    superimpose all data to one figure    is selected CGH Plotter will  plot all selected data to the same figure  Each sample  filtered data  of the sample and amplicon boundaries of the sample have the same  col
31. tle of the first column tells which chromosome is in question   Names of the samples are titles of the other columns  File presents the type   start  end and height of the amplicon or deletion  It also gives the maximum  ratio value of the amplicon and the minimum ratio value of the deletion     2    Copy Number Alterations in the BT474 Breast Cancer Cell Line Genome    CGH Plotter   Original data          Amplicon deletion boundaries    CGH Plotter           pope A  n a i     a a a l  om ODER ASA ERA O TS O CDI Ml Ta ae MM   it atl    500000000 1000000000 1500000000 2000000000 2500000000 3000000000    Chromosomal  CGH             Cumulative Genomic Base Pair Location    Figure 20  Chromosomal CGH and output of CGH Plotter for breast can   cer cell line    BT474     CGH Plotter original data is shown on top  ampli   con deletion boundaries in the middle and chromosomal CGH data on bot   tom  CGH Plotter can clearly identify amplicons and deletions detected by  chromosomal CGH and  as expected due to the higher resolution of array   CGH  also reveals additional aberrations     In order to compare the performance of CGH Plotter we have illustrated  both the chromosomal CGH and the output of CGH Plotter in Figure 20     28    3 1 7 Write TXT    In phase  Write TXT    user can write analyzed data  or simply contents of  data struct  to text file for further analysis  Data will be printed in tab de   limited text format     Ela    Write data in text format    A Choose Data     B S
32. uld not  interact with each other during the filtering process  Such locations are  for example chromosome borders and other know gaps  There might be  also other reasons to limit interaction between genes in sense of filtering   Therefore  it is possible to give information of the regions that should be  treated separately during the filtering process  For every given region filter  window slides over the whole region  At the begin and at the end of the  region window size is only half of the given window size   Since one half of  the window is outside of the region   Also if given filter window size is very    36    big  for example 10     basepairs  result from filtering process is constant for  each region  If any special info on regions is not given to BP filter  CGH   plotter considers each chromosome as a separate region     5 Summary    CGH Plotter is a Matlab toolbox that is aimed to CGH data analysis  The  main purpose of CGH Plotter is to identify and visualize the amplicon and  deletion regions of CGH data  With a graphical user interface CGH Plotter is  straightforward to use  The user has many possibilities to illustrate the CGH   data  For example  the data can be illustrated as ratios or log transformed  ratios and plotted against basepairs  if available   CGH Plotter enables the  user to visualize each sample individually or all samples in parallel  It is  also possible to plot the data of one chromosome or the data of the sample  genomic wide  The results ca
33. xt format is used  file must contain a column vector   similar to one described above  in tab delimited text format  Structure of  the region info file is illustrated in Figure 12     D  Regions  button   By pressing Regions  button user can select region info that controls inter   polation process  Interpolation will be executed only in these given regions   Format of this region info file is described in the previous section     E  Chromo limits  button   By pressing Chromo limits  button user can select general chromosome limits   in basepair units  that will be used during basepair conversion  File con   taining the region info must be in matlab   mat format and it must contain  variable called chromo or a variable whose name is equal to filename  The    18    variable must be 25x1 vector containing start points for every chromosome  and an end point for last chromosome  in 25   th element      F  Interpolate gaps shorter      In this field user can specify the longest gap  in basepair units  that will be  interpolated  If there are less than two clones in that area  interpolation will  not be done  Default value for this field is 150000 BPs        G  Start  button  By pressing start button user can start the conversion process  Status of the  process will be show in a message box     H  Main page  button  The Main Page  button takes user back to the main page     3 1 6 Plot Data    In    Plot Data    phase it is possible to compare the data and results from  dynamic prog
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
Hoja de Características SYON50  2.a2.8-Lista-de-docu.. - Servicio Ecuatoriano de Normalización  Samsung EX1920NW 用户手册  Troubleshooting Aspects - The Tulip Group, India  Kenmore 519 Sewing Machine User Manual  con motore a pistone with piston compressor avec motor a piston  Poulan P2500 Parts Manual  901000147340 - IMP MU PAP F MANUAL DO USUÁRIO  varimed® Mobiliario médico para reconocimiento  MANUAL - Sauna's en Zwembaden.nl    Copyright © All rights reserved. 
   Failed to retrieve file