Home

TIGR MultipleExperimentViewer (MEV)

1. Pa P E E 02 BE 4 25 HCL ST SOTA RN KMC KMS cast QTC GSH SOM FOM M ANOVA ANOVA SVM KNN GDM PCA TRN EASE 2 B Cluster Manager 2 0 353 9 Analysis Results Dec 23 2004 10 22 22 AM 0 283 Data Source Selection 9 m coa E 9 Projections on COA axes 0 212 Components 1 2 3 SOL E Rd 43D views 4 s ea ce 33D view genes 2 rea CAE 33D view expts MUST e 0 071 x 3D view both 9 12D views Z s 120 views genes 7 naj 12D views expts 0 449 0 359 e 08 0 18 027 0 359 0 449 b both OB mheartmev aj C Store gene cluster et RE s 2 Ja Launch new session with selected genes 0141 ea 1 E save gene cluster if MP inertia values 3 Store sample cluster samples 2 a Launch new session with selected samples 7 re b Expression ma 3 Save sample cluster Centroid Graph gt 0 283 e E Expression Gra Show eee names HBTable views 1 Show tick marks and labels 5 0 353 1 Delete Cluster Information E ad
2. Clone Name Chromosome Start Stop BxPC 3 vs DanG vs hu HPAC vs h Hs766T vs HUP T4 vs PA 8902 vs Panc1 vs h 8086 86 VS _23_ 146 9 19045999 19046119 0 034 0 092 0 187 0 501 0 207 0 086 0 127 23 P157 9 19362311 19362431 D 431 0 358 0 458 0 081 0 344 0 399 23 P123 9 19366245 19366365 0 192 0 302 0 61 0 047 0 221 0 195 _23_ 21 9 __20336419 2033453 0355 1305 D489 0 245 0314 0008 0 991 23 P21673 a 20985737 20985857 0 352 1 015 0 051 0 335 0 257 0 048 0 785 A 23 P83175 8 21005872 21005992 0 237 1 512 0 504 0 237 0 095 1 308 23 P71774 9 21057474 21057594 0 496 1 065 0 124 0 19 0 034 0 014 0 762 A_23_P60373 9 21131546 21131666 0 562 1 454 0 232 0 099 0 119 11 41 _23_ 258 9 21155827 21155947 1 498 0 018 0 004 0 195 0 373 0 655 A_23_P94563 9 21294699 21294819 2 871 2 27 0 195 0 307 0 537 0 354 1 367 _23_ 83159 9 21321040 21321160 1 295 1 364 0 171 0 216 0 623 0 609 0 248 A_23_P71757 9 21340343 21340463 2 909 2 476 0 039 0 165 0 396 0 355 1 314 23 P250 9 21375266 21375386 2 287 2 865 0 165 0 163 0 524 0 31 1 018 23 P145 9 21399838 21398959 1 846 2 043 0 326 0 283 1 441 _23_ 135 8 21431020 214314 3238 o 0426 0 632 0 508 ___ 1 236 23 P123 8 21445073 21445193 0 391 0 173 B 0 1 0
3. b xe sy BS FAEAFAESIE JAECHEHESEAEAEIESESIEGETMENFE x 2 S 4 gt 18 819 0 0 Mean Adjusted FOM values SD vs Number of Clusters 11 35 0 0 gt 8 995 0 014 gt 6 036 0 039 gt 4 479 0 0 gt 4 466 0 0020 gt 4 43 0 116 gt 4 421 0 116 gt 4 416 0 111 gt 4 379 0 16 Show Iteration Values 24 374 0 158 gt 4 313 0 214 gt 4 127 0 269 gt 4 247 0 248 gt 4 156 0 28 gt 4 112 0 294 gt 3 893 0 316 gt 3 931 0 355 gt 3 992 0 297 gt 3 9 0 363 main view M Cluster Manager 9 ih Analysis Results Apr 9 2004 4 50 59 B8 FOM genes 1 2 Graph FOM value vs 0 General Information 2 si Script Manager Sj History Details TIGR MultiExperiment Viewer 11 12 1 FOM vs No of Clusters graph for KMC algorithm 1 20 clusters 20 iterations Sample Selection Gene Cluster FOM Sample Cluster FOM Iteration Selection Number of FOM Iterations 1 K Means K Medians CAST Calculate means Calculate medians Maximum number of clusters enter an integer gt 0 20 Maximum number of iterations enter an integer gt 0 50 K Means K Medians will be run using a starting K
4. Sample_growth_protocol_chl optional Sample_molecule_chl required VALUE total RNA polyA RNA cytoplasmic RNA nuclear RNA genomic DNA protein other Sample extract protocol chl optional Sample label 1 required Sample label protocol 1 optional Sample source name ch2 required Sample organism ch2 required Sample characteristics ch2 required Sample biomaterial provider ch2 optional Sample treatment protocol ch2 optional Sample growth protocol ch2 optional Sample molecule ch2 required VALUE total RNA polyA RNA cytoplasmic RNA nuclear RNA genomic DNA protein other Sample extract protocol ch2 optional Sample label ch2 required Sample label protocol ch2 optional Sample hyb protocol optional Sample scan protocol optional Sample description required Sample data processing required Sample platform id required provide accession of existing GEO Platform for array used e g GPL1001 ID REF required this column should correspond to the ID column of the reference platform VALUE required normalized log2 ratios are typically provided in this column HEADER 3 optional HEADER 4 optional HEADER N optional any number of user defined columns can be included and it is recommended that data tables be as comprehensive as possible excep
5. Cookies Available Files tt Selected GEO SOFT Two Channel File C Desktop 4 C document est C eclipse A C Favorites EA W Co Ful faffycali_simple txt Expression Table 2 ll_simple2 txt 1 C help call A Cy Installanywt ffy_mass txt ji El C jdbc 4 mas5 simple txt Local Setti mas5_simple2 t mey cvs samplet txt I midas Affy_sample2 txt C My Docume bnpreprocess txt NetHood L C paper Z BreastvsTumorData t E C PrintHood censor tt al G rama core txt C Recent core sample txt e first column should be We gt runtime Ecli sample2txt E If your file is in dif order es eil der ius defined divided by M D Then click the upper leftmost expression value Click the Load button to finish Start Menu Templates Platform File Available Platform Load Options Itmi m E O Load Unload C UserData calltxt C WebEx call simple txt p Platform File 3 workspace affy call simple2 txt B C metadal Selected Platform File bin _laffy_mas5_simple t C hello laffy_mas5_simple2 t 8 mev samplettt _50 sample2 txt C mevorig bnpreprocess txt BreastvsTumorData G con Y E
6. TIGR MultiExperiment Viewer 11 9 1 QT CLUST Expression Graphs 95 f QT Cluster Sample Selection Cluster Genes O Cluster Samples Parameters Maximum Cluster Diameter 0 5 Minimum Cluster Population 5 Use Absolute R Hierarchical Clustering Construct Hierarchical Trees TIG Reset cancel OK 11 9 2 QTC Initialization Dialog Parameters Sample Selection The sample selection option indicates whether to cluster genes or samples Maximum Cluster Diameter Cluster Diameter is related to the overall variability between the member elements within a cluster Maximum Cluster Diameter is a constraint on that variance such that all formed clusters must have a diameter variability smaller that the entered maximum Increasing the maximum diameter tends to make larger and more variable clusters and decreasing the maximum diameter tends to produce smaller less variable clusters Minimum Cluster Population The minimum number of elements required to be present in order to form a cluster For instance a Minimum Cluster Population of 10 insures that all formed clusters will have at least 10 members Use Absolute R Using this option will group expression patterns that are positively and negatively correlated Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created Defaul
7. C Desktop Available Files txt Selected SOFT Format Affymatrix File C document S Ceclipse W C Favorites Il simpletxt C Full call simple Expression Table help affy call simple2 txt affy mas5 txt 7 C3 InstallAnywfi d E jdbe affy_mas5_simple t 7 Local Settini _mas5_simple2 t C mev_evs Affy_sample1 txt t C midas sample2 txt Z L Enei BreastvsTumorData t paper Hm 4 ID censor rama core txt The first column should be ChiplD Cj Recent core sample txt The second column should be normalized signal intensity runtime Ecl core sample2txt _ The third column should be detection call C3 SendTo z If your file is in different order you can reorder them as required C source UZ gt Then click the upper leftmost expression value Click the Load button to finish Start Menu Z s Templates platform File Available Platform Load Options C tmp 32 3844 x Load Unload C UserData calltxt C WebEx _eall_simpletxt platform File 2 workspace affy call simple2txt C metada rA rSelected Platform File Cbin affy mas5 txt C hello laffy mas5 simple txi mer affy mas5 simple2 t mev so Affy s
8. main View M Cluster Manager a Analysis Results R Script Manager SF History TIGR MultiExperiment Viewer 4 13 1 Main View in MeV Borders drawn on image 4 14 Result Navigation Tree The left side of the main interface is a navigation tree At any time clicking on the Main View node will return to the main display Output from any MeV module will be added as a new subtree under the Analysis node In general clicking on a node in the tree will navigate to the associated result or data view and that view will be displayed in the right panel The initial tree will also include a Cluster Manager node to manage clusters stored from analysis results The Script Manager manages activities related to analysis script creation loading modification and execution The Cluster Manager and the Script Manager will be covered in detail in is sections 7 2 and 10 respectively 4 15 The History Node and Log The History Node contains a log of most activities For each log entry a date and time is recorded Major events such as file loading analysis loading script loading algorithm execution and cluster storage events are logged to the History Log Ifthe analysis is stored to a file then the History is retained and restored so that new events can be logged The history log can be stored to a text file by right clicking in the viewer and selecting the Save History to File menu option
9. 249 15 11 SAM group or class loading file format 250 15 12 TTEST group of class loading file format 251 15 13 ANOVA group or class loading file format 251 16 Appendix Preferences Files 252 17 Appendix Distance Metrics 254 18 Appendix MeV Script D TD 257 19 Appendix R Package Installation 260 Installing under OS X Precompiled Binary Version 261 Updating under Windows 269 Installing under OS A C 269 Installing under Windovws 270 Running under OS m 270 Running under Windovws 270 2 271 PA C tribi l uu k 272 ici 274 1 General Information 1 1 1 2 1 3 1 4 Obtaining MeV http mev tm4 org Maintainer Contact Information TIGR MeV Team mev tigr org Platform System Requirements Java Runtime Environment JRE 1 4 or later Java3D v1 3 or later required for PC
10. Additional Fields Selection Preferences Mantaly emen C1 preferences MevMet Selected Preferences File C MeVNotes C Papers C Program Files C RECYCLER E Browse Preferences C SunOne System Volume In Temp 6 TIGR cancel Load 4 2 1 The Expression File Loader tav Files 4 3 Loading Tab Delimited Multiple Sample txt Files TDMS Format Select the Tab Delimited Multiple Sample Files TDMS option from the drop down menu to load TDMS format files Navigate to the folder containing the file and select the desired file in this format The file will be displayed in a tabular format in the file loader preview table Click the cell in the table which contains the upper leftmost expression value in the file The header labels for the annotation fields will be displayed at the bottom of the dialog Check that the correct fields are listed before clicking Load 11 amp Expression File Loader Load expression files of type Tab Delimited Multiple Sample Files TDMS S Computer ea 9 c Documents and 58 711386 j2sdk1 4 2 05 C3 Java A MyProjects r Available Files txt Affy sample1 txt Affy sample2 txt angiotensin heart stanford ts autismOrdering txt BreastvsTumorData txt gdmsavek txt Selected TDMS File C MtyProjectsWtey 1MdatavStanford Larg
11. mev version CDATA primary data id ID CDATA path CDATA data type ENUM file list file name CDATA file type ENUM analysis alg_set alg_set_id ID input data refIDREFT algorithm alg_id ID CDATA L alg_name CDATA alg_type CDATA input_data_ref IDREF plist key CDATA value CDATA 259 mlist O matrix name ID CDATA matrix_type CDATA xdim CDATA data_node name CDATA data node id ID CDATA element row CDATA col CDATA value CDATA 19 Appendix R Package Installation Please make sure you are using the latest versions of Rama Bridge ee 1 3 0 amp 1 3 1 at the time of this writing 1 Background and Introduction 2 Installing Updating R 3 Installing Rama Bridge 4 Updating Rama Bridge 5 Installing Rserve 6 Running Rserve 1 Background and Introduction The Bioconductor project www bioconductor org is an open source software project that provides a wide range of statistical tools primarily based on the R programming environment and language www r project org Taking advantage of R s powerful statistical and graphical capabilities developers have created and contributed numerous Bioconductor packages to solve a variety of data analysis needs However
12. Or enter delta value here 0 88954455 vi Use Fold Change 20 11 15 2 SAM Output 119 If SAM has been used at least once during a run of MeV the input parameters and SAM graph of the last run can be called up by default thus bypassing the need to run SAM again for that set of parameters In addition to the standard viewers and information tabs SAM also outputs a SAM graph viewer as well as a Delta table viewer Fig 11 15 3 which contains output information for a range of SAM values This information can be saved as a tab delimited text file by right clicking on the table The clusters saved from the other viewers will store gene specific SAM statistics in addition to the annotation and expression measurements stored in clusters from most other modules FESTIGR Multiple Array Viewer D x File Normalization Distance Analysis Display Sort Help SWM oe Aor amp TTEST HCL alg ST SOTA MullipleExperimentViewer al Main view P Delta Table Analysis E Asam A Median false 90th ile false sig genes FDR Median FDR 90th Graph 0 66 182 75515 78 788 89 899 3 Delta table 0 63 636 74 667 76 670 89 960 CI Expression Images j 0 30 545 38 182 59 421 86 777 C centroid Graphs i 0 21 212 33 091 57 330 89 435 Expression Graphs 21 212 33 091 67 330 89 435 7 Cluster Information j 0 21
13. J MeV cvs test MevMet Annotation Fields C MeVNotes ene 62 TIGR camen 4 7 1 Agilent File Loader Loading GEO Simple Omnibus Format in Text SOFT Affymatrix format File After Expression File Loader dialog is launched SOFT Affymatrix format file can be loaded by selecting the GEO SOFT Affymatrix file loader option from the list of available file formats to load On the left hand side of the dialog is a file browser Use this browser to locate the files to be loaded The desired format file can be selected from the list on the top of right hand side of the dialog The file will be displayed in a tabular format in the file loader preview table Since SOFT format is relatively flexible Columns may appear in any order after the first column ID column So users have to follow the instructions in red given under the table to reorder their table by using mouse to drag the column to the position required Then click the cell in the table which contains the upper leftmost expression value in the file The platform format file can be selected from the list on the bottom of right hand side of the dialog Users can choose to load the platform The default value is not loading Then click button Load to load file 15 Expression File Loader x Select Definition to File Formats Selected File GEO SOFT Affymetrix Format Files
14. Leave one out cross validation LOOCV statistics View Cluster Manager Analysis Results Original number of training set elements in class 5 y 2 Number oftraining set elements correctly assigned to class by LOOCV 4 Jul 28 2004 10 38 39 AM Number of training set elements falsely assigned to class by LOOCV 0 KNNC Validation genes 1 i Class 2 9 Expression Images 2 Original number of training set elements in class 6 Class 1 Number of training set elements correctly assigned to class by LOOCV 4 2 Number of training set elements falsely assigned to class by LOOCV 0 class 2 j Class 3 B Class 3 Original number of training set elements class 5 B class 4 E Number of training set elements correctly assigned to class by LOOCV 5 2 Number of training set elements falsely assigned to class by LOOCV 0 a Class 5 1 class 4 ll Not in training set E Original number of training set elements in class 5 Centroid Graphs 2 Number of training set elements correctly assigned to class by LOOCV 5 Expression Graphs E Number of training set elements falsely assigned to class by LOOCV 1 T EB Table poss Original number of training set elements in class 5 Validation Information 8 Number oftraining set elements correctly assigned to class by LOOCV 4 General Information 2 Number of training set elements falsely assigned to class by LOOCV 3 fia Script Ma
15. TIGR isoen 4 9 1 GEO SOFT two channel file loader 4 10 Loading Bioconductor MASS Format files Bioconductor MASS format file can be loaded by selecting the Bioconductor MASS file loader option from the list of available file formats to load On the left hand side of the dialog is a file browser Use this browser to locate the files to be loaded The desired format file can be selected from the list on the top of right hand side of the dialog The file will be displayed in a tabular format in the file loader preview table Click the cell in the table which contains the upper leftmost expression value in the file The call format file generated by Bioconductor can be selected from the list on the bottom of right hand side of the dialog Then click button Load to load file 17 Expression File Loader E D xi Select Definition to File Formats Selected File Bioconductor using MAS5 Files Cookies EN C Desktop Data File Available Mas5 Files txt document 38 txt Selected Path eclipse call txt E gt F A e affy call simple txt C help affy call simple2 txt gt T ession Table InstallAnyw affy mas5 txt E jdbc mas5 simple txt ZI aj C Local Settin mas5_simple2 txt mev cvs A Affy sample1 txt C midas Affy sample2 txt 1 aie 7 bnp
16. Enable Percentage Cutoff Filter Percentage Cutoff 0 0 0 9 TIGR Reset Cancel 5 4 3 Percentage Cutoff Filter Dialog Variance Filter 28 The variance filter allows the removal of genes with low variation of expression over the loaded samples This filter is basically used to remove flat genes that don t vary much in expression over the conditions of the experiment The variance filter has three possible criteria for specifying which genes to keep The Enable Variance Filter check box turns the filter on and off Be sure to observe the History Node log to see the number of genes retained after using the filter Note that the variance filter is performed after other filters such as Percent Cutoff Filter is imposed This convention insures that the genes that are check for variance also contain some minimum level of good not missing data The Percentage of Highest SD Genes option ranks the genes based on standard deviation and then the genes that are kept are some percentage of this ranked list For and example if we have 1000 genes and the percentage was set to 20 then the result would be a final list of the 200 most variable genes The Number of Desired High SD Genes also ranks the genes based on SD and then the number of genes specified are selected from this SD ordered list such that the highest SD genes are selected The SD Cutoff Value uses an actual SD value such that all g
17. 0 Construct Hierarchical Trees for Significant genes only All clusters TI x Reset Cancel 110 x Gern ects Group Assignments Group Group B C Neither group Group Group B C Neither group e Group Group B C Neither group Group Group B lt Neither group e Group Group B CO Neither group e Group Group B CO Neither group Group Group B Neither group e Group Group B C Neither group e Group CO Group B C Neither group e Group A Group B C Neither group Note Group and Group B MUST each contain more than one sample Save grouping Load grouping Reset Variance assumption for between subjects t test only Welch approximation unequal group variances Assume equal group variances P Value Parameters p values based on t distribution p values based on permutation Randomly group samples 10 times _ Use all permutations Overall alpha critical p value 0 01 p value false discovery corrections just alpha no correction standard Bonferroni correction adjusted Bonferroni correction Step down Westfall and Young methods for permutations only O minP False discovery control permutations only With confidence of 1 alpha J EITHER The number of false significant genes should not exceed 1 OR The proportion of f
18. 21 m Main View M Cluster Manager 9 Analysis Results Jul 21 2004 2 29 32 PM 5 genes 1 fia Script Manager 9 History History Log History Log 4 15 1 22 5 Adjusting the Data 5 1 Adjustment Filter Overview Prior to starting an analysis certain data adjustments can be done using the items in the Adjust Data menu These include normalization for genes rows log transformations and various filters TIGR Multiple Array Viewer Adjust Data Metrics Analysis Display Utilities Help GenemRow Adjustments ur P pu SampleiColumn Adjustments EN MeV Log Transformations Data Fitters M Anova nova SVM KNN DAM GDM _ TRN n i V Adjust Intensities of 0 History 5 1 1 Adjust Data Menu Adjustments may not necessarily affect the main display or the values displayed when elements are clicked on the matrix displays but will influence the calculation of the expression matrix the foundation of all analyses Adjustments will also be reflected when the entire matrix or individual clusters are saved as text files although the original data files are not overwritten Furthermore with the exception of three options Set Lower Cutoffs Set Percentage Cutoffs and Adjust Intensities of Zero all the changes made to an expression matrix are irreversible
19. 5 main view Cluster Manager Gene Clusters 9 Analysis Results Apr 5 2004 10 33 18 PM T genes 1 GDM genes 2 T genes 3 B genes 4 B Terrain 5 iL h CAST genes 6 Terrain 7 Terrain 8 Map General Information B CAST genes 9 Script Manager History History Log 2097999999 9 Map position 0 23 t iu TIGR MultiExperiment Viewer 11 25 5 Terrain Viewer Without Surface Rendering 177 11 26 EASE Expression Analysis Systematic Explorer Hosack et al 2003 The implementation of EASE within the MeV framework provides a method to give the researcher an initial biological interpretation of gene clusters based on the indices provided in the input data set and information linking those indices to biological themes These themes are generally GO terms KEGG pathways or any other descriptive term related to biological role or biochemical pathway information The result of the analysis is a group of biological themes which are represented in the cluster A statistic reports the probability that the prevalence of a particular theme within the cluster is due to chance alone given the prevalence of that theme in the population of genes under study all genes loaded in
20. Reset Cancel 5 Diversity Ranking Cluster Selection Dialog 12 5 210 TIGR Multiple Array Viewer E nl File Adjust Data Normalization Distance Analysis Display Sort Help Ro Aw ME S 8 2 8 ms mS KMS Cluster Selection Information main view M Cluster Manager Y Analysis Results Number of Desired Clusters 3 x uem uM S Minimum Cluster Size population 10 Algorithm Set Diversity Measurement Intra gene Based Diversity mean gene to gene algorithm Set dist Script Result 2 algorithm Set Cialgorithm Set Data 9 Results Cluster Selection R Div Rank Diversity Population B Expression Ima Note Clusters are sorted by diversity Selected clusters are in bold type N Centroid Graph 1 1851681 51 Expression Gra 3 Selection Inforr 1 2095171 51 5 Script Manager EH Script Table 1 2231982 51 J Script 1 Script Tree Viewer 2 0247245 153 L g u Script XML Viewer lt z TIGR MultiExperiment Viewer 12 6 Cluster Selection Information Viewer The output is a list ranking the clusters by diversity with cluster population listed Clusters that pass the size criteria and are least diverse are selected and are indicated in the list by bold type The output nodes from cluster selection on the Script Tr
21. The Cluster Node field identifies the specific cluster node under the Algorithm Node from which the cluster was stored The Cluster Label is an optional user defined name for the cluster The Remarks field can be used to contain details about the process used to create the cluster or specific features of interest in the cluster The Size field shows the number of elements in the cluster The Color displays the user defined color for the cluster The Show Color check box allows you to show or repress the displayed color This option can be useful when visualizing cluster intersections in viewers Selecting only one cluster color to view can simplify interpretation File AdjustData Metrics Analysis Display Utilities 2 gt Em Ee lt 2 1 em Ml amp A St Se mz E 4 Source Algorithm Node Cluster Node Cluster L Remarks i Col Show Color H main View Algorithm KMC genes 1 Cluster 1 E Cluster Manager Algorithm KMC genes 1 Cluster 2 Gene Clusters 9 Analysis Results Mar 8 2005 9 26 55 AM i Algorithm KMC genes 1 Cluster 4 Algorithm KMC genes 1 Cluster 3 RTR lt is R 8 Data Source Selection 2 e genes 1 Algorithm KMC genes 1 Cluster 5 8 Data Source Selection Cluster Op Union 4 Modify Attributes si Script Manager fa OpeniLaunch Ej History I gt Intersection
22. C Blue gt Red GR Bar Display _ Expression Ratio Sample Text Normalization Text 7 2 4 Sample Details displayed in a Single Array Viewer 7 3 Expression Graphs This viewer displays graphs of the expression levels of each gene across the experimental conditions 7 3 1 7 3 2 By default the line color is gray or colored according to the cluster membership The lines can be set to display a gradient coloring option in which the lines are colored according expression level This option is found in the Display menu under Color Scheme and selecting the Use Color Gradient on Graphs option will toggle this feature on and off The mean expression levels of genes in the cluster are shown as a centroid graph overlaid on top of the individual expression graphs By default the range of the Y axis for an expression graph is the same for each viewer produced from an analysis regardless of which subset of expression graphs is displayed The X axis y 0 is centered on the Y axis and the Y range is set to the distance of the element in the entire data set which is furthest from zero Use the Change Y Axis option in the right click menu to change the Y range to the maximum distance from zero only for elements within the current expression graph This option allows the expression graph to expand to increase the resolution of expression changes within a particular cluster 43 Mult
23. The CGH Browser displays a plot representation of one or more CGH experiments Right clicking on any flanking region Figure 2 4 or probe in the CGH Position Graph Viewer or on any probe in the CGH Circle Viewer and selecting Show Browser will launch the CGH Browser with the values corresponding to the selected data region highlighted on both the chart and the table Figure 2 6 226 CGH Browser Experiment Chromosome View CloneValues oan 5 vs humd Chromosome Start Stop DanG vs humd i 20109838 20709995 26276901 26277021 26382069 26382189 26949636 26949656 _23_P36464 _P390734 27009994 27010114 27020402 27020522 27072615 27072735 i9 27366614 27366734 27465358 27465478 27546245 27546365 27737165 27737285 27741126 27741246 27800312 27800432 27842228 27842348 N to S t 28007548 28007668 CGH Browser with the data region selected The Experiment menu of the CGH Browser can be used to toggle the display between each experiment that has been loaded or all experiments Figure 2 7 The Chromosome menu of the CGH Browser can be used to toggle the display between one chromosome or all chromosomes Figure 2 6 227 CGH Browser Experiment Chromosome View CloneValues LI m omi
24. v x luster Manager script Manager o History Show Genes in Region Show Browser Display Data Values Launch Ensembl Launch Golden Path Launch NCBI Viewer IGR MultiExperiment Viewer Circle Viewer view of sample BxPC 2 Clone Values Log Values The CGH Analyzer currently allows one method of determining the value for each probe i e the log2 ratio All displays by default are a red green ratio gradient color display Each element is red green black or gray Black elements have a log average inverted ratio of 0 while green elements have a log ratio of less than 0 and red elements have a log ratio greater than 0 The further the ratio from 0 the brighter the element is Gray elements are missing or were determined as bad by the spot quality filtration criteria and are not used in any analyses By default the lower and upper bounds of this display are 1 and 1 indicating that probes with log ratios less than or equal to 1 are shown with the maximum red intensity and those greater than or equal to 1 are shown with the maximum green intensity This scale can be changed allowing for display of a wider intensity range by using the Set Ratio Scale item in the Display menu The colors used can be changed by selecting Set Color Scheme from the Display menu Notice 221 how the color bar at the top of the display updates when these values are changed indicating the current color scheme and ratio scale
25. Sample description required Sample data processing required Sample platform id required provide accession of existing GEO Platform for array used e g GPL96 fID REF required this column should correspond to the ID column of the reference platform VALUE required typically supplied as normalized signal intensities HEADER_3 optional HEADER_4 optional 247 HEADER_N optional any number of user defined columns can be ncluded and it is recommended that data tables be as omprehensive as possible excepting annotations that are provided the platform entry ample_table_begin REF VALUE HEADER 3 HEADER 4 insert data table here the ID REF column Sample table end O Q H G U a H HEADER_N columns may appear in any order after A template for platform file PLATFORM required Platform_title required latform_technology required VALUE spotted DNA cDNA igonucleotide spotted oligonucleotide antibody SS latform distribution required VALUE non commercial mmercial custom commercial latform organism required latform manufacturer required latform manufacture protocol required latform catalog number optional latform support optional latform coating optional latform description optional latform web link optional latform contributor optional 1
26. The expression matrix can be saved as a tab delimited text file by selecting the Save Matrix item from the File menu Enter a name for the file in the save file dialog that is displayed The matrix saved reflects any data adjustments that are currently imposed on the data set such as percentage cutoffs or low intensity cutoffs 63 A expression_matrix txt Notepad File Edit Search Help UniqueID Name Ex2 AG AG 6 6 8 8 B8 B8 1 9999992 ca 2 000039 00 DO 2 4999998 E8 1 499998 F8 F8 2 5888688 G6 1 5000007 2 000039 J8 1 9999992 Aq A1 8 22613188 A2 A2 6 6676963533 0 086178996 A4 A4 0 25306 6734 A5 A5 8 34172982 8 1959854 A7 A7 6 047770597 A8 n8 8 18032875 A9 A9 80 4295071 A16 916 80 0060188072 A11 A11 8 066994876 A12 A12 080 2297775 A13 A13 8 85785373 A14 A14 8 17265245 15 A15 0 384472 A16 A16 8 2928516 A17 A17 8 23262587 10 2 1 Expression matrix saved as text file 10 3 To save a tiff file of the currently displayed image in the main view select Save Ex3 Ex4 8 8 8 8 1 9999992 2 000039 2 9999998 8 999999354 3 888152 1 8888256 2 000839 1 9999992 8 27992555 8 35418163 8 32919875 8 3697512 8 37199185 8 52616928 8 36558595 8 13755952 8 23373696 8 57982 8 827313566 0 1872 0959 8 2539855 8 858681183 8 38851657 8 345383975 86 0951242 Saving Viewer Images Ex5 Ex6 8 8 8 8 1 9999992 2 000039 2 4999998 1 4
27. and are expression vectors of size Euclidean Distance Euclidean distance is perhaps the most familiar distance metric since it reflects the distance between two objects in space The definition of Euclidean distance extends to as many dimensions as present in the expression vectors to be compared Distances can range from 0 to positive infinity m d u v w y i l Manhattan Distance Manhattan distance or City Block distance describes the distance as the sum of the differences of each element pair or dimension In two dimensions it is like going up one and over three blocks to get to a destination in a city where one can not traverse a block diagonally Distances can range from 0 to positive infinity d u v wy Pearson Correlation The Pearson Correlation and other related metrics are very commonly used to evaluate trends of expression over a set of conditions This metric allows one to group trends or patterns irrespective of their overall level of expression Two genes having different levels of expression but having parallel expression patterns would be considered closely related Values can vary from 1 to 1 Correlations near 1 indicate a strong positive correlation between the two vectors Meaning that when one increases the other increases Values closer to 1 indicate a negative correlation when one vector has a relative increase in expression the other vector has decreasing
28. s name and description attributes if required since they may be important for cluster identification within the external repository Select the Submit Gene List External Repository menu option to start the submission process The initial dialog provides a set of panels indexed by repository name At the time of the version 3 0 release only one repository was available for submission but more can be added in future releases On each repository page a general description of the repository is given as well as some guidelines and requirements that should be met prior to cluster submission Please adhere to the requirements of the specific repository Once the repository is selected the user will be guided through the submission process unique to the selected repository Once a repository is selected hit the submit button to be lead through the submission process for the selected repository 55 amp Cluster Archive Selection Dialog Introduction LOLA External Cluster Repository Submission Welcome to IMeV s cluster submission process Select a repository using the tabbed panes in this dialog Information on each page will indicate the suitability of a repository for your gene list After selection of a repository hit submit to be lead through the submission process For first time submissions it might be required to register at the repostitory web site and to use the user name and password during the cluster submission See manual for det
29. 1 598884 THC1058838 1 R11498 nu 1 816600 nul 1 RO6G746 1119429 unnamed prote 1 AAD09791 1 412691 THC1118802 CC 1 789094 1067707 RG 1 457232 THC1134142 unnamed prote 1 N49263 nul 2 AA443940 nu TCR Com name nul nu nu nul nul nul NUM TDMS files encapsulate the expression data from multiple samples into a single tab delimited file Each sample represented in the file will have a single dedicated column that contains the expression data for that sample Each row below the header rows represents information relating to a particular spot on the slide The following sections describe the format of the file in detail The image following the description contains color coded sections that relate to each of the distinct areas of the file Header Rows yellow light blue and cyan top 4 rows in the example TDMS files must contain one or more header rows These header rows must be at the top of the file The first header row is used primarily to contain the default sample name for each sample contained in the file The first header row also contains descriptive gene annotation field names yellow over the columns dedicated to gene annotation Additional header rows may be present and each additional header row contains additional sample annotation Gene Annotation Columns green annotation yellow annotation field names The TDMS format permits any number of
30. 1 4496108 0 31710863 2 283999 10 37650493 3 791 9 Analysis Results Feb 1 2005 1 46 40 995875 14894676 027957708 2 367763 1 21132 2 4951122 0 44937858 1 4558054 0 46421254 3 59 10 43042758 13 82 Data Source Selection 1 11220203 1 5237006 0 30072194 2 2568388 0 3315161 13 66 H24018 1 3866475 0 80503994 1 9991525 0 51997143 7 89 Tests 2 490010 1 5796398 0 8055252 1 6812567 0 55715424 7 44 e Expression Images 156 74 1 527357 0 8750926 1 9360611 93024 1 4736483 1 1422722 1 8600457 10 62457824 0 48637852 Centroid Graphs Expression Graphs 101299 1 5483565 1 0326355 11 9719865 9 views EX Significant Genes 3 Store selected rows as cluster Til Delete cluster composed of selected rows Non significant Genes Launch new session with entire cluster Cluster Information th Launch new session with selected rows KJ volcano Plot General Information Save cluster Script Manager GHistory Save all clusters History Log Search Select all rows Clear all selections 0 7721114 6 0 6536576 045874652 17 0 36093777 7 0 62076193 5 0 35325706 040270555 6 0 5975337 0 5860361 10 6669501 0 62528414 0 7499247 5 0 37311465
31. 18015 P 0000127 1980 8 P 1000005 633 5 P 000058 94 0 000297 1508 4 P 7 AFFKBi 18813 P 00007 2487 4 P 000006 29934 P 00007 7994P 0000081 104P 000 18032P 9708 0000001 11905 P 0000001 150068 P 000007 3546 5 P 0000127 5553 1 P 0000127 104737 P 9 AFFKOmk 154032 P 0000004 233136 P 0000004 267131 P 0 000044 6883 7 P 0 000044 105042 P 0000004 191722 P AFC 200675 P 0 000044 30766 3P 0 000044 360203 P 0000004 111033P 0000004 131105 P 0 000044 280675 P 11 12 13 14 15 6 17 M 4 MpivotData simple 14 Ready NUM Pivot Data File GEO SOFT Affymetrix File Format GEO Simple Omnibus Format in Text SOFT file format is a kind of flexible tab delimited file format Users can check the file format in details at http www ncbi nlm nih gov projects geo info soft2 html SOFTsubmissionexam ples A template for a single channel Sample file SAMPLE required Sample title required Sample source name required Sample organism required Sample characteristics required Sample biomaterial provider optional Sample treatment protocol optional Sample growth protocol optional Sample molecule required VALUE total RNA genomic DNA polyA RNA cytoplasmic RNA nuclear RNA protein other Sample extract protocol optional Sample label required Sample label protocol optional Sample hyb protocol optional Sample scan protocol optional
32. 6 11 X axis 1 Y axis 2 11 25 Expression Terrain Maps Kim et al 2001 Terrain maps provide a three dimensional overview of the major clusters inherent in the data Terrain maps can represent gene or experiment groupings depending on the mode selected in the input dialog The input elements are first mapped into a two dimensional grid in which the placement of each element is influenced by a user selected number of nearest neighbors Once the two dimensional layout is finished the third dimension is determined by the density of points over discrete areas of the 2d grid This value is projected as a surface in the third dimension The higher peaks indicate large numbers of very similar elements while the lower peaks are composed of fewer elements that tend to be not so similar in expression It is very important to note that like many algorithms TRN uses a default distance metric that greatly affects the outcome of the terrain created The default metric for TRN is Pearson Squared which groups elements based on correlation and tends to place strongly correlated and strongly anti correlated elements in similar groups Explicit selection of Euclidean distance can sometimes reveal more obvious groupings See section 13 the distance metric appendix for more details on available metrics in MeV EO Terrain Initialization Genes lt Samples E _J T G R THE INSTITUTE FOR GENOMIC RESEARCH Neighbors 20 TIG Has
33. 60 Ba Analysis Results 2 2458272 0 17878574 1 749808 011078815 5 2733192 60 14496108 0 31710863 2 283999 0 37650493 3 7902164 70 a 1 4894676 10 27957708 2 367763 0 43042758 3 826414 MT Tests 1 e a Expression Images Centroid Graphs Expression Graphs 9 views EX Significant Genes non significant Genes Cluster Information volcano Plot General Information 2 4951122 0 44937858 1 4558054 0465421254 1 5237006 0 30072194 2 2568388 0 3315161 Store entire cluster Store selected rows as cluster Delete cluster composed of selected rows 1997143 5715424 2457824 8637852 Launch new session with entire cluster Launch new session with selected rows 721114 536576 5874652 Save cluster Save all clusters 6093777 2076193 5325706 3 596952 3 66261 7 899803 74446907 7 7 203318 6 0042686 6 1050634 6 0136604 7 9687505 1 4081216 5 8932824 6 64366 si Script Manager 0270555 6 557609 amp Search Ej History 975337 8 2177553 EH Select all rows 860361 7 009777 669501 6 674319 HE Clear all selections 2528414 6 033046 499247 5 7654862 7311465 7 140185 Link to URL 1919965 5 370035 FT 49r34T9 TLU55U492Z T8 T555 05804023 6 228807 1 181727 0 8928961 11 8593292 0 40653515 6 931087 1 3779283 11441918 1 9320648 05017
34. D x ASYM Classification Editor File Edit Tools SEN gt In Class OutofClass Neutral Unique ID Spot Name Spot VVeic 0 0 vi vi 1 vi vi ILU 11 18 4 SVM Classification Editor The second dialog also defines parameters used for creating the kernel matrix The following is an overview of the training parameters SVM Output The final result of an SVM run depends upon the process run Training results in a set of weights that can be viewed along with the parameters for kernel construction Note that from this viewer the training results can be saved as an SVM file Classification results in a viewer that indicates each element s discriminant value and a final classification The SVM Classification Information Viewer describes how many elements were initially selected as positive examples and how many elements were later recruited into the positive and negative classifications as well as other overview statistics 131 Multiple Array Viewer File AdjustData Normalization Distance Analysis Display Sort Help wk s btm it A BE EAE EAE P G MultipleExperimentViewer main view Classification Information Analysis 9 73 SVM genes SVM Mode Training and Classification SVM Training Result 3 S
35. E Reset Cancel OK 11 23 1 COA initialization Dialog The displays in this module are very similar to the PCA displays except that 2D and 3D plots are shown for genes and experiments separately as well as together on the same plot Menus for creating new plots selecting data points and customizing displays are available from the corresponding nodes on the navigation tree just as in PCA Genes that lie close to one another on the plot tend to have similar profiles regardless of their absolute value The same is true for samples If some genes and samples lie close to one another on the plot then these genes are likely to have a high expression in the nearby samples relative to other samples that are far away on the plot On the other hand if a set of genes are on the opposite side of the plot from a set of samples relative to the origin then the expression of that set of genes is likely to be depressed in those samples relative to samples that might be positioned close to those genes The farther the points are from the origin the stronger the association between genes and samples Correspondence analysis works by decomposing a matrix of chi squared values derived from the rows and columns of the expression matrix The first two or three axes are the most informative in showing associations among genes and experiments The amount of information explained by a given axis is quantified by its inertia which may be thought of as the pr
36. GenBank AMBUT28 757429 156571 jAI286022 521292 Select All Matching Results IDs Found 9 of 10 IDs Not Found 1 of 10 x15 Store Cluster iewer 9 2 2 List Import Result Dialog 9 3 Append Sample Annotation This feature allows the import of additional sample annotations Often these would be more descriptive sample names that distinguish the samples based on a study factor condition or some measured variable The sample annotation file should be a tab delimited text file containing one header row for annotation labels field names The file may contain multiple columns of annotation with each column containing a header entry that indicates the nature of the annotation The annotation for each sample is organized in rows corresponding to the order of the loaded samples If annotation is missing for a sample the entry in that sample row may be left blank Please see the manual appendix on file formats for more information and a small example 9 4 Append Gene Annotation The Append Gene Annotation feature is used to append additional gene annotation from an MeV style annotation ann or dat file The annotation file format is described in detail in the File Format Appendix The main parameter selection dialog permits the selection of two key fields to be used to map the 61 annotation from the input file to the prope
37. Java TextJDB Aa 924 Mev_Rele Amev_s e C3 e bin co 23 ee C3 lib pre e 73 soy e 2 so o C sod gt C3 so o C relea C Mev_SF_w 7 MSOCache 9 23 My Downloads C3 My Music e 2 e C Program Files e C Public QUARANTINE e C Raktim e C RECYCLER e Cj Ron D Cun jl n CGH File Loader Available Files txt Hs CytoBands txt Hs RefGenesMapped txt Mm CytoBands txt Mm RefGenesMapped txt Pancreas Oligo CGH txt Pancreas Oligo CGH Shc SampleGeneList txt TDMS_format_file txt 6 r Selected CGH File C J0 Mev Release Mev SF _4 0 data Pancreas Oligo CGH Short txt CGH Data Characteristics Human e Log2 Ratio Mouse Just Ratio Expression Table ClonelD Ch Start End BxPC 3 vs DanG vs h JHPAC vs h Hs 5E 23 P23 1 950296 950416 0 0977116 0 2036646 0 2596853 2 65 gt M 3 P13 1 951235 951355 0 1670887 0 1736198 A 23 1 3 P40 1 974647 __ 874767 _ 0 1297944 0 0214240 989584 10 1277992 0 0080260 22 gt 0 4184788 5 86 _ 0 03080416 8 716 0 1809689 1 79 989704 3 P811 989864 389984 0 0528525 0 3343648 0 6794393 4 188 3 P15 1057451 1057571 10 2030503 0 1058017 0 35247632 5 686 3 P20 1192328 1192449 0 6519726 r
38. The Error Log initially lists the errors and indicates a line number for each error The Edit Script button launches an XML viewer that can be modified and saved to address the errors Once the dialog is closed the script should be re loaded using the File menu to begin a fresh script loading and validation amp Script Error Log x 2 fundamental problems in script construction such as unpaired tags Loading will be terminated so that the reported errors can be corrected MeV does not 2 lt param key kmc cluster genes value true have to be closed while corrections are made to the 2 lt param key distance factor value 1 0 input script 2 lt param key distance function value 4 gt 30 The element type analysis must be terminated by the matching end tag P Fatal Errors Xprimary data 1 gt 2 lt analysis gt The following Fatal Error occured during lt alg_set input_data_ref 1 set_id 1 gt parsing and validation 7 lt algorithm alg id l alg alg type cluster lt plist gt Note Fatal Errors indicate that the input script had sparen keye oalcul te means values o lt param key number of iterations value 50 gt lt param key number of clusters value 10 gt lt param key distance absolute value false gt lt plist gt lt output_data output class multi gene cluster outp Xdata node data node 2 name M
39. U Union XOR Hide Columns Delete Selected Delete Save Cluster Import Gene List Submit Gene List External Repository IGR MultiExperiment Viewer 8 1 2 Cluster Table with option menu open The spreadsheet allows single or multiple row selection by holding down the control key when left clicking the mouse A right click with one or more rows selected will display a menu that contains several options detailed below The Modify Attributes option allows the user to modify cluster label remarks or the cluster color by displaying the input form with the current settings displayed 52 The Open Launch menu has two options open cluster viewer will pull up the source cluster viewer The second option is Jaunch mev session which opens a new multiple array viewer containing only the data from the selected cluster or the union of the members of several clusters if several clusters are selected The Cluster Operations menu allows for three possible operations to be performed if two or more clusters are selected The Union operation combines the members of the selected clusters and stores the resulting cluster on the list Elements represented in more than one cluster of the input clusters are only represented once in the output cluster The Intersection operation takes the elements from two or more clusters and produces a cluster containing all elements which are common to a
40. chooseCRANmirror gt 4 In the small dialog that appears select a repository 266 Repositories Select repositories Copyright Install package s tatistical Computing Version 2 1 0 Update packages 07 0 Install package s from local zip files R is free You are welcome to redistribute it under certain conditions Type license or 1 for distribution details Natural language support but running in an English locale Rais a collaborative project with many contributors Type contributors for more information and citation on how to cite R or R packages in publications Type demo for some demos for on line help or help starct for a HTML browser interface to help Type q to quit R gt utils chooseCRANmirror gt utils setRepositories gt 6 In the small dialog that appears select rama or bridge 267 Packages nnomi OLIN OLINgui onto Tools pairseqsim ROC RSNPper Ruuid 4 Updating Rama Bridge Updating under OS X Command Line RECOMMENDED 1 Download the Source Package rama 1 3 0 tar gz from http www bioconductor org packages bioc 1 8 html rama html or bridge 1 3 1 tar gz from http www bioconductor org packages bioc 1 8 html bridge html 268 2 Open a Terminal window This can be found at Applications Utilities Terminal Navigate to the downloaded file For instance if you down
41. for more information and citation on how to cite R or R packages in publications Type demo for some demos help for on line help or help start for a HTML browser interface to help Type q to quit R gt 2 In the small dialog that appears select location 264 CRAN mirror Brasil PR Brasil Brasil SP 1 France Toulouse France Lyon Germany Muenchen Hungary Arezzo Italy Ferrara Japan Aizu Japan Tsukuba Poland Portugal Slovenia Besnica Slovenia Ljubljana South Africa Spain Switzerland Zuerich Switzerland Bem 1 Switzerland Bem 2 3 Choose a Repository Click on Packages gt Select Repositories 265 gt RGui R Console SEES File Edit Misc PEES windows Help m 3 R Copyright Install package s atistical Computing Version 2 1 0 Update packages 07 0 Install package s from local zip files R is free sof LY NO WARRANTY You are welcome to redistribute it under certain conditions Type license or licence for distribution details Natural language support but running in an English locale R is a collaborative project with many contributors Type contributors for more information and citation on how to cite R or R packages in publications Type demo for some demos help for on line help or help start for a HTML browser interface to help Type q to quit R utils
42. no annotation file Remove Annotation Quotes Available Selected cortex mev cortex mev heart mev Add heart mev kidney mev Add All kidney mev liver mev liver mev Remove Remove All TIGR MeV Annotation Files ann dat Available Selected Tissue_Clone_GB_TC ann Add Tissue Clone GB TC ann Tissue_Role_Cluster_ComName ann Tissue_Role_Cluster_ComName ann Add All Remove Remove All TIGR eriment Cancel Load 4 1 1 The Expression File Loader MeV Files 4 2 Loading TIGR Array Viewer tav Files To load tav formatted files use the drop down menu to select the TIGR ArrayViewer tav option This loader is very similar to the mev loader Use the file browser to select the tav files to load Instead of selecting annotation files to load alongside the data files however you must select a preferences file This preferences file contains information that MeV uses to determine what type of tav file is being loaded See the Appendix section 15 13 for more details on preferences files 15 13 10 5 Expression File Loader TIGR ArrayViewer Files tav _ 4 r Selected Path C MyProjects MeV_3_1 data TIGR ArrayViewer Expression Files tav C j2sdk1 4 2 05 C Java Available A MyProjects C CGH CGHBuilder2 cortex tav heart tav kidney tav Add liver tav Single array tav Remove
43. number of clusters 1 with K being incremented by 1 in each subsequent iteration up to the maximum number of clusters specified above Reset Cancel 11 12 2 FOM Initialization Dialog CAST 103 CAST Parameters Threshold Interval For FOM an interval is used to perform a series of CAST runs in which the Affinity Threshold is incremented from 0 0 by the interval indicated The default of 0 1 is often a good value since it provides 11 CAST results from 0 0 to 1 0 incremented by 0 1 The threshold parameter is a value ranging from 0 0 to 1 0 which is used as a cluster affinity threshold Each expression element will have an affinity for the current cluster being created based on it s relationship to the elements currently in the cluster If that affinity is greater than the supplied threshold the gene is permitted to be a member of the cluster FOM Figure of Merit Sample Selection Gene Cluster lt Experiment Cluster FOM Iteration Selection Number of FOM Iterations 1 K Means K Medians Calculate means lt Calculate medians Maximum number of clusters enter an integer gt 0 20 Maximum number of iterations enter an integer gt 0 50 K Means K Medians will be run using a starting K number of clusters 1 with K being incremented by 1 in each subsequent iteration up to the maximum number of clusters specified above TIGR e aca Rese
44. replicated on the common data set Scripting is also useful when running several long analysis steps that would normally require monitoring in MeV s interactive mode Each algorithm and the parameters are pre selected in the script so the next algorithm kicks off as soon as the previous run finishes Despite the advantages of scripting there may be times when careful evaluation of a result before deciding on the next algorithms is needed In this setting scripting might be used as a first pass analysis and the multiple results of the script run can lead to the selection of different algorithms or new parameter selections The Script The MeV script is an XML based text document containing information about which algorithms to run the order of the algorithms and the source data for each algorithm Script creation is accomplished through a graphical representation of the script to eliminate the need for the user to understand the complex structure of the script The Document Type Definition DTD can be found in Section 14 Appendix MeV s Script DTD for those interested in the details of script structure Creating a New Script Creating a script is a simple process that can be initiated by selecting New Script from the File menu in the Multiple Array Viewer Data must be loaded before this menu option can be enabled since many algorithms require data specific information e g group assignments for TTEST or SAM depend on the number and order o
45. true gt lt param key saveImputedMatrix value false gt lt param key num combs value 100 gt lt param key userPercentile value 1 0 gt lt param key permit graph interaction value false lt param key useTusherEtAlS0 value true Xmev version 1 0 gt lt Original Script Creation Date Jun 18 2004 12 32 28 PM gt TIGR MultiExperiment Viewer 12 9 Script XML Viewer highlighted algorithm and script line selected Editing Parameter Values within the XML Viewer Limited editing capabilities are available in the Script XML Viewer If the selected script line is a parameter line then hitting the Edit menu option will display an input dialog to permit altering the value of the parameter Keep in mind that caution should be used when changing script parameters so that downstream algorithms that are dependent on results are still valid The parameter input dialog only permits alteration of the parameter s value A button on the dialog labeled View Valid Parameters will produce a table of possible parameters for the algorithm being modified The list contains parameter names parameter types 214 optional constraints value limits and whether the parameter is required in all cases or if it is dependent on other parameter selections f value Editor Ix i Script Line lt param key number of clusters value 10 gt View Va
46. 0 0 1 0 0 2 0 0 0 0 0 i 0 0 One way ANOVA 25 05 1 05 0 6454972 Expression Images 15 05 T 0 5 0 6454972 27 Centroid Graphs 25 0 5 1 0 5 0 6454972 7 Expression Graphs 1 5 0 5 0 5 0 6454972 I Cluster Information 2 0 0 0 1 0 20 0 0 CAF Ratio information 20 0 0 0 0 1 0 2 0 0 0 D Significant Genes 0 06422493 0 29820946 0 36668697 0 0762278 0 0035144 0 34363487 0 0128793 0 3386631 0 3187289 0 0921475 0 26822692 0 14683537 Non significant Genes 0 07092709 0 26621515 0 19354802 0 25730923 0 16372713 0 3987915 General Information 0 08561 27 0 39541218 0 05749334 0 21724758 0 12817432 0 20692855 3 History 0 24915238 0 18717049 0 0435245 0 4768001 0 1118623 0 07317513 0 11156401 0 31109852 0 0582168 0 32384735 0 1247288 0 0339455 0 11431832 0 2206299 0 13763689 0 07976358 0 0116957 0 39679977 0 0206784 0 1190003 0 11319309 0 1987352 0 0248037 0 29638064 0 21980788 0 39314786 0 26082134 0 1120318 0 3627567 0 1986139 0 14488335 0 2905416 0 27184868 0 15444125 0 14973351 0 42260122 0 1198895 0 1275276 0 0605114 0 35201153 0 11698475 0 29195407 0 0569378 0 24864803 0 10332832 0 33003187 0 19109274 0 23106217 0 0394280 0 35220957 0 2348744910 2881723 0 22049832 0 27518356 0 0298313 0 12368584 0 06515752 10 33570597 0 03818571 0 32704136 0 11072978 0 42636153 0 05102
47. 9 Utilities Menu 58 10 Creatine u uu uy M 3 10 1 10 2 10 3 10 4 TARDA 63 Saving the Expression 63 Saving Viewer D00066 ER 64 Saving Cluster Dal u a 64 Ti FU OTT Description Conventions and General Pointers 65 ILI HCL Hierarchical clustering 67 11 2 VO TOME TD OL RT 72 11 3 REC NEU E EE EEEE 78 11 4 SOTA Sell Organizing Tree Algorithm 81 11 5 RN R l eyance IV CBW OTIS etui a De qu ra tona Qd enn MEE 86 11 6 K Means K Medians Clustering 89 11 7 KMS K Means K Medians Suppor i 91 11 8 CAST Clustering Affinity Search Technique 93 11 9 OTC OT CLUST ctus queas 95 ILIO SOM Self Organizing Maps uites 97 IRL rm 100 11 12 FOM Figures of 102 11 13 PT
48. Centroid Entropy Variance Ranking Cluster Selection Selection Parameters Desired Number of Clusters 3 Minimum Cluster Size 8 genes 10 amp Rank Clusters on Centroid Variance Rank Clusters on Centroid Entropy pos gree Reset Cancel OK 127 Centroid Entropy Variance Ranking Cluster Selection Dialog 212 5 TIGR Multiple Array m ni xi File AdjustData Normalization Distance Analysis Display Sort Help Main View Cluster Manager Analysis Results Script Manager _ EB script Table Single Adjusted Ouput 2 9 J Script t i Script Tree viewer mf SAME 22 Script XML viewer S History Positive Significant Genes 3 i insu Negative significant Genes 4 au significant Genes 5 Non significant Genes 6 Data Visualization 8 TIGR MultiExperiment Viewer 12 8 Script Tree Viewer with Constructed Script Script Tree Viewer Options Right click menus displayed from the script tree viewers vary depending on whether the selected node is a data node or an algorithm node The menu displayed from a data node provides the Add Algorithm Node Execute Script and Save Script options The menu displayed from an algorithm node provides the Delete Algorithm View XML Section Execute Script and Save Scrip
49. Graph Customization Dialog Rendering Options The final two menu options in the Graph Viewer include an option to display a locus reference line that provides gene identifier and location information as you mouse over the graph and an option to reset the x range to zoom out to show all data in one window 2 Table of Unmapped Spots In the event that there are control spots on the slide or spots for which the loci and coordinate information is not known a table is created to collect and identify these spots This table lists all annotation for unmapped spots and can be reviewed searched or saved to confirm that the listed spots are indeed unmapped and cannot contribute to the LEM The table has the properties and options of MeV s cluster table viewers that are described in section 7 5 of the manual 3 LEM Mapping Summary Viewer The mapping summary is placed on MeV s result navigation tree below the unmapped spot table if present and the LEM viewer s This viewer list 159 information about the number of spots that are entering LEM the number of spots that map to loci the fraction of spots that are mapped and lists all of the parameters selected when producing the LEM If the LEM doesn t appear to render loci appropriately check the parameter listing to confirm that the proper coordinate information was supplied and other parameters such as the indication of multiple chromosomes or plasmids was indicated correctly In addit
50. Group E Group Gene must be called Q resent i 7 owt of the total in Group Select one of the following QR Group R Gene must be called Present ix 7 out of the total in Group ial TIGR Multiple Experiment Viewer 5 4 5 Set Detection Filter Dialog Fold Filter for Affymetrix Data Only Select Use Fold Filter to remove genes that do not pass one of three specified criteria based on Fold Change Fold Change is calculated as mean signal in Group A mean signal in Group B Select Set Fold Filter and define the two groups to be filtered Enter the threshold by which expression of genes in one group should exceed that of the other group Select the appropriate greater than or less than symbol to define which group should be more highly expressed than the other Select both to keep genes in which either group s expression level exceeds the other by the defined threshold Adjust Intensities of Zero This option is turned on by default This means that if either but not both of the cy3 or the cy5 intensities for an element is recorded as zero that intensity value will be reset to 1 In this case the expression ratios will be calculated as cy5 1 or 1 cy3 depending on which value is zero and the element is included in subsequent analyses Sometimes the user may desire this However the user should be aware that the expression ratios for such elements are spurious You might want to turn this
51. K x y ex 2 22 Training Parameters Diagonal Factor Constant added to the main diagonal of the kernel matrix Adding this factor to the main diagonal of the kernel is required to force the matrix to be positive definite The definition of a positive definite matrix is best reviewed in books devoted to linear algebra but this state is achieved by selecting a constant of sufficient magnitude This positive definite state of the kernel matrix is required for the SVM algorithm to yield meaningful results Testing values starting at 1 0 and increasing may be required to find an appropriate value If the value is too low all elements will be partitioned in the negative class For a range of values for this factor a stable set of elements may be classified as positive At very high values there is a tendency to force all positive examples to be in the positive class regardless of their similarity of expression Threshold This value is used as a stopping criteria for the weight optimization phase of training Optimizing the weights produced during training is an iterative process which converges on an optimal set of weights to separate the positive and negative examples This threshold dictates how stable the weights must be before the optimization process is terminated Selection of a threshold that is very low could cause the optimization process to take an extremely long time and yet yeild similar results to those where a higher threshold
52. Shrunken Centroids eene 196 LAE vgl Creati ie a New Script aas esa ua ub pue IR NR ESTES 205 The Script Tree Viewer Script Construction 206 acia alata sa a bob dd 207 SCHDEJUMT Viewer s us a RA ER REIR PAYER REA ES rad tmn cud a rag o hd RU 214 Loading a 5 dude i udo es NE PME n EM D ur M uds 216 216 13 Comparitive Genomic Hybridization Viewer 219 14 Working with the Single Array Viewer 235 15 Appendix File Format Descriptions 238 15 1 H 238 13 4 Tab Delimited Multiple Sample Files TDMS files 239 15 3 P 240 15 4 MEV Files 241 15 5 Annotation Files 244 15 6 Bioconductor MASS Files essent 245 15 7 Affymetrix GCOS Pivot Data File 246 15 6 GEO SOFT Affymetrix File Formalt 247 15 9 GEO SOFT two channel file format 248 15 10 dChip or DFCI core file format
53. Single Linkage Reset cancel OK 11 3 2 Support Trees Initialization Dialog Parameters The Support Tree algorithm permits the resampling options to be set separately for the gene tree and the sample tree Tree Construction Options The Draw Gene Tree and Draw Sample Tree options allow you to select to construct a gene tree an experiment tree or both Resampling Options You can elect to resample either genes or samples or neither using either a bootstrapping or a jackknifing method Bootstrapping The matrix is reconstructed such that each expression vector has the original number of values but the values are a random selection with replacement of the original values Values in the original expression vector may occur more than once since the selection uses replacement Jackknifing Jackknifing takes each expression vector and randomly selects to omit an element This method produces expression vectors that have one fewer element and this is often done to minimize the effect of single outlier values Iterations 79 This indicates how many times the expression matrix should be reconstructed and clustered Linkage Method This parameter is used to indicate the convention used for determining cluster to cluster distances when constructing the hierarchical tree Single Linkage The distances are measured between each member of one cluster each member of the other cluster The minimum of these distances
54. Threshold 1 0E 5 Neg constraint 1 0 TIGR sr ran a Reset Cancel 11183 SVM Training Parameter Initialization Dialog Classification Input The SVM training process requires the supplied expression data and an additional initial presumptive classification which indicates which elements are initially presumed to have a relationship Two options are provided for selecting members of the initial classification Use SVM Classification Editor This option causes an editor application to be launched in order to allow a flexible tool for finding and marking elements to be positive members of the initial classification This classification can be saved as an SVC file for later recovery of these initial settings Use Classification File This allows the loading of an initial classification from an existing SVC file 128 Kernel Matrix Construction One can select to construct a polynomial or a radial kernal matrix Polynomial Kernel Function Parameters The polynomial option is the default and three parameters are used to define the kernel construction Constant An additive constant c Coefficient A multiplicative constant w Power A power factor p Polynomial Kernel Function w Dist i j c Radial Basis Function Parameters The Radial Basis checkbox is used to select to use this type of Kernel generating function Width Factor Radial width factor w see in below formula Radial Basis Kernel Function
55. Thresholds To change to discrete copy number determination based on clone ratio thresholds select the Set Threshold item from the CloneValues menu Using this determination each clone is assigned a copy number determination and corresponding color based on the criteria shown in table 2 1 Copy Number Color Log2 Ratio Other 2 Copy Pink lt Deletion Deletion Threshold 2 Copy 1 Copy Red lt Deletion Not 2 Copy Deletion Threshold Deletion 2 Copy or Yellow gt Amplification greater Threshold 2 Amplification Copy 1 Copy Green gt Amplification Not 2 Copy Amplification Threshold Amplification No Copy Blue Not Deleted or Change Amplified Bad Clone Grey Discrete copy number determination based on probe 1022 ratio thresholds The CGH Position Graph The Main View node of the navigation tree should contain a subtree called Chromosome Views Figure 2 2 Expand this subtree to display a list of all chromosomes Clicking on any of these chromosomes will display the CGH Position Graph Viewer for that chromosome The CGH Position Graph Viewer is used to display data values for a single chromosome for multiple experiments The left side of this view displays the cytogenetic bands of the selected chromosome Positional coordinates in MB are annotated to the left of the cytobands Probes are represented as horizontal bars beginning and ending at positions corresponding to the genomic coordinates of t
56. and if it belongs to group2 users define it as 2 and so on If it belongs to neither group users define it as 0 251 16 Appendix Preferences Files Preferences files store information about a data input file s format The number of file format variations including the Stanford file format and various flavors of TAV file format make it necessary to provide MeV with an instruction set for reading those files Preferences files are human readable tab delimited text and contain the information MeV needs to understand the data layout in a microarray data file Three sample preference files are included with the MeV installation serving as templates for TAV files section 15 1 Stanford files section 15 1 2 and cluster files Use a text editor such as Notepad to customize one of these files for a particular file type The names of Preferences files should always end with the word Preferences and have no period Most lines in the preferences file are preceded by a double slash indicating that the following text is part of a comment and will be ignored by MeV These comments contain descriptions of the parameters listed below them Lines containing a parameter have no double slash and consist of a parameter description followed by a tab and the parameter value Only the parameter value should be altered when customizing a preferences file It is also extremely important that the label and value are separated by one tab charac
57. and their expression profile graphs Selection of elements in the list display the expression pattern for that gene The Gene Cluster Template tab provides a list and view of templates which are the mean values of stored colored clusters The Sample Template and Sample Cluster Template tabs provide the same functionality but use experiment templates The Saved Template tab provides an interface for loading gene and experiment templates Templates loaded from files will populate a list from which a template may be selected A button in each of these areas is used to select the displayed template for matching Threshold Parameters Use Absolute R Using this option will select expression patterns that are either positively or negatively correlated with the template Use Threshold R This option will indicate that the threshold value for determining a match is the R value between the expression vector and the template Use Threshold p Value This option will indicate that the threshold value for determining a match is the p value on R between the expression vector and the template Threshold Input Value This is either a supplied value for R or a p value ranging from 0 0 to 1 0 R values closer to 1 0 are more stringent p values closer to 0 are more stringent Save Template This button launches a file browser to allow the user to save the current template to a tab delimited text file These files can be loaded from the Saved Templat
58. appearance without dismissing the dialog 0 0 2000 0 10000 0 e e e 1 an uw Er mi D m m E Had i B in B E ARB E in P BE n Bg m n g m mn tn Pe eee v lt O lt d lt g at lt dw cl d d t d c TEASE tree HCL mode with distance thresholds applied Adjusting Color Gradient A right click on the TEASE Tree View will reveal a pop up menu Near the bottom of the menu is an option that says Change Score Boundary Left click on the option will produce a configuration window In the window are two editable text fields similar to the Assign Color Gradient panel in the initialize window Enter appropriate number and click on OK to apply the changes or Cancel to exit the window without any changes You can view the changes you make in Tree View once you exit the window 77 11 3 ST Support Trees This option shows the hierarchical trees obtained using the previous module but it also shows the statistical support for the nodes of the trees based on resampling the data The user can select two resampling methods bootstrapping resampling with replacement and jackknifing resampling by leaving out one observation in this implementation Resampling can be conducted on genes and or experiments for a user specified number of iterations The branches of the resulting tree are color coded to denote the percentage of times a given node was supported over
59. can be selected by entering an element height and width 39 7 Viewer Descriptions 7 1 Overview Viewers are the graphical displays used by MeV to present the results of the modules calculations The viewers will appear as a subtree under the module s Result Tree within the main navigation tree The viewers listed here are those which are used by more than one of the MeV modules Custom viewers used by only one module are described in that module s section 7 2 Expression Images This viewer is used in the main window of the Multiple Experiment Viewer as well as in most of the modules 7 2 1 It consists of colored rectangles representing genes in a matrix Each column represents all the genes from a single experiment and each row represents the expression of a gene across all experiments The default color scheme used to represent expression level is red green red for overexpression green for underexpression and can be adjusted in the Color Scheme dialog in the Display menu tice Multiple Array Viewer x File AdjustData Normalization Distance Analysis Display Sort Help b 1 P Le Bm HCL ST PAS Kc us CAST 4 8 Bal gt gt a Ah Sum oa MultipleExperimentViewer Main view 9 C Analysis 9 KMC genes 9 Expression Images D Cluster 1 D Cluster 2 D Cluster 3 D Cluster 4 D Cluster 5 3 Cluster 6 D Cluster 7 D Cluster 8 3 Cluster 9 D Cluster 10 1 Centr
60. color gradient and a midpoint when using a double gradient The labels next to the input boxes display the minimum maximum and median expression values as a rough guide when setting color scale limits When the Update Limits button is pressed the Color Saturation Statistics panel below and the gradient preview panel are updated The limits are also conveyed to MeV and the current viewer is updated to reflect the limits This permits you to adjust and update the limits while viewing the affect of the new limits on the expression image The Color Saturation Statistics panel displays the number of spots that are beyond the limits of the color scale These elements would be represented in the 38 6 5 expression image as the saturated endpoint colors This panel also reports this information as a percentage of elements that are saturated relative to the total number of elements These numbers allow you to adjust the limits such that the expression is spread across the gradient with a limited number of saturated off scale elements The reset button of this dialog will reset the limits in the dialog to the original limits present when the dialog was opened and MeV has it s limits rolled back to the original limits Element Appearance The final two options in the Display menu are options to alter the element size and to draw or omit borders around each element Element size can be selected from among four preset options or a customized size
61. control The algorithms are described in Korn et al 2001 2004 Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created P value corrections reduce the probability that a non significant gene will be erroneously picked as significant This can be a serious issue when many tests are done which is usually the case in microarray analyses as there are as many tests as there are genes in the analysis The standard Bonferroni correction is very stringent and may exclude many genes that are really significant whereas the adjusted Bonferroni correction is less conservative and more likely to include significant genes while still controlling the error rate The step down Westfall Young MaxT correction is also less conservative than the standard Bonferroni correction and still provides statistical power False discovery control is a useful option as p value corrections can be too stringent for microarray analysis Sample output from this module is shown below 113 EX a o et ARN z ST RN KMC KMS cast QTC GSH SOM FOM PTM M D Main View E 107 Genes 7 Expression Images D Significant Genes I Cluster Information inca Wegen MM HCL SOTA 9 Analysis Centroid Graphs D Non significant Genes C General Information File AdjustData Normaliza
62. cutoff have a probability of having members that are paired by chance at or below the supplied pValue Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created Default Distance Metric Euclidean The result views created by the SOTA algorithm include the basic viewers with the addition of two SOTA specific viewers and enhancements to expression image viewers to include more cluster information One of the SOTA specific viewers is the SOTA dendrogram below which displays the generated tree with the expression image of each resulting cluster s centroid gene The text to the right of the centroid expression image includes a 83 cluster id number the cluster population number of genes in the cluster and the cluster diversity mean gene to centroid distance Clusters can be colored and saved from this viewer and a left click over a cluster centroid jumps to the expression image for that cluster Multiple Array Viewer g ni xi File Data Normalization Distance Sort Help A E KMC EE amp RA Lx PTM M TTEST S oa MultipleExperimentViewer D Main View 9 Analysis KMC genes c HcL m sT CI SOTA genes 3 SOTA Dendogram 3 Expression Images 3 Centroid Graphs CJ Expression Graphs 3 SOTA Divers
63. element groupings The thickness of the links is also adjustable from 1 to 10 on a relative scale Drift Auto navigation Control Flying through the terrain to a selected point on the terrain or a data element is performed by holding down the Ctrl key and left clicking on the desired destination The point of view will automatically navigate to the destination If the selected destination is a point on the terrain such as a peak or a plateau the 175 point of view will orient such that the view is orthogonal to the plane on the terrain that contains the destination point This can be useful to quickly get to a top down view when clicking on flat areas between peaks If the selected destination is a data point the final point of view on the element and in the plane containing the data elements The route will move toward the element and attempt to orient in a position to more easily read the label associated with the element Some further navigation may be required to avoid elements that may obstruct the element of interest The Drift menu option permits entry of a relative distance parameter that can range from 0 up to but not including 1 0 This represents the final distance between the point of view and the destination when drifting Small values will cause the drift auto navigation mode end position to be close to the destination and is often preferred when visiting a specific data element Cluster Selection and Related Operations The f
64. expression aus cov u v where u v v cov u v m 1 and 254 1 w i l m 1 Pearson Uncentered This is a variation of the standard Pearson correlation in which the values computed are the standard deviation from zero rather than the mean intensity for that vector The difference between this metric and the Pearson correlation is partially dependent on how different the mean expression is from zero Pearson Squared This variant of the Pearson correlation performs Pearson uncentered and then squares the result Using this metric will cause patterns that are strongly positively and strongly negatively correlated to possibly cluster together Values for this metric range from 0 to 1 0 Cosine Correlation This metric produces values that range from 1 to 1 with values toward 1 indicating a strong positive relationship and values toward 1 indicating a strong negative relationship m gt u v COIT izl cosine u v a mi 2 2 i i i l i l Covariance Covariance can produce values which are unbounded Values are not scaled by factors representing the variance within u and v Covariance values for vectors with a strong positive relationship should be large and positive If the two vectors have a strong negative relationship then the covariance will be negative and large If the two vectors have little relationship then the terms of the summation
65. gene across all loaded Affymetrix files is used as the denominator for that spot If Reference is selected a reference Affymetrix file selected in the file selector at the bottom of the dialog is used The intensity value of each record in the Affymetrix file is used as the denominator of the ratio calculation for the corresponding spot in each of the loaded data files 19 Expression File Loader Select Definition to File Formats Inl xl Selected File Type dChip DFCl Core Format Files C Desktop i document Selected Path 7 eclipse Favorites Full help InstallAnywi E jdbe C Local 8 meos C midas My Docume NetHood CJ paper C PrintHood rama Recent C Documents and Settings wrang67 workspace mevweka data Affymetrix Data Files txt Available Selected 051305_Grp1_B D07 TXT 38 txt affy call txt affy_call_simple txt Add All affy_call_simple2 txt E affy mas5 txt Remove mas5 simple txt mas5 simple2 txt Add Tess gt Remove All runtime Ecl CJ SendTo source Start Menu CJ Templates Cj tmp Tora C UserData C WebEx Affymetrix Data Options Absolute Absolute Mean Intensity Reference Select reference files below Select Reference Files Available Selected
66. gt ml gr em ANN zc AM St At cs AS m P SOTA Az CABE a GSH M FOI TrEST M ANOVA ANOVA SVM KNN DAM LEM GDM COA TRN EASE M Cluster Manager 9 h Analysis Results May 16 2005 11 40 42 AM 8 Data Source Selection R LEM 1 8 0001 8 0001 D LEM Viewer isi Script Manager SP0002 8 0002 SJ History P SP0003 5 0003 LEM Navigation Customize Viewer SP0004 8 0004 Color Scale Options E Open Locus Selection Manager pla s Store Selected Loci as cluster 8 0006 _ 520005 Save Selected Loci Locus Detail Save Selected Loci Spot Detail SP0007 SP0007 Seo M Lock SP0008 SP0008 SP0009 SP0009 SP0010 SP0010 11 21 1 LEM viewer with fixed length locus arrows Requirements for LEM Construction e Locus Identifier an annotation key that will map spots to loci Coordinate Information 5 and 3 coordinates that correspond to the set of loci 142 Chromosome Information an annotation key to indicate on which chromosome the loci is located This information is not needed in the case of organisms that have only one chromosome The locus identifier should be an annotation field loaded with the annotation during the expression file load in MeV If the input data is in mev format then the locus identifier should be a field in the corresponding annotation MeV ann format file For oth
67. intensity lower cutoff To enable this option check the Enable Lower Cutoff Filter checkbox just below the Set Lower Cutoffs menu option and uncheck it to disable this option All subsequent analyses will include only those genes for which intensity is above the specified threshold This option is disabled by default 27 Set Lower Cutoffs Enable Lower Cutoff Filter Intensity Lower Cutoff 0 0 0 0 6 TIG A rer Reset Cancel OK 5 4 2 Lower Cutoff Filter Dialog for One Color Microarray Percentage Cutoffs Select Use Percentage Cutoffs to ignore the genes for which there are not enough valid non zero expression values across all samples This will not delete any data but will only exclude the genes from analysis This option is sometimes useful in speeding up module calculation since many zeros will often slow them down To determine which genes will be excluded select Set Percentage Cutoffs and enter a percentage value To enable this option check the Use Percentage Cutoffs checkbox just below the Set Percentage Cutoffs menu option and uncheck it to disable this option Genes with less than the specified percentage of non zero values will be ignored A value of 0 0 indicates that all genes will be used in the analysis To require that every one of the gene s expression values must be valid to be included set the value to 100 This option is disabled by default Set Percentage Cutoff
68. is surveyed for appropriate clustering results to apply to the GDM Application of the clustering result reorders the genes such that the rows and columns are grouped by cluster membership Cluster boundaries are represented by a white border Notice the figure displaying a distance matrix with a K means clustering result imposed with five clusters The elements within a cluster are similar as evidenced by the very dark squares on the main diagonal Note that the element size was reduced in order to view the entire matrix and that every third gene was displayed Change Annotation The change annotation feature allows the selection of an annotation type to be displayed in the headers Change Annotation Width This option allows the expansion or contraction of the header to permit viewing of the header without excessive scrolling or if contracted more of the matrix will be visible 164 ge TIGR Multiple Array Viewer x pe A Normalization Distance Mu Display Sort E Main View M Cluster Manager 9 Analysis Results Apr 3 2004 11 37 43 AM genes 1 9 E CDM genes 2 Bl Matrix view Time 1362 ms Euclidean distance s Script Manager History History Log 11 22 4 GDM matrix with a KMC clustering result imposed k 5 Additional GDM Features If one clicks on a spot an information record is displayed describing several attributes of the element The anno
69. locus graph as a different color as indicated by the key The locus selection graph updates to show the graphs for the loci that are selected in the table section of the window Using a shift or control click in the table will permit the selection of multiple rows in the table and will overlay the expression graphs of all selected loci 153 gt LEM Locus Selection List F Locus Selection List Use shift left click on a locus arrow to add to the list Number of Loci in the List 4 SP0042 5 0043 SP0200 SP0201 Mean Locus Expression Selected Loci Locus a spots locus SP0042 40816 42969 5 0043 142982 44331 SP0200 187010 188305 SP0201 188283 188549 1 4 1 TIGR E Remove Selected Clear Close iewer 11 21 9 Locus Selection List 4 loci selected with Visible Locus Graph and Key Storing Selected Loci as Clusters MeV s cluster manager contains many cluster set utilities and cluster operations such as unions and intersections Loci that are of interest can be stored as a cluster in the manager by selecting the Store Cluster option from the LEM right click menu As usual the user will be prompted to assign an optional label and description to the cluster and will be prompted for a color to associate with the elements of the cluster so they can be tracked during analysis When storing a set of loci to a cluster the elements of the cluster are actually the spots on
70. managing microarray experimental conditions and data converting scanned slide images into numerical data normalizing the data and finally analyzing that normalized data These tools are all OSI certified see section 12 open source and are freely available through the TM4 website http www tm4 org The Microarray Data Manager MADAM is a data management tool used to upload download and display a plethora of microarray data to and in a database management system MySQL An interface to MySQL Madam allows scientists and researchers to electronically record capture and administrate annotated gene expression and experiment data to be shared with and ultimately used by others within the scientific community TIGR Spotfinder is image processing software created for analysis of the image files generated in microarray expression studies TIGR Spotfinder uses a fast and reproducible algorithm to identify the spots in the array and provide quantification of expression levels Microarray Data Analysis System MIDAS is an application that allows the user to perform normalization and data analysis by applying statistical means and trim the raw experimental data and create output for MeV MultiExperiment Viewer MeV is an application that allows the viewing of processed microarray slide representations and the identification of genes and expression patterns of interest Slides can be viewed one at a time in detail or in groups for comparison p
71. none at all One particularly interesting feature of this algorithm is that it will associate genes whose expression levels change by a similar magnitude across experiments but in the opposite direction For example a gene with a given expression pattern across a series of experiments will be clustered with other genes whose expression pattern is the exact opposite 8 GSH Gene Shaving Sample Selection Cluster Genes Cluster Samples Parameters Number of clusters Number of permuted matrices 20 Number of permutations matrix 5 Hierarchical Clustering 1 Construct Hierarchical Trees TIG e LI cancel 11 11 1 Gene Shaving initialization dialog Parameters Sample Selection The sample selection option indicates whether to cluster genes or samples Number of Clusters This integer value indicates the number of clusters to produce Note that in the GSH algorithm the clusters do not necessarily represent disjoint sets Some elements may be represented in more than one cluster while other elements may not be represented at all Number of Permuted Matrices This integer value indicates the number of permuted matrices used to generate an average measure of cluster variance used to generate the gap Statistic Number of Permutations Matrix This integer value represents the number of alterations to each permuted matrix produced to generate the gap statistic Hierarchical Clustering 100 This check bo
72. of loci can be visualized The expression values from all spots on the slide that map to a particular locus are averaged to produce a single value for a locus The map is highly customizable in appearance and has fine and course navigational controls to assist in stepping through the genome or chromosome being viewed The LEM features also include options to view locus detail options to view locus related spot annotation information and the option to link to web resources for additional information about a selected locus General Appearance The LEM organizes loci by chromosomal location where loci are represented as arrows that indicate the direction of transcription Initially the LEM will be configured to have fixed locus arrow lengths and fixed length open areas between sampled loci The header indicates the sample related to each column of loci on the map The arrow colors represent the mean expression for the locus in the measured sample The initial color representation takes the color from a gradient within the displayed range in the header The annotation for each locus includes the 5 location the 3 location and the selected locus identifier The last column is reserved for a user selected field of annotation that can be selected from the Display menu of MeV Please see manual section 6 2 Selecting Gene Annotation TIGR Multiple Array Viewer DAX File AdjustData Metrics Analysis Display Utilities Aa M ge Hm
73. on the tree This representation of the tree will persist unless the dialog is dismissed by hitting cancel The distance threshold can be entered into a text field or can be adjusted with a slider over the maximum range of inter node distances The number of terminal nodes clusters using the current distance threshold is displayed in the upper right quadrant of the dialog Confiduratiun 4 4 Distance Treshold Adjustment Distance threshold 0 99606353 of Terminal Nodes 3855 KI Min Distance Range Max Create Cluster Viewers Tree Dimension Parameters Minimum pixel height 10 Dimensions Maximum pixel height 50 poly L Reset cancel HCL Tree Configuration Dialog The Create Cluster Viewers option allows you to create viewers based on the distance threshold This option collects groups of elements falling below terminal nodes in the tree using the current distance threshold The clusters of elements are represented as nodes in the result navigation tree under the TEASE result node The results are added once the TEASE Tree Configuration dialog has been dismissed 76 The minimum and maximum pixel distance imposes limits on the minimum and maximum displayed inter node distance This alters the appearance of the tree The Apply Dimensions button causes the entered tree dimensions to be applied to the TEASE tree This allows one to fine tune the tree s
74. permit customization of the color representation for that sample s graph In practice the overlay mode is most practical for viewing a 156 few samples at a time and it is often useful to zoom in on particular regions of interest TIGR Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities Chromosome Default Slide Name Line Color Marker Color TIGR MultiExperiment Viewer 11 21 11 Overlay View Graph Customization Options Below viewer mode in the right click menu there is a customize graph option which allows the y axis range to be set as well as providing other graph rendering options Two options are enabled to permit y scaling The initial default setting retrieves the range from the color scale range in MeV s display menu A custom range option with tick interval provides a greater level of control An optional x axis line can be drawn at a specified y value This can be a good reference line when dealing with log ratio data Formatting options include selectable line style and color 157 Y Range and X Axis Parameters Y Range Options External Display Menu Range Setting Custom Y Range Minimum Value 3 0 Maximum Value 3 0 Tick Interval 1 0 X Axis Display Options vj Show X Axis Line X axis crosses at 0 0 Line Style Line Color WW rrr TIG Reset cancel OK 11 21 12 Graph Cus
75. the permuted data d values The user can change the value of the tuning parameter delta using either the slider bar or the text input field below the plot Delta is a vertical distance in graph units from the solid line of slope 1 i e where observed expected The two dotted lines represent the region within delta units from the observed expected line The genes whose plot values are represented by black dots are considered non significant those colored red are positive significant and the green ones are negative significant The user can also choose to apply a fold change criterion for the two class paired and unpaired designs In this case in addition to satisfying the delta criterion a gene will also have to satisfy the following condition to be considered significant For a given fold change F Mean unlogged group B Mean unlogged group A values gt F for positive significant genes or lt 1 F for negative significant genes where F is the fold change gt l 6 35 5 08 ES e 3 81 2 54 1 27 lt ae _ 155 Tod 0 93 2 5 7031 wom An 7 0 62 0 93 1 24 1 55 s 77 gt gt 2 54 3 81 5 08 oe d 6 35 X axis Expected Y axis Observed Number of Significant Genes 28 Median number of false significant genes 0 35294 Use slider to set delta value LS
76. this label type will be displayed for the loaded experiments Once a Label Key is entered a double click on the remaining cells will allow editing to add appropriate labels for each sample Reopening the editor will allow you to alter any label or label key visible in the table except for the default primary label and key shown in gray Blank entries are allowed Merging Sample Labels Occasionally it is convenient to merge several attributes or labels to produce a more informative label for each sample To merge labels start by selecting two or more rows using ctrl left click and then selecting the Merge Selected Rows menu item A small list will be presented that can be used to order the selected labels before actual merging Once attribute ordering is done a new row is inserted in the table to display the merged labels and the merged label key 35 B List Sorter dU Sample Label Keys Default Slide Name Move Treatment Diet A Y TIG 7 Reset cancel OK 6 1 2 Attribute Sorter for Merging Sample Attributes 6 2 Selecting Gene Annotation The Gene Row Label menu option from the Display menu is used to select a gene annotation field from among the loaded annotation types Expression image viewers and other viewers that display gene annotation will be adjusted to display the selected annotation type 6 3 Color Scheme Selection The color scheme selection options pertain to the color gradients that are
77. topology Dimension Y 98 This positive integer value determines the Y dimension of the resulting topology Note that Dimension X times Dimension Y gives the number of clusters that will be produced Iterations This positive integer value indicates the total number of times that the data set will be presented to the network or Map Graph Each expression element will be presented this number of times to train the Nodes Alpha This value is used to scale the alteration of SOM vectors when a new expression vector is associated with a node Radius When using the bubble neighborhood parameter this float value is used to define the extent of the neighborhood If an SOM vector is within this distance from the winning node the cluster to which an element has been assigned then that Node and SOM vector is considered to be in the neighborhood and it s SOM vector is adapted Initialization Random Genes or Random Samples Indicates that the initial SOM vectors will be selected at random as actual elements in the data Random Vector Indicates that the initial SOM vectors will be constructed as random vectors generated to reflect the magnitude of the data set These initial vectors are not actual expression vectors in the data set Neighborhood The neighborhood options indicate the conventions formulas used to update adapt an SOM vector once an expression vector has been added into a Node s neighborhood Bubble This option uses the provide
78. user will be required to start the Rserve server lt runningRserve gt each and every time he she wants to do analysis using MeV R 260 2 Installing Updating R Installing under OS X Build from Source If you need the latest version and a precompiled binary hasn t yet been created 1 Download the latest version toward bottom of screen R 2 3 0 tar gz at time of this writing from http cran r project org src base R 2 2 Be sure that the downloaded file is not in a folder containing a space in the name like My Downloads It will not work Open a terminal window and cd to the folder containing the file 3 Unpack it by typing tar zxvf R 2 3 0 tar gz or whatever the filename is 4 CD into the unpacked directory R 2 3 0 in this case This directory should contain make and configure files 5 If you re using OS 10 4 Tiger issue the command sudo gcc select 3 3 to force the use of gcc 3 3 6 Issue the command configure 7 Issue the command make 8 Issue the command sudo chmod R g w Library Frameworks R framework to change file permissions 9 Issue the command sudo make install Installing under OS X Precompiled Binary Version This is the easiest way but there is often a lag between when the latest version is available and when a precompiled binary is available 1 Download R Get the MacOS X precompiled Binary Distribution from http cran fhcerc org 2 Install R by extracting the downloa
79. value was used which terminated the process earlier Constraints 129 This check box selects to apply limits to weights produced during training Positive Constraint The upper limit to produced weights Negative Constraint The lower limit to produced weights Distance Metric Dot Product using normalized expression vectors so that the norm of each vector is 1 This metric is fixed for this algorithm and will not correspond to the distance menu The SVC file format is a tab delimited text file with the following columns for each element 1 Index a sequential integer index 2 Classification an integer value indicating class membership 1 in initial classification 0 neutral 1 out of initial classification 3 Optional annotation columns The SVM Classification Editor Fig 11 18 4 allows one to use searches on supplied annotation as well as SVC files to assign membership to the initial presumptive classification The editor allows the user to sort the list based on classification or annotation fields The constructed initial classification can be stored in SVC format and later reloaded to allow alterations to produce what could be several initial classifications for a given study The SVC files once created can be used to supply the initial classification thereby skipping the editor step If the editor is used a button or menu selection launches the algorithm based on the current classification selection 130
80. which reports the sizes of each cluster 101 11 12 Figures of Merit Yeung et al 2001 The Figure of Merit is in concept a measure of fit of the expression patterns for the clusters produced by a particular algorithm MeV s FOM implementation provides FOM results for running the KMC and CAST clustering algorithms Each algorithm is initialized by selecting either the K Means K Medians tab or the CAST tab Currently FOM is available for the CAST K means and K medians algorithms A figure of merit is an estimate of the predictive power of a clustering algorithm It is computed by removing each sample in turn from the data set clustering genes based on the remaining data and calculating the fit of the withheld sample to the clustering pattern obtained from the other samples The lower the adjusted FOM value is the higher the predictive power of the algorithm The Maximum number of clusters input field under the K Means K Medians tab in the initialization box is used to determine how many times FOM values should be calculated for the k means k medians algorithm Each time the number of clusters computed is increased by one starting with one cluster in the first iteration The Interval input field under the CAST tab allows the user to specify the increase in threshold affinity in successive iterations of CAST If the Take Average box is checked in case there is more than one clustering outcome for a given number of cluster
81. which the nodes above to the root node and below the selected node are all selected The Select Ancestors forces path selection to extend from the selected node up to the root node of the tree The Select Successors option forces path selection to extend from the selected node down the tree Create Subset Viewer Extracting subsets of nodes from the main hierarchy is perhaps the most important feature of this viewer After selection of a tree path it is possible to use this option to build a new GO Hierarchy Viewer from the nodes in the selected path The extracted trees can be rendered in a new window or docked in MeV s main viewer panel Note that if the option it taken to render the new tree docked in MeV s viewer panel then a node is placed in the result tree so that the subtree can be saved in the analysis Open Viewer The open viewer option provides a shortcut method to jump from a node of interest to a viewer containing all of the cluster s genes that are related to the selected theme node 191 11 27 FOM Figure of Merit i x Sample Selection Gene Cluster O Experiment Cluster FOM Iteration Selection Number of FOM Iterations 1 K Means K Medians CAST Calculate means Calculate medians Maximum number of clusters enter an integer gt 0 20 Maximum number of iterations enter an integer gt 0 50 K Means K Medians will be run using a starting K number of cluster
82. 0 41919965 0 5894023 0 40653515 EE Sort table in original gene order 0 6017853 0 52127284 8 Link to URL JHUS bs T bU74435 084084700 1 8648005 TEE 0 42884266 9 N33243 1 55195 14149224 1 9042143 480826 1 3824095 0 7644585 1 7750477 0 59442997 6 11 0 37004593 8 31 TIGR MultiExperiment Viewer 11 14 3 TTEST Gene Statistics Table Viewer Another non standard TTEST viewer is the volcano plot viewer This plot shows the difference between the means of groups A and B for each gene plotted against the negative logio p value associated with its t value A volcano plot gives an intuitive visual sense of the nature of the relationship between the mean differences between groups and the statistical significance of those differences for the data set as a whole Right clicking on this plot brings up options for toggling the reference lines on and off selecting genes from the plot using slider bars storing these selected genes as clusters and projecting cluster colors from previous analyses on to the volcano plot amp TIGR Multiple Array Viewer File Adjust Data Normalization Distance Analysis Display Sort Help S B S Z w S b m Z W A Be ae Expression Graphs D Significant Genes D Non significant Genes C All Genes 3 Cluste
83. 0 release coordination graphics documentation CGH data analysis package additional data loaders other miscellaneous enhancements Agilent data loader USC RAMA Bridge TEASE Raktim Sinha Dana Farber Cancer Institute Added aCGH type data loaders Alexander Sturn Institute of Biomedical Zlatko Trajanoski Engineering Graz University of Technology Mark Snuffin DataNaut Inc Aleksey Rezantsev Dennis Popov Alex Ryltsov Edward Kostukovich Igor Borisovsky Stu Golub Zaigang Syntek Systems Liu Jane Ruan Corporation Inc Minhas Siddiqui Wei Liang The Institute for Genomic Research Jerry Li Vasily The Institute for Genomic Sharov Joe White Research Mathangi Thiagarajan Tracey Currier Eleanor Howe Patrick Cahan Tim The George Washington McCaffrey University 272 aCGH visualizing modules and aCGH analysisalgorithms functionality to MeV based on a published open source tool amp article Adam A Margolin et al Bioinformatics 2005 Initial system architecture and viewer design initial module development HCL KMC SOM PCA SVM System architecture module architecture parallel processing system primary module development RN TRN Primary module development CAST FOM GSH GDM Normalization algorithm development TM4 website maintenance Additional development software testing documentation support Affymetrix data loader and filters Affymetrix loader rework gene su
84. 051305 Grp1 B D07 TXT x 38 txt Add affy call txt affy call simple txt Add All call simple2 txt mas5 txt Remove affy mas5 simple txt affy mas5 simple2 txt Remove All TIGR taz m coca 4 12 1 dChip File Loader 4 13 Initial View of the Loaded Data Main Expression Image For each set of expression values loaded a column is added to the main display 4 13 1 This display is an Expression Image viewer in which each column represents a single sample and each row represents a gene The names of the samples are displayed vertically above each column and any annotation field of interest from the input files can be displayed to the right of each row MeV expects that each sample loaded will have the same number of elements in the same order and that each gene spot is aligned with that element in every other sample loaded For example using that rule all input files will have data for gene x in row y Clicking on a spot displays a dialog with detailed information about that spot For more detail regarding the Expression Images viewer see section 7 2 20 amp TIGR Multiple Array Viewer lol xl File AdjustData Normalization Distance Analysis Display Sort Help am m o 5 BB 4 cm A HCL ST SOTA RN KMC KMS cast QTC GSH SOM FOM PTM TTEST M
85. 1123 AFFX Dap 3 at 20 2 0 988616 16 3 LH20051123 AFFX LysX 5_at 2 2 36 A 48511 17 4 LH20051123 AFPX LysX M at 1 NGA 0 544597 18 5 LH20061123 AFFX LysX 3_at 2 2 5A 0 147918 19 6 LH20051123 AFFX PheX5_at 1 2 22 0 794268 2 7 LH20051123 AFFX PheX M at 1 2 07A 0 987453 21 LH20051123 AFFX PheX 3_at 20 2 58A 0 749204 2 9 LH20051123 AFFX ThOCS at 2 2 0 960339 3 20 LH200511230 AFFX ThiteM_at 1 20 56A 0 672921 24 21 LH20061123 AFFX Thite3_at 2 31 0852061 25 22 LH20051123 AFFX Trpri5 at 20 2 22 0 699394 23 LH20051 123 AFFX Trpn M at 1 20 HA 0 645547 7 24 LH20061123 AFFX Tronit 3_at 1 2 4A 0 574038 28 25 LH200511230 AFFX 12 Ec bioB 5 _ 1 1 1908P 0 000244 10 MINNA 17304 Ec bia M 11 11 D 0 00054 4 gt 2005112301 M dChip File Format 15 11 SAM group or class loading file format 1 Two class unpaired The loading file format is a tab delimited file For a sample if it belongs to group A Users define it as 1 and if it belongs to group B Users define it as 2 and if it belongs to neither group Users define it as 3 2 Two class paired The loading file format is a tab delimited file Samples start from 0 Users define one pair in one line Following is a sample file The first sample and second sample are in one pair The third sample and fourth sample are in one pair and so on NBN Q We 3 Multi class The loading file format is a tab delimited file Fo
86. 1394785570 3 173711955309506 0 7461394851 3 5141649339 1 6088664880 0 3350615427481746 10 41144740461 10 10160071973 10 05978398490 0 24905794368255382 0 55917625868 0 44 7898071143 0 2578967 2328 1 0974152148396164 0 00884284708 1 8160840474 1 2745943886 2 9378169659898496 0 1072128958 3 5134243281 2 0878705218 0 44526159626298506 10 29339348070 0 2333797147 0 1514461618 4 074186692443422 1 9172274606 3 6747356905 0 6909116722 1 1417840107559578 10 22978481824 0 6399915337 0 1821280641 3 221780725929314 0 2346493029 2 8196126362 1 1945486718 1 7839233084423376 0 05168369133 1 4256360933 0 5970014369 0 3179358150707298 0 36189921474 3 00827274265 2 98759976790 0 14511564950780137 144141887712 0 0334839053 0 8361128514 0 23976200738983006 104796785511 6 78946198489 6 16120756141 TIGR MultiExperiment Viewer RAMA Results Table of results Data Transformations Log2 Transform This is fairly self evident just taking the log2 transform of every element in the matrix Note that this adjustment should not usually be necessary When tav or files are loaded into MeV the program will automatically compute the 1022 ratio of the two intensities and use them in the expression matrix TDMS files also often contain pre calculated 1022 ratios Normalize Genes Rows This will transform values
87. 2 38 37 325 11 198 14 pathway 01120 Energy Metabolism sapiens 18 20 125 120 17 342 14 GO Biological Process 60 0009206 purine ribonucleoside triphosphate biosynthesis 12 38 12 325 1 147E 12 100 Biological Process GO 0009108 jcoenzyme biosynthesis 12 38 12 325 1147 12 GO Biological Process 100 00091 42 nucleoside triphosphate biosynthesis 12 38 12 325 1 147 12 _ 160 Biological Process 100 0009201 ribonucleoside triphosphate biosynthesis 12_ 38 H2 _ 325 1147 12 100 Biological Process GO D009145 purine nucleoside triphosphate biosynthesis 112 12 325 147E 12 190 Biological Process 100 0006753 nucleoside phosphate metabolism 12 1325 1447E 12 80 Biological Process 00 0006754 ATP biosynthesis 12 325 11 147 12 _____ 80 Biological Process _ G 15672 ___ inorganic cation transport JS _ 55 130 _ 325 8 436 12 190 Biological Process G0 0006752 group transfer coenzyme metabolism 13 325 1 377E 11 160 Biological Process 60 0009205 jpurine ribonucleoside triphosphate metabolism 12 138 14 325 8 896 11 190 Biological Process GO 0009141 jnucleoside triphosphate metabolism 14 325 8 896E 11 80 Biological Process 60 0008144 purine nucleoside triphosphate metabolism n 325 18 896 11 80 Biological Process 80 0009199 ribonucleoside triphosphate metabolism 14 1325 8 896 11 100 Biological Process G0 0006732 jcoenzym
88. 212 33 091 57 330 89 435 E General Information 21 212 33 091 57 330 89 435 History E 21212 33 091 57 330 89 435 21 212 33 091 57 330 89 435 21 212 33 091 57 330 89 435 21 212 30 545 60 606 87 273 21 212 30 545 60 606 87 273 19 515 28 000 69 697 100 000 3 394 4 242 67 879 84 848 3 394 4 242 67 879 84 848 3 394 4 242 67 879 84 848 3 394 4 242 67 879 84 848 3 394 4 242 67 879 84 848 3 394 4 242 57 878 84 848 3 384 4 242 67 878 84 848 3 384 4 242 67 879 84 848 3 384 4 242 67 878 84 848 3 384 4 242 67 879 84 848 67 879 84 848 67 879 84 848 67 879 84 848 67 879 84 848 67 879 84 848 67 879 84 848 878 84 TIGR MultiExperiment Viewer 11 15 3 SAM Delta Table Viewer 120 11 16 ANOVA Analysis of Variance Zar 1999 pp 178 182 ANOVA is an extension of the t test to more than two experimental conditions It picks out genes that have significant differences in means across three or more groups of samples Currently only one way or single factor analysis of variance is implemented The user is initially required to enter the number of groups following which a sample grouping panel similar to the t test panel with the appropriate number of groups is created Samples can be assigned to any group or excluded from the analysis F statistics are calculated for each gene and a gene is considered significant if p value associated with its F statistic is smaller than the user specified alpha or critica
89. 230aprobe L rama 110 1 0 1 T Install Location 9 At System Level R framework Install Selected At User Level In Other Location Will Be Asked Upon Installation Update All Installing under OS X using the command line RECOMMENDED 1 Download the Source Package rama 1 3 0 tar gz from http www bioconductor org packages bioc 1 8 html rama html or bridge 1 3 1 tar gz from http www bioconductor org packages bioc 1 8 html bridge html 2 Open a Terminal window This can be found at Applications Utilities Terminal Navigate to the downloaded file For instance if you downloaded the file to the desktop type cd Users yourusername Desktop and hit return The prompt should now read MyComputer Desktop username To install the package type 263 RCMD INSTALL rama_1 3 0 tar gz or bridge_1 3 1 tar gz Installing under Windows using the supplied Windows R GUI interface 1 Optional Choose a Mirror Click on Packages gt Set CRAN mirror RGui R Console Set CRAN mirror Select repositories R Copyright Install package s tatistical Computing Version 2 1 0 Update packages Install package s from local zip files You are welcome to redistribute it under certain conditions Type iicense or iicence for distribution details Natural language support but running in an English locale R is a coliaborative project with many contributors Type contributors
90. 36485 ORF 62196_KO ORF 62196_KO 33697 181702 5 3922367623824675 36488 ORF 61757 ORF 61757 8252 51364 6 2244364411605187 36496 ORF 66126 ORF 66126 57681 1005437 17 555208430548082 36489 ORF 66626_KO ORF 66626_KO 200872 913185 4 546163986618344 36485 62343_KO ORF 62343 78964 25220 6 319386604933893927 38486 ORF 62274 ORF 62274 77627 458243 5 9031393716103935 30485 ORF 66525 ORF 880525 30505 463043 15 179249363392886 38591 ORF 66568 ORF 66568 157921 156389 3 085356372658378 30486 ORF 61186 ORF 61186 217842 1859668 8 536774359398 096 36485 ORF 62346_KO ORF 62346_KO 386199 1566328 4 655732925253561 36488 ORF 61737 ORF 61737 14278 52618 3 6852560356189104 36486 ORF 825 85 ORF 62565 379532 1424193 3 752497813696129 38487 ORF 66566_KO ORF 06506 16334 134138 8 212195426595678 38486 ORF 66666_KO ORF 06066 49538 154553 3 1178691185817755 38598 ORF 62476 ORF 62476 2 2 5 3 5 8 9 1 1 3 4 5 7 3 9 1 1 3 4 4 4 7 1 1 2 2 8 1 5 9 9 1 1 ANON ANA O N N N N un n Go Go 00 S I gt QD N NN NO G x consery phospho puruuati DNA prot blank conseru conseru blank 5888 185925 21 185 38486 49 ORF 62654 ORF 62654 van2 protein putative blank hypothe membrani 5 nethy blank hypothe hypothe blank neuranii blank pyridin hypothe blank blank primase 1 2 pr blank glutanil bl
91. 53913 0 45023 0 34688 15 2 1 Tab Delimited Multiple Sample File TDMS Format Image 5 Expression File Loader Load expression files of type Tab Delimited Multiple Sample Files TDMS Computer Available Files txt Selected TDMS File gt B 9 cx Affy sample1 txt C WfyProjectsWfeV 3 1 data Stanford Large txt Documents and 58 Affy sample2 txt SS T 1386 angiotensin heart stanford t3 Jedi 2205 autismOrdering txt C3 Java DreastvsTum rData tst YORF NAME GWEIGHT GenBank ea A MyProjects gdmsavek txt Expression Table 1 41460128 0 775447 2 R911803 2 _ 1460128 25 B _ 0924 2 Mss _ AA704242 2 5 7 022 1 5 045320 2 C CGHBuilder2 golub_preclass_2_classes tx EASE raw golub preclass 3 classes tx metabolism HTA acute and chronic 4 ti matrix save 154 txt savek a13 20 txt f Stanford_1 38_and_Ind_39 up 452986 2 1 1 1 1 1 1 1 1 1 2 1 AA757429 0 22613566 0 2799177 Stanford Largest _____ 2 2 AA156571 0 3231618 0 0076951 0 3541049 1 1 1 1 1 1 1 1 1 C scripts Stanford Population Annotati AJ286022 0 202547 0 0861757 0 3291901 C vus Ag 4521292 0 3659734 0 2530692 03697424 C devel AJ344681 003373382 0 34173125 0 3719920 AAT05237 01044241 0 1959031 0 42
92. 57 represents multiple clusters in which each cluster contains vectors that are similar There is no clear ordering of results Generally to act on this output a selection algorithm should be used to select a cluster partition output is a multi cluster output where the clusters are ordered and cluster members have a paricular shared quality e g Significant genes by a statistical algorithm elements partitioned by classification algorithms gt lt ELEMENT data node EMPTY gt lt ATTLIST data node data node id CDATA REQUIRED name CDATA REQUIRED gt lt ELEMENT primary data file list lt ATTLIST primary data id CDATA REQUIRED data type mev tav stanford gpr affy abs affy ref affy mean IMPLIED gt want an enumeration of data types mev tav stanford affy gpr gt lt ELEMENT file list file lt ELEMENT file EMPTY gt lt ATTLIST file file_path CDATA REQUIRED file_type data annot preference REQUIRED gt 256 18 2 DTD UML Schema UML SCHEMA REPRESENTATION OF MEV SCRIPTING XML The dashed lines represent possible references from data id s to input data IDREF 0 or 1 element 1 or more elements 0 more elements unmarked links represent exactly one element tm4 version CDATA
93. 6 v fe C E H Row Column Metarow Metacol Subrow Subcol Cy3 Int Cy5 Int 1784877 1777587 47205 296114 443327 235098 0 0 99362 78752 128894 53126 103781 52196 194146 107295 275681 12977 102280 65244 19216 16091 0 CO m Ch 1 2 3 4 5 6 7 8 9 10 15 1 1 A file containing only the required fields 238 Microsoft Excel sample1 tav File Edit View Insert Format Tools Data Window Help m f null 1 Row Column Metarow Metacol Subrow Subcol Cy3Int Cy5 Int Flag 1 Flag 2 Ratio 2 ES 1 1 1 1 1 1 1784877 1777587 B B EN 1 2 1 1 1 2 47205 296114 C C EN 1 3 1 1 1 3 443327 235098 C C NBN 1 4 1 1 1 4 0 0Y X te 1 5 1 1 1 5 99362 78752 C 8 1 5 1 1 1 6 128894 53126 C C H 1 7 1 1 1 7 103781 52196 C C 10 1 8 1 1 1 8 194146 107295 C m 1 9 1 1 1 9 275681 12977 C NA 1 10 1 1 1 10 102280 65244 C 13 1 11 1 1 1 11 19216 x C 14 1 12 1 1 1 12 16091 ac X M gt 1 Ready 15 1 2 A TAV file with several extra fields 15 2 3903 0 925 0 9532 0 8092 0 9086 574 491 494 497 501 504 507 510 514 517 520 523 73 265 0 7329 0 8043 0 9263 0 8931 1 0188 0 8443 1 2875 277 Plate Well amp clone id amplified 49570 4035 5124 5261 7741 8916 10026 11226 12708 13908 15018 16218 14 ___ Tab Delimited Multiple Sample Files TDMS files 0 M86720 nu 2 AA126115 THC1082463 Chloride co
94. 61710 485752 0 02515574 00477768 0 3655759 H81821 0 0247928 0 1003257 0 1375680 924 0 4636477 0 4295079 0 2337320 W24076 0 0391608 0 0060148 0 4798256 MevMet C MeVNotes C Papers Program Files AA939275 0 2653523 10 0669999 0 0273161 4 917078 N 849 102297779 018 RECYCLER SunOne i Click the upper leftmost expression value Click the Load button to finish CJ System Volume In Temp Annotation Fields C WINDOWS C wuTem vj XEEEEEEM ____ A TIGR C ren 15 2 2 Loading Tab Delimited Multiple Sample Files TDMS Format 15 3 GenePix Files Microarray data in the Genepix file format can be easily converted to tav file format using TIGR s Genepix Converter application This program is freely available from the TIGR website as part of the MADAM package The latest 240 version of the converter can convert multiple files in batch mode See the help files for the Genepix Converter for more details Files converted when the Keep all Genepix data box is checked will be extremely large and may cause MeV to run slowly Leaving this box unchecked will cause only the data required by MeV to be written to the output tav file 15 4 MEV Files A MultiExperimentViewer or mev file is a tab delimited
95. 676663 H52534 T69270 8285 A775447 R91803 988 3475742 0128 4 15657 24 6 39275 70 1146 T809 1 I344681 705 4857 H81821 69 407 91707 amp 741038 676663 2534 44045320 HS 1678286 A70424 5 N780 12860 A AAS T4 w24 AAS R66006 N70794 T69 Set Color Scale 11 22 2 GDM showing borders and popup menu Menu Options Like most viewers in MeV the GDM has a right click menu that provides options for extracting information from the viewer and manipulating the appearance of the viewer In the sections below each of the menu options are described Color Scheme Two default color schemes are available as well as the option to select a custom color scheme Element Size Five preset element sizes are offered as well as the option to select a custom size Changing the size is a good way to get either a detailed look at specific genes or to take a broad survey of the matrix Draw Borders The Draw Borders options places borders between elements and can help with visual alignment when viewing the distance of one gene to several other genes on a single row of the matrix The color of the border can be selected from the menu to contrast the colors used to represent distance 162 Set Color Scale The GDM menu provides the option of selecting the limits of the displayed color scale By altering the lower and upper limits
96. 78 01019578 0 03710229 0 3177015 0 32198283 0 0266310 0 0030479 0 3699321 0 12320718 0 2873118 0 22161625 0 12136618 0 1080355 0 28815243 0 0273934 0 41843417 0 12653272 0 4331264 0 2179629 0 1242599 0 13588768 0 27482247 0 2264994 0 1551286 0 33907738 001729145 0 14871097 0 3682818 0 24507831 0 14308676 0 23679535 0 22913224 0 0385911 0 43945697 0 09871387 0 49816275 0 06311054 0 25755474 0 0733670 0 383684 0 0156969 0 22506227 0 23113425 0 27522767 0 2801697 0 3184198 0 26904163 0 09083017 0 31129918 0 11583544 0 0285251 0 43259102 0 0241408 0 47091904 0 34565696 0 14735182 0 11396349 0 35808244 0 12657094 0 0229531 0 09858432 0 07923959 0 4439097 TIGR MultiExperiment Viewer 11 16 2 One way ANOVA F Ratio Information viewer 123 11 17 TFA Two factor ANOVA Keppel and Zedeck 1989 pp 183 196 536 541 Manly 1997 pp 125 131 Zar 1999 pp 248 250 Two factor ANOVA can be used to find genes that very significantly across levels of two independent variables factors as well as their interaction The first initialization dialog prompts for the names and number of levels of the factors following which an initialization dialog very similar to that for t tests and one way ANOVA is displayed The only difference is that the top panel of this d
97. 78 0 27 23 P302 9 21471081 21471201 0 397 0 015 0 141 0 095 23 P112 9 21851957 21852077 1 908 1 557 0 223 0 04 2 66 0 241 1 433 23 P250 9 21957530 21957650 2 439 0 603 0 067 0 558 2 822 22 pA340n 21067070 _21949N99 4734 7019 D 354 n 2 3 TRA NAIT 4 ABQ Log Ratios for chromosome 2 of all experiments Clicking anywhere on the chart will highlight the data point closest to the selection as well as the corresponding row in the table Selecting any number of rows in the table will highlight the corresponding region in the chart The View menu can be used to change annotations and display styles in the browser CGH Analysis The CGH Analysis menu contains a number of algorithms for searching for data regions that are consistently altered throughout the experiments These algorithms can be performed on probes genes and data regions minimal common regions of alteration Algorithms on Probes The items CloneAmplifications CloneDeletions CloneAmplifications2Copy CloneDeletions2Copy are used to search for probes that are commonly altered throughout the experiments Click on the CloneDeletions item Notice that a subtree has been added to the Analysis node of the navigation tree on the left side of the screen Expanding this tree and selecting the Results node will set the main view to display a table showing the number and percentage of experimen
98. 8 45296990 47090759 47579278 148456812 148566214 98788145 98806260 14867839 14884677 16025804 15153369 25152421 25253477 25108827 25275901 26276901 26382069 26382069 26949536 27465478 27546245 27842348 28007548 29696394 29716924 29717044 29852573 34715022 34814108 36191703 36399022 36399142 36705742 71244570 168497703 169742322 176900761 209556832 71685118 182821289 4520682838 s 8 170501055 176989063 209834818 114216251 118304544 147393475 147715699 162528309 152538371 180966318 181810376 76249327 76033076 ultiE periment Viewer Chromosome 1 deleted regions Select and annotate these five regions and display the CGH Position Graph viewer for chromosome 1 Figure 3 4 The annotated data regions represented by light blue rectangles on the right side of the display This technique can be used to significantly reduce the size of the data regions determined for further investigation Right click on any of the blue rectangles and select Show Genes in Region to check if there are any consistently deleted genes of interest Figure 3 5 These are displayed in a tabular format 231 Multiple Array Viewer Adjust Data Metrics Analysis Display
99. 853 5 725088 1 2786527 0 55959946 1 722357 0 52127284 8 527909 12197434 08273132 1 459436 0 42884266 9 315455 1 6074435 0 84684706 1 8648008 0 54117334 7 72557 1 55195 11149224 19042143 0 59442997 6 11658 0 37004593 8 312951 Sort table in original gene order TIGR MultiExperiment Viewer 7 5 1 Gene cluster table view Columns can be dragged horizontally across the table to change their relative ordering Successive clicks on the header for any column will sort the rows in ascending or descending order of the entries in that column Sorting of the Stored Color column will bring together elements that have been stored with the same cluster color CTRL clicking on any column header will sort the table in the original order of elements in that cluster an option that is also available from the right click menu below Right clicking on the table view brings up a menu containing the options available from the right click menus of other types of viewers with a few important modifications and additions as follows In addition to storing the entire cluster and launching the entire cluster as a new MeV session as in the other types of viewers users can also select a subset of rows in the table for these operations Even individual elements can be stored and tracked one at a time Contiguous row selections can be made by dragging the mouse over these rows or by selecting the first row
100. 91E 0 5652352 1 60E4 23 21 1232482 1232602 0 0917071 0 1824450 22 gt 23 51 1 206812 11286932 0 1826013 _ 0 0139316 2 2 2 23 P35 1208830 1208950 10 0295801 0 0860649 Fi 2 2 _ 1554989 0139318 0 2782873 pp 0 1 706635 1562655 0 3193021 1562535 1749669 10 4243102 0 0103693 1749549 1993355 10 0432650 0 00725909 0 1374480 8 10E 1993235 1993906 1994026 0 2838596 0 3418829 0 6882978 0 1860023 2145633 2145753 10 2730705 First 4 column MUST be MarkerlD Marker Chromosome Marker Start Marker End appearing exactly inthe same order Click the upper leftmost expression value Click the Load button to finish Annotation Fields CloneID Ch Stari End The protocol for loading data from files is as follows 219 1 Launch MeV see section 1 2 Click File gt Load Data to launch the file loading dialog 3 Clicl Select gt CGH to invoke the CGH loader 4 Locate directory in which data files are located using the directory tree on left To load a sample data set included with the MeV distribution navigate to the installation directory of MeV expand the Data directory and select the CGH sample data txt file Use default settings for Species amp Log status 5 At the bottom of the window click Load CGH Analyzer Viewers The CGH Circle Vi
101. 99998 2 5888688 1 5888887 1 8888256 8 99999934 8 27894148 6 324628 8 38256173 8 1458932 8 11216137 8 2752273h 80 1554148 8 18321837 8 38981665 8 895255636 8 897083755 8 05287785 8 45582325 8 43822922 8 0155448719 8 5238365 8 28357737 151 Ex8 9 18 6 8 6 6 8 8 1 9999992 1 9999992 1 9999992 1 9999992 2 000039 2 000039 2 000039 2 000039 1 9999992 1 499998 8 99999934 1 499998 1 9999992 2 5999998 2 9999998 2 5999998 2 000039 1 5888887 1 8888256 1 5888887 2 000039 2 50006 08 3 000142 2 5888688 8 8 6 99999934 1 9999992 1 9999992 1 9 8 8 1 8888256 2 000039 2 000039 afi 0 41660944 8 58549798 6 167487 67 8 37853232 6 46811864 8 22585657 8 15288569 8 13762925 8 297692 8 89958139 8 48959656 8 21574145 8 8328121 8 28635117 8 24139665 8 38851867 8 48228675 8 56387237 8 16378513 8 16988185 6 678372434 8 37152755 8 12526849 8 1687737 5 8 58188536 8 16653775 8 36986567 8 339560883 8 287587 86 8 15527576 8 13934253 8 04878717 8 28573515 8 18789765 6 48176382 8 428528 8 33857883 8 3817 85 85 8 23838795 8 56958975 8 56378535h 8 18517362 8 54181275 8 1755565 8 11959573 8 48256595 80 2592242 8 13789862 8 3568882 8 8924789 8 5315772 8 8851767 8 2382781 6 8644699945 8 11539963 8 288575426 8 168548856 1 8999395E 4 6 662687819 6 46919814 8 2422755 8 189915432 8 815518956 8 31888858 6 1628484 8 36585115 8 291954457 Image from the File menu To print the image selec
102. A TRN DAM and COA functions Referencing MeV Users of this program should cite Saeed AI Sharov V White J Li J Liang W Bhagabati N Braisted J Klapa M Currier T Thiagarajan M Sturn A Snuffin M Rezantsev A Popov D Ryltsov A Kostukovich E Borisovsky I Liu Z Vinsavich A Trush V Quackenbush J a free open source system for microarray data management and analysis Biotechniques 2003 Feb 34 2 374 8 http www tigr org software tm4 menu TM4 Biotechniques 2003 pdf A note on non Windows operating systems The majority of our MeV development and testing was performed on Windows operating systems Although MeV will run under other operating systems there may be some incompatibilities or bugs revealed in this manner Please report any such issues to mev tigr org MacOSX users can simulate the right click by using control click For more help A link to the online copy of this manual can be found in Help MeV Manual in the main MultipleExperiment Viewer toolbar For more help beyond the scope of this manual or to submit a bug report see the MeV website at http mev tm4 org 2 TM4 Software Overview MultiExperiment Viewer is one member of a suite of microarray data management and analysis applications originally developed at The Institute for Genomic Research TIGR Within the suite known as TM4 there are four programs MADAM Spotfinder MIDAS and MeV Together they provide functions for
103. Apr 4 2006 2 43 50 B Data Source Selection 1 Tree Gene Node Height Plot General Information si Script Manager GHistory 75 TEASE Tree View Basic Navigation A large dataset is likely to have more than a handful clusters that fall within the size range but only clusters that are more red worth attention It is thus important to assign appropriate color gradient boundaries to save time Adjusting color gradient will be in a later section To view information about each cluster simple position the cursor over the root of the cluster to reveal a pop up window When done move the cursor away and the window will disappear Adjusting Tree Configuration and Viewing Clusters A right click in the Tree Viewer will produce a menu which includes an option to alter the displayed tree Tree Properties This option permits the user to change the tree s appearance and to reduce the complexity of the tree by imposing a distance threshold Elements on nodes which have distances below this threshold can be considered as one entity or cluster Consequently the lower level detail of the tree is ignored As the value is adjusted the corresponding TEASE tree will have nodes below this threshold appear as light gray in color and a translucent wedge from that node to all enclosed elements will be draw
104. Correlation filter The correlation filter is used to filter out those genes of the set to be classified that are not significantly correlated with at least one member of the training set The significance of correlation is determined by the p value which is calculated by a permutation test in which each gene is permuted a user specified number of times KNN classification parameters This is where the user specifies the expected number of classes which is also the number of classes present in the training set The number of neighbors is the number of genes from the training set that are chosen as neighbors to a given gene Euclidean distance is used to determine the neighborhood Let s say we want to classify a gene g Gene g is assigned to the class that is most frequently represented among its k nearest neighbors from the training set where k is specified by the user In case of a tie gene g remains unassigned Create import training set If the user chooses to import a previously created training set for instructions on saving a training set see below on hitting the Next button a file chooser is displayed from which the training file can be chosen If an appropriate file is chosen the KNN classification editor shown below in Fig xx is displayed with the class assignments from the file If the option to create a new training set from data is chosen on hitting the Next button the classification editor is directly display
105. Created by the MeV Team See About gt Credits Part of the TM4 Software Suite www tm4 org MeV MultiExperiment Viewer Version 4 0 July 06 2006 Table of Contents Ll General Ot IILI A Al l l i i i s 2 TMA Software Oyepyiew u k ULU T 3 Starting MultiExperiment Viewer 8 9 4 Loading Expression Data Jd Loading MeV mev Format 9 42 Loading Array Viewer fav Files 10 4 3 Loading Tab Delimited Multiple Sample txt Files TDMS 11 4 4 Loading Affymetrix Data txt TXT Files 12 46 Loading GenePix gpr Data Files 13 Au Loading Agilent FileSend i ia 14 4 13 Initial View of the Loaded Data Main Expression 1 20 4 14 Result Navigation Tree 21 4 15 The History Node and 21 35 Adgjushng the Daffa uuu u u uu uuu a 23 Ad Adjustment Filter Overview a 23 Ja Replicate Anglysi u apana alana ass 24 5 3 D
106. Down The display order changer Element Size The width and length of the probes can be changed through the Element Length and Element Width items in the Display menu By default the width and length are calculated to fit the entire display on the screen It is often useful to increase the length to look at a particular region because it is often difficult to distinguish probes that lie close to each other Flanking Regions CGH arrays are used to determine a copy number profile throughout the genome In expression arrays the values of importance are usually genes that are covered by probes but in CGH arrays the regions that lie between probes are often as important as those that are covered Therefore unless a CGH array has complete genomic coverage it is important to interpolate copy number change in the regions not covered by probes Flanking regions also allow experiments to be analyzed together that were generated using different arrays Flanking regions are used to approximate a complete genome copy number profile of each sample Flanking regions rely on assigning a discrete copy determination to each probe A region between two probes is considered altered if either of the probes that flank that region is altered If a data region is flanked by one or more deleted probes the region is considered deleted and if it is flanked by one or more amplified probes the region is co
107. EG Viewer Linear Expression Graphs LEGs are also produced during LEM execution These graphs depict gene expression as a graph where features are segregated by chromosome or plasmid and then ordered based on chromosomal location The LEG Viewer node is appended to the result tree just below the node for the LEM viewer A right click will produce Viewer Options A right click context menu allows several options for customizing the view The following sections describe these basic features 155 Viewer Modes Tiled View The default view for the LEG Viewer Tiled View produces a set of graphs one for each sample that are stacked vertically A click drag of the mouse allows you to zoom in on a section of the graph The header bar will update to indicate zoom level and the current location on the graph The default color scheme produces 35 unique marker line color combinations TIGR Multiple Array Viewer x Adjust Data Metrics Analysis Display Utilities 12 5 Chromosome 15 Chromosome SP0659 629696 60262 HL x Y T gt I 17 5 Chromosome 20 Chromosome tiExpe 11 21 10 Tiled View Overlay View The overlay view displays a single graph over which one or more samples can be graphed The click drag zoom option is still enabled in this mode A ctrl click in the table will permit multiple selections in the table Clicking on the marker or line colors in the table will
108. Folds 5 2 experiments will be removed as pseudo test experiments during each Cross Validation Fold After 5 Folds all 10 experiments will have been used once and only once in the pseudo test set A higher Folds is recommended for smaller class size 197 CV runs is the number of times to repeat cross validation Reducing this parameter will reduce computation time in the training phase at the expense of less accurate average number of classification errors and genes selected from the cross validation step Bins is the number of different values to use for Delta Max Delta is the maximum Delta value to use Deltas will range from 0 Max Delta incrementing by Max Delta Bins The user may consider reducing this parameter to get a more precise estimate of the optimal shrinkage threshold A if the optimal estimated A is significantly smaller than this value On the other hand if the number of classification errors from cross validation is unsatisfactory the user may consider trying a larger Max Delta Corr Low is the lowest Correlation Coefficient threshold to use The default is 0 5 which should be sufficient for most cases Corr High is the highest Correlation Coefficient threshold to use The default is 1 0 which is the maximum possible correlation Corr Step is the value to increment over going from Corr High to Corr Low 198 O O O USC Uncorrelated Shrunken d a Enter all Class Labels ea
109. Gene Locator This option moves the LEM to a selected Locus The controls for this are under a tab in the controller labeled Gene Locator Provide a locus identifier and the LEM will jump to that location with the indicated locus at the top of the viewer This feature is useful when one has specific loci of interest Base Locator This option moves the LEM to a specified base location The controls are found under the Base Locator tab in the navigation control window Entering a base location will move the main view to the locus or loci that cover the entered base location Locus Information Panel Locus information panels can be launched by a left mouse click on a locus arrow The mean expression of a locus for each of the loaded samples is displayed in an expression graph If there are several spots on the slide that correspond to the locus each spot will be plotted as a point The lower half of the panel has a table that contains expression information and another table that contains full annotation information for all spots associated with the locus 147 Locus Information SP0015 Expression Graph Expression of Locus SP0015 Logz Cy5 Cy3 wo Sample Number Locus SP0015 Expression Data lt lt Previous GenePage select 11 21 4 Locus Information Panel The locus information window contains four buttons at the bottom The Previous and Next buttons update the information in the panel t
110. IGR preview Reset Cancel Apply 152 11 21 8 Bin Color and Limit Selection Dialog Locus Selection List The LEM viewer allows the user to select loci of interest These loci can be output to file in various formats and can be stored as an MeV cluster in the cluster manager repository A shift left click on a locus will add the locus to the selection list one can shift left click again on the locus to deselect the locus and remove it from the list Selected loci will have a red marker to the left of the locus arrow and will have it s annotation shaded in red The locus selection list can be viewed by selecting the Locus Selection List option from the right click menu in the LEM The selection list has buttons that allow one to clear selected loci or to clear all loci from the list The file menu on the selection list provides options to save all loci or selected loci to tab delimited text files that can be loaded into MeV or can be opened in a spreadsheet These options are similar to options from the main LEM right click menu and will be discussed in detail The Select menu has options to make targeted selections of loci by either selecting a locus or by specification of a base location range The Graph Options menu has options to control locus expression graph rendering One option is used to hide or show the locus graph while the other option is used to display the graph as either a simple monochrome graph or to render each
111. LIT eee m_ Sees _ eee mi Te eee ee EEE eee 117111111 Gradient Preview 6 3 1 Color Scheme Selection Dialog 6 4 Setting Color Scale Limits Expression images convey expression levels by converting the numeric expression value log2 A B or absolute expression value as a color that is extracted from the color gradient By setting numeric limits or endpoints on the displayed gradient seen in the preview panel above and at the top of expression image viewers expression values within the limits can be displayed as a color selected from the gradient image The Set Color Scale Limits option from the Display menu provides a dialog to set the limits of the color gradient 37 Color Scale Limits Gradient Style Double Gradient O Single Gradient Color Range Selection Lower Limit min data value 3 5 3 0 Midpoint Value median data value 0 03 0 0 Upper Limit max data value 3 49 3 0 Update Limits Color Saturation Statistics Number Percent Elements Off Color Scale 38 21 Elements Below Lower Limit Elements Above Upper Limit Gradient and Limits Preview TIG Ge Reset cancel OK 6 4 1 Color Scale Limits Dialog The color scale limits dialog like the color scheme dialog also provides the choice of using a double or single gradient as appropriate The color range panel provides input boxes to set lower and upper limits for the
112. M Matching Eo EEG E PEDIR Isi dao eke 106 IIIA Ncc 109 11 15 SAM Significance Analysis of Microarrays 117 ILIO ANOVA Analysis of Variance 121 11 17 Two factor ANOVA t 124 11 18 SVM Support Vector Machines 126 11 19 KNNC K Nearest Neighbor Classification 133 11 20 Discriminant Analysis Module 137 11 21 LEM Linear Expression Maps 142 11 22 GDM Gene Distance MatrixX a a 161 11 23 COA Correspondence Analysis 168 11 24 Principal Components Analysis 170 11 25 Expression Terrain Maps 172 11 26 EASE Expression Analysis Systematic Explorer 178 11 27 FOM Figure of Merilt 192 11 26 BRIDGE Bayesian Robust Inference for Differential Gene Expression 194 11 29 USC Uncorrelated
113. M Initialization Dialog Matrix Viewer Basics The matrix viewer has annotation headers that can be used to identify the gene associated with a column or row Each square element within the matrix is rendered as a color that represents the distance between the two genes associated with the element The main diagonal is simply rendered as white for identification 161 viewer __________________ File Adjust Data Normalization Distance Analysis Display Sort Help O ale Te ES 3s X x Bin ANOVA HCL ST 5 GDM KMS GSH SOM PTM A x A PC Meva main view Y TRR o lt 9 9 37 M Cluster Manager Q Analysis Results Apr 3 2004 11 37 43 AM genes 1 2 GDM genes 2 E ans GDM genes 3 91803 Color Scheme 2 6012 Matric view Element Size Time 901 ms II Euclidean distance 8 78022 _ Draw Borders fia Script Manager 22045320 Select Border Color History 2 452988 History Log 2 AA757429 2 15657 Toggle Sort on Proximity 1286022 521292 Save Neighbors AI344681 Sort 705237 AA485752 Impose Cluster Result H81821 T46924 Change Annotation W24076 Change annotation Width 1939275 917076 T41038 266006 70794
114. Mode selection HCL Only Perform Hierarchical Clustering HCL No biological theme exploration Cluster Analysis 72 EASE analysis on clusters that fall within the minimum and maximum size of population as specified in the panel HCL Clustering Cluster size specification Calculate the score of each category and rank the categories by score Annotation Survey EASE analysis on clusters that fall within the minimum and maximum size parameters specified by the user Calculate and rank present categories in each cluster by hit counts No score is calculated in this mode Parameters e LAS Analysis File Updates and Configuration Select EASE File System CAMeV 20050307datatease Update EASE File System Mode Selection HCL Only Cluster Analysis Annotation Survey HCL Clustering a opula tion Color Annotation Parameters Statistical Parameters Tree Selection v Gene Tree Sample Tree Distance Metric Selection Current Metric Euclidean Distance The default distance metric for HCL is Euclidean Distance L Use Absolute Distance Linkage Method Selection Average linkage clustering Complete linkage clustering Single linkage clustering Cluster size Specification MGR Feee Reset Cancel OK TEASE parameter setting window HCL Clustering Linkage Method This parameter is used to indicate the convention
115. NSTALL Rserve 0 3 16 tar gz Installing under Windows 1 Download Rserve from http stats math uni augsburg de Rserve dist rserve win html 2 Copy the downloaded file to the same directory where R dll is located by default C Program Files R rw1080 bin Note The windows version of Rserve is NOT RECOMMENDED It suffers from namespace issues namely that parallel connections are not supported 6 Running Rserve Running under OS X 1 Open a terminal instance Type R CMD Rserve Running under Windows 1 Double click Rserve exe 270 20 License Copyright 2006 The TM4 Development Group All rights reserved This software is OSI Certified Open Source Software OSI Certified is a certification mark of the Open Source Initiative Please view the license Artistic License pdf in the root MeV directory 271 21 Contributors Contributor Affiliation Nirmal Bhagabati The Institute for Genomic John Braisted Vasily Research Pathogen Sharov Wei Liang Functional Genomics Chun Hua Wan and Resource Center Alexander I Saeed Eleanor Howe Dana Farber Cancer Raktim Sinha John Institute Quackenbush Weidong Wang Vu Chu Annie Liu University of Washington Roger Bumgarner Contribution System design module development and implementation project coordination documentation optimization usability assurance user support and Linear Expression Map LEM module development Analysis saving rewrite v4
116. On the left hand side of the dialog is a file browser Use this browser to locate the files to be loaded The desired format file can be selected from the list in the middle of the dialog The file will be displayed in a tabular format in the file loader preview table Choose the data options by clicking proper radio button Then click the cell in the table which contains the upper leftmost expression value in the file Then click button Load to load file 18 zax Select Definition to File Formats Selected File Affymetrix GCOS using MAS5 Files amp Desktop 2 File Available i Ga document 7 MfyGCOS Files txt C eclipse 38 txt Selected Path C Favorites affy calltxt 7 C Full affy_call_simple txt 7 P call simple2 B Wl mas5txt Affymetrix Data Options 5 simple x Ci Local Setin simple Only Intensity C Intensity With Detection C cvs mas simple2 midas Affy samplet txt 7 Intensity with Detection and P value C My Docume sample2 txt 7 CI NetHood bnpreprocess txt 4 Expression Table C paper BreastvsTumorDati C3 PrintHood 2 rama A Recent core txt Gi runtime c core_sampletxt C3 SendTo core sample2 txt 7 source group txt Start Menu group
117. Samples Threshold Parameter Threshold 0 8 Hierarchical Clustering Construct Hierarchical Trees TIGR ma Reset Cancel 11 8 1 CAST Initialization Dialog Parameters Sample Selection The sample selection option indicates whether to cluster genes or samples Threshold The threshold parameter is a value ranging from 0 0 to 1 0 which is used as a cluster affinity threshold Each expression element will have an affinity for the current cluster being created based on it s relationship to the elements currently in the cluster If that affinity is greater than the supplied threshold the gene is permitted to be a member of the cluster Note that thresholds near 1 0 are more stringent and tend to produce many clusters with rather low variability Conversely using a lower threshold will produce fewer more variable clusters A balance by trail and error should be found between these extremes Note that in the algorithm expression elements are repeatedly tested for their affinity to the cluster being formed In that way elements can be added or subtracted based on the current cluster membership Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created Default Distance Metric Euclidean 93 D xi Multiple Array Viewer File Adjust Data Normalization Distance Analysis Display Sort Help Ace MR So Sm mi ae MultipleExperi
118. The initialization dialog shown below allows the user to denote the dye labeling scheme used in the experiment n e RamalnitDialog 4 For each slide mark Cortrol Sample s dye color 5 HIV 01 mev Q e HIV 02 mev Q e HIV_03 mev HIV 04 mev C Advanced Parameters Now would be good Rserve Connection time to start Rserve ERE localhost 6311 v yur in Click the info button for help lower left or enter anewserver Add 5 TIG imma eriment Reset Cancel OK RAMA Initialization Dialog Rserve Connection RAMA is a package written in the R programming language and requires a connection to a computer running RServe to function See Section 19 for details on installing R and RServe By default Bridge will look on the local machine for an Rserve server However since Rserve is a TCP IP server theoretically it could be running anywhere The user need only enter an IP address and port number separated by a in the Text Field Enter a new location By clicking Add the new location will populate the pull down menu It will be saved to the user s config file and be available for later use 24 5 3 Rama Results Sample output from this module is shown below 11 26 2 Rama estimates intensities based on the data input The new set of intensities replaces the loaded dataset and is available for further analysis A table of the results is also made av
119. Utilities CloneValues CGH Analysis E A HCL FOM ST SOTA CAST QTC som fic hromosome Views X Chromosome 1 X Chromosome 2 vs humd X Chromosome 3 vs humd vs_hund M Chromosome 4 X Chromosome 5 HUP T4 HPAC HUP T4 M Chromosome 6 X Chromosome 7 M Chromosome 8 M Chromosome 9 X Chromosome 10 M Chromosome M Chromosome 12 M Chromosome 13 M Chromosome 14 M Chromosome 15 M Chromosome 16 M Chromosome 17 XM Chromosome 18 M Chromosome 19 M Chromosome 20 x Chromosome 21 M Chromosome 22 X Chromosome x M Chramacama V ultiE periment Viewer Chromosome 1 data regions with six or more amplifications Algorithms on Genes The items GeneAmplifications and GeneDeletions are used to search for genes that are commonly altered between experiments Select the GeneDeletions item Select the Results node in the newly created Gene Deletions subtree This view displays the number of deletions for every gene stored in UCSC s Golden Path database 232 Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities CloneValues CGH Analysis Help A EE lt gt zg ST SOM FOM ST g HCL SOTA KMS cAST L File Annotations Links Stop Alterations s 2199254 2201012 10 Experiment Views 2213589 2225743 10 chromosome views 2226468 2228712 10 M Clust
120. a analysis gt lt ATTLIST mev version CDATA REQUIRED gt lt ELEMENT analysis alg_set gt lt ELEMENT alg set algorithm gt lt ATTLIST alg_set set_id CDATA REQUIRED input_data_ref CDATA REQUIRED gt lt ELEMENT algorithm plist mlist output data lt ATTLIST algorithm alg_id CDATA REQUIRED input_data_ref CDATA REQUIRED alg_name CDATA REQUIRED alg_type cluster cluster genes cluster experiments data visualization data adjustment cluster selection data normalization REQUIRED gt lt ELEMENT plist param gt lt ELEMENT param EMPTY gt lt ATTLIST param key CDATA REQUIRED value CDATA REQUIRED gt lt ELEMENT mlist matrix gt lt ELEMENT matrix element lt ATTLIST matrix name CDATA REQUIRED type int array string array FloatMatrix REQUIRED row_dim CDATA REQUIRED col_dim CDATA REQUIRED gt lt ELEMENT element EMPTY gt lt ATTLIST element row CDATA REQUIRED col CDATA REQUIRED value CDATA REQUIRED gt lt ELEMENT output_data data_nodet gt lt ATTLIST output_data output_class cluster output R EQUIR lt si ED single output multi multi cluster output multi gene xpteriment cluster output partition output ngle output indicates that the result is one set usually the result of normalization filtering or transform multi cluster output is produced by many clustering algorithms and 2
121. a set loaded in MeV The output is a list of biological themes represented in the cluster and a statistic reporting the probability that a particular theme is over represented in the cluster relative to it s representation in the entire data set The resulting table will initially be sorted by this statistic Slide Annotation Survey The survey mode simply produces a list of biological themes that are represented in the slide The initial ordering of the output table is based on the prevalence of a theme in the data set hit count This mode can be used to cluster genes based on biological themes The clusters can then be stored and marked colored for tracking during cluster analysis Population and Cluster Selection Page This panel provides options to specify a gene population list and a gene cluster list The top portion is devoted to selection of a population list and has two options for list selection The default selection is to use a population file which is simply a file containing all gene indices from which the cluster was selected 181 Often this includes all slide annotation or a large subset of the slide with bad data and control spots removed The file format is a simple list of indices with one entry per line The other option for population definition is to simply use all of the genes loaded into MeV It is often necessary to use the file to define the gene population because often the current viewer may not contain all gen
122. above tend to vary between positive and negative and the covariance tends toward zero m1 cov u v 255 Average Dot Product Average Dot Product can produce values which are unbounded This metric has been used to compute similarity between expression vectors which have been normalized such that all elements range in value from 0 to 1 and each vector has a norm of 1 m ue v m Spearman s Rank Correlation Spearman s Rank Correlation as the name implies ranks the expression within each vector based on increasing expression level Each vector in this manner is transformed to reflect the ordering of expression level If two elements have exactly the same expression then both elements get assigned to the same level falling 0 5 levels above the next lower level For example an expression vector containing five values 0 3 1 2 2 2 1 1 1 2 would have a ranking of 1 0 2 5 3 0 2 0 2 5 These ranking vectors are then used to compute the distance via u v 1 6d n n 1 CoTr Spearman where d gt x and y are the ranking vectors corresponding to u and v i l The spearman rank correlation makes no assumptions about the distribution of the data and the magnitude of expression becomes unimportant as ranking of expression level is used to determine the correlation Kendall s Tau Kendall s Tau is a measure of correlation based on the tendency of the tw
123. ailable TIGR Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities r1 TEASE HCL ST SOTA KMC USC ERs TTEST M SVM KON GDM EASE D Parameters Used qLo qUp Gammal Shift 5 361715 1 1327878148762403 0 21746908566 3 7714336613 3 3248009277 M Cluster Manager B 21000 0 3123364592653108 1 80817083637 2 85199129688 1 96755011603 miniter 1000 3 582178705777775 4 4649089523B 1 18011150920 0 75322089602 Mh Analysis Results 1 465261668295446 1 76705250215 2 46896385223 2 14494473054 B Data Source 56 Save Results 3 628651321699904 4 74642917375 0 92199390080 0 32830750358 E Rama 1 i RSE 2 79767227410946 2 63724165742 0 1346466172 0 0550285440 Rama 1 4107905412305417 0 52791026267 3 2163327510 2 7340790962 E 1 1590875887670546 0 0579825521 1 8177872710 1 3082593726 fa Script Manager 0 09555259863028284 0 76263110768 1 51243114544 1 17605766365 S History 0 36004984097494375 1 55543804702 0 18654148499 0 7758599009 0 040520190353625245 0 83533240149 4 98372135740 4 52164345148 0 5575748433177101 0 09678357275 0 3731033728 0
124. ails TIG o rimas Reset Cancel Submit sewer 8 1 5 Repository Selection Dialog Note that some repositories may require user accounts with passwords to be established before the submission can be enabled If a password is required a dialog will be provided to enter a user name and password for the repository Note that one can enter login information in the repository configuration file to have MeV remember the user name and password To enable this one can edit the archive submission config xml file in the config directory Open this file in a text editor and move to the repository entry for the specific repository to add user information Alter the user tag from lt user gt to lt user user_name your_name password password email your_email gt A sample of this tag is in the xml file as a guide Note that some repositories might not require an email or might in that case the email attribute can be removed from the tag 56 Cluster Archive Selection Dialog xi LOLA Repository Name LOLA Repository Web Site www lola gwu edu Description List Of Lists Annotated LOLA is a web driven database allowing researchers to identify andcorrelate significant subsets of genes derived from microarray expression profiling It is maintained by the George Washington University Genomics Core Facility http Jwww gwume edu microarray Unlike other databases LOLA serves as a common platform for analy
125. ain View M Cluster Manager 9 analysis Results Jul 21 2004 4 48 21 PM Script Result 1 9 Algorithm Set F Script Tree Input Data Input Data Node Primary Data 1 Number of Experiments 10 Number of Genes 458 input Data Viewers Results sam SAM Graph Delta table Expression Images A Centroid Graphs Expression Graphs E Table Views amp Cluster Information 8 General Information si Script Manager Script Table j Script 1 Fe Script Tree Viewer 22 Script XML viewer ST History TIGR MultiExperiment Viewer 12 13 Script Output Nodes on the Result Tree Single SAM run 217 If an algorithm fails to produces a data node that is a source data node for another algorithm set then that algorithm set using the null input is aborted and an empty node with a text label indicating empty source data is displayed f TIGR Multiple Array Viewer File AdjustData Normalization Distance Analysis Display Sort Help 0 x Meva View Cluster Manager 9 Analysis Results Jul 21 2004 4 48 21 PM Script Result 1 Script Result 2 9 Dialgorithrn Set 2 Input Data Results 9 algorithm Set Script Tree Input Data Results si Script Manager Script Table 9 JJ Script a E Script Tree Viewer 2 Script ML viewe
126. al Reset Cancel OK 11 25 1 Terrain Map Initialization Dialog Data Element Identification Data element identification can be achieved by four major methods The right click menu allows one to turn on labels The utility of the labels depends on how close the elements are to one another It is also possible to move the mouse cursor over an element and reveal the annotation label currently selected in the main Display menu Label submenu A third option is to click on an element which will open an element information table listing annotation for the selected element The fourth option is to select one or more elements in an area and either write them to a file or open a new mev session with the selected elements Details about element selection are provided below 172 1 File AdjustData Normalization Distance Analysis Display Sort Help ez ANOVA TTEST HCL ST GSH og Eu CAST KMC 4 Al GDM nly PTM ot SVM EASE TRN FOM g i3 B Main View M Cluster Manager 9 Analysis Feb 25 2004 11 13 28 PM Terrain 8 A Map 8 General Information Sj History Map position 0 24 TIGR MultiExperiment Viewer 11 25 2 Terrain Map with Navigation Control
127. alse significant genes should not exceed o ih Fast approximation but possibly conservative gt Complete computation possibly slow Calculate adjusted p values for false discovery control Hierarchical Clustering C Construct Hierarchical Trees for Significant genes lt All clusters Reset Cancel ewer 11 14 1 a and b TTEST Initialization Dialog Box Parameters One class Between Subjects Paired Panel In the one class design samples can be included or excluded by checking or unchecking the checkboxes next to each sample name The user can also specify the test mean In the between subjects panel the buttons permit each sample to be placed into group A group B or neither group 111 If an experiment is placed in neither group it will be ignored for the purposes the analysis Note that groups A and B must each have at least two members following the assignment The paired panel allows the specification of pairs of experiments Save Grouping Setting The save grouping setting button allows you to save the grouping or setting to file This is particularly useful when there are many experiments Load Grouping Setting This button allows you to select and load a saved grouping or setting Reset The reset button returns all of the settings to the original settings P Value Parameters This set of controls are used to indicate the method by which p values are determine
128. ample1 txt mevorig Affy sample2 txt mevwel bnpreprocess txt C bin BreastvsTumorData t E confi B 9 D 1777777 6 TIG Cancel oad 4 8 1 GEO SOFT File Loader 4 9 Loading GEO Simple Omnibus Format in Text SOFT two channel format File SOFT two channel format file can be loaded by selecting the GEO SOFT two channel file loader option from the list of available file formats to load On the left hand side of the dialog is a file browser Use this browser to locate the files to be loaded The desired format file can be selected from the list on the top of right hand side of the dialog The file will be displayed in a tabular format in the file loader preview table Since SOFT format is relatively flexible Columns may appear in any order after the first column ID column So users have to follow the instruction in red color given under the table to reorder their table by using mouse to drag the column to the position required Then click the cell in the table which contains the upper leftmost expression value in the file The platform format file can be selected from the list on the bottom of right hand side of the dialog Users can choose to load the platform The default value is not loading Then click button Load to load file 16 Expression File Loader r 0 x Select Definition to File Formats Selected File Type GEO SOFT Two Channel Format Files
129. and down buttons tilt the terrain toward and away from the pov The left and right button forces the terrain to rotate about the central point while staying constant relative to the horizon The corner buttons permit a tight rolling motion of the terrain relative to the pov TIGR Multiple Array Viewer I xl File AdjustData Normalization Distance Analysis Display Sort Help E oe m BB BE EE uz Au HCL ST SOTA RN GDM KMC KMS cast QTC GSH SOM FOM PTM M ANOVA Sum Mev View M Cluster Manager 9 Analysis Results Apr 8 2004 10 55 07 AM 08 Terrain 1 Map Control Panel 8 General Information Grid fia Script Manager History Fill Polygon History Log Element Shape Store cluster Launch New Session Deselect Show Elements Drift Map position 0 26 TIGR MultiExperiment Viewer 11 25 4 Terrain Map with Menu and Selection Area Visible Euclidean Distance Altitude Slider This moves the data points up and down relative to the terrain Sometimes it is preferable to have the points higher than the terrain to view the distribution 174 The Terrain Viewer Menu The terrain viewer has a right click menu providing several options for altering aspects of data presentation and for extracti
130. ange the tree s appearance and to reduce the complexity of the tree by imposing a distance threshold Elements on nodes which have distances below this threshold can be considered as one entity or cluster Consequently the lower level detail of the tree is ignored As the value is adjusted the corresponding HCL tree will have nodes below this threshold appear as light gray in color and a translucent wedge from that node to all enclosed elements will be draw on the tree This representation of the tree will persist unless the dialog is dismissed by hitting cancel The distance threshold can be entered into a text field or can be adjusted with a slider over the maximum range of inter node distances The number of terminal nodes clusters using the current distance threshold is displayed in the upper right quadrant of the dialog Tree Configuration Distance Treshold Adjustment Distance threshold 1 7950364 of Terminal Nodes g g Min Distance Range Max LI Create Cluster Viewers Tree Dimension Parameters Minimum pixel height 5 Maximum pixel height 10 TIG a Pip E Reset Cancel OK 11 1 4 HCL Tree Configuration Dialog Apply Dimensions The Create Cluster Viewers option allows you to create viewers based on the distance threshold This option collects groups of elements falling below terminal nodes in the tree using the current distance threshold The clusters of elements are represented as n
131. ank consery L fucul blank blank ul 14 14 10 1 Flat file output from Single Array Viewer 237 15 Appendix File Format Descriptions 15 1 Files The original TIGR ArrayViewer file type was an eight column tab delimited text format developed at TIGR for the purposes of storing the intensity values of the spots on a single slide It is written out by the program TIGR Spotfinder and contains one row for each spot The first six columns of the file contain positional data for the spots and are followed by two columns of intensity data These eight columns are required by MeV for display and analysis of experimental data Optional columns can contain flags annotation Genbank numbers etc It is the variability of tav files caused by these optional columns that make Preferences files necessary see section 15 13 Optional columns can be used to sort the spots in the Main View by choosing the appropriate column from the Sort menu A flag is simply a letter code corresponding to a description of the spot A 0 non saturated pixels in the spot B 0 50 non saturated pixels in the spot C 50 or more non saturated pixels in the spot X spot is rejected due to spot shape and intensity relative to background Y background is higher than spot intensity Z spot not detected by Spotfinder Microsoft Excel sample1 tav E i D x File Edit View Insert Format Tools Data Window Help fed ES 02
132. annotation columns which must occupy the left most columns of data in the file Each of these annotation columns contains annotation corresponding to the spot represented by that row of the file Each of the annotation columns has a label in the top header row to indicate the annotation type Expression Data uncolored section with numeric data The Expression data is arranged such that there is one column for each sample represented in the file The position of the expression data column for a particular sample is beneath the header s sample label for that sample See the TDMS file loader figure below as an example file displayed within the preview window Sample 2 Sample 3 Sample 4 Sample 5 placebo placebo placebo placebo placebo 239 ndu DH ubiquinc AAT bindinc P4 regulate B 2 101 102 103 104 105 F F M M M 0 246457 0 226136 0 27992 0 278945 0 416614 0 32316 0 007695 0 354105 0 32402 0 40812 0 202585 0 08618 0 32919 0 38246 0 29769 0 36597 0 253069 0 369742 0 14589 0 032014 0 033734 0 341731 0 371992 0 11216 0 48229 0 104424 0 1959 0 426171 0 275234 0 07836 0 02516 0 047777 0 36558 0 15541 0 401885 0 024793 0 10033 0 137568 0 10322 0 287513 0 463648 0 429508 0 23373 0 389819 0 204742 0 03916 0 00601 0 479826 0 095263 0 338572 0 265352 0 067 0 027316 0 09708 0 463785 0 21338 0 229778 0 18721 0 052883 0 119597 0 43006 0 057859 0 2
133. asScall lt write exp 55 data rs masSdata file affy_mas5 txt masb5calls data rs mas5Scall file affy_call txt Affymetrix GCOS Pivot Data File An Affymatrix GCOS file is a tab delimited text file that contains Affymatric Gene chip ID and several experiment datum In each experiment data it contains one column of intensity one column of detection call and one column of p value The first line is head line 246 15 8 Ri Microsoft Excel pivotData_simple 8 Fle Edt Insert Format Toos Data Window S PLUS Hep AdobePDF Type a question for help mz DERG SRY RA Soro r l Bm D 44g I Al M B C D E F 6 H M N 0 P Q R THR22 Signal Delect R22 Det HRO3 Sig HRS Det Det HR Deli HRS Sig HR25 DetHROS Det HAG Sip HRZE_DetiHRZ6 Det HR2 Sigi HRZ 2AFXBOE 501A 0 002566 7403A 0 003695 9534 0 00275 00484 3 AFFKBi 1069 2 P 0000509 14052 P 0000195 18079 P 0000195 426 1 P 0 000446 55 000141 14229 AFFKBoE 4813 0000857 7177P 000046 9978 000058 29 0002275 3509 0 002023 596 4 P 5 AFFKBiC 13878P 0000195 0087 P 000047 MP 0 000147 7799 0000258 1066 9 P 0000147 20496 P B AFFKBiC 11126 P 000011
134. assignments are balanced some cells have missing values for the gene F tests using the F distribution as opposed to using permutations are quite fast However missing values in the expression matrix will greatly slow down permutation tests The reason for this is if a gene has missing values it has to be permuted individually In the permutations values are randomly reassigned to cells making sure that the missing values remain in their original cell As each gene has to be permuted one a case by case basis the total number of permutations will be number of genes with missing values number of permutations On the other hand for those genes that have complete data the columns of the expression matrix are permuted and all of the permuted F values for those genes are computed at one go in a given permutation Thus the computation time for permutation tests is orders of magnitude less for a complete matrix than for one with significant numbers of missing values Thus the ideal data set for this kind kind of analysis would be one with balanced factor assignments and no missing values if you want to do permutation tests Designs with just one sample in each factor A B combination cell are also handled however in this case only the A and B factor main effects are tested Interaction is not tested in this case Unbalanced designs where one or more cells have only one sample or no samples are not tested for a
135. ata Transformations a 25 5 4 Data Filters Data Quality and Variance Based Filters 27 23 Data Source SOLE CH ON 32 0 Display sy issiron oani EARE EAER EEKE TENDRE EaR 34 6 2 Selecting Gene 36 6 3 Color Scheme 5 1 36 64 Setting Color Scale Limits 37 6 5 Element Appearance a 39 7 Viewer Descriptions 40 Zd PU 40 7 2 Expression Images a 40 Tidy Expression u 43 XUL Graphs s e MER ui 45 Z 5 7 47 Common Viewer cccccccceecececceseececcsseececcsceeccccseeccaauseececauseecesaaeneecees 49 8 Working with Clusters 51 8 1 Storing Clusters and Using the Cluster Manager 51
136. bmission capability for the LOLA Todd Peterson Luke Somers Jim Johnson Ernest Retzel Glynn Dennis Douglas Hosack Richard Lempicki Wei Gao Eric Albert Sally Gaddis Stephen C Harris National Center for Genome Resources Fox Chase Cancer Center Center for Computational Genomics and Bioinformatics University of Minnesota National Institute of Allergy and Infectious Disease NIH Laboratory of Immunopathogenesis and Bioinformatics Independent Development University of Texas MD Anderson Cancer Center FDA National Center for Toxicological Research 273 cluster repository GeneX Lite to MeV connectivity QTC algorithm optimization and bug fix Java WebStart configuration scripts and consultation to support using ANT technologies for MeV development EASEOpenSource Java package to support EASE development and helpful consultation during EASE integration and development Implementation of BrowserLauncher java utility to launch default web browser from within MeV Report and basic testing of Java3D support for mev 3D viewers on the Mac platform T test bug identification and suggestion for improvement of adjusted Bonferroni correction References These references and links to their PubMed records can be found in the main MeV toolbar under About Papers Publication Reference Ben Dor A R Shamir and Z Yakhini 1999 Clustering gene expression patterns Journal of Computational B
137. cal p value P Value false discovery Corrections The p values for each p can be adjusted to correct for the large number of observations genes and the increased possibility of considering a gene without a real significant change to be considered significant Alternatively a false discovery threshold can be set such that the number 112 or proportion of false positives in the significant gene list does not exceed a specified level with a certain confidence Just Alpha no correction Using this option the alpha is not altered Standard Bonferroni Correction In the standard Bonferroni correction the user specified alpha is divided by the number of genes to give the critical p value This is much more stringent than using an uncorrected alpha Adjusted Bonferroni Correction In the adjusted Bonferroni correction the t values for all the genes are ranked in descending order For the gene with the highest t value the critical p value becomes alpha n where n is the total number of genes for the gene with the second highest t value the critical p value will be alpha n 1 and so on The stringency of this correction falls somewhere between no correction and the Standard Bonferroni Step down Westfall Young MaxT correction Dudoit et al 2003 In this method the genes are ranked in descending order of their absolute t values and the adjusted p values are computed by an algorithm described in Dudoit et al 2003 False discovery
138. ch information for clusters that are too big or too small The default size is from 10 Population Color 74 Population selection Available only when in Cluster Analysis mode Please refer to EASE documentation for more information Assign Color Gradient Specify the upper and lower score for assigning color gradient Increasing upper and lower bounds will cause the gradient shifts to red whereas decreasing will shift it to blue Gradient can also be adjusted in the graphic view window after analysis is completed and the correct tree code is selected The default setting for upper bound is 0 1 and 0 00001 for lower bound Annotation Parameter Selections of annotation key annotation conversion file and gene ontology gene annotation gene linking files Refer to EASE documentation for more information Statistical Parameter Selections of reported statistics multiplicity correction and time parameters Refer to EASE documentation for more information Navigating the hierarchical tree Her uTtpIe Array Hower File AdjustData Metrics Analysis Display Utilities gt Sa E Ne A 5 Ale 88 Ed 8 hm A Z M ES a ee GDM EASE o o 400 01000 0 M Cluster Manager Analysis Results
139. cluster save the cluster and set several tree options Clusters set and named in this display can propagate to other displays Saving a cluster will display a dialog where a tab delimited text file containing the data for the highlighted cluster can be named The algorithm also produces a Node Height Graph which displays the number of terminal nodes in the tree given a particular inter node distance threshold Multiple Array Viewer b m ni xi File AdjustData Normalization Distance Display Sort Help os iss EM Az cast s 1 M PTM EE ee A TTEST MultipleExperimentviewer Main View 9 Analysis genes 9 HCL D Tree average linkage 3 Time 1332 ms D Euclidean distance D History TIGR MultiExperiment Viewer 11 1 1 Hierarchical tree with clusters selected 67 HCL Hierarchical Clustering Parameters Average linkage clustering v Cluster genes lt Complete linkage clustering v Cluster samples Single linkage clustering TIG are MES Reset Cancel OK 11 1 2 HCL Initialization Dialog Parameters Linkage Method This parameter is used to indicate the convention used for determining cluster to cluster distances when constructing the hierarchical tree Single Linkage The distances are measured between each member of one
140. cluster each member of the other cluster The minimum of these distances is considered the cluster to cluster distance Average Linkage The average distance of each member of one cluster to each member of the other cluster is used as a measure of cluster to cluster distance Note that this option in MeV actually is determined by a weighted average of distances of cluster members Example Consider the distance from node d to cluster a b c Unweighted Average Linkage d 4 d da p d d da cape 7 A 4 Fa 68 Weighted Average Linkage uM mE tdi _ m m Nodes on are weighted unequally where nodes deeper in the sub tree contribute less to the overall computed distance Complete Linkage The distances are measured between each member of one cluster each member of the other cluster The maximum of these distances is considered the cluster to cluster distance Cluster Genes Cluster Samples Options These checkboxes are used to indicate whether to cluster genes samples or both Default Distance Metric Euclidean Node Heights s K 5 k Distance 11 1 3 Node Height Graph Adjusting the Tree Configuration and Viewing Clusters A right click in the Tree Viewer will produce a menu which includes an option to 69 alter the displayed tree Tree Properties This option permits the user to ch
141. column SR Sub row SC Sub column FlagA TIGR Spotfinder flag value in channel A FlagB TIGR Spotfinder flag value in channel B SAA Actual spot area in pixels in channel A SAB Actual spot area in pixels in channel B SFA Saturation factor in channel A SFB Saturation factor in channel B QC Cumulative quality control score QCA Quality control score in channel A QCB Quality control score in channel B BkgA Background value in channel A BkgB Background value in channel B SDA Standard deviation for spot pixels in channel A SDB Standard deviation for spot pixels in channel B SDBkgA Standard deviation of the background value in channel A SDBkgB Standard deviation of the background value in channel B MedA Median intensity value in channel A MedB Median intensity value in channel B AID Alternative ID The first seven fields UID IA IB R C MR and MC are required as specified above This flexible format allows users to track slide specific data of interest such as background spot size and alternate intensities without requiring them of all users or adopting a limited vocabulary of field names This header row serves to identify the required and additional data columns UID must be the left most column in the mev file Other columns do not need to be present in a fixed order For mev files generated at TIGR the UIDs may be of the form database name spot id eg cage 20238 For any given microarray database the id field in th
142. culated Means 12 gt 0 692 0 763 Iterations 25 13 gt 0 707 0 859 Max Iterations 50 14 gt 0 508 0 549 Average true 15 gt 0 665 0 599 TRN EA Mean Adjusted FOM Time 67737 ms Euclidean distance Script Manager S history History Log Number of Clusters FOM Graph following 25 Iterations of KMC with K 0 to 15 193 11 28 BRIDGE Bayesian Robust Inference for Differential Gene Expression Gottardo et al 2005 Test for differentially expressed genes with microarray data This package can be used with both cDNA microarrays or Affymetrix chips The package fits a robust Bayesian hierarchical model for testing for differential expression Outliers are modeled explicitly using a t distribution The model includes an exchangeable prior for the variances which allow different variances for the genes but still shrink extreme empirical variances Parameter estimation is carried out using a novel version of Markov Chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space Bridge Initialization Dialog Data Type 2 Color Data C Intensity Data For each slide mark the Control Sample s dye color Cy3 5 _01 o e HIV_02 mev e HIV 03 mev e D HIV_04 mev e _ Advanced Parame
143. d for each gene and allows the input of the critical p value p values can be computed either from the theoretical t distribution or from permutations of the data for each gene between the two groups p values based on t distribution Using this option a gene s p value is taken directly from the theoretical t distribution based on the gene s calculated t value p values based on permutation Using this option a gene s p value is determined by forming a distribution based on permutations of the data for that gene For the one class t test in each round of permutation some of the values in the expression vector are picked at random to be replaced by the following quantity original value 2x original value hypothesized mean Thus the randomized vectors have some of their elements randomly flipped about the hypothesized mean For the between subjects t test the permutations allow each value in the expression vector in group A or group B to be randomly placed into either group the size of each group is conserved t values are constructed following each permutation to construct a distribution which is used to generate p values for each gene based on its t value If permutations are used two buttons allow you to select to permute the values a number of times indicated or to permute the values a number of times equal to the maximum number of permutations possible Critical p value This text field allows you to enter the alpha or criti
144. d radius see above to determine which surrounding SOM nodes are in the neighborhood and therefore are candidates for adaptation When this option is selected the Alpha parameter for scaling the adaptation is used directly as provided from the user Gaussian This option forces all SOM vectors in the network to be adapted regardless of proximity to the winning node In this case the Alpha parameter is scaled based on the distance between the SOM vector to be adapted and the winning node s SOM vector Topology Indicates whether the topology should be rectangular or hexagonal If rectangular topology is selected the node to node distance is determined as Euclidean distance within the two dimensional x y grid If hexagonal distance is used an appropriate formula is used to determine the distance given the coordinates of the two nodes Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created Default Distance Metric Euclidian 99 11 11 GSH Gene Shaving Hastie et al 2000 The clusters that are created by this method differ from the results of other clustering algorithms in several ways Clusters are constructed such that they show a large variation across the set of samples and small variation between the expression levels of the individual genes Each cluster is independent of the others and they may overlap other clusters each gene may belong to several clusters or
145. deded dmg file A new volume will be mounted containing the installer Run the installer by double clicking on R2 1 1 mpkg You can place the Application anywhere you like but the R framework is probably installed into Libraries Frameworks R framework versions x y z resources Installing under Windows 1 Download R Get the Windows precompiled Binary Distribution from http cran fherc org 2 Install by double clicking the downloaded installer You can install anywhere but remember where 261 3 Installing RAMA BRIDGE Installing under OS X using the supplied OS X GUI interface 1 Click on Packages amp Data in the menu bar Select Package Installer 2 In the dialog box that appears choose a Repository Click on the pull down menu probably displaying CRAN binaries and change it to Bioconductor binaries 3 Click Get List 4 Click on rama or bridge to highlight it 5 Click Install Selected 262 _ _R Package Installer CRAN binaries CRAN sources v BioConductor binaries BioConductor sources Other Repository specify format Packages Package Search Local Binary Package Local Source Package sion g Repository Version Local Package Directory 1 1 0 II nell 133 OLINgui 1 0 1 ontoTools 1 4 8 pairseqsim 1 0 4 pamr 1 25 pickgene 1 0 0 pixmap 0 4 2 plasmodiumanophelescd 1 5 1 plgem 1 0 0 prada 1 3 1 PROcess 1 3 4 qtl 0 99 24 0 qvalue ELO R2HTML 1 54 rae230a 1 8 4 rae
146. delimited format similar to the tav format The first lines of the report contain the name of the original input file and the normalization method used bsp30025a001_report txt Notepad File Edit Search Help Report for SlideFile bsp38825a8881 Normalization Iterative Log Thresholds 8 25 8 75 Column MetaRow MetaColumn SubRow SubColumn Cy5 Cy5 Cy3 Plate Well Feat name Locus Common Name 2 1 2 1 133186 31977 6 2466928 6254681423 36486 ORF 88238 ORF 66238 86283 295625 3 426225328286194 36496 ORF 82658 ORF 82658 18732 71427 3 8131805765534915 36489 ORF 66171 ORF 66171 817686 8912752 10 899968936731215 36489 ORF 81391 ORF 81391 50196 1818544 36 228862857598216 36486 ORF 62553 02553 137002 1512462 11 639269499715333 30486 ORF 62344 ORF 62341 382226 2104471 5 505915441368845 36487 ORF 66527 ORF 00527 188151 28178 6 2604691588612218 36485 ORF 62464 KO ORF 62464 38371 219357 7 222580751375666 38488 ORF 61636 ORF 61636 419257 1583533 3 776998356616586 36486 ORF 61653 ORF 61653 382238 3137163 10 380051616318697 36496 ORF 62365 ORF 62365 25636 84253 3 2865111561866125 36496 ORF 80579 ORF 68579 18684 61818 3 418386889183869 36485 ORF 88896 ORF 66896 34997 107820 3 08083550018573 36487 ORF 61589 ORF 61589 64262 267589 4 167923117659886 36485 ORF 66876 ORF 66876 267145 1376204 5 151524453612469 38489 ORF 66931_KO ORF 66931_KO 15248 116259 8 163553378786516 364968 ORF 61914 ORF 61914 1674688 7653268 4 563996426873543
147. dication of data source change a node is placed on the result navigation tree to indicate the source of the data and the number of genes and samples in the selected data set Subsequent analysis runs will use only this data subset until a new data source is selected 32 TIGR Multiple Array Viewer TOR main view Data Source Selection Information E Cluster Manager Analysis Results 2 Data Source Path Analysis Results KMC genes 1 Expression Images Cluster 3 Mar 8 2005 9 26 55 AM M NunberofCenes 51 Data Source Selection Number of Samples 10 KMC genes 1 a Expression Images Cluster 1 B cluster 2 a Cluster 4 Cluster 5 Centroid Graphs Expression Graphs Ej Table views amp Cluster Information General Information B Data Source Selection s Script Manager E History GR MultiExperiment Viewer 5 5 1 Data Source Selection Node source node is marked with the green border 33 6 Display Options The graphical display of data and analysis results is one of MeV s strongest assets This section of the manual describes options that include selection of gene and sample annotation to display adjustment of expression image color schemes and adjustment of expression image element size 6 1 Sample Annotation Changing Sample Labels Samples in MeV can be labeled in various ways to indicate natural groupings based on experimental d
148. display can be rotated and shifted by left dragging or right dragging respectively Right clicking on the 3D view node will display a popup menu that allows the user to change the 3D view s display options and create a selection area to define cluster The points are projections of the elements being classified into 3D space using the first three expression components generated during MPLS determination of representative components 140 TIGR Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities los a ae d EE als EE BE e x ES Meva M Cluster Manager 9 h Analysis Results Mar 3 2005 10 22 37 AM 8 Data Source Selection DAM samples 1 a Expression Images classifiers Classified 9 classifiers Classified B class 1 B class 2 a Used Gene Expression a Unused Gene Expression N Centroid Graphs 2 Expression Graphs EB Table views Component 3D view e Cluster Information General Information samples 2 si Script Manager 13 History IGR MultiExperiment Viewer 11 20 4 3D Component Projection View 141 11 21 LEM Linear Expression Maps The Linear Expression Map module produces a viewer to display locus level expression information organized by chromosomal location The LEM aligns the sampled loci from multiple samples in a single viewer so that expression patterns for loci and groups
149. e txt Expression Table YORF NAME GWEIGHT GenBank EWEIGHT CI CGH C CGHBuilder2 Z CJ EASE raw metabolism AM60128 _ AA775447 R91803 Al460128 T80924 AA704242 N78022 AAD45320 452988 2 Z 2 AA757429 0 2464568 0 22613566 0 2799177 AA156571 0 3231618 0 0076951 0 3541049 I286022 0 2025847 0 0861757 0 3291901 AA521292 0 3659734 10 2530692 0 3697424 AISA4681 0 03373382 0 34173125 0 3719920 05237 0 1044241 0 19591 485752 0 02515574 0 0477766 5 H81821 0 0247928 0 1003257 0 1375680 T46924 0 4636477 0 4295079 0 2337320 W24076 0 0391608 0 0060148 0 4798256 75 0 2653523 0 066999 0 0273161 _ z L18 AD B co po E0 Fo Go HO Jo Al 2 Ad A5 AB Ag ATO golub_preclass_2_classes tx golub preclass 3 classes tx HT acute and chronic 4 ti matrix save 154 txt savek txt savek a13 txt savek a13 20 txt Stanford 1 38 and Ind 39 up Stanford Large txt i Stanford Population Annotati config 9 data 7 C3 agilent ease CJ scripts C vus Ag 7 devel 7 documenta lib C MeV 3 0 CJ preferences C MevMet C MeVNotes Papers Program Files RECYCLER C SunOne System Volume Ini 7 Temp J WINDOWS C wuTemp Click the upper leftmost expression value Cl
150. e correlation coefficient between genes by comparing the expression pattern of each gene to that of every other gene The ability of each gene to predict the expression of each other gene is measured as a correlation coefficient Genes are represented as nodes in a network and edges are drawn between them if their correlation coefficient falls between the minimum and maximum thresholds specified in the initialization dialog The experiment subtree created by this module contains information regarding the networks predicted Under the Network tab is a graph of all of the subnets generated fig 10 5 1 A subnet is a group of genes in which each gene is connected to at least one other gene The Relevance Subnets tab contains network diagrams for each of the individual subnets and the Expression Images folder contains expression views for the genes in each of them REsTIGR Multiple Array Viewer D File AdjustData Normalization Distance Analysis Display Sort Help ie V A 02 58 2 erl I em Ax HCL ST SOTA RN KMC KMS cast QTC GSH SOM PTM M TrEST SVM A MultipleExperimentViewer D view 9 Analysis 9 C3RelNet I Expression Images 9 Relevance Subnets 3 Subnet 1 11 3 Subnet 2 8 C Subnet 2 Subnet 4 2 C Subnet 5 2 3 Subnet 6 2 3 Subnet 7 2 I Network 7 General Informatio
151. e known members of the class using the selected algorithm and parameters Note that the default is to save these selections to a file when exiting the editor If the settings have been set previously and saved you can choose to load or apply those settings from the editor s File menu This will apply the saved settings but you will still have the ability to alter class memberships before proceeding 139 DAM Classification Editor DX Edit Tools ApplyFile Class Neutral iment 0 C amp CO KD dq JOJOOOOOOUOFI amp Save classification to file C Do not save classification to file ay 11 20 3 DAM Classification Editor DAM Result Viewers When the DAM module has run to completion a sub tree labeled DAM will be created and placed under the Analysis tab in the navigation tree The tabs within the DAM sub tree contain results of the module s calculations The results include Expression Images Centroid Graphs Expression Graphs Gene Component 3D view Cluster Information and General Information Of these viewers all have been described previously except the Gene Component 3D view The Gene Component 3D view is a three dimensional view representing the 3 most significant gene components obtained from Dimensional Reduction The
152. e metabolism 1 14 1325 8 896E 11 2 GO Biological Process GO 0046034 ATP metabolism 14 325 8 896E 11 TIGR MultiExperiment Viewer EASE Result Table Save EASE Table Stores the result table to a tab delimited file Open Web Page Opens the default web browser using the URL associated with the theme file e g KEGG pathway txt using the available accession or index If the accession numbers are not available in the Tags directory or no URL file has been entered in the URL Data directory this feature will be disabled GO Hierarchy Viewer The GO Hierarchy Viewer is a hierarchical representation of GO terms resulting from an EASE cluster analysis Each of the GO terms found in the cluster under analysis is represented by a node in the hierarchy and as one descends down a path in the hierarchy the terms represented by the nodes in the path tend to become narrower in scope in terms of identifying a particular biological theme The color of each node represents the theme s p value relative to user defined thresholds 189 e TIGR Multiple Array Viewer File Adjust Data Metrics Analysis Display Utilities Help avay M H 2 8 integral to membrane p lt 0 05 p gt 0 05 J p 2 501E 05 26 118 IGR MultiExperiment Viewer EASE GO Hierarchy Viewer with Selected Path Non verbose nodes The header represents the currently selected node in verbose rendering or the hierarchy s ro
153. e of the following formats for the header row depending on the origin of mev file The non required columns i e anything after the 7 column may be rearranged and their names are subject to change at this time 1 Database created mev file UID X IA t IB X R t C MR X MC X SR t SC t FlagA X FlagB t SAA X SAB X SFA X SFB V QCS t QCA X QCB v BkgA v BkgB UID Unique identifier for this spot IA Intensity value in channel A IB Intensity value in channel B R Row slide row C Column slide column MR Meta row block row MC Meta column block column SR Sub row SC Sub column FlagA TIGR Spotfinder flag value in channel A FlagB TIGR Spotfinder flag value in channel B SAA Actual spot area in pixels in channel A SAB Actual spot area in pixels in channel B SFA Saturation factor in channel A SFB Saturation factor in channel B QC Cumulative quality control score QCA Quality control score in channel A 242 QCB Quality control score in channel B BkgA Background value in channel A BkgB Background value in channel B 2 Spotfinder created mev file UID X IA C IB Cc R t Ct MR MC t SR t SC X FlagA X FlagB t SAA X SAB X SFA X SFB V QCS QCA V QCB v BkgA V BkgB X SDA X SDB X SDBkgA t SDBkgB t MedA t MedB t AID UID Unique identifier for this spot IA Intensity value in channel A IB Intensity value in channel B R Row slide row Column slide column MR Meta row block row MC Meta column block
154. e processing seamless There are not any naming conventions for annotation files at this time If such a standard is introduced in the future it will be detailed here Bioconductor MAS5 Files A Bioconductor MASS expression file is a tab delimited text file that contains Affymatric Gene chip ID and several columns of expression datum The header line contains all CEL file names you use in the Bioconductor calculation Sample A CEL Sample B CEL Sample C CEL 1053 at 435 013780957768 488 838904739281 435 013780957768 245 15 7 117_at 121_at 1255_g_at 1294 at 1316 at 1320 at 44 1563783495161 1222 87243433892 58 672587649119 336 502704708535 163 254451578192 114 309125038560 88 9787051028434 698 900718145166 47 0203292843508 335 169154361150 92 4927417614044 105 223093019586 44 1563783495161 1222 87243433892 58 672587649119 336 502704708535 163 254451578192 114 309125038560 A Bioconductor MASS call file is tab delimited text file that contains Affymetrix Gene chip ID and several columns of present absent detection for corresponding expression file Sample_A CEL 1053 at 117 at A 121 at P 1255 g atA 1294 at P 1316_at P 1320_at A Sample_B CE B P A A P P P P A p Sample C CEL 49 tu Prt Users can use following scripts to generate above files by using Bioconductor library affy data ReadAffy masSdata lt write exp m
155. e spot table will be unique The combination of database and spot id 243 will therefore uniquely identify any spot on any array created at TIGR It is important to note that this is not enough information to distinguish between spots in the same location on two slides of the same slide_type as this would typically require an analysis_id Since annotation data is based on slide_type it is not necessary to make this distinction as all slides of a given type will use the same annotation file The AID column will usually contain an incremental sequence of numbers starting at 1 These can be used to return the file to the original sorted order and can function as a unique row identifier if necessary Applications that generate files of expression data commonly in tav format by retrieving records from the database access the spot table TIGR Spotfinder Midas and Madam are all capable of generating UIDs of the form described above in addition to the typical coordinate and intensity data mev files are required to end with the extension mev At this time there are further naming conventions for mev files 15 5 Annotation Files ann An annotation file is a tab delimited text file containing annotation data for a specific slide_type mev files can be associated with an annotation file only if both types of files are based on the same slide_type The keys to this association are the unique ids in both files Rows of mev and annotat
156. e tab interface Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in the resulting matched and unmatched element sets Default Distance Metric Pearson fixed will not correspond to the distance menu Template matching is particularly useful when the researcher is searching for a specific expression pattern Applying this method with the input parameters in the previous figure gives the following output where the first panel on the right 107 corresponds to genes that matched the template and the second panel to genes that did not match GR Multiple Array Viewer File Normalization Distance Analysis Display Sort Help s BE RR a amp che 8 fon PN Sew MultipleExperimentViewer D Main view 9 Analysis PTM genes 7 Expression Images 3 Centroid Graphs Expression Graphs 3 Matched Genes D Unmatched Genes Dall Clusters 3 Cluster Information 3 General Information B History TIGR MultiExperiment Viewer 11 13 2 PTM results Expression Graphs 108 11 14 TTEST T tests Dudoit et al 2000 Pan 2002 Welch 1947 Zar 1999 Korn et al 2001 2004 Three t test designs are implemented one sample paired and between subjects In the one sample design the user specifies a mean Each gene whose mean log2 expression ratio over all included samples is significantly different from the user sp
157. ecified mean is assigned to one cluster while those genes whose means are not significantly different from the user specified mean are assigned to another cluster To exclude a sample from the analysis uncheck the box next to that sample s name in the left pane of the one sample screen In the between subjects design samples can be assigned to one of two groups and genes that have significantly different mean log2 expression ratios between the two groups are assigned to one cluster while the genes that are not significantly different between the two groups are assigned to another cluster The user may choose to exclude some samples from the analysis which can be done by selecting the neither group option for those samples in the initialization dialog see screenshot below For the between subjects t test we can use the Welch t test for small samples with unequal variances in the two groups Welch 1947 or assume equal variances In the paired design samples are not only assigned to two groups but there is also a one to one pairing between a member of group A and a corresponding member of group B e g gene expression measurements on a group of subjects where measurements are taken before Group A and after Group B drug treatment on each subject T values are calculated for each gene and p values are computed either from the theoretical t distribution or from permutations of the data for each gene Whether a gene s mean ex
158. ed and transforms them to log base 2 i e it assumes that the input data is in the form log10x and it outputs log2x Log2 to Log10 This assumes that the current data are log 2 transformed and transforms them to log base 10 i e it assumes that the input data is in the form log2x and it outputs log10x Unlog2 transformation 26 This assumes that the current data are log2 transformed and removes the log2 transformation 5 4 Data Filters Data Quality and Variance Based Filters Lower Cutoffs Select Use Lower Cutoffs to exclude from analysis any genes for which the expression values are lower than specified values There are two options in this menu One is for two color arrays and one is for single color arrays For two color arrays select Low Intensity Cutoff Filter to set either the corresponding Cy3 or Cy5 columns To enable this option check the Enable Lower Cutoff Filter checkbox just below the Set Lower Cutoffs menu option and uncheck it to disable this option All subsequent analyses will include only those genes for which all and Cy5 values are above the specified thresholds This option is disabled by default Set Lower Cutoffs Enable Lower Cutoff Filter Cy3 Lower Cutoff 0 0 0 0 5 Lower Cutoff 0 0 00 co Reset Cancel OK 5 4 1 Lower Cutoff Filter Dialog for two color arrays For single color arrays select Low Intensity Cutoff Filter to set
159. ed with all genes set to neutral Hierarchical Clustering This checkbox selects whether to perform hierarchical clustering on the elements in each cluster created Default Distance Metric Euclidean for finding nearest neighbors Pearson for Correlation test Fixed will not correspond to distance menu 134 xl Classification Editor E 4 O E OOOO 0800000000 0000 O L1 1 O 0000000000000 o m OK NO tw CO OO e NOM to Om OM OO NO CO CN eN eN eN CN cy qd amp Save classification to file CQ Do not save classification to file KNN Classification editor 11 19 2 Genes be assigned to any of the specified classes by checking the box in the appropriate column for that gene Genes designated as Neutral will be classified in subsequent steps whereas those a
160. ed within a cluster All resulting clusters will fall below this level of diversity mean gene to cluster centroid distance if diversity is used as the cell division criteria Unless Max cycles are reached at which time some clusters may still exceed this parameter Min Epoch Error Improvement This value is used as a threshold for signaling the start of a new cycle and a cell division The tree diversity is monitored during a training epoch and when the diversity fails to improve by more than this value then training has been considered to have stabilized and a new cycle begins Run Maximum Number of Cycles unrestricted growth The algorithm will run until Max Cycles or until all of the input set are fully partitioned such that each cluster has one gene or several identical gene vectors Centroid Migration and Neighborhood Parameters Migration Weights These values are used to scale the movement of cluster centroids characteristic gene expression patterns toward a gene vector which has been associated with a neighborhood When a gene is associated with a cluster the centroid adapts to become more like the newly associated gene vector The parent and sister cell 82 Neighb migration weights should be smaller than the weight for the winning cell Cell to which the gene vector is associated orhood Level This value determines which cells are candidates to accept new expression elements When elements are considered for redistributi
161. ee can be used as input data to new algorithms The other option for cluster selection is Centroid Entropy Variance Ranking Cluster Selection This method places either a variance or an entropy value on the cluster s centroid mean expression pattern While the previous method selects tightly constructed clusters this method focuses on finding clusters having variable centroids The selected clusters are clusters that have a lot of variability on average over the expression measurements The clusters are ranked on decreasing centroid entropy or variance and clusters are selected with the highest centroid variability The selected clusters must also pass a minimum cluster population A Cluster Selection Information Viewer is created to describe the selection process This viewer is similar to the viewer pictured for the Diversity Ranking Cluster Selection algorithm Two options are available to describe centroid behavior variance and entropy 1 Centroid Variance 211 5 k 9 5 V gt c x Vc is centroid variance where x is the centroid mean is i l the ith centroid value of m values 2 Centroid Entropy Entropy in this case describes the dispersion of expression values within the expression limits of the centroid Centroid values are binned within 10 bins evenly dividing the expression value range for the centroid P x is the fraction of centroid points falling in bin x pG log p x b
162. eeman and Co NY Kim S K J Lund M Kiraly K Duke M Jiang J M Stuart A Eizinger B N Wylie and G S Davidson 2001 A Gene Expression Map for Caenorhabditis elegans Science 293 2087 2092 Kohonen T 1982 Self organized formation of topologically correct feature maps Biological Cybernetics 43 59 69 Korn E L J F Troendle L M McShane R Simon 2001 Controlling the number of false discoveries application to high dimensional genomic data Technical report 003 Biometric Research Branch National Cancer Institute http linus nci nih gov brb TechReport htm Korn E L J F Troendle L M McShane R Simon 2004 Controlling the number of false discoveries application to high dimensional genomic data Journal of Statistical Planning and Inference 124 379 398 Manly B F J 1997 Randomization Bootstrap and Monte Carlo Methods in Biology 2 ed Chapman and Hall CRC FL Margolin AA Greshock J Naylor TL Mosse Y Maris JM Bignell G Saeed AI Quackenbush J Weber BL 2005 CGHAnalyzer a stand alone software package for cancer genome analysis using array based DNA copy number data Bioinformatics 21 15 3308 11 Nguyen D V D M Rocke Multi class Cancer Classification via Partial Least Squares with Gene Expression Profiles Bioinformatics 18 9 1216 1226 2002 Pan W 2002 A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray exper
163. elements in a cluster For instance if 10 KMC runs were run and the percentage was 80 then a pair of expression elements found together at least 8 times would be considered to pass a criteria to be included in a cluster Number of Clusters K This positive integer value indicates the number of clusters to be created during each KMC run Note that for K Means support the final number may turn out to be slightly smaller or larger than this entered value depending on the nature of the input data and the appropriate selection of K number of clusters to create Note that FOM can be used to estimate an appropriate value for K Number of Iterations This positive integer value is the maximum number of times that all the elements in the data set will be tested for cluster fit On each iteration each element is associated with the cluster with the closest mean or median Note that a KMC run will terminate when either no elements require migration reassignment to new clusters or when the maximum number of iterations has been reached Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created Default Distance Metric Euclidean 91 KMS K Means K Medians Support Sample Selection Cluster Genes C Cluster Samples Means or Medians Calculate means Calculate medians Parameters for K Means K Medians repetitions Number of k means k medians runs Threshold o
164. enes having an SD greater than this value are selected Variance Filter Filter Settings vj Enable Variance Filter amp Percentage of Highest SD Genes 1 100 Number of Desired High SD Genes SD Cutoff Value Value 50 TIGR ess 77 Reset 5 4 4 Variance Filter Dialog Detection Filter for Affymetrix Data w Detection Calls Only Select Use Detection Filter to ignore genes that are not marked present in enough samples Select Set Detection Filter to divide the samples into two groups Enter the number of times that a gene must be called as present in group A Do the same for group B Select AND so that each gene must pass 29 both criteria Select OR so that a gene only must pass one of the criteria in order to be used in further analysis NGR Mullitixperimentviewer p Vile Adjust Data Normalization Distance Analysis Display Sort Heip Det Fiter Num Required Crp 1 15Num R 22283 genes wit used in subsequent anadyses Group Assignments ERAS PLBCINT Group A Group E L145C FAI TXT Group A Group F145T PO6C TXT Group A Group E12 P36C TXT Group A Group E ELIST PIACTXT Group A Group E E145T PAF TXT Group A Group E FIASC POACTXT Group A Grup e EX TT PHETXT Group Group E EIAST POGF TXT Group A Group E E145 T F20C TXT Group A Group E T145T P20fTXT Group A Group E EXHT PHCTXT Group A Group ELISC P20C TXT Group A
165. ents Validation Selection 1 Enable Validation amp AD 2 A1 Reset Cancel Next 11 20 1 DAM Initialization Dialog 137 Parameter Selection The Classification Selection panel provides two options for DAM classify genes or classify experiments The Data Screening panel allows the users to select to apply a filter to select genes or samples if classifying genes from the loaded set that should be near optimal for partitioning the elements to classify based on Analysis of Variance ANOVA on permutations of the known training class members The list of selected elements and number of selected elements will be reported after classification and the alpha value can be adjusted to apply a more or less stringent criterion and will impact the number of elements selected to enter the classification algorithm The Classification Algorithm Selection panel is for selecting either between PDA and QDA as the primary classification algorithm The DAM Classification Parameters panel has fields to enter the number of expected classes that should be found The data will be partitioned into this many classes The number of components indicates the number of representative expression vectors that should be generated from the data using MPLS These components can roughly be described as components that represent major features of the data or correlate to variance found in the data These components once determined are actua
166. er data description Common name or other details about the experiment An example of the leading comments version V3 0 format version V4 0 date 04 20 2004 analyst jwhite created by Database script gi version 3 0 slide type IASCAGI output row count 32448 description Standard annotation file The header row consists of the field names for each subsequent row in this file Only the UID field is required It must be the first field present and it must be named UID Any number of additional fields may be included Annotation files created at TIGR will always contain the following columns UID unique identifier for this line of annotation R row slide row C column slide column The remaining fields may vary and a standard set has yet to be determined Such a list will be published on a future date R and C have been included to allow for manual alignment of the mev and corresponding annotation files in the event that the mev files were not generated in a traditional manner ie using Madam etc Some varieties of annotation files follow The format may vary depending on the purpose of the file UID t R t C X FeatN X GBNum X TCNum V ComN V UID X R X C X GeneN V Rxn N PathwayN V UID R X C X FeatN X End5 X End3 N ChrNum V Of course it would be possible to combine the fields of these files or add fields that have not been mentioned here The goal is to keep the annotation flexible and th
167. er Manager 14873511 14887680 35739487 75740703 O May 2008 26249446 _ 25295121 8 Data Source Selection 26164227 26169113 Bi GeneAmplifications 1 26239788 26277134 Results 26381611 26877398 9 General Information 26949384 26982186 29715923 29774317 Amplification Threshold 0 8 29790564 29853264 Deletion Threshold 0 8 34713582 34807895 Amplification 2 Copy Threshold 1 0 34813861 34835563 Deletion 2 Copy Threshold 1 0 30805928 396334514 amp i Script Manager 36408298 36439067 36508753 36512974 amp History 36534899 36537124 36643251 36650518 99780981 99795131 99934261 99995924 44340853 44424541 44844783 44851371 44874939 44876452 148566108 148590021 152547650 152640045 152642372 152667908 168542294 169113513 169884527 169908128 170178142 170189603 170201259 170303837 170309131 170370658 20 co co co 02 lultiExperiment Viewer Gene amplifications on the dataset Loading a Gene List Selecting the LoadGeneList item in the CGH Analysis menu will calculate the number of amplifications and deletions for ever
168. er formats this annotation should be found in the data file where each locus id is on the row corresponding to the associated spot information annotation and expression data Please see the manual section 4 Loading Expression Data for information on file loading for specific file format loading instructions The Coordinate Information and Chromosome Information can likewise be imported with the annotation during the expression file load or a chromosomal coordinate file can be used to supply the coordinate and chromosomal information This optional chromosome coordinate file has a simple tab delimited format described below Optional Chromosomal Coordinate File In the case where chromosomal information is not supplied during the initial MeV file load an auxiliary coordinate file can be supplied for LEM construction The format of the file is a text tab delimited format with a row for each locus The columns have the following order indicates optional column lt Locus ID gt Chr ID lt 5 end gt lt 3 end gt The file should not have a header Spots in the loaded data set will be ignored if they map to loci for which location information is incomplete or missing The LEM Initialization Dialog The initialization dialog collects information that indicates where the critical information for LEM construction resides and provides information about the nature of the data 143 Linear Expression Map i Locus Identifier Selec
169. er is created using the targeted metablock as the new dataset In this way the user can focus on a particular defined area of the slide The elements of the array can be sorted by location row and column ratio or any of the additional fields that were specified in the preferences file These sort options are available in the Sort menu Sort by location is the default sorting method Differentially expressed genes can be identified by checking the Expression Ratio checkbox on the panel on the left side of the window The slider below the checkbox controls the expression ratio used to determine differential expression When the checkbox is checked only those genes which have one intensity value greater than the other by a factor greater than or equal to the expression ratio will be displayed Other genes will be blacked out For example a ratio of 2 0 will exclude genes where the two intensities do not differ by a minimum factor of two The array representation can be saved as an image file or sent to a printer Select Save Image from the File menu and choose a name and graphic format in the dialog that appears To print the image select Print Image from the File menu and set up the printer dialog To write a flat file as output select the Generate Report item from the File menu A save file dialog will be displayed prompting for a name for the report file This report will contain the data for each spot that is currently visible in a tab
170. ere used during this classification 3 GLBWVIHH ELBSWVIHH CLSWVAuH USC Heatmap 11 29 7 Classify From File 203 Having saved the results of a classification you may want to test experiments without the time intensive Cross Validation step It is important that you use different sets of experimental samples in the training and test phases Keep in mind that you can only test experiments that are of the exact same chip type as the training experiments If you would like to experiment with different values for Delta and Rho you can easily change them in the Training Result File Delta 0 8 BLANK ALL1 ALL2 2 Rho 0 8 BLANK ALL ALL 3 AFFX HUMISC 0 2 2 _4 AFFX HUMISC 1 2 3324385 2 064458 5 AFFX HUMISC 2 2 9014583 2 636488 6 AFFX HUMRG 3 4 1625047 2 788875 7 AFFX HUMRG 4 3 9884698 2 0606978 8 AFFX HUMRG 5 3 9308982 3 1812718 11 29 8 Training Result File 204 12 Scripting The scripting capabilities within MeV permit the execution of multiple algorithms to be performed without user oversight or intervention once processing begins The execution steps are dictated by a user defined script that describes the parameters to use for the selected algorithms Scripting in MeV allows one to document the algorithms run and the selected parameters during data analysis The script document can be shared with collaborators so that analysis steps can be
171. erroni Step Down Correction This modified Bonferroni correction ranks the results by the statistic in ascending order Each value is multiplied by n rank where n is the number of results In the case of a tie where two results have the same probability the rank is kept constant until the next element occurs having a higher probability value The rank is then adjusted for the number of tied elements where rank was constant Sidak Method This correction uses the following formula where v is the corrected value and k is the rank of the result in terms of original statistic value In this case ties in rank are handled as described in the step down Bonferroni correction described above v 1 1 v Sidak method formula 186 Resampling Probability Analysis The resampling option performs a number of resampling iterations where random gene lists of the initial input size are selected from the population without replacement and run through the analysis The result for each biological theme is a probability which indicates the probability of the original significance level EASE score or Fisher Exact occurring by chance alone Trim Parameters The trim parameters can be applied to filter the analysis results based on either the number of hits within the cluster or on the percentage of genes in the cluster that are represented by an annotation term Sometimes a term can be found significant but does not represent a large segment of t
172. es considered to be part of the population This is the case when the viewer was launched as a new viewer on a data subset or if the viewer was initially loaded with a previously saved cluster The Population and Cluster Selection Panel also displays gene clusters currently stored in MeV s cluster repository If no clusters have been saved then a blank browser page will be displayed on this page and the Cluster Analysis mode option will be disabled Selecting a row in the cluster table will display the cluster in the expression graph area of the browser EASE cluster analysis will operate on the selected cluster 182 5 EASE EASE Annotation Analysis File Updates and Configuration Select EASE File System CAMyProjectsiMeV 3 Tidatalease Update EASE File System Mode Selection Cluster Analysis CQ Annotation Survey Population and Cluster Selection Annotation Parameters Statistical Parameters Population Selection Population from File te Browser Population from Current Viewer Cluster Graph Algorithm Cluster Cluster La Remarks Algorithm Cluster 6 Algorithm Cluster 4 ENE 6 TGR E Reset Cancel sewer Population and Cluster Selection Page Annotation Parameters Page MeV Annotation Key This area contains a drop down list which contains a list of available annotation type
173. es the stopping criterion for the algorithm as it searches through the data set finding smaller and smaller clusters with each iteration Checking the Use Absolute checkbox will include genes with similar as well as opposing trends in a cluster e g if the distance metric selected is Pearson Correlation both positively and negatively correlated genes will be considered for inclusion in a cluster If the Use Absolute box is unchecked only genes of similar trends will be considered for inclusion in the same cluster As with the previous two methods it is possible to construct hierarchical trees from the clusters The last displayed group of genes consists of genes that remain unassigned to any cluster The subnodes on the left panel are similar to the ones previously described for k means and SOM Di xi File AdjustData Normalization Distance Analysis Display Sort Help m A oE MN RS su Z ST RN PTM M mov uw HCL SOTA KMS cast QTC GSH SOM FOM 4 s TrEST SVM PCA MultipleExperimentViewer view 9 Analysis 9 genes Expression Images Centroid Graphs Expression Graphs 3 Cluster 1 3 Cluster 2 3 Cluster 3 3 Cluster 4 3 Cluster 5 3 Cluster 6 3 Cluster 7 D Unassigned Genes 3 All Clusters 04 Cluster Information 3 General Information D History
174. esign By default samples are labeled with the file name or in the case of TDMS format files the names are in the header row Sample labels can be selected from the Select Sample Label menu of the Sample Column Labels menu of the Display menu Expression images and expression graphs will the label samples by this selected annotation field TDMS format files can contain additional sample annotation as described in the file format appendix Two other options for providing additional sample annotation are to use the append sample annotation option of the utilities menu or to use the Sample Label Editor as described below Editing Sample Labels and Sample Reordering The Sample Label Editor is launched from the Edit Labels Reorder Samples menu option One key point is that if samples names are added merged or if samples are reordered the changes can be captured by selecting the Save Matrix option from the File menu This will save the loaded data in to a TDMS file and will preserve the added sample annotation and sample order if altered Sample Label Editor TI G 6 1 1 Sample Label Editor The primary function of the Sample Label Editor is to permit the modification of labels attributes associated with the loaded samples Note that the first row in the table contains the default sample name The Default Name cannot be edited nor can it be deleted The second major function is to enable the order of
175. ession Save Cluster The Save Cluster option saves the currently viewed cluster to a file The expression values and annotation for the current cluster are saved in a format that can be reloaded as a Tab Delimited Multiple Sample TDMS format file For several statistical method this output includes statistics such as F values or T values and p values depending on the particular statistical algorithm applied Save All Clusters This saves all clusters from a clustering result as described above but where the file name has a cluster index appended to indicate cluster id Delete Cluster This method is used if the Store Cluster method has been used to store the cluster to the repository This is a remote method to remove the cluster from the repository The cluster table viewer in the cluster manager also has a means to remove single or multiple clusters 50 8 Working with Clusters The analysis modules available in MeV subdivide genes or samples into clusters by unsupervised techniques statistical methods classification algorithms or biological relationships These partitioned sets of elements are then individually displayed in one of the standard cluster viewers 8 1 Storing Clusters and Using the Cluster Manager Clusters of interest can be stored to a repository from the basic cluster viewers by opening a menu by right clicking in the viewer and selecting the store cluster option Once a cluster is stored the Cluster Manage
176. ether to cluster genes or experiments Process Selection The SVM algorithm works by performing two main processes training and classification One can elect to perform training only classification only or both phases of the SVM classification technique The Training Only option results in a set of numerical weights which can stored as an SVM file and used for classification at a later time The Classification Only option takes a file input of weights generated from training and results in a binary classification of the elements The Training and Classification option provides the ability to use the input set as a training set to produce weights which are immediately 127 applied to perform the classification The One out Iterative Validation iteratively performs an SVM training and classification run On each iteration one element is moved to the neutral classification and therefore will not impact the SVM training nor the classification of elements The final classification will notbe biased by an initial classification of the element Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created S M Initialization x Classification Input vj Use SYM Classification Editor Use Classification File Choose Training Parameters Constant 1 0 0 Radial Coefficient 1 0 Width factor Power 1 0 Constraints Diag factor 1 0 Pos constraint 1 0
177. ewer Once experiments have been loaded the Main View node of the navigation tree should contain a subtree called Experiment Views Expand this subtree to display a list of all the samples that have been loaded Clicking on any of these samples will display the CGH Circle Viewer for that sample Figure 2 1 The CGH Circle Viewer is a circular representation of the entire genome of a sample This view provides easy identification of large scale abnormalities and overall aneuploidy of a sample The display consists of 24 concentric circles each representing a chromosome with chromosome represented by the outermost circle and chromosome Y represented by the innermost circle Each circle is composed of a series of colored dots each representing a probe The probes are arranged by their linear around the genome The p arm of each chromosome begins at 180 degrees from the center of the display and subsequent probes are arranged clockwise by their position on the chromosome Click on any clone in the circle viewer to display its clone name and chromosome Right Clicking on any region will display a menu to browse RefSeq genes in the region Launch CGH browser on a sample Map out to public domain sites like NCBI etc 220 File Adjust Data Metrics Analysis Display Utilities CloneValues CGH Analysis Help m om s V for 52 HCL ST KMS casr QTC SOM PTM RN periment View O Berca 3 vs o
178. f genome wide expression patterns Proceedings of the National Academy of Sciences USA 95 14863 14868 Fellenberg K et al 2001 Correspondence analysis applied to microarray data Proceedings of the National Academy of Sciences USA 98 19 10781 10786 Gottardo Raftery Yeung and Bumgarner Quality Control and Robust Estimation for cDNA Microarrays with Replicates accepted for publication in Journal of the American Statistical Association Gottardo R Raftery AE Yee Yeung K Bumgarner RE 2006 Bayesian robust inference for differential gene expression in microarrays with multiple samples Biometrics 62 1 10 8 274 Hastie T R Tibshirani M B Eisen A Alizadeh R Levy L Staudt W C Chan D Botstein P Brown 2000 Gene shaving as a method for identifying distinct sets of genes with similar expression patterns Genome Biology 1 RESEARCH0003 Herrero J A Valencia and J Dopazo 2001 A hierarchical unsupervised growing neural network for clustering gene expression patterns Bioinformatics 17 2 126 136 Heyer L J S Kruglyak and S Yooseph 1999 Exploring expression data identification and analysis of co expressed genes Genome Research 9 1106 1115 Hosack D A G Dennis Jr B T Sherman H C Lane R A Lempicki Identifying biological themes within lists of genes with EASE Genome Biol 4 R70 R70 8 2003 Keppel G and S Zedeck 1989 Data Analysis for Research Designs W H Fr
179. f occurrence in same cluster Parameters for each K Means K Medians 10 80 Number of clusters 10 50 Maximum number of iterations Hierarchical Clustering Construct Hierarchical Trees TIGR experiment cancel 11 7 1 K Medians Support Initialization Dialog Box The number of consensus clusters generated may be more than the input number of clusters per run This is because some genes may cluster together frequently yet they may form a subset of different clusters in different runs Hence a set of genes that appeared as a single cluster in any given run may be split up into two or more consensus clusters over several runs Some genes may remain unassigned because they did not cluster with any other genes in enough runs to exceed the threshold percentage 92 11 8 CAST Clustering Affinity Search Technique Ben Dor et al 1999 The user is prompted for a threshold affinity value between 0 and 1 which may be thought of as the reciprocal of the distance metric between two genes scaled between 0 and 1 that has to be exceeded by all genes within a cluster The algorithm works by both adding and removing genes from a cluster each time adjusting the affinities of the genes to the current cluster and continuing this process until no further changes can be made to the current cluster CAST Cluster Affinity Search Technique Sample Selection Cluster Genes CQ Cluster
180. f the loaded experiments Once the New Script menu option has been selected an initial dialog form will come up to allow one to enter a script name and description 205 12 1 amp Script Attribute Input Script Attributes Creation Date Jun 16 2004 10 22 50 PM Script Name Study 432345 Script Description This scripttrims genes by percentage cutoffs run j two class SAM and uses EASE to find themes TIG ars Reset Cancel OK 1 New Script Attribute Dialog The script name description and the creation date will be stored in the script as comments Once the initialization dialog is dismissed the script manager node will become populated with a script table and two viewers associated with the new script The viewer that opens automatically is called the Script Tree Viewer This viewer is a graphical representation of the script and it is from this viewer that the user constructs the script The other script viewer is the Script XML viewer which displays the actual text of the script during script creation The Script Tree Viewer Script Construction The Script Tree Viewer is the main viewer used to construct the script The viewer s graphical nature permits the user to focus on script creation without undue consideration of complex script syntax The Script Tree Viewer represents the script as a set of connected nodes Each node is either a data node or an algorithm node Data n
181. for information on setting the limits when using the gradient color mode Three Bin Mode In the three bin color mode colors are assigned from one of three expression bins The user sets a high and a low limit on expression and values exceeding these limits are given a color that corresponds to the appropriate bin or left empty if the expression value falls within the set cutoffs Five Bin Mode The five bin mode provides four cutoffs levels and while the LEM still colors the arrows with discrete colors the two extra bins allow for an intermediate high and low expression bins Setting Limits and Colors for the Bin Modes The discrete color bin modes have cutoff values and color selection options contained in a dialog that can be launched via a right click menu option labeled Bin Colors and Limits The dialog has colored buttons that can be clicked to display a palette for color selection Five cutoffs for the bins can be altered via text fields In the case of the three bin mode only the outer most limits and colors are used The values of the bins should increase from left to right This convention is enforced to maintain valid limits when switching between 3 bin and 5 bin modes Note that the preview button will apply the current settings to the LEM To revert back to the original settings use Reset and then Preview or Apply 5 LEM Bin Range and Color Selection 3 Color and Range Settings Select Select Color Color T
182. for the current MeV session Different types of adjustments may be applied on top of one another in any sequence and the same type of adjustment may be applied repeatedly to the matrix although this may not make sense from the point of view of analysis Because of the above features sometimes it might not be a good idea to apply data transformations halfway through an analysis except for perhaps the Set Lower Cutoffs Set Percentage Cutoffs and Adjust Intensities of Zero options as the post transformation analyses and displays might not be entirely consistent with the pre transformation analyses A good way to use these options might be to apply any required adjustments to the data set save the entire adjusted matrix as a tab delimited multiple sample TDMS formatted text file using the Save Matrix option under the File Menu and then load this new file in a new 23 5 2 MeV session during which no further data adjustments will be made This will ensure consistency throughout the MeV session Adjustment options are described below Replicate Analysis RAMA Robust Analysis of MicroArrays Robust estimation of cDNA microarray intensities with replicates The package uses a Bayesian hierarchical model for the robust estimation Outliers are modeled explicitly using a t distribution and the model also addresses classical issues such as design effects normalization transformation and nonconstant variance
183. format text files can be loaded by selecting the Agilent file loader option from the list of available file formats to load Use the file system navigation tree on the left to move the directories containing files to load Files appearing in the Available file list can be added to the Selected file list using the Add or Add All buttons The upper file selection area is for the selection of Agilent Oligo Feature Extraction text files and lower file selection area is for the text version of the pattern file that corresponds to your slide 14 4 8 Expression Loader x 5 Load expression files of type Agilent Files Computer Selected Path OMA 9 cx C ANT i _ Agilent Oligo Feature Extraction Files txt E Config Msi Z Documents and 59 Y Eclipse i Available Selected 711386 7 j2sdk1 4 2 05 C Java A MyProjects C metadata A C BarCoder Remove 1 CGHBuilder2 Remove All CJ _ _ 011521_D_20021115_pattern txt 31 16011521022738_P2Cy3_RefCy5 txt 16011521022738 P2Cy3 RefCy5 txt C metabolism Agilent Pattern Files txt Available Selected 011521 D 20021115 pattern txt E 911521 D 20021115 pattern txt 16011521022738 P2Cy3 RefCy5 txt Remove Remove All
184. ft and then clicking the Select highlighted gene button 2 doing the same thing with one of the cluster means assuming that clusters have already been set by some other method 3 entering values between 0 and 1 in the text input fields above the slider bars corresponding to each experiment or 4 Adjusting the slider bars to the desired values Matches can be made by considering either the signed or the unsigned values of correlation coefficient using the checkbox labeled Match to Absolute R and the threshold criterion for matching can be either the magnitude of the correlation coefficient or the significance p value of the correlation coefficient PTM Pavlidis Template Matching Select highlighted cluster from above list to use its mean as template Ex3 Ex4 Ex5 Ex Ex8 Ex9 Ex10 os os os os os os os os SEE EERE Eat Threshold Parameters Use Absolute R Save Current Template Construct Hierarchical Trees for Matched genes only Use Threshold R Enter p value 0 1 All clusters Use Threshold p Value Reset Cancel 11 13 1 Template Matching PTM initialization dialog Parameters Template Selection Tabs The five tabbed panels at the top of the dialog select to view candidate templates from various sources 106 The Gene Template tab provides a list of genes in the data set
185. gorithm Panel The cluster selection algorithms are specific to scripting in MeV Automatic Cluster Selection allows the user to provide criteria for evaluating cluster results where clusters have no intrinsic identity such as significant genes One scenario is the result from K Means Clustering KMC where K 10 In this case 10 clusters will be produced and the cluster selection algorithms could be used to extract clusters based supplied criteria More on Cluster Selection Algorithms Two main options are available for cluster selection Diversity Ranking Cluster Selection computes cluster diversity for each of the input clusters and then ranks the clusters from least variable to most variable Clusters are selected that satisfy a minimum size population but are as least variable as possible In Diversity 209 Ranking Cluster Selection two possibilities exist for determination of cluster diversity 1 Centroid Based Diversity mean gene to centroid distance b dist 8 0 n is the cluster centroid gi is the ith expression vector of n i l vectors 2 Intra gene Based Diversity mean of all gene to gene distances in the cluster D ire uz n his j i l j l Diversity Ranking Cluster Selection I Selection Parameters Desired Number of Clusters 3 Minimum Cluster Size genes 10 Rank Clusters on Centroid Based Diversity Rank Clusters on Intra gene Based Diversity TIGR
186. he cluster of interest These options can be applied to be certain that a minimum number of genes in the cluster fall under that particular annotation class This feature should be used with caution so that biological themes represented by very few genes are not excluded 187 5 EASE EASE Annotation Analysis File Updates and Configuration Select EASE File System CAMwyProjectsiMev 3 Tidatatease Update EASE File System Mode Selection Cluster Analysis Annotation Survey Population and Cluster Selection Annotation Parameters Statistical Parameters Reported Statistic Fisher Exact Probability vj EASE Score Multiplicity Corrections 1 Bonferroni Correction Sidak Method Bonferroni Step Down Correction vj Resampling Probability Analysis Number of Permutations 1000 Trim Parameters C Trim Resulting Groups v Select Minimum Hit Number Min Hits C Select Minimum Hit Percentage Percent Hits Tiber Reset Cancel Statistical Parameters Page Results of EASE Analysis The primary result is reported in a table in which entries are ordered based on the reported statistic The table can be sorted on any column A right click in the table will launch a menu allowing several operations 188 Store Selection as Cluster Stores the genes associated with a biological theme as a cluster that will be stored in the cluster manager Open Viewer Opens one of
187. he clone 222 Multiple Array Viewer File Adjust Data Metrics Analysis Display Utilities CloneValues CGH Analysis Help 1 A HCL ST SOTA KMC KMS CAST QTC SOM ST SOTA vs humd HT Chromosome V X Chromosome 1 A Chromosome 2 A Chromosome 3 A Chromosome 4 A Chromosome 5 A Chromosome 6 A Chromosome 7 A Chromosome 8 A Chromosome A Chromosome 10 A Chromosome 1 A Chromosome 12 A Chromosome 13 A Chromosome 1 4 A Chromosome A Chromosome 18 A Chromosome 1 7 3 vs humd vs humd J A vs humd Pancl vs humd Hs766T vs humd HUP T4 vs humd 890 ultiExperiment Viewer CGH Position Graph view of chromosome 2 Probes in this display are colored the same way as described for the Circle Viewer Clone values color schemes and ratio scales can be adjusted as described in section Error Reference source not found Changing Experiment Order The order in which experiment appear in the display can be changed by using the Display Order item in the Display menu Figure 2 3 The position of samples can be moved up and down using the buttons on the bottom of this dialog and selecting Ok will cause the experiments to be displayed using the new order 223 BxPC 3_vs_humd DanG_vs_humd PAC vs humd Hs 766T vs humd HUP T4 vs humd PA 8902 vs humd 1 vs humd SU86 86 vs humd Up
188. he loaded data files 12 e Expression File Loader Load expression files of type Affymetrix Data Files Axt Compute re Selected Path S3ci edini odii Affymetrix Data Files txt mH Available Selected eji ea 5 fy eo Ani Remove eem o ed m e R ez Affymetrix Data Options Absolute _ Mean Intensity Reference Select reference files below Median Intensity Select Reference Files Available Selected Add Remove Remove j gt 4 4 1 The Expression File Loader Affymetrix Files 4 5 Automatically Setting Color Scale Limits When data are loaded in MeV it automatically generates expression image Expression images convey expression levels by converting the numeric expression value log2 A B or absolute expression value as a color that is extracted from the color gradient For Affymetrix data and two channels data each has different data range in the data set Now MeV has new function to automatically setting color scale limits according to different data set It follows the rule as it is For Affymetrix data the low end is set to 0 and midpoint is set as median and high end is set to the value so that 8096 of the data fall below this value For two color array data the low end is set to 3 and midpoint is set to 0 and high end is set to 3 Users can change color scale limi
189. iExperiment Viewer format mev the TIGR ArrayViewer format tav the TDMS file format Tab Delimited Multiple Sample format the Affymetrix file format and GenePix file format gpr See section 15 for details regarding the different file formats In addition to being formatted correctly the input data should already be normalized Using normalized data as input will result in more statistically valid output MIDAS a member of the TM4 software suite is one program that can do this normalization The maximum number of samples that can be loaded into a Multiple Array Viewer at one time depends on the available RAM in the computer running MeV and the number of expression values from the samples 4 1 Loading MeV mev Format Files Select Load File from the File menu to launch the file loading dialog At the top of this dialog use the drop down menu to select the type of expression files to load On the left hand side of the dialog is a file browser Use this browser to locate the files to be loaded The default file type to load is the mev file This file type is an update of the older tav file format Details about this file format can be found in the appendix 15 4 In the section of the loader labeled MeV Expression Files mev the contents of the folder selected in the file browser will be displayed in the box labeled available Select the mev files to load and click the add button to add them to the list
190. ialog contains two sub panels instead of one for group assignments 5 Two factor ANOVA Initialization Group Assignments Factor assignments Factor B assignments Ex1 Group 1 Group 2 O Notin groups 2 Ex1 Group 1 Group 2 Notin groups 2 Ex2 e Group 1 Group 2 Not in groups Ex2 e Group 1 Group 2 Not in groups Ex3 Group 1 Group 2 Not in groups Ex3 Group 1 Group 2 Not in groups Ex4 e Group 1 Group 2 Not in groups Ex4 Group 1 Group 2 Not in groups Ex5 e Group 1 Group 2 Not in groups Ex5 e Group 1 Group 2 Not in groups Ex6 e Group 1 Group 2 Not in groups Ex6 e Group 1 Group 2 CO Not in groups Ex e Group 1 Group 2 Not in groups Group 1 Group 2 Not in groups Ex8 e Group 1 Group 2 Not in groups Ex8 Group 1 Group 2 Not in groups 9 e Group 1 Group 2 Not in groups Ex9 e Group 1 Group 2 CO Not in groups Ex10 Group 1 Group 2 Notingroups Ex10 Group 1 Group 2 Notin groups gt gt Save settings Load settings Reset P Value Parameters p values based on t distribution p values based on permutation Enter number of permutation Enter critical p value 0 01 Alpha Corrections just alpha no correction standard Bonferroni correction jjusted Bonferroni ci Step down Westfall and Y
191. icant Genes 2 Non significantGenes 8 fa Script Manager S History J 11 2 12 gt i i i i 3 5 5 E B B a a TIGR MultiExperiment Viewer Bridge Results Expression Graph of Significant Genes 195 11 29 USC Uncorrelated Shrunken Centroids Yeung et al 2003 Prediction of the diagnostic category of a tissue sample from its expression profile and selection of relevant genes for class prediction have important applications in cancer research We developed the uncorrelated shrunken centroid USC algorithm that is an integrated classification and feature selection algorithms applicable to microarray data with any number of classes The USC algorithm is motivated by the shrunken centroid SC algorithm Tibshirani et al 2002 with the following key modification USC exploits the inter dependence of genes by removing highly correlated genes We showed that the removal of highly correlated genes typically improves classification accuracy and results in a smaller set of genes As with most classification and feature selection algorithms the USC algorithm proceeds in two phases the training and the test phase A training set is a microarray dataset consisting of samples for which the classes are known A test set is a microarray dataset consisting of samples for which the classes are assumed to be unknown to the algorithm and the goal is to predict which classes these samples belong to The first step in cla
192. ick the Load button to finish Annotation Fields MultiExperiment Viewer 6 TIGR Cancel Load Loading Tab Delimited Multiple Sample Files TDMS Format 4 3 1 4 4 Loading Affymetrix Data txt TXT Files Selecting Affymetrix Data Files txt from the drop down menu allows the loading of Affymetrix files Select the Affymetrix files to be loaded as you would when loading mev files using the file browser These files contain a single intensity value per spot instead of the usual two that MeV requires The values loaded from these files will be use as a Cy5 value that is the numerator in the calculation of the ratio of intensities Therefore there are several options for simulating a second intensity value the denominator Select from the radio button options to choose a method If Absolute is selected the denominator is given a value of 1 for all ratio calculations If Mean Intensity 18 selected the average of all intensity values for that gene across all loaded Affymetrix files is used as the denominator for that spot Similarly if Median Intensity is selected the median of all intensity values for that spot is used If Reference is selected a reference Affymetrix file selected in the file selector at the bottom of the dialog is used The intensity value of each record in the Affymetrix file is used as the denominator of the ratio calculation for the corresponding spot in each of t
193. ifferential response to exposure to a perturbation between groups of test subjects A valuable feature of SAM is that it gives estimates of the False Discovery Rate FDR which is the proportion of genes likely to have been identified by chance as being significant Furthermore SAM is a very interactive algorithm It allows users to eyeball the distribution of the test statistic and then set thresholds for significance through the tuning parameter delta after looking at the distribution The ability to dynamically alter the input parameters based on immediate visual feedback even before completing the analysis should make the data mining process more sensitive Currently SAM is implemented for the following designs 1 Two class unpaired where samples fall in one of two groups and the subjects are different between the two groups analogous to a between subjects t test The initialization dialog box is similar to the t test dialog Fig 11 15 1 The user inputs the group memberships of the samples in the top panel In the two class design genes will be considered to be positive significant if their mean expression in group B is significantly higher than in group A They will be considered negative significant if the mean of group A significantly exceeds that of group B 2 Two class paired in which samples are not only assigned to two groups but there is also a one to one pairing between a member of group A and a correspo
194. iment name which represents a labeled slide and analysis id which represents a particular image analysis session a unique dataset is specified Single Array Viewer File Views Normalization Sort Display Control bsp30025a0001 Blue gt Red 2 GR Bar Display GR Overlay GIR Scale Log Scale Cy3 Upper Limit t 1 10134107 11212465 Cy5 Upper Limit 112124 f 112134 _ Expression Ratio Sample Text Normalization Text 14 1 1 Single Array Viewer Once a slide has been loaded a representation of the slide will be displayed in the window Each colored rectangular bar an element corresponds to a spot on the array and is in the same position The display can be changed using the same menus as in the Multiple Array Viewer One display unique to the Single Array Viewer is the false color display It shows the two channels in separate areas 235 14 3 14 4 14 5 where elements are colored based on a scale where low intensities are dark and blue while high intensities are bright and red Clicking on a spot will display a dialog that shows detailed information about the target spot This information includes the row and column of the spot intensity values and the extra fields specified in the preference file Other elements of this dialog include a compressed version of the actual spot image where available and a graph showing the expression levels of
195. iments Bioinformatics 18 546 554 Pavlidis P and W S Noble 2001 Analysis of strain and regional variation in gene expression in mouse brain Genome Biology 2 research0042 1 0042 15 273 Raychaudhuri S J M Stuart B Altman 2000 Principal components analysis to summarize microarray experiments application to sporulation time series Pacific Symposium on Biocomputing 2000 Honolulu Hawaii 452 463 Available at http smi web stanford edu pubs SMI Abstracts SMI 1999 0804 html Soukas A P Cohen N D Socci and J M Friedman 2000 Leptin specific patterns of gene expression in white adipose tissue Genes and Development 14 963 980 Tamayo P D Slonim J Masirov Q Zhu S Kitareewan E Dmitrovsky E S Lander and T R Golub 1999 Interpreting patterns of gene expression with self organizing maps Methods and application to hematopoietic differentiation Proceedings of the National Academy of Sciences USA 96 2907 2912 Theilhaber J T Connolly S Roman Roman S Bushnell A Jackson K Call T Garcia R Baron 2002 Finding Genes in the 2 12 Osteogenic Pathway by K Nearest Neighbor Classification of Expression Data Genome Research 12 165 176 Tusher V G R Tibshirani and G Chu 2001 Significance analysis of microarrays applied to the ionizing radiation response Proceedings of the National Academy of Sciences USA 98 5116 5121 Welch B L 1947 The generalization of students problem whe
196. in later sections Most of these features correspond to menu options except for Locus Information Panels which are launched with a left mouse click on a locus arrow LEM Navigation provides multiple options to systematically navigate over the map Locus Information Panels provides detailed locus expression and annotation information a link to web resources and an option to navigate at the locus level Customize Viewer sets locus arrow scaling options and viewer layout options Color Scale Options sets the mode and constraints for coloring locus arrows to reflect expression Locus Selection Manager provides support for the creation and visualization of lists of selected loci options for list output and methods for targeted loci selection Store Cluster stores spots related to selected loci to MeV s cluster manager repository Save Selected Loci Locus Detail saves locus level information to a file Save Selected Loci Spot Detail saves spot information for selected loci Save All Loci saves the all loci expression and annotation information to file LEM Navigation The LEM navigation controller is launched via the right click menu option titled LEM Navigation The Navigation Control provides several options for moving within the LEM The upper section of the control is a reduced representation of the LEM where each locus is one pixel high A blue rectangle indicates the area that is currently visible in MeV s
197. ing only the centroid graph Error bars represent the standard deviation of expression within the cluster 7 4 1 45 Multiple Array Viewer DJ x File Adjust Data Normalization Distance Analysis Display Sort Help RR RE oai GE Gk a p Z MultipleExperimentviewer D Main View E analysis 9 KMC genes Expression Images Centroid Graphs D Cluster 1 3 Cluster 2 D Cluster 3 C Cluster 4 Cluster 5 D Cluster 6 Cluster 7 D Cluster 8 D Cluster 9 C cluster 10 D All Clusters I Expression Graphs I Cluster Information 3 General Information D History TIGR MultiExperiment Viewer 7 4 1 Centroid Graph 46 7 5 Table Views These views show element annotation and where appropriate auxiliary information such as element specific statistical information for the elements in a cluster TIGR Multiple Array Viewer File Adjust Data Normalization Distance Analysis Display Sort Help A S Sal ES mon m ee anova anova SVM KNN GDM TRN lolx 2 HCL ST SOTA RN KMC FOM GWEIGHT GroupA me std GroupB me GroupB std Absolute t v Degre 1 4 0 8944272 1 8 0 472136 71554174 50 view 14 0 8944272 1 8 0 4472136 71554174 50 M Cluster Manager 0 25968328 0 16670637 0 19215456 0 0998969 5 986637
198. ing set _7 s Train amp Classify training phase Assign S gt labels Choose the optimal parameters Delta and Rho Microarray data N test set with ms unknown labels a test phase Classify from File 11 29 1 Overview of USC Initial Dialog Box The initial dialog box allows you to choose from 2 modes of operation Train amp Classify or Classify from File The option Train amp Classify should be used for the training phase or if both the training and test sets are uploaded as one microarray data The option Classify from File corresponds to the test phase of the algorithm and assumes that a classifier has been previously built When Training amp Classifying the user is required to enter all the unique class labels of the known training experiments By default there is space for 2 class labels If more are needed use the of Classes spinner Entering Class Labels is disabled if you are Classifying from File You are also allowed at this point to make any adjustments to the default parameters By default the parameters are disabled Clicking on the Advanced checkbox enables adjustment of the parameters Advanced Parameters Folds is the number of times to divide the training set in pseudo training and pseudo test sets during a cross validation run For example if there are 10 total training experiments to be cross validated and
199. iology 6 281 297 Brown M P W N Grundy D Lin N Cristianini C W Sugnet T S Furey M Ares Jr and D Haussler 2000 Knowledge based analysis of microarray gene expression data by using support vector machines Proceedings of the National Academy of Sciences USA 97 262 267 Butte A J P Tamayo D Slonim T R Golub I S Kohane 2000 Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks Proceedings of the National Academy of Sciences USA 97 12182 12186 Chu G B Narasimhan R Tibshirani and V Tusher 2002 SAM Significance Analysis of Microarrays Users Guide and Technical Document http www stat stanford edu tibs SAM Culhane A C et al 2002 Between group analysis of microarray data Bioinformatics 18 1600 1608 Dopazo J J M Carazo 1997 Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree J Mol Evol 44 226 233 Dudoit S J P Shaffer and J C Boldrick 2003 Multiple Hypothesis Testing in Microarray Experiments Statistical Science 18 71 103 Dudoit S Y H Yang M J Callow and T Speed 2000 Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments Technical Report 2000 Statistics Dept Univ of California Berkeley Eisen M B P T Spellman P O Brown and D Botstein 1998 Cluster analysis and display o
200. ion Cluster Genes Cluster Samples Parameters Use Permutation Test Min Threshold 0 97 Max Threshold 1 0 Use Filter Reset Cancel OK 11 5 1 RN Initialization Dialog Parameters Sample Selection The sample selection option indicates whether to cluster genes or samples Use Permutation Test This check box is used to indicate that the minimum threshold R value should be selected based on a distribution constructed from element to element R values derived following permutation of the expression vectors Min Threshold This value ranging from 0 to 1 0 indicates the smallest R possible between two elements to permit a link between the elements in a subnet Max Threshold This value ranging from 0 to 1 0 indicates the greatest R possible between two elements to permit a link between the elements in a subnet Use Filter This option allows the user to filter out elements with little dynamic change thus removing flat or uninteresting elements A measure of entropy is used to rank the elements The percentage value entered 1 to 100 indicates what percentage of the elements to retain for the construction of the network A value of 25 will retain the 25 of elements having the greatest entropy Distance Metric Pearson squared Other Acceptable Metrics None Changing the metric in the Distance menu 86 will not affect the calculations done by this module The module calculates th
201. ion Parameters Page Statistical Parameters Page Several sections on this page are used to specify the reported statistic optional multiplicity corrections and optional result trimming parameters 185 Reported Statistic Fisher Exact Probability The Fisher s Exact Probability reports the probability that a biological theme is over represented in the cluster of interest relative to the representation of that theme in the total gene population For example suppose that one has a gene list of 50 genes from a population of 10 000 genes Now suppose that 10 of the 50 genes were related to pathway A but only 13 genes in the total population were associated with pathway A This scenario would yield a low probability that the observed number of hits occurrences of pathway A within the small sample could be due to chance alone This statistic is based on the hypergeometric distribution and has benefits over chi square in that it is appropriate for finite populations The reference sited for EASE describes this statistic at length EASE Score The EASE Score reported is essentially a jackknifed Fisher s Exact Probability that is arrived at by calculation of the Fisher s Exact where one occurrence list hit for a term has been removed Multiplicity Corrections Bonferroni Correction This correction simply multiplies the statistic by the number of results generated This is the most stringent correction of the three options Bonf
202. ion about the category that is included in the annotation file If the default mode Cluster Analysis is selected the last column is the score the category receives that is the probability the certain category is decided over expressed by chance If Annotation Survey mode is selected the last two columns would be the hit count and cluster size respectively GR Multiple Array Viewer File Adjust Data Metrics Analysis Display Utilities S S oe Z TEASE HCL ST SOTA RSEN CI CAST gr QTC GSH SOM pm AA AA FOM TTEST ANOVA ANOVA MeV F Cluster Manager Analysis Results Jan 17 2006 1 44 20 PM 8 Data Source Selection TEASE 1 TEASE Tree Gene Node Height Plot General Information 9 TEASE 2 TEASE Tree Gene Node Height Plot General Information Script Manager Ej History 2 3 2 3 3 3 114 TEST TEST 122 6A TEST3 8A TEST 10A TEST1 3 GO Biological Process GO 0006954 inflammatory response 1 146 05 GO Biological Process GO 0045087 innate immune response 1 401E 05 GO Biological Process GO 0009611 response to wounding 5 885E 05 GO Biological Process GO 0007267 ce ce l signaling 8 58E 05 GO Biological Process GO 0006935 chemotaxis 1 296 04 IGR MultiExperiment Viewer TEASE hierarchical tree with color coded visualization dots
203. ion files can be associated with each other if the unique ids are identical A single header row is required to precede the annotation data in order to identify the columns below Each remaining row of the file stores annotation data for a particular spot feature on the array Annotation files may contain any number of non computational comment lines These lines starting with will be treated identically to comment lines in mev files and should precede the header row Annotation files created at TIGR will use UIDs that match the format used in the mev files most likely database_name spot_id The structure of each annotation file is detailed below The header row consists of headers that identify each column of data Each subsequent row of the file stores data for a particular spot feature on the array The annotation files created at TIGR will typically contain at least one comment at the top of the file with the following information version Version number based on revisions of annotation data format_version The version of the mev file format document date Date of file creation or update analyst Owner or the person responsible for creating the file created by Software tool used to create the document gi version Version of the Gene Indices or db that produced this annotation data 244 15 6 slide_type type from the slide_type table that this array is based on output row count Number of rows of annotation eg non head
204. ion option indicates whether to cluster genes or samples Means Medians option The Means or Medians option indicates whether each cluster s centroid vector should be calculated a mean or a median of the member expression patterns Number of Clusters 89 This positive integer value indicates the number of clusters to be created Note that FOM can be used to estimate an appropriate value Number of Iterations This positive integer value is the maximum number of times that all the elements in the data set will be tested for cluster fit On each iteration each element is associated with the cluster with the closest mean or median Note that the algorithm will terminate when either no elements require migration reassignment to new clusters or when the maximum number of iterations has been reached Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created Default Distance Metric Euclidean SEES Multiple Array Viewer B ES ni xj File AdjustData Normalization Distance Analysis Display Sort Help FERA RESE aA a Cat KMS casT HCL ST SOTA MultipleExperimentviewer Main view 9 Analysis 9 KMC genes 7 Expression Images Cf Centroid Graphs 9 3 Expression Graphs D Cluster 1 D Cluster 2 D Cluster 3 B Cluster 4 Cluster 5 3 Cluster 6 3 All Clusters I7 Cluster Information 3 Gene
205. ion to global mapping data the number of mapped loci and the number of spots that correspond to each chromosome or plasmid are listed 160 11 22 GDM Gene Distance Matrix Most of the clustering methods found in MeV form clusters by algorithms that group genes based on similarity of expression pattern The distance inverse of similarity between two genes is calculated using a distance metric see Distance menu and manual section 13 the appendix on metrics The GDM gives an intuitive and comprehensive view of the distance or similarity between any two genes loaded into MeV by creating a colored matrix representing all gene to gene distances The GDM module is useful for taking a distance survey as well as discovering which genes are similar in expression pattern to a particular gene of interest Like most of the MeV modules the GDM module can also be used with experiments as input GDM Initialization The GDM module can be started by using the GDM button or by selecting the GDM menu item from the analysis menu When creating a gene matrix it is possible to display a subset of the full data set The creation of an nxn matrix is expensive from a computer memory standpoint and by using the Display Interval option it is possible to make a smaller matrix and conserve memory 8 Gene Distance Matrix Initialization Parameters Genes Samples Display Interval 1 TIG amp ipe MS Reset cancel OK 11 22 1 GD
206. iple Array Viewer _ D J xl File Adjust Data Normalization Distance Analysis Display Sort Help te or dw MultipleExperimentViewer D Main View Analysis 9 genes C Expression Images Centroid Graphs Expression Graphs Cluster 1 Cluster 2 Cluster 3 D D Cluster 5 D Cluster amp 3 Cluster 7 Cluster 8 D cluster 8 3 cluster 10 Dall Clusters 29 Cluster Information 3 General Information L3 History TIGR MultiExperiment Viewer 7 3 1 Expression Graph of one cluster with gradient color selected 44 Multiple Array Viewer File AdjustData Normalization Distance Analysis Display Sort Help E RE BE S 8 JT SVM HCL amp sx ST SOTA MultipleExperimentViewer D Main View 9 Analysis 9 KMC genes 3 Expression Images Centroid Graphs Expression Graphs B Cluster 1 KMS CAST TTEST De Y Dc Ae Dc Dc Dec Dc Ae uster 2 uster 3 uster 4 uster 5 uster 6 uster 7 uster 8 uster 9 uster 10 D3 All Clusters Ef Cluster Information 3 General Information D History TIGR MultiExperiment Viewer 7 3 2 Expression Graphs of all clusters 7 4 Centroid Graphs This viewer is very similar to the Expression Graph Viewer except that line graphs for individual genes are omitted leav
207. is considered the cluster to cluster distance Average Linkage The average distance of each member of one cluster to each member of the other cluster is used as a measure of cluster to cluster distance Complete Linkage The distances are measured between each member of one cluster each member of the other cluster The maximum of these distances is considered the cluster to cluster distance Support Tree Legend A legend to relate support tree output colors to support is displayed by selecting the support tree legend menu item in the main help menu Default Distance Metric Euclidean 80 11 4 SOTA Self Organizing Tree Algorithm Dopazo et al 1997 Herrero et al 2001 The initialization form shown below 11 4 1 is divided into four main areas The SOTA algorithm constructs a binary tree dendrogram in which the terminal nodes are the resulting clusters 8 SOTA Self Organizing Tree Algorithm Sample Selection Cluster Genes Q Cluster Samples Growth Termination Criteria Max Cycles 10 Max Cell Diversity D 01 Max epochs cycle 1000 Min Epoch Error Improvement 0 0001 1 Run Maximum Number of Cycles unrestricted growth Cetroid Migration and Neighborhood Parameters Winning Cell Migration Weight 0 01 Parent Cell Migration Weight 0 005 Neighborhood Level 5 Sister Cell Migration Weight 0 001 Cell Division Criteria Use Cell Diversity mean dist gene centroid C Use Cell Variability max dis
208. ity History Cluster Information 3 General Information 072733 1307672 8773184 780998 78875154 55135286 81361246 9014461 0441005 87349904 8694091 DB History F Fa Cluster 3 Population 51 Diversity 0 8773184 11 4 2 SOTA dendrogram The SOTA diversity viewer shows the change in the summation of gene to associated centroid distance vectors for all genes in the tree This is a measure of overall tree diversity This can reveal how much diversity improvement is achieved with each cycle new cluster addition 84 Multiple Array Viewer _ MultipleExperimentViewer I D view SOTA Tree Diversity History 9 Analysis 9 SOTA genes 3 SOTA Dendogram 3 Expression Images 3 Centroid Graphs C Expression Graphs SOTA Diversity History Cf Cluster Information 1 General Information DB History Cycle Number 11 4 3 SOTA diversity viewer 85 11 5 RN Relevance Networks Butte et al 2000 A relevance network is a group of genes whose expression profiles are highly predictive of one another Each pair of genes related by a correlation coefficient larger than a minimum threshold and smaller than a maximum threshold assigned in the initialization dialog box is connected by a line Groups of genes connected to one another are referred to as networks RN Relevance Networks Sample Select
209. k number see below for details This parameter is mandatory Headers Sets the number of non element row and column headers in the input file separated by a colon For example 2 4 would indicate the first two rows and first four columns of every input file are considered headers Unique ID Sets the number of the column that represents the unique identifiers for each element This column should not contain duplicate values and every element should have a value in this column Spot Name Set the number of the column that represents the name of each element These are usually descriptive and human decipherable strings These values do not need to be unique Additional Fields Indicates the names of additional data fields after row column cy3 cy5 to be stored and displayed The field names should be separated by a colon If you have additional fields you want to have MeV process each line of your input from the flat file or row returned from the database must have a number of additional columns equal to the number of additional fields Each field name must be unique Algorithm Factory The name of the Algorithm Factory implementation class 253 17 Appendix Distance Metrics MeV provides eleven distance metrics from the distance menu on the menu bar While Euclidean Distance and Pearson Correlation are by far the most utilized metrics this appendix summarizes all available metrics Note that in the following equations
210. l p value Currently p values are computed only from the F distribution fb One way ANOVA Initialization Number of groups 3 OK Group Assignments Ex1 Group 1 Group 2 Group 3 Not in groups Ex2 Group 1 Group 2 Group 3 Not in groups Ex3 Group 1 Group 2 Group 3 Not in groups Ex4 Group 1 Group 2 Group 3 Not in groups Ex5 Group 1 Group 2 Group 3 Not in groups Ex6 Group 1 Group 2 Group 3 Not in groups Group 1 Group 2 Group 3 CO Not in groups Ex8 Group 1 Group 2 Group 3 Not in groups Ex9 Group 1 Group 2 Group 3 Not in groups Ex10 e Group 1 Group 2 Group 3 Not in groups Note Each group MUST each contain more than one experiment Save settings Load settings _Reset P value parameters Enter alpha critical p value 0 01 Hierarchical Clustering 0 Construct Hierarchical Trees TIG Reset Cancel 11 16 1 One way ANOVA initialization dialog box 121 Parameters Group Assignments Group Selection Controls This set of buttons permits each experiment to be placed into any group or no group If a sample is placed in no group it will be ignored for the purposes the analysis Note that each group must each have at least two members following the assignment Save Grouping The save grouping button allows you to save the grouping to file This is particularly useful
211. l resource The lt URL gt entry is the URL for the resource Look in the file for examples It is important to note that the URL has a section labeled FIELD1 This is a placeholder to indicate where the gene identifier should go when constructing the URL The UniGene URL is the only exception requiring two variable fields called FIELD1 and FIELD2 which are parsed out of the UniGene identifier The UniGene Annotation Label is required in the annotation URLs txt file to correctly parse UniGene Ids therefore please do not modify this line 7 6 Common Viewer Activities A right click within most result viewers will launch a menu that is specific to the currently displayed viewer Some of the viewer specific options such as searching in Table Viewers have been discussed however there are several common viewer options that are shared among the common viewer types 49 Store Cluster The Store Cluster option will save the currently viewed cluster to a cluster manager This feature will be described in detail in section 8 The main use of this feature is to assign a color to the elements in the cluster so that the location of the marked elements in future results can be assessed Launch New Session The Launch New Session option takes the elements in the current viewer an opens anew multiple array viewer with just those elements represented This is useful for quickly extracting elements of interest and further characterizing their expr
212. le system directs EASE related file choosers to that area but file selections may be made outside of selected base file system if appropriate Note that the selected directory should be the directory that contains the EASE Data directory In MeV s default data directory this would be the ease directory Updating Downloading the EASE File System This feature downloads EASE annotation file systems for a selected species and array A selection dialog will allow species selection from a variety of plant and animal species A list of many commercially available arrays for the selected species is also presented for selection After species and array selection a dialog will be presented to select a directory as the destination directory for the EASE file system Zip files will then be downloaded and automatically extracted into the destination directory The new base directory will be labeled with ease_ and 180 the selected array name This new file system can be selected as the default system Ease File Update Selection Animal Arrays Species Human Arrays for Human TIG E Fn Reset cancel OK EASE File Update Array Selection Dialog Analysis Mode Selection The EASE implementation in MeV provides two major modes of operation Cluster Analysis and Slide Annotation Survey modes Cluster Analysis This mode performs annotation analysis on a selected subset sample list or cluster of the full dat
213. lid Parameters Reset Cancel Commit 12 10 Script Parameter Value Editor valid Parameters for KMC x Valid Script Parameters for KMC Note Parameters that are not listed as Always required usually depend on the value of other entered parameters to determine if they are required key vare un us tie Bees jm eem been uem _ calculate means poem kmc cluster genes distance factor ot distance absolute distance function x o jJ hierarchical tree poem jJ pp ease 12 11 Script Algorithm Parameter Table 215 Loading a Script Loading of saved scripts is done by selecting Load Script from the File menu of the Multiple Array Viewer A file selection dialog will automatically launch to prompt for the selection of a script file Scripts should be stored in the script directory of MeV s data directory to ensure proper validation During script loading several levels of script validation occur Fatal Errors usually malformed XML Validation Errors script that does not match the Document Type Description DTD Parser Warnings and algorithm Parameter Errors missing required parameters parameter type mismatch or parameter out of bounds errors are caught during the validation If multiple validation errors exist all will be reported All validation errors are reported in a Script Error Log dialog
214. ll be excluded select GenePix Flags Filter and enter a percentage value To enable this option check the Enable GenePix Flags Filter checkbox just below the Set GenePix Flags Filter menu option and uncheck it to disable this option A value of 0 0 indicates that all genes will be used in the analysis To require that every one of the gene s expression values must be valid to be included set the value to 100 This option is disabled by default Set GenePix Flags Filter Enable GenePix Flags Filter Above Percentage samples with negative flags total samples for each gene 0 0 TIGR cree Reset OK 5 4 7 GenePix Flags Filter Dialog 5 5 Data Source Selection An important aspect of data analysis or data mining is what might be called analysis branching This involves the initial selection of a gene or sample set based on a preliminary screen or analysis technique such as a statistical method and then taking this subset of elements and performing more detailed analysis to find constituent features such as prevalent expression patterns A right click on a result viewer node in the result navigation tree will present a menu option when applicable to set the contained data as the primary data set for subsequent analysis The selected data source node in the navigation tree will be highlighted by a green rectangle which indicates that this is the primary data source for downstream analysis As an in
215. ll clusters The XOR exclusive OR operation produces a cluster containing elements that are members of one cluster or another but not members of more than one cluster Options also exist to delete selected clusters or all clusters in the list as well as to save a selected cluster to a specified file Deleting clusters can be performed by selecting a single or multiple clusters in the cluster table or by selecting delete public cluster option from the menu in the viewer which contains the cluster You can also Save Cluster Data to a tab delimited text file Selecting this option from the right click menu will cause a file chooser to appear Select a file name and a place to save row column data log ratio expression values and optionally Cy3 and Cy5 values for each gene in the cluster Selecting Save All Clusters will allow you to save the genes in all clusters in a similar way This option is available from the cluster table as well as in the viewer One additional option is the option to delete all gene clusters or sample clusters These global operations which effect all colored clusters is selected from the Utilities menu in the multiple array viewer by selecting Delete All Gene Clusters or Delete All Sample Clusters or can be done from the cluster tables The Import Gene Experiment List allows one create a cluster based on supplied identifiers Identifiers belonging to the cluster are pasted into the text area The drop down list indica
216. lly used for portioning the data rather than the actual input expression vectors Usually about 3 components is adequate to describe or cover most of the variance in the data This step in the algorithm is described as the dimensional reduction step The Validation Algorithm Selection panel allows the user to select if validation should be performed and which assessment algorithm in AO Al and A2 are to be used in validation The algorithms are described in the cited DAM reference and briefly in the information help page that is launched when from the information button in the lower left corner When the Information button at the bottom left corner of the DAM Initialization Dialog is depressed an Information Dialog screen is popped up This Dialog contains brief description of the terminologies in DAM and references page describing the methods and parameters used in DAM 138 amp DAM Parameter Information DAM Discriminant Analysis Module Initialization Dialog Parameter Information General DAM Terminology The Primary function of the DAM is to serve as a method for multi class classification It incorporates the dimensional reduction method Midiivariate Partial Least Squares MPLS and two classification analysis methods Pefychotomous Discrimination PD and Quadratic Discriminant Analysis Either PD or are performed after the starting data has been reduced by MPLS Classification Selection The classification
217. loaded 34 samples to be rearranged Two menus are available within the editor frame for performing editor actions The two menus contain the same menu options One is in the main menu bar and the other is available using a right mouse click with the mouse cursor over a table cell Note that the import of additional sample annotation can be done using the Append Sample Annotation option from the utilities menu Sample Reordering The order of the columns in the table can be altered by clicking on the column header and dragging the column to the desired location This action alone does not force the loaded data to be reordered The table order can be imposed on the loaded data if the check box menu item is selected to Enable Sample Reordering This ordering is imposed as soon as the dialog is dismissed Note If analyses have been run on the loaded data the reordering option is disabled This is done as a precaution against misrepresentation of prior results that may have relied on the original ordering Typically the reordering of samples should be done just after data loading to place associated samples into an order that is reasonable based on study design Adding Sample Labels The Add New Sample Label menu item is used to create a new row in the table Note that the first cell in the new row is light yellow and is used to indicate a Label Key This key will be placed into the Sample Label menu in the Display menu so that when selected
218. loaded the file to the desktop type cd Users yourusername Desktop and hit return The prompt should now read MyComputer Desktop username To install the package type RCMD INSTALL rama_1 3 0 tar gz or bridge_1 3 1 tar gz Updating under OS X Drag amp Drop 1 Download the Source Package rama 1 3 0 tar gz from http www bioconductor org packages bioc 1 8 html rama html or bridge 1 3 1 tar gz from http www bioconductor org packages bioc 1 8 html bridge html 2 Unpack the gzipped file 3 Replace the old rama directory with your new one in Library Frameworks R framework Versions Current Resources library Note You may have to chmod your file permissions depending on your installation Updating under Windows 1 Download the Source Package rama 1 3 0 zip from http www bioconductor org packages bioc 1 8 html rama html or bridge 1 3 1 zip from http www bioconductor org packages bioc 1 8 html bridge html 2 Unzip the zipped file 3 Replace the old rama or bridge directory with your new one The location depends on where you installed R By default it would be C Program Files RWR x y zMibrary 5 Installing Rserve Installing under OS X 269 1 Download Rserve from http stats math uni augsburg de Rserve down shtml You will want to get the Current version Rserve 0 3 17 tar gz at time of writing 2 Open a terminal window and navigate to the directory containing the downloaded Rserve file Type R CMD I
219. lysis to file can be accomplished by selecting Save Analysis As in the Multiple Array Viewer s File menu The analysis file will contain the loaded data the current analysis results and any clusters that are stored in the cluster manager If the analysis has been saved previously then the Save Analysis option will be enabled If this option is taken then the analysis will be saved to the analysis file that was opened or the last saved analysis file To reopen an analysis select the Open Analysis menu option in the File menu This will restore the data and all algorithm results One word of caution on analysis saving this feature has been dramatically re written since MeV v3 1 to improve reliability and portability Because of these changes MeV 4 0 cannot open saved analysis files created by MeV 3 1 or MeV4 0b In order to make saving analyses seamless and efficient we have had to sacrifice backwards compatibility with previous versions of the analysis saving features of MeV Future versions of MeV after v4 0 will be able to open analysis anl files created by MeV v4 0 and later Note that for sharing analysis techniques and results with collaborators one option is to use MeV s scripting ability to create an algorithm execution script so that others analyzing the same data can share techniques and view interesting results produced by specific analysis routines See Scripting for details Chapter 10 10 2 Saving the Expression Matrix
220. main matrix figure you may note that a gene in the column header is selected as denoted by the rectangle around the gene identifier Moving the mouse cursor over the header enacts item selection If you click on the label in the header then that element is moved to the left or top if in row header and the remaining elements are ordered by proximity to the selected element Typically you should see the first row and column appear as a gradient since the neighbors are ordered by proximity The sort menu options can be used to impose other 163 orderings The Toggle Sort on Proximity menu item turns this capability on or off Save k Neighbors The Save k Neighbors menu option is used in conjunction with proximity sort Once sorted by proximity the Save k Neighbors options saves any number of the selected gene s or sample s nearest neighbors as displayed in the viewer Sort The Sort menu provides methods to sort the genes or samples according to the order specified in the input file default or by a selected annotation key This provides a useful method to be used with proximity sort First one can order by annotation which enables easy selection of the gene of interest Once found clicking on the gene s label will shift it to the corner position with its nearest neighbors in order of proximity Impose Cluster Result The Impose Cluster Result option is among the most useful features of the GDM When selected the current MeV session
221. main window 145 Update Overlay Window Gene Locator Base Locator Window Stepping Navigate by Window Steps Start Prev Next gt 11 21 3 LEM Navigation Control with active tool tip locus id and coordinate info Navigation Modes Click to Location In this option the main view Jumps to a location in the LEM that corresponds to the location of a left mouse click in the Navigation Controller s LEM representation The blue rectangle outline of the main viewer s range is updated to reflect the current viewable location in the main viewer Aside from clicking on obvious expression features the reduced size navigation screen also displays a tool tip to indicate the locus id and location information for the element under the mouse tip This feature allows one to mouse over the controller in search of a particular locus or base location The reduced size view can be a quick means to take a survey of the genome or chromosome in the LEM Window Stepping 146 This option allows you to step through the LEM systematically by advancing by one view screen at a time During window stepping the blue rectangle will update to indicate the LEM location that is visible in the main viewer The buttons to control this are located at the bottom of the controller under a tab labeled Window Stepping Shortcuts to the start and end of the LEM are additional options to quickly jump to those end points
222. mentviewer D Main view A 153 Genes analysis genes 9 CAST genes 3 Expression Images 9 Centroid Graphs 3 Cluster 1 3 Cluster 2 3 Cluster 3 3 cluster 4 Cluster 5 Dall Clusters Expression Graphs Ef Cluster Information General Information 3 History TIGR MultiExperiment Viewer 11 8 2 CAST Centroid View 94 11 9 QTC QT CLUST modified from Heyer et al 1999 The dialog will prompt the user for the cluster diameter and the minimum cluster size The cluster diameter is the largest distance allowed between two genes in a cluster expressed as a fraction between 0 and 1 A diameter of 1 corresponds to the largest possible distance between two genes For Pearson Correlation Pearson Uncentered Pearson Correlation Squared Kendall s Tau and Cosine Correlation the maximum possible distance is 1 and therefore the user input diameter is the actual maximum allowed distance between two genes in a cluster For the other distance metrics which do not have a fixed upper bound the maximum distance between two genes in the current dataset is set to 1 and the diameter is calculated accordingly To reduce bias resulting from outliers the distances used for computing clusters are jackknifed 1 e each experiment is left out in turn while computing the distance between two genes and the maximum of the distances is taken The minimum cluster size specifi
223. n D History TIGR MultiExperiment Viewer 11 5 2 Network View Several options can be launched from the network viewer to enhance the view or to better characterize the results by using a right click context menu Note that links colored in red represent elements that are positively correlated while links colored in blue represent elements that are negatively correlated Options from the menu to enhance the view include the ability to zoom in and out on the subnets alter the color of the background alter the element shape and alter the thickness of the links for better visibility Other options reveal the nature of the subnets You can select to alter the threshold to make it more stringent Using this option under the inks threshold 87 the viewer is instructed to only show links for elements correlated with R greater that the new threshold Selection of elements in the network can be done by using the select option in the right click menu Two options are offered for selecting elements one by providing a gene identifier label Element ID option and the second option by specifying all elements with a specified minimum number of links Feature Degree option As in other viewers selected elements may be assigned a color Set Selection option 88 11 6 KMC K Means K Medians Clustering Soukas et al 2000 Selecting this analysis will display a dialog that allows the user to specify whether to use means
224. n name or other details about the experiment An example of the leading comments version V1 0 format_version V4 0 date 10 06 2004 analyst aisaeed analysis id 10579 slide type IASCAGI input row count 32448 output row count 32448 created by TIGR Spotfinder 2 2 3 TIFF files processed gpc30025a 532 nm tif gpc30025a 635 nm tif description Tumor type comparison 241 This is the 4 experiment in a series of 20 to identify tissue specific genes The header row consists of the field names for each subsequent row in this file with the exception of comment lines A minimum of seven columns must be present and these must use a set of specifically named headers Any number of additional columns may be included The seven required column headers are UID Unique identifier for this spot IA Intensity value in channel A IB Intensity value in channel B R Row slide row Column slide column MR Meta row block row MC Meta column block column As of version 4 0 of this file format the and IB columns can be substituted with MedA and MedB The new requirement is that at least one integrated intensity IA IB etc or one median MedA MedB etc value be reported for each channel in the microarray For example a two channel microarray mev file would require either and IB or MedA and MedB MedA Median intensity in channel A MedB Median intensity in channel B files may use on
225. n several different population variances are involved Biometrika 34 28 35 Yeung K Y D R Haynor and W L Ruzzo 2001 Validating clustering for gene expression data Bioinformatics 17 309 318 Yeung KY Bumgarner RE 2003 Multiclass classification of microarray data with repeated measurements application to cancer Genome Biol 4 12 R83 Yeung KY Bumgarner RE 2005 Correction Multiclass classification of microarray data with repeated measurements application to cancer Genome Biol 6 13 405 Zar J H 1999 Biostatistical Analysis 4 ed Prentice Hall NJ 276
226. n the Chromosome Views subtree of the navigation tree on the left side of the screen to see the selected probes annotated The element length may have to be changed to view all annotated probes Multiple Array Viewer File Adjust Data Metrics Analysis Display Utilities CloneValues CGH Analysis Help Z8 IL HCL ST SOTA KMC KMS CAST QTC SOM FOM ST SOTA Ell Chromosome views M Chromosome 1 X Chromosome 2 M Chromosome 3 M Chromosome 4 M Chromosome 5 X Chromosome 6 M Chromosome 7 A Chromosome 8 M Chromosome 9 Chromosome 10 M Chromosome 11 M Chromosome 12 M Chromosome 13 M Chromosome 14 M Chromosome 15 A Chromosome 16 M Chromosome 17 X Chromosome 18 M Chromosome 19 4 X Chromosome 20 lt Jn ultiExperiment Viewer 229 CGH Position graph of chromosome 1 with probes annotated that are deleted in 4 or more samples Right clicking on an annotation allows for querying of genes containing in the region and to link to the CGH Browser with the selected annotation highlighted If the CGH Browser corresponding to an annotation is displayed it will display the log average inverted clone values for all experiments for the chromosome corresponding to the annotation Annotations can be cleared by using the Clear Annotations item in the Display menu Saving Results The Results of any CGH Analysis algorithm can be saved as a tab delimited text file To do this select
227. nager Sj History E Indicates number of training set elements in a given class after variance filtering if applied TIGR MultiExperiment Viewer 10 18 3 KNN Classification cross validation statistics viewer 136 11 20 DAM Discriminant Analysis Module Danh V Nguyen and David M Rocke 2002 DAM is a method for classification of genes experiments into more than two groups or classes DAM incorporates a gene dimensional reduction method Multivariate Partial Least Squares MPLS and two classification methods Polychotomous Discriminant Analysis PDA and Quadratic Discriminant Analysis Either PDA or can be performed after MPLS depending on the user s selection A three dimensional plot of the most significant gene components is generated and displayed in the DAM results viewer The results viewer also contains Expressions Images Expression Graphs Centroid Graphs and Cluster Information for experiments that are used as classifiers and for experiments that are to be classified DAM can be launched from the Multiple Array Viewer toolbar by selecting the DAM button or by using the DAM menu option in the analysis menu DAM Initialization Classification Selection Classify Genes Classify Experiments Data Screening v Enable Data Screening Step ANOVA Alpha Value 0 05 Classification Algorithm Selection Q PDA DAM Classification Parameters Number of Classes Number Of Compon
228. nding member of group B e g gene expression measurements on a group of subjects where measurements are taken before Group A and after Group B drug treatment on each subject 3 Multi class where the user specifies the number of groups gt 2 that samples fall into Genes will be considered significant if they are significantly different in expression across some combination of the groups 4 Censored survival where each sample is associated with a time and a state censored or dead Censored samples are those for which the subject was alive at the time the data were collected and no further data are available for those subjects 5 One class in which the user specifies a value against which the mean expression of each gene is tested A gene is considered significant if its mean log2 expression ratio over all included samples is significantly different from the user specified mean To exclude a sample from the analysis uncheck the box next to that sample s name in the left pane of the one class screen 117 The data for each gene are permuted and a test statistic d is computed for both the original and the permuted data for each gene In the two class unpaired design d is analogous to the t statistic in a t test in that it captures the difference among mean expression levels of experimental conditions scaled by a measure of variance in the data Missing values in the input data matrix are imputed by one of two methods 1 Ro
229. ng information such as gene lists Each option is describe below Control Panel This option determines whether the navigation control panel is visible or not Grid The Grid menu item permits the selection of the terrain s resolution or smoothness Reducing the grid value can sometimes improve viewer response by saving computational expense on the computer Lowering the grid value will render the terrain as a series of planes Fill Polygon This option alters whether the terrain s surface is render or if simply the outline of adjacent planes Like the Grid option deselecting this can produce faster rendering of the viewer during navigation Element Shape Three optional element shapes are supported point cube and sphere Labels Labels can be displayed or hidden If displayed there is a billboard option to help render the labels such that they can be seen from the current point of view Note that if identification of a cluster of elements is desired the best way to verify is to select the elements and store them as a cluster or launch a new viewer containing the elements Both options are described below Links Elements passing at threshold criteria can display a link between the points The links menu option allows the user to display or hide displayed links between elements passing the threshold criteria Within the links menu the current threshold can also be set in order to visualize the strength of the associations within
230. nsidered amplified If a region is flanked by one deleted and one amplified clone the region is considered as deleted and amplified allowing for maximum flexibility in algorithms that use flanking regions Flanking regions can be toggled through the Flanking Regions checkbox item in the Display menu Right clicking on any flanking regions Figure 2 4 will allow for querying of the genes containing in the flanking region querying the intensity ratios of the probes that make up the region and to link to the CGH Browser 224 Multiple Array Viewer File Adjust Data Metrics Analysis Display Utilities CloneValues CGH Analysis er EF A HCL ST i QTC SOM FOM ST SOTA H Chrornosorne vie A Chromosome 1 A Chromosome 2 A Chromosome 3 A Chromosome 4 A Chromosome A Chromosome 6 A Chromosome 7 A Chromosom A Chromosome 9 A Chromosome 10 A Chromosome 11 A Chromosome 12 A Chromosome 13 A Chromosome 14 A Chromosome 15 A Chromosome 18 A Chromosome 17 A Chromosome 18 A Chromosome 18 X Chromosome 20 vs humd vs humd vs humd vs humd E V 2 L 3 G HUP T4 vs humd HPAC vs humd Hs766T vs humd 890 Dani 1 p Er i N N E i oa Show Genes in Region Show Browser 2 4a 2 4 a In NX Display Data Values Launch Ensembl E Hi gt ultiExperiment Viewer Launch NCBI Vie
231. ntergenic Length 1 pixel Max Intergenic or unsampled Length Locus Replicate Rendering This option will display an arrow for each of the spots related to the locus Because of the complex structure arrow lengths and intergenic lengths will be fixed when this option is selected 1 Show Locus Replicates representative spots TI G MEME CUN Preview Reset Cancel Apply 11 215 LEM Customization Dialog 149 The main selection to make is whether the loci arrows should be of fixed length in which case the user selects the fixed length in pixels or if the arrow length should be scaled to reflect locus sequence length If the loci arrows are scaled a scaling factor can be selected to control resolution Selecting fewer bases per pixel will elongate the viewer and will allow one to distinguish finer differences in length The minimum arrow length is designed to render very small sequences with at least a small arrow so that associated annotation can be displayed The maximum scaled arrow length allows one to constrain long sequences to a reasonable length When using the scaled arrow lengths loci that have overlapping coordinates are offset to the right so that arrows do not directly overlap TIGR Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities HCL ST SOTA RN cast QTC GSH SOM u Fa GDM PCA 2 TRN 4E amp ls Clu
232. ny effects 125 11 18 SVM Support Vector Machines Brown et al 2000 Although SVMs have been used in various fields of study the use of SVMs for gene expression analysis was described in detail by Brown et al SVM is a supervised learning classification technique The algorithm uses supplied information about existing relationships between members of a subset of the elements to be classified The supplied information an initial presumed relationship between a set of elements coupled with the expression pattern data leads to a binary classification of each element Each element is considered either in or out of the initial presumptive classification The algorithm proceeds through two main phases The first of these phases training uses the presumptive classification supplied knowledge and the expression data as inputs to produce a set of weights which will be used during the next phase The second phase classification uses the weights created during training and the expression data to assign a discriminant or score to each element Based on this score each element is placed into or out of the class Fig 11 18 1 Initial D Classification SWM Training SVM Classification Elements Elements In Out of Classification Classification 11 18 1 SVM Process Overview SVM Dialog Overview The initial dialog Fig 11 18 2 is used to define the basic SVM mode One can select to classify genes or experiment
233. o vectors to vary in the same direction from one element to the next In the case of gene expression vectors for each observation of expression the expression is compared to the previous measurement to determine if it is a relative increase or decrease in expression If both expression vectors change in the same direction both increasing or both decreasing then the metric is incremented If the expression vectors for that element change in expression in opposite directions then the metric is decremented The measure is finally scaled by the number of observations This metric ignores the magnitude of the expression levels but purely looks at inflections in the expression patterns 256 18 Appendix MeV Script DTD 18 1 DTD lt lt xml encoding UTF 8 2 gt gt lt ELEMENT TM4ML midas dbi_controller mev gt lt ATTLIST TM4ML version CDATA REQUIRED gt lt 1 midas and dbiController place holders gt lt ELEMENT midas EMPTY gt lt ELEMENT dbi_controller EMPTY gt lt ELEMENT mev primary_dat
234. o correspond to the next or previous locus relative to the currently displayed locus This feature allows one to advance one locus at a time in order to compare expression patterns between adjacent loci This can allow the user to step through sections by viewing loci in order The Select button is used to push the locus onto the LEM s locus selection list The locus selection features are detailed in a later section 148 The Gene Page button launches a web resource relevant to the displayed locus If the appropriate resource cannot be identified by MeV using the locus ID field name a list of resources will be presented from which one can identify the proper resource Customizing Viewer Appearance Scaling the Viewer The LEM permits one to scale loci based sequence length so that the arrows corresponding to loci reflect the length of the sequence User defined constraints allow one to limit the length to fall within reasonable bounds The scaling options are available via the right click context menu by selecting the Customize Viewer menu item A single dialog box helps to define the view Customize LEM Viewer Locus Arrow Dimensions amp Use Fixed Arrow Length Fixed Arrow Length pixels gt 15 Use Scaled Arrow Length Scaling Factor Minimum Scaled Arrow Length 7 15 Maximum Scaled Arrow Length Intergenic or Unsampled Region Dimensions vj Use Fixed I
235. odes shown as light green are sources for data for attached algorithms Any number of algorithms can be attached to a data node Algorithm nodes shown as light yellow represent processes that transform the data or act on the data to produce results 206 b TIGR Multiple Array Viewer File Adjust Data Normalization Distance Analysis Display Sort Help iri b u a hain view B rerom ha Analysis Results 5 Script Manager 2 Save Script EE Script Table j Script 1 Script Tree Viewer Script XML Viewer TF History TIGR MultiExperiment Viewer 12 2 Script Tree Viewer with Initial Primary Data Node and Pop up Menu Adding an Algorithm Select a data node to use as source data by left clicking a data node Selected nodes will have a blue highlighted border when selected A right click will reveal a menu containing an Add Algorithm Node menu option Add Algorithm Node will present a dialog used to select the algorithm and parameters to append to the data node 207 e Script Algorithm Initialization Dialog wets Selection Preview Algorithm Category Analysis Algorithm Algorithm SAM Significance Analysis for Microarrays Analysis Algorithms HUSENE ECHO OTI U TET Algorithm Selection rr Reset Cancel 12 3 Script Algorithm Selection Dialog Analysis Algorithm Panel The algorithms fall into
236. odes in the result navigation tree under the HCL result node The results are added once the HCL Tree Configuration dialog has been dismissed The minimum and maximum pixel distance imposes limits on the minimum and maximum displayed inter node distance This alters the appearance of the tree The Apply Dimensions button causes the entered tree dimensions to be applied to the HCL tree This allows one to fine tune the tree s appearance without dismissing the dialog 70 ar THA err nn aD ROOT HIRI IRR RETR RAR ORS 11 1 5 HCL Tree with distance thresholds applied 71 112 TEASE Tree EASE Selecting this analysis will display a configuration window with three main modes and four tabbed panels each contains essential configurations that are specific for Hierarchical Clustering HCL or EASE analysis Once the analysis is completed select the tree node under Analysis Results to view the hierarchical tree with color dots signified whether the cluster contains over expressed biological categories Red most significant Blue least significant To view EASE analysis information of each cluster position the mouse over the color dot at the root of the cluster to display a popup window The first column in the window is the name of annotation file which contains the category The second to the third to the last column are informat
237. of Classes Label 1 Breast Label Colon Label 3 Leukemia Label4 Renal Analysis Mode Advanced Parameters Advanced Folds CV runs 20 Max Delta p Low i Train amp Classify om High 3 Classify from File Step ese 11 29 2 USC Initialization Dialog If you are doing Training amp Classifying the USC algorithm needs to know the classes of the experiments in the training set Using the pull down menus assign labels to each of the experiments that were loaded Label any test experiments as Unknown Test Keep in mind that you are not required to test any experiments at this point You may just classify an entire training set saving the classifier as a file for later use on any test experiments of choice 199 USCAssignLabel Assign Labels Breast BREAST Breast BREAST Breast BREASTS Breast BREASTA Breast BREASTS Breast BREASTS Colon Coloni Colon Colon Colon Colon Colon Colona Colon Colons Leukemia w Leukemia Leukemia Leukemia Leukemia Leukemia Leukemia Leukemial Leukemia Loukomia5 Leukemia Leukemiati Renal Renal Renal Renan Renal Renal Renal Renals Renal Renal Unknown Test Samplet 11 29 3 USC Assign Label Dialog Click OK and
238. of files to be loaded Similarly select the file s containing the appropriate annotation information in the section labeled MeV Annotation Files ann dat The Load Integrated Intensities Load Median Intensities options specify that MeV should load either integrated intensities default or Median intensities Note that care should be taken to select the intensity measurements integrated or median that have been previously normalized in MIDAS so that the loaded data will have been normalized and possibly trimmed The Load Auxiliary Spot Information option will load spot background spot pixel count and other spot specific items Loading this information will not impact the analysis results but will allow you to view this data when clicking on an expression element The default is not to load this spot information since it consumes significant system memory and can severely limit the number of mev files that can be loaded due to memory constraints The Use Annotation Contained in MeV File option overrides the use of an annotation file by loading annotation that is contained within the MeV file This is a specialized case and for ease of annotation updates it is suggested that you adhere to using a separate annotation file even if the mev file contains annotation 9 This option is off by default The Remove Annotation Quotes option removes quotations from annotation fields where the annotation entries start and end with quotation ma
239. of the interval and then SHIFT clicking on the last row of the interval in Windows Non contiguous rows may be selected by CTRL clicking on the desired rows To delete a cluster stored from this viewer 1 either all the rows in the table or a subset of them select the rows corresponding to the stored cluster and choose the 47 Delete option on the right click menu Please note that for successful deletion the stored cluster should have been created within the current algorithm run that the table view belongs to and the row selection should exactly correspond to the rows in that cluster The Search function on the right click menu pulls up a search dialog as shown below table search __________ emn Select fields to search YORF vj NAME vj GWEIGHT II GroupA mean GroupA std dev GroupB mean Hndwhat Select All Rows Found _ Match Case Select Incrementally 7 5 2 Table search dialog Users may search any combination of columns for specific terms Other options on this dialog are self explanatory The Link to URL option allows users that are connected to the Internet to launch a web browser to open a page related to the selected gene First select a gene of interest a row and then click on the annotation key to use as the gene identifier MeV will attempt to construct a URL for the selected annotation key by checking a URL configurati
240. of the scale gene distances map to a different location on the color scale The figure of the Color Scale Dialog illustrates this The lower limit is at 0 0 while the upper limit is set to 0 4 Any element having a distance greater than 0 4 will appear saturated in color at the bright red end of the gradient The effective range is 0 0 to 0 4 and this will accentuate distances in this low end range By imposing different limits it is possible to get better resolution color differentiation within a given range When setting the upper or lower limit to values other than 0 0 and 1 0 respectively there will always be some gene pairs that have distances that fall off of the upper or lower effective range The percentage of elements in the matrix that are saturated is displayed as a guide to the percentage of elements that are off of the effective range When altering color scale it is often useful to hit the preview button to view the effects on the actual matrix Reset will return the values to the values that were in effect when the dialog was launched Cancel will return the limits to the original values in effect when the dialog was launched and will dismiss the dialog Note that the current limits are always displayed in the labels on the header color gradient x Lower Limit 0 1 00 Upper Limit 0 1 0 4 Saturation 68 97252 Effective Range Preview 11 22 3 Color Scale Dialog Toggle Sort on Proximity In the
241. oid Graphs n gt s N O n O utu Cluster Information General Information D History Expression Graphs n Q N iD D o0 O n Q N O TIGR MultiExperiment Viewer 7 2 1 Expression Images Viewer Clicking on any of the rectangles in this view will open a window reporting information for that spot 7 2 2 This window contains a great deal of information including expression values and row column coordinates Clicking 40 the Gene Graph button will open a window containing a graph of this gene s expression level across all samples 7 2 3 To view a Single Array Viewer displaying the entire experiment of which this spot is a part click the Sample Detail button in the Spot Information window 7 2 4 The Set Gene Color button has not been enabled in this version of MeV amp Spot Information Location and Intensities File Index 383 Value 2 3451908 Sample Annotation Default Side Ex Gene Annotation YORF H24 NAME H24 GWEIGHT 1 GenBank H05768 Gene Graph Sample Detail Set Gene Color Close Spot Information 7 2 2 Spot Information Window 41 5 Sample vs Log Ratio Sample vs Log Ratio S 8 N l Sample Name 7 2 3 Gene Graph Window 42 Single Array Viewer File Views Normalization Sort Display Control bsp30025a0001
242. ole Cluster ComName TIGR Reset cancel 9 1 1 Search Initialization Dialog The search result is presented in a new window that is split into an upper section that contains the list of genes or samples identified as matching the search criteria and a lower section providing shortcut links to cluster viewers that contain the identified samples or genes 58 The upper panel displays the list of genes or samples retrieved by the search The list of genes or samples found could include some entries that not of interest Elements in the list can be deselected using the checkboxes Clicking on the Update Shortcuts button will produce a new search result window with just the previously selected entries and the associated viewer shortcuts This allows one to prune unwanted elements out of the search result The Store Cluster button will store the selected items as a cluster and assign a user selected color Note that this is one method to help identify the elements found during the search within the cluster viewers 3 Search Result x Genes Found Number of genes matching the search 40 Selected File Index Color Clone ID GB TC Putative Cluster 0 UIMBH 8527 TC382 TC EG 4323 14 UIMBH Al8545 TC443 TC Un 41 NA i 9 UIMBH 8530 TC375 bre 4376 NA Selec
243. ollowing four options are only enabled if a group of elements has been selected in the selection area Points are selected by holding down shift and dragging the mouse in the viewer Once selected the following options are enabled Store Cluster The store cluster option will store the elements in the selection box in the Cluster Manager The clusters in the repository are viewable in the cluster table and if incomplete selections are made in TRN several clusters can be made and then joined Union operation in the cluster table to capture all of the desired elements Launch New Session This feature launches a new Multiple Experiment Viewer containing the selected elements This is perhaps the fastest and most convenient means of viewing the selected elements An alternative to this is the Show Selection option described below Deselect This option dismisses the selection area Show Selection This option shows the selected elements within a small window Note that the window supports the selection of multiple elements using shift click or ctrl click and the list can be copied and pasted into other applications by ctrl c 176 File Adjust Data Normalization Distance Analysis Display Sort Help A 52 m r HCL ST SOTA RN GDM QTc 2 AM 25 9 M SVM PCA TRN EASE GSH SOM FOM PTM 2
244. on file If it is unable to definitively decide what the selected annotation type is a dialog will be presented to allow you to select an annotation field name column header name and an external resource to open 48 amp Link annotation to URL X Select annotation to link out from Select internet resource to link to GenBank Accession Number Y MultiExperiment Reset Cancel OK Viewer 7 5 3 Annotation URL Association Dialog Within the config directory there is a file annotation URLs txt that contains entries to support this feature in mev This file can be modified with new external resources as needed and the existing fields can also be modified with the exception of the UniGene field see below The file is arranged such that each row represents a different annotation label resource pair with the following tab delimited fields X indicates a tab Annotation Label X lt URL gt V Resource Name The Annotation Label indicates the name of the annotation field that is loaded into mev This is in the header of the annotation file or TDMS file For example if in some files you call GenBank GB and in other files it is referred to as GenBank then two entries in this annotation url file can be made one for each key If an input annotation file had the GenBank numbers identified as gb then the Annotation URL Association dialog would launch to allow you to manually indicate the desired externa
245. on to new node during a cell division candidate cells are determined by moving up the tree toward the root this number of levels From that node all cells terminal nodes within this subtree are targets for possibly accepting expression vectors Each vector moves into the cell to which it is most similar Cell Division Criteria Parameters Use Cell Diversity Use Ce pValue Cell diversity is the mean distance between the cell s members expression profiles to the cell s centroid vector When considering which cell to divide the cell with the greatest diversity is split providing it s diversity exceeds Max Cell Diversity see above Variability Cell variability is the maximum element to element distance within a cell The cell having the largest internal gene to gene distance is selected as the next cell to divide In this case the stopping criteria is changed so that growth continues until the most variable cell falls below a variability criteria generated using the provided pValue see below This value is used when using variability as the cell division criteria A distribution of all element to element distances is generated by resampling the data set with each expression vector having randomized ordering of vector elements The resulting distribution represents random gene to gene distances The pValue supplied is applied to this resampled distribution to generate a variability cutoff Clusters falling below this variability
246. oportion of the total chi squared value of the matrix explained by that axis The inertia values are provided under the corresponding node under the main COA analysis node 168 amp TIGR Multiple Array Viewer E File AdjustData Metrics Analysis Display Aloe W X HCL ST SOTA RN KMS GSH SOM F GDM TRN EASE EW 31 M Cluster Manager 9 Analysis Results Dec 23 2004 10 22 22 AM Data Source Selection m coa 9 Projections on COA axes Components 1 2 3 33D Views 3D view genes r 3D view expts 3D view bote 32D Views Reset 720 view 17 Options 2120 view 9 32D view 31 2 32 3 m 31 3 i Show gene text Inertia values B samples 2 1 e Expression Image Centroid Graphs Expression Graph 9 views HB cluster 1 Cluster Informatiol SLED ERS General Informatio White background TIGR MultiExperiment Viewer pelete 11233 COA 3D display M Show sample text i Show selection area f amp TIGR Multiple Array Viewer D File AdjustData Metrics Analysis Display Utilities Help
247. option off if you want to eliminate all those elements from the analysis that have at least one zero intensity value If you deselect this option if either intensity is zero then the log ratio computed for analysis is set to a Not A Value flag NaN and it is not used during any analysis and appears as a gray element in the expression image Note that with 30 either option any elements with two zero intensities have the computed log ratio is set to the NaN flag since a log ratio cannot be computed Bioconductor detection call noise filter Select Bioconductor detection call noise filter to filter the genes for which the absent call percentage across all samples is above the level users define in the dialog This will not delete any data but will only exclude the genes from analysis Users can check how many genes are filtered from history log Set Percentage Cutoff Enable Percentage Cutoff Filter Percentage Cutoff 0 0 0 9 uem RES Reset Cancel OK 5 4 6 Set Detection Filter Dialog 31 GenePix Flags Filter The GenePix Flags Filter allows the removal of genes for which the samples with negative Flags percentage across all samples is above the level users define in the dialog This will not delete any data but will only exclude the genes from analysis This option is sometimes useful in speeding up module calculation since many zeros will often slow them down To determine which genes wi
248. or medians as well as the number of clusters and iterations to run Once the computations are complete select the KMC node under Analysis to view the results There are several sub nodes beneath KMC further divided by the clusters created based on the KMC input parameters Hierarchical trees shows trees constructed for each cluster if the option to draw hierarchical trees for clusters is selected Expression images are similar to the main display Cluster Information is a summary of each cluster based on size and composition Centroid graphs show the centroids for each cluster and experiment individually or all at once Expression graphs are similar to centroid graphs but with each gene s expression levels displayed alongside the centroids Right clicking within an expression image displays a popup menu that allows the user to propagate the cluster to other displays Set Public Cluster save and delete the cluster This method of clustering is useful when the user has an a priori hypothesis about the number of clusters that the genes should subdivide into KMC K Means K Medians Sample Selection Cluster Genes lt Cluster Samples Parameters Calculate K Means Calculate K Medians Number of clusters 10 Maximum iterations 50 Hierarchical Clustering Construct Hierarchical Trees TIGER e S Reset cancel 11 6 1 Initialization Dialog Parameters Sample Selection The sample select
249. ort Help A 92 EE E W EB mi amy 2 HCL ST SOTA RN KMC KMS cast QTC GSH SOM FOM M Anova anova SVM KNN TRN EASE MeV l xnl version 1 0 2 gt Vi Mies 2 lt DOCTYPE TM4ML SYSTEM config mev script dtd dtd Main View 2 S TM4ML version 1 0 gt Cluster Manager 2 ln Analysis Results 9 5 Script Manager Script Table lt Script Name Study 12231 Script gt J Scripti lt Script Description Percent cutoff SAM EASE for bio themes gt amp Script Tree Viewer E E Script XML viewer primary data 14 1 gt 9 J Script 2 lt analysis gt fe Script Tree Viewer lt alg_set input data ref l set_id 1 gt 2 Script XML viewer algorithm alg id l alg name Percentage Cutoff alg_type data adjustment input d i lt plist gt 13 History 2 m output data output class single output Clear Overlay Xdata node data node id 2 name Single gaye Script X output data lt algorithm lt falg_set gt lt alg_set input_data_ref 2 set_id 2 gt algorithm alg_id 1 alg name SAM alg type cluster genes input_data_ref 2 gt lt plist gt lt param key distance function value 4 gt lt param key useAllUniquePerms value
250. ot in the case where no node is selected The header also provides the key to relate node color to the two user defined thresholds 190 The viewer allows several options to customize the view and to extract information A right click menu provides the following options Node Style Node style dictates the rendering of the go tree nodes Minimal rendering represents each node as a circle with the color representing the p value s relation to the defined thresholds The Verbose rendering provides more information about the identity of the GO term including GO id GO term name the p value and the number of genes in the cluster and in the population that are related to this GO term Note the example of verbose rendering in the header of the Tree Viewer figure above Connector Style Two connector styles are available curved and strait Under some circumstances the strait rendering provides a more easily traced path Set Thresholds Two user defined p value thresholds can be set using this option The initial default levels are 0 05 and 0 01 for the upper and lower thresholds respectively Representation of p value significance by discrete colors provides a quick means to focus on significant results Alterations to the thresholds are immediately represented in the header and the tree s node colors Selection Polarity The selection polarity option provides the option to modify tree path selection The default is Bipolar Selection in
251. other The URL related files are also optional but having these allow the resulting tables to link to web sites which describe the findings such as GO terms or KEGG pathways See the table below for a summary of the directory structure EASE Root of the EASE file structure Data Encompasses EASE data files Convert Contains files linking indices e g GenBank gt Locus Link ID These files are optional and are not needed if the annotation indices for each gene are the same as the keys in the Classification File below Class Contains files linking indices to biological themes e g locus link id gt GO Biological Process MINIMAL REQUIREMENT Implies Contains optional files relating themes to other themes e g hydrogen ion transport Implies cation transport URL Data Contains optional files describing a url indicating the tag accession placement The contents of these files act as a models when constructing links to resources using annotation accession numbers Tags Contains optional files linking biological theme or pathway to accession numbers EASE Directory Descriptions The files behind EASE are often used to link one annotation key to another annotation value and for that purpose most files are arranged in rows containing key value pairs separated by a tab delimiter The figure below 9 18 3 shows a scenario which demonstrates how a primary index in MeV can be mapped through a secondary index
252. oung methods for permutations only minP Hierarchical Clustering 0 Construct Hierarchical Trees TI G R i cen Reset Cancel 11 17 1 TFA initialization dialog oa The p value parameters and alpha corrections not yet implemented are similar in function to the corresponding features in t tests and one way ANOVA The cluster views in the output are similar to those of most other modules Table viewers display the annotation F values and p values of genes Two or three p values are generated for each gene one each for the effects of the two factors and an interaction p value if relevant see below A significant gene cluster is 124 generated for each significant effect F values and p values are saved when clusters are saved as text files from the right click menu on any viewer A few points should be kept in mind while running this analysis Optimally there should be equal numbers of samples in each cell i e for each factorA factorB combination If samples sizes in cells are unbalanced F tests are biased the degree of bias depending on the amount of imbalance In such a case the F tests might be evaluated at a more stringent critical p value than the one originally intended Unbalanced designs as described above can occur in two ways 1 by initially specifying the factor assignments in such a way that they are unbalanced or 2 due to missing values for a gene so that even if the original
253. ource Selection Cu cwn vy vy vy 3LEM Viewer i Script Manager History tat VY vy OD OD 4 4 OD 44 Oo DOM het ttt VY vy DS OD e at at De 44 at OO ODO DOR 444 4444 VY vy De OD 44 at at 4 44 44 net O04 0 0 9 9 0 gt P 2 9 P gt gt v IGR MultiExperiment Viewer 11 21 7 scheme Color Scale Options LEM with Locus Spot Replicates Shown blue yellow MeV color The arrows in the LEM are given a color that indicates the level of expression This color is taken from a scale based on the spot s expression value Altering the color scale limits can help resolve or ignore if appropriate levels of expression that are closer together The LEM provides three modes to assign colors to expression values These color assignment modes can be selected via the right click menu in a submenu labeled Color Scale Options Gradient Color Mode 151 The gradient option is commonly used and the value limits can be set using MeV s Display menu The Display menu also contains options to apply a different gradient color scheme The Gradient limits and color scheme options are described in other manual sections related to the Display menu These controls are placed in MeV s menu since they relate to all expression views in MeV Please see manual sections 6 4 Setting Color Scale Limits and 6 3 Color Scheme Selection
254. per author use irstname lastname or firstname middleinitial lastname latform pubmed id optional D required a unique id should be provided for each the array EADER 2 required ified using one or more columns n EADER 3 EADER N optional provide as many headers as needed to fully describe th lements of the array Platform table begin ID HEADER 2 HEADER 3 HEADER N insert data table here columns may appear in any order after the ID column Platform table end in situ tissue MS 59 FO ct spot all elements of the array should be in addition to the ID Ej ct 3 optional T RRO GGT p HHO rg ru tu ty 3 rg 15 9 GEO SOFT two channel file format GEO Simple Omnibus Format in Text SOFT file format is a kind of flexible tab delimited file format for two channel data Users can check the file format in details at http www ncbi nlm nih gov projects geo info soft2 htmlZSOFTsubmissionexam ples A template for two channel file SAMPLE required Sample title required Sample_source_name_chl required Sample_organism_chl required Sample_characteristics_chl required Sample biomaterial provider chl optional Sample treatment protocol chl optional 248 15 10
255. ple Selection Cluster Genes Cluster Samples Parameters Dimension X Initialization Dimension Y Iterations Neighborhood Gaussian Alpha Radius 8 Topology Hexagonal Random Genes Hierarchical Clustering 1 Construct Hierarchical Trees TIGR uan Reset Cancel OK 11 10 2 SOM Initialization Dialog Basic Terminology Node An SOM structure to which expression elements are associated to form clusters Each node contains an SOM Vector SOM Vector A vector of size n which represents it s node s location in the n dimensional expression space Distances from this vector to expression vectors in the input data are used to determine to which node an expression vector should be associated Training Adaptation The process of repositioning the SOM Nodes by altering their SOM Vectors The adaptation process is a result of an expression element being associated with a node The new position is determined by the distance between the expression element and the SOM Vector the Alpha value and the neighborhood convention see below Topology A two dimensional topology used to define how node to node distances are calculated Note that a cluster is a collection of expression elements associated with a Node Parameters Sample Selection The sample selection option indicates whether to cluster genes or samples Dimension X This positive integer value determines the X dimension of the resulting
256. plifications on the dataset Deleting a Node Nodes in any tree can be deleted Right click on the node and select Delete Searching for a Gene Selecting the Find Gene item from the CGH Analysis menu will display a dialog prompting for the name of a gene Enter the name of a gene of interest and click Ok A dialog will appear showing how many times that gene is deleted and amplified in the dataset Selecting Annotate Selected from the Annotations menu of this dialog will display the CGH Position graph corresponding to this gene with the gene annotated Higher Level Analysis Refer to other sections of the MeV manual for a description of the analysis capabilities of the Multi Experiment Viewer 234 14 Working with the Single Array Viewer 14 1 14 2 The Single Array Viewer displays one slide at a time Open a Single Array Viewer by choosing New Single Array Viewer from the File menu in the MeV main toolbar Once a Single Array Viewer window is open use its File menu different from the main menu bar s File menu to load a slide Open Experiment From File loads a flat file in tav format and Open Experiment From DB loads array data from a relational database using several stored procedures If the user selects the former option an open file dialog will be displayed prompting the user to select a flat file to load The latter option displays a list of experiment names and then analysis ids from the database By selecting an exper
257. pression level is significantly different between the two groups is determined either by directly comparing the gene s p value with the user specified critical p value or alpha or by adjusting the p values using a correction for multiple testing see screenshot below 109 TTEST T test 25 Ext Ex2 Ex3 Ex4 Ex5 Enter the mean value to be tested against 0 Ex y Ex8 Ex9 Save settings Load settings Reset Variance assumption for between subjects t test only Welch approximation unequal group variances Assume equal group variances P Value Parameters p values based on t distribution C p values based on permutation Randomly group samples 00 times Use all permutations Overall alpha critical p value 0 01 p value false discovery corrections just alpha no correction standard Bonferroni correction adjusted Bonferroni correction Step down Westfall and Young methods for permutations only CO minP O False discovery control permutations only With confidence of 1 alpha EITHER The number of false significant genes should not exceed 10 OR The proportion of false significant genes should not exceed o 05 Fast approximation but possibly conservative 2 Complete computation possibly slow C Calculate adjusted p values for false discovery control Hierarchical Clustering a
258. r S History TIGR MultiExperiment Viewer BS 82 bh b Primary Data 1 218 1 1 Positive significant Genes 2 Single Ordered Output 61 y Negative Significant Genes 3 All Significant Genes 4 Non significant Genes 5 1 Single Ordered Output 7 13 Comparitive Genomic Hybridization Viewer Loading Experiments Currently CGH Analyzer is capable of loading experiments only in one generic format We will provide other loaders to accommodate various other formats in the future Currently CGH Viewer supports only 2 species data Human amp Mouse Loading from Files CGH Analyzer allows data to be loaded from one format only The format includes 4 mandatory columns followed by sample columns The mandatory columns are Probe Marker Probe Chromosome Probe Genomic Start in BP 1 2 3 4 Probe Genomic End in BP The mandatory columns are followed by Sample observations where each observation for each probe is the 1022 or simple intensity ratio of amp Cy5 If the observations are not log2 transformed they are done so by the module Expression File Loader Select Definition to File Formats Selected File Type CGH Fi Computer o 9 e Scr biojava e cygwin e Dell Documents e C3IA Installers Java BioJava 2
259. r Information I Gene Statistics Volcano Plot 3 General Information 9 Hit Tests 2 C Expression Images 3 Centroid Graphs 9 7 Expression Graphs Significant Genes D3 Non significant Genes D All Genes 3 Cluster Information 5 Project previously stored cluster colors Use gene selection sliders Store selected genes as cluster Gene Statistics D Volcano Plot C General Information 9 CIT Tests 3 7 Expression Images 0 Centroid Graphs Expression Graphs D Significant Genes D Non significant Genes D All Genes C Cluster Information 9 Gene Statistics D Significant Genes D Non significant Genes 114 0 91 a Launch new session with selected genes Save selected genes as cluster 0 69 0 46 0 23 0 23 Mean GroupB Mean GroupA 0 46 0 69 0 91 Volcano Plot TIGR MultiExperiment Viewer 11 14 4 TTEST Volcano plot 116 11 15 SAM Significance Analysis of Microarrays Tusher et al 2001 implemented as in Chu et al 2002 SAM can be used to pick out significant genes based on differential expression between sets of samples It is useful when there is an a priori hypothesis that some genes will have significantly different mean expression levels between different sets of samples For example one could look at differential gene expression between tissue types or d
260. r a sample if it belongs to groupl users define it as 1 and if it belongs to group2 users define it as 2 and so on If it belongs to neither group users define it as 0 4 Censored survival 250 15 12 15 13 The loading file format is a tab delimited file Users define one sample in one line Following is a sample file The first column is 1 selected O unselected The second column is time and third column is 1 censored or 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 5 One Class The loading file format is tab delimited file Users define 1 selected or 0 unselected for each sample TTEST group of class loading file format 1 One Class The loading file format is tab delimited file Users define 1 selected or 0 unselected for each sample 2 Between Subjects The loading file format is a tab delimited file For a sample if it belongs to group A Users define it as 1 and if it belongs to group B Users define it as 2 and if it belongs to neither group Users define it as 3 3 Paired The loading file format is a tab delimited file Samples start from 0 Users define one pair in one line Following is a sample file The first sample and second sample are in one pair The third sample and fourth sample are in one pair and so on NBN YO e ANOVA group or class loading file format The loading file format is a text file For a sample if it belongs to groupl users define it as 1
261. r gene already loaded in MeV One key should be specified for the currently loaded data This key should be a gene identifier of some sort and should correspond to the annotation field selected as the file s primary key The values of these annotation keys is used to map or correlate gene annotation in the file to the loaded genes The lowest section of the parameter dialog specifies the annotation fields from the file to import Import status will be reported and an entry that logs the import will be added to the history node s history log Append Gene Annotation Annotation Key Selection Gene Identifier from current data UID Corresponding Gene Identifier from Input File Note The Gene identifiers from the loaded data and the input annoation file should correspond to the same annoation type These identifiers are used to map annotation in the file to the proper genes loaded in MeV Select Annotation Fields to Append ETE 0 UID Putative Role Cluster vj ComName TIGR imperier cance 9 4 1 Gene Annotation Import Dialog 62 10 Creating Output While creating output files is not the main purpose of MeV several options exist for saving files from the Multiple Array Viewer for later use For details regarding file output with the Single Array Viewer please see section 14 10 10 1 Saving the Analysis Saving the state of an ongoing ana
262. r node on the result navigation tree will contain a list of stored clusters Gene clusters and sample clusters are maintained in separate spreadsheets which are viewable from the Cluster Manager node When storing a cluster to the repository an input dialog is presented which allows for three user defined fields to be associated with the cluster Two optional text fields are used to capture a cluster name and a description of the algorithm or interesting features of the cluster The third user input is a color used to identify genes or experiments which are members of the clusters These colors can be tracked while performing analyses so that clustering consensus can be established Store Cluster Attributes Analysis Node genes 2 Cluster Node m Cluster lt Cluster Label KMC up reg Remarks KMC with k 10 exp increases over experiments optional fields 5 Reset Cancel OK 8 1 1 Cluster Attributes Dialog The cluster tables contain the following columns The Serial Number is a unique number which is sequentially assigned to easily identify a particular cluster 51 The Source field describes whether the cluster source was an algorithm a cluster operation or some other means of selecting a group of elements The Algorithm Node field identifies the algorithm used if the source was an algorithm and includes the navigation tree result index in parentheses
263. ral Information SVM PCA Be D History TIGR MultiExperiment Viewer 11 6 2 K Means K Medians Clustering Expression Graphs 90 117 KMS K Means K Medians Support This module allows the user to run the K Means or K Medians algorithms multiple times using the same parameters in each run Owing to the random initialization of K Means and K Medians the clusters produced may vary substantially between runs depending on the data set and the input parameters The KMS module allows the user to generate clusters of genes that frequently group together in the same clusters consensus clusters across multiple runs The output consists of consensus clusters in which all the member genes clustered together in at least x of the K Means Medians runs where x is the threshold percentage input by the user see screenshot below Parameters Sample Selection The sample selection option indicates whether to cluster genes or samples Means Medians option The Means or Medians option indicates whether each cluster s centroid vector should be calculated a mean or a median of the member expression patterns Number of k means k medians runs This integer value indicates how many times KMC should be run Threshold of occurrence in same cluster This parameter indicates the minimum percentage of times that two elements should cluster together in order consider the two
264. ral Information tab always contains a summary of the parameters used in the analysis Each algorithm run will present a dialog or form to use to input parameters specific to the algorithm being performed An information button on each dialog lower left corner can be used to retrieve a reference page describing the required parameters Customizing the module menubar Under the Multiple Array Viewer window s File menu is the option Customize Toolbar Choosing this option will generate a dialog for users to select the modules to be displayed in the Multiple Array Viewer After data has been loaded into MeV this function will be disabled for safety reason All modules will still be listed under menu Analysis grouped by submenus 65 6 11 1 HCL Hierarchical clustering Eisen et al 1998 Selecting this analysis will display a dialog that allows three different linkages and options to cluster genes samples or both Once the computations are complete select the Tree node under Analysis to view the hierarchical tree The display is similar to the main display but similar genes and experiments are connected by a series of branches Labels are displayed on the right side Clicking under a branch intersection node will select that node and the subtree below that node Once selected right clicking in the same area will display a popup menu that allows the user to set the highlighted area as a cluster name the
265. reprocess txt paper BreastvsTumorData txt PrintHood H C3 rama Recent Icore sample txt gt itime Eclt lt Eus 222 p Click the upper leftmost expression value Click the Load button to finish J Start Menu File Available Annotation Fields CJ Templates tmp 38 txt 5 C Tora affy_call txt UserData affy call simple txt rCallFile WebEx 2 affy call simple2 txt Aworkspace affy mas5 txt Al r Selected Call File m metada affy mas5 simple txt e affy_mas5_simple2 txt 7 mev Affy sample1 txt C so Affy sample2 txt mevorig bnpreprocess txt 9 Amevwe BreastvsTumorData txt C bin censor txt con 3 at EA Tl 4 ZZ 6 TIGR Cancel Load 4 10 1 Bioconductor 55 Format file loader 4 11 Affymatrix GCOS Format files Affymatrix GCOS Gene chip operating software output pivot data files can be loaded by selecting the Affymatrix GCOS file loader option from the list of available file formats to load This file loader can actually load three similar file formats by choosing data option radio buttons Only Intensity only containing signal intensity for every experiment Intensity with Detection containing detection for every experiment Intensity with Detection and p value containing detection and P value for every experiment
266. rks This option is provided to counteract the behavior of some popular spread sheet programs where cells containing text with a delimiter such as a comma are automatically enclosed by quotation marks If MeV stalls on loading an annotation file try loading it with this option selected Expression File Loader k nl xj E Load expression files of type TIGR MeV Expression Files mev QR 7 Altiris C AntTest apache ant 1 5 4 Documents and Settings E Mail 7 Eclipse C ePOAgent External6_26_03 C forte4j C3j2sdle1 4 2 7 j2sdk1 4 2 C JB Transferred stuff jdk1 3 1 01 C Madam MIDAS 3MyProjects J AffyForIntegration C beans J CGH CGHBuilder2 ClusterEnhancement Devel 2 3 GDM 7 devel mev EASE FileLoader FileLoaderBuild J GDM Classes C helpmenu javadoc LOLA 3 0 Integration C metabolism 7 MeV MeV 2 2 devel 3MeV 2 3 Devel Data CJ Documentation r Selected Path 4 C MyProjects MeV_2 3 DeveliDataV TIGR MeV Expression Files mev Load Integrated Spot Intensities Load Auxiliary Spot Information Annotation Fields Clone_ID GB TC Putative Role Cluster ComName Load Median Spot Intensities Use Annotation Contained in MeV File
267. s Terrain Navigation The main visible controls below the terrain view are related to navigation through the terrain The navigation panel consists of nine buttons that will alter the point of view pov relative to the terrain There are three major navigation modes for moving through the terrain viewer Each mode is selected by clicking on the center square of the navigation control 11 25 3 Terrain Map Navigational Controls Various Navigation Modes Linear Axis Mode This mode selection is indicated by the four strait arrows icon in the center of the navigation control panel The arrow buttons move the point of view relative to the terrain in predictable strait line movements The up and down arrows move the pov higher or lower relative to the terrain The upper left and right arrows move the pov directly toward the terrain view while the lower right and left 173 arrows move the pov directly away from the terrain The left and right arrows move the pov left and right relative to the terrain Attitude Mode This mode controls the pitch angle and roll of the pov relative to the terrain The corner arrow buttons make the terrain roll about the center point of the viewer The left and right force a banking motion and the up and down arrows control the pitch or angle of view Rotational Mode This mode forces rotation of the terrain about a central point in the terrain The up
268. s 1 with K being incremented by 1 in each subsequent iteration up to the maximum number of clusters specified above ice Reset Cancel OK FOM Initialization Dialog FOM Iteration Selection This field specifies a number of FOM iterations to run during the analysis When the K Means mode is selected and the number of FOM iterations is greater than one the mean FOM values are reported with the standard deviation on the output graph This is useful in the case of K means where the initialization step involves an initial random partitioning Each K means run is potentially unique although each should be similar When using this option the result can be based on several runs 192 amp TIGR Multiple Array Viewer D x HCI File Adjust Data Normalization Distance Analysis Display Sort Help A FM MN RE 1 gt 18 819 0 0 ST SOTA RN GDM m Main View Mean Adjusted FOM values SD vs Number of Clusters 2 gt 7 736 1 023 M Cluster Manager 3 gt 4 22 1 624 Analysis Results 4 gt 2 668 1 176 Apr 3 2004 11 37 43 AM 5 gt 2 067 0 983 genes 1 6 gt 1 412 0 963 BB genes 2 7 gt 1 12 0 825 GDM genes 3 8 gt 1 105 0 966 E genes 4 9 gt 0 991 1 013 Graph FOM value vs st of Clusters 10 gt 0 825 0 823 General Information 11 gt 0 863 0 934 Method KMC Cal
269. s Cans 2 General Information TIGR MultiExperiment Viewer 11 23 3 COA 2D display 169 11 24 PCA Principal Components Analysis Raychaudhuri et al 2000 PCA is used to attribute the overall variability in the data to a reduced set of variables termed principal components To each principal component a certain fraction of the overall variability of the data is attributed such that each successive component determined accounts for less of the variability than the previous one This ranks the components in order of decreasing determination of data variability The first three principal components are used to map each element into a three dimensional viewer Once the calculations are complete select the PCA node under Analysis to view the PCA results Under the node called Projections on PC Axes are the default plotting of components 1 2 and 3 Right clicking on this node will allow other components to be chosen for plotting These new plots will show up as new nodes under this node 3D view is one of the primary PCA displays and is a three dimensional view The display can be rotated and shifted by left dragging or right dragging respectively Right clicking on the 3D view node will display a popup menu that allows the user to change the 3D view s display options and create a selection area essentially a cube to define a cluster The 2D views will display plots of any two components at a time Dragging the mouse over
270. s the average FOM for that number of clusters will be used to draw the FOM vs number of clusters curve If the Take Average box is unchecked each FOM value will be represented in case of such a tie and curves will be drawn through each value In the figure below the value of the adjusted FOM for a K means run decreases steeply until the number of clusters reaches 4 after which it levels out This suggests that for this data set K means performs optimally for 4 clusters and that any additional clusters produced will not add to the predictive value of the algorithm FOM is useful in determining the best input parameters for a clustering algorithm FOM Input Parameters FOM Iteration Selection This field specifies a number of FOM iterations to run during the analysis When the K Means mode is selected and the number of FOM iterations is greater than one the mean FOM values are reported with the standard deviation on the output graph This is useful in the case of K means where the initialization step involves an initial random partitioning Each K means run is potentially unique although each should be similar When using this option the result can be based on several runs A right click in the graph will provide the option to show or hide the individual lines representing FOM iterations 102 TIGR Multiple Array Viewer lol x File AdjustData Normalization Distance Analysis Display Sort Help
271. s and can select to perform one or both phases of the algorithm The Train and Classify option allows one to run both 126 phases of the algorithm Starting with a presumptive classification and expression data the result is a final classification of each element The Train only option produces a list of weights which can be stored as an SVM file along with training parameters so that they can be applied to data to classify at a later time The Classify only option prompts the user for an SVM file of weights and parameters and results in final classification The user also has an option to produce hierarchical trees on the two resulting sets of elements The second dialog Fig 11 18 3 is used during either the Train and Classify mode or the Train Only mode The upper portion is used to indicate whether the initial presumptive classification will be defined using the SVM Classification Editor or supplied as an SVC file SVM Process Initialization Classification Selection amp Classify Genes O Classify Samples SVM Process Selection Train SVM then Classify O Train SYM skip classify lt Classify using existing SVM file One out Iterative Validation see information page Hierarchical Clustering Construct Hierarchical Trees TIG ipu dE Reset cancel Continue 11 18 2 SVM Process Selection Dialog Process Initialization Parameter Information Sample Selection The sample selection option indicates wh
272. s dialog one can hit the preview button to view the alterations to the LEM If cancel is hit the dialog box will be dismissed and the original settings will be restored to the LEM The reset button will not 150 dismiss the dialog but will return the values to the values of the LEM at the time the dialog was launched The last option on the Customization dialog is the option to expand the viewer to show an arrow for each spot related to a loci If this is selected the arrow that represents the mean expression for a locus is displayed as usual but in addition arrows are displayed that correspond to each spot related to the locus This displays the expression of the spots that contribute to the locus mean The additional arrows for locus spots are displayed to the right of the locus mean arrow and are shortened slightly to help differentiate The background of alternate samples is shaded lightly to help distinguish sample boundaries When this option is selected the arrow lengths and open areas are fixed to simplify the view so that spot level arrows are not confused with short loci arrows TIGR Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities A zz HCL ST SOTA KMS TTEST ANOVA EF is KMC FOM 2 Fact aw ES d CAST GSH PTM ANOVA SVM KNN Dam a M Cluster Manager 9 Uh Analysis Results May 16 2005 12 59 45 PM 8 Data S
273. s which can be used identify genes Generally it s best to use an index or accession uniquely identifying the spotted material 183 Annotation Conversion File This optional file provides the mapping from your annotation key above to the index used to map to biological themes GO terms KEGG pathways etc If your annotation key type is the one used in the linking file below then this conversion mapping is not needed These files if needed are typically stored in the Convert directory Gene Annotation Gene Ontology Linking Files This section allows one to specify one or more annotation files These files contain gene indices paired with biological themes such as GO terms These files typically reside in the Class directory 184 f EASE EASE Annotation Analysis File Updates and Configuration Select EASE File System C mMyProjectsiMeV_3_1idataiease Update EASE File System Mode Selection Cluster Analysis Annotation Survey Annotation Parameters Statistical Parameters MeV Annotation Key Unique ID Annotation Key GenBank use annotation converter File 3_1WataleaseiDatalConveriiSarnple GB to LocusLink txt File Browser Gene Annotation Gene Ontology Linking Files __addFiles _ RemoveSelected Files GO Biological Process txt GO Cellular Component txt GO Molecular Function txt KEGG pathway txt Annotation Conversion File TIRES orm Reset Cancel OK Annotat
274. selection option indicates whether to classify genes or experiments Assessment Algorithm Selection The Assessment algorithms 0 Al amp 2 essentially contain the same fundemental stages ie Gene Selection Dimension Reduction PLS amp Classification Prediction based on Leave One Out Cross Validation LOOCV but they are odered in a different sequence AD algorithm 1 Select Genes Select a set S of p genes giving an expression matrix X of size Nx p 2 Dimension Reduction Fit PLS to obtain PLS gene components matrix T of size Nx 3 Classification Prediction Classification is based on FOR i 1 to WDO Leave out sampl row of T Fit classifier to the remaining N 1 samples Use the fitted classifier to predict left out sample i END NOTE For a given expression matrix X steps 1 gene selection amp 2 dimension reduction are fixed with respect to Thus the effect of gene selection amp dimension reduction on the classification can not be assessed Close Help Window 11 20 2 DAM Parameters Information Page The DAM Classification Editor When the Next button is depressed from the initialization dialog a DAM Classification Editor screen shall pop up This screen allows the user to identify which samples are known examples of a class and to which class they should be assigned Samples that are left as neutral are assumed to be of unknown classification and will be partitioned based on th
275. simple txt acute and ch imp leuk 40 class exp A Tora more than 780 M UserData I EE WebEx 7 E workspace amb metadal pair simpletxt y C bin PivotData txt _ C hello 4 pivotData none txt n hV P np pn t C mer pivotData_simplewi mey_S0 wd txt Click the upper leftmost expression value Click the Load button to finish Mint RMA12 txt 9 mewek fmaafty_from_mig Annotation Fields Econ SOFT_Affy txt m 9 3 p T D O TIGR Miter a 4 11 1 Affymatrix GCOS Format file loader 4 12 Loading dChip Output files Selecting dChip DFCI core Format Files from the drop down menu allows the loading of dChip or DFCI core output files Select the data files to be loaded as you would when loading mev files using the file browser These files contain a single intensity value per spot instead of the usual two that MeV requires The values loaded from these files will be use as a Cy5 value that is the numerator in the calculation of the ratio of intensities Therefore there are several options for simulating a second intensity value the denominator Select from the radio button options to choose a method If Absolute is selected the denominator is given a value of 1 for all ratio calculations If Mean Intensity is selected the average of all intensity values for that
276. ssification is to build a classifier using the given training set and the second step is to use the classifier to predict the classes of the test set In the training phase the USC algorithm performs cross validation over a range of parameters shrinkage threshold A and correlation threshold p Cross validation is a well established technique used to optimize the parameters or features chosen in a classifier In m fold cross validation the training set is randomly divided into m disjoint subsets with roughly equal size Each of these m subsets of experiments is left out in turn for evaluation and the other m 1 subsets are used as inputs to the classification algorithm Since the USC algorithm is essentially run multiple times on different subsets of the training set the cross validation step in the training phase is quite computationally intensive The end result of the training phase is a table of the average number of classification errors in cross validation and the average number of genes selected corresponding to parameters Delta A and Rho p Depending on the dataset being analyzed there might be a trade off between the average number of errors and the number of genes selected The user will be asked to select one set of parameters Delta A and Rho p to be used in the test phase in which microarray data consisting of experimental samples with unknown classes will be classified 196 Z Microarray data N n s train
277. ssigned to classes in this step will be treated as the training set The Edit and Tools menus at the top of the classification editor allow searching sorting and selection of data The classification scheme 133 can be saved in a text file if needed to be loaded in a future KNNC run as explained above On hitting the Next button in the classification editor the calculations proceed and the output has the usual viewers as most other MeV runs The main difference is that each type of viewer has several sub viewers for Used Classifiers Unused Classifiers those that are possibly weeded out by variance filtering Classified Used Classifiers Classified Unclassified and All This is done so that users can visually compare training vectors to trained vectors The Validate option in KNNC provides dialogs very much like the above and is used to perform leave one out cross validation on the training set The output viewers are also similar to the ones obtained in classification with an additional viewer that provides cross validation statistics File Adjust Data Normalization Distance Analysis Display Sort Help aS sS A oS MA EB Te 2 4 HCL ST soTA RN KMS cast atc GSH bs 255 2 Suet a EASE
278. ster Manager 9 ih Analysis Results May 24 2005 11 08 49 AM B Data Source Selection OR LEM 1 C LEM Viewer D Locus Mapping Summary isi Script Manager Sj History Sample 10 1100 1205 1310 1415 1520 1450 1730 1200 1250 1300 1350 1400 1625 1500 8 0001 8 0002 8 0003 SP0004 SP0005 5 0006 SP0007 8 0008 8 0009 8 0010 8 0011 8 0012 8 0001 8 0002 SP0003 SP0004 SP0005 SP0013 SP0014 8 0016 SP0016 11 21 6 LEM with Offset Overlapping Loci Showing Highlighted Locus and Annotation When in the scaled locus mode the arrows have an anchor point to aid in locus selection If you mouse over the anchor point the locus will be highlighted and the anchor point will change from white to red A shift left mouse click will select the locus Another special feature in this mode is that locus annotation that overlaps due to locus overlap will be rendered with the policy of highlighted locus annotation on top The un sampled areas of the chromosome have a separate set of scaling controls The base to pixel conversion factor is used if these areas are scaled If these areas are scaled then a maximum length can be selected so that large areas without sampled loci are not extremely long This prevents the occurrence of long areas without expression values Note that during interaction with thi
279. t Cancel 11 12 3 FOM Initialization Dialog K means K medians KMC Parameters Means Medians Option The Means or Medians option indicates whether each cluster s centroid vector should be calculated as a mean or as a median of the member expression patterns Maximum Number of Clusters This positive integer value indicates the maximum number of clusters to be created For instance if the entered value is 10 then KMC is run 10 times to produce 1 2 3 10 clusters An FOM value is returned for each run Maximum Number of Iterations 104 This positive integer value is the maximum number of times that all the elements in the data set will be tested for cluster fit within a single KMC run On each iteration each element is associated with the cluster with the closest mean or median Note that a KMC run will terminate when either no elements require migration reassignment to new clusters or when the maximum number of iterations has been reached 105 11 13 PTM Template Matching Pavlidis and Noble 2001 The user can specify a template expression profile for a gene a series of relative expression ratios between 0 and 1 and the data set will be searched for matches to the template based on the Pearson Correlation between the template and the genes in the data set The template profile can be specified in one of several ways 1 by selecting one of the genes in the data set as a template from the list on the upper le
280. t All Clear Store Cluster Update Shortcuts Viewer Shortcuts Search Result Shortcuts Expression Viewers CAST genes 1 Cluster 1 Cluster 2 Cluster 3 Table Viewers TI ania Close 9 1 2 Search Result Window 9 2 Import Gene or Sample List The Import Gene Sample List allows one to create a cluster based on supplied identifiers Identifiers belonging to the cluster are pasted into the text area The drop down list indicates the type of annotation being loaded After searching for matches the List Import Result dialog will be displayed This intermediate dialog will report the number of input identifiers that were matched the number of matches found note that this might be greater than the number of identifiers in the case where genes are duplicated on the array This dialog also displays a table 59 that contains matching elements The rows in the table can be selected to remove unwanted entries before hitting the Store Cluster button to store the items to a cluster The bottom section of the dialog also reports which indices were found and which were not found in the loaded data set Gene List Import Dialog Gene List Import Parameters Gene ID Type GenBank Paste List ctrl v TIGR Reset cancel OK 9 2 1 List Import Dialog 60 Genes Matched 9 of 10 input IDs were matched List length 9 Selected File Index Color YORF NAME GWEIGHT
281. t Distance Metric Pearson This module works very poorly with the average dot product and covariance distance metrics 96 11 10 SOM Self Organizing Maps Tamayo et al 1999 Kohonen T 1982 Selecting this analysis will display a dialog that allows the user to set up the size topology and behavior of the SOM Once the computations are complete select the SOM node under Analysis to view the SOM results The subnodes under this node are very similar in form and function to those found beneath the KMC node K Means K Medians Support Initialization Dialog Box TIGR Multiple Array Viewer Bc E D xj File ep Normalization Distance ERIT Du Sort Help 8x Ed GSH SOM Es FOM e PTM KMC A cast FE E RE HCL SOTA MultipleExperimentviewer D Main View CI Analysis j 4 z CJ SOM genes d i i a i C reson mages nA 6 D Cluster 1 1 1 Cluster 2 1 2 J40 3 Cluster 3 1 3 J29 Cluster 4 2 1 J45 3 Cluster 5 2 2 J49 3 Cluster 6 2 3 J4 3 Cluster 7 3 1 J24 Cluster 8 3 2 J22 Cluster 9 3 3 J23 3 Centroid Graphs J31 7 Expression Graphs 47 SOM Visualization J5 Cluster Information CI General Information D History ESS S Si A TIGR MultiExperiment Viewer 11 10 1 SOM Expression Image 97 SOM Self Organizing Maps Sam
282. t Print Image instead 10 4 Saving Cluster Data 8 2217666 You also Save Cluster Data to tab delimited text file Selecting this option from the right click menu will cause a file chooser to appear Select a file name and a place to save row column data log ratio expression values and optionally Cy3 and Cy5 values for each gene in the cluster or each sample if the cluster is produced via clustering samples Selecting Save All Clusters will allow you to save the elements in all clusters in a similar way 64 11 Modules Description Conventions and General Pointers Each of these modules can be launched from the Analysis menu or from a button on the Multiple Array Viewer toolbar Common to all clustering modules the selected algorithm can be performed to cluster genes or samples In the title of the analysis module descriptions which follow the acronym used to label the module s button is followed by the name of that module Below this is a line containing the reference s used in implementing the algorithm Each module when run will create a subtree labeled with its acronym and a label indicating whether the result was created by clustering genes or samples This subtree will be placed under the Analysis tab in the result navigation tree tree The tabs within this subtree contain the results of the module s calculations These tabs vary greatly depending on the module which creates them but the Gene
283. t g i 90 pValue 00 5 Hierarchical Clustering 1 Construct Hierarchical Trees ick Reset Cancel 11 4 1 SOTA Initialization Dialog Box Parameters and Basic Terminology SOTA Terminology and Concepts Topology The topology of the resulting tree is a binary tree structure where each terminal node represents a cluster Node A structure which contains a Centroid Vector and a number of associated expression profiles members 81 Cell A Node which is the terminal Node in a branch of the tree a k a leaf node The members of the cell are considered members of an expression cluster Centroid Vector A vector that is representative of the membership of a node Members Expression Elements associated with a Node Node A structure which contains a Centroid Vector and a number of associated expression profiles members Cell A Node which is the terminal Node in a branch of the tree a k a leaf node The members of the cell are considered members of an expression cluster Growth Termination Criteria Parameters Max Cycles This integer value represents the maximum iterations allowed The resulting number of clusters produced by SOTA is Max Cycles 1 unless other criteria are satisfied prior the indicated maximum number of cycles Max epochs cycle This integer value indicates the maximum number of training epochs allowed per cycle Max Cell Diversity This value represents a maximum variability allow
284. t options The Delete Algorithm option deletes the algorithm the associated output data nodes and any downstream algorithms that rely on the output of the deleted algorithm The View XML Section option is a shortcut to the Script XML Viewer When selected this option will open the XML viewer and will highlight the algorithm section associated with the node selected in the Script Tree Viewer One note on saving a script scripts should be saved to the script directory inside MeV s Data directory This location of the script ensures that when the script is loaded the files supporting script validation are located and used to validate script integrity 213 Script XML Viewer The Script XML Viewer is a text rendering of the script as it would appear when saved to an output file The main purpose of the viewer is to get a view of the script during script creation and to review parameter value selections for particular algorithms When the XML viewer is opened via the Script Tree Viewer the selected algorithm in the XML viewer is highlighted in light green as shown in the Script XML Viewer figure Script lines are selected by clicking on the row number displayed on the left side of the viewer If the selected line corresponds to a parameter key value pair then the Edit menu option will be enabled so that the value can be altered amp TIGR Multiple Array Viewer xl File Adjust Data Normalization Distance Analysis Display S
285. t score of that assignment Parameters are also displayed as well as the list of the genes that were used for this classification You can save that gene list 1f desired 201 maps Ganap geris Gana 1 Gane_2 Correlatinr Lov 05 3 Correlation High 4 0 mane Step 0 4 arin menm 7 Bana amp m Gang 17 Bsmplai Breast 15 055533545351327 Gang 14 pamila Breget t z 3854 5BSE2835857 Gene 15 Gang 18 menu 1 Gana 13 ans 18 are 340 ann 21 Gang 23 dang 24 Gene 25 erg 25 ena 29 sang 30 ang 31 Gane 32 Gere 33 37 Gand 33 ans 40 Gang 41 Gene 42 munani En 1 Save itt 11 29 5 USC Parameters Dialog A number of heat map visualizations are also available Clicking on Loaded Experiments Genes Used displays all the experiments that were loaded and the genes that were used during this classification 202 m gt Ta aes gb oan Sura rcermBxne gucTo3 IucTO3 gisa LIYANE PLSYAEG EILSYEZEH EILSVEZEH USC Heatmap 1129 6 There is also a heat map visualization for each of the classes in the analysis again with the genes that w
286. t should contain examples of each class If the Classify option is chosen from the initial dialog an input dialog box is displayed for parameter input 5 KNN Classification Classify genes or samples Classify genes Classify samples Variance filter 0 Use variance filter if unchecked use all genes Use only the following number of highest variance genes Correlation filter C Use correlation filter Cutoff p value for correlation oon Number of permutations for correlation test KNN classification parameters Number of classes 5 Number of neighbors 3 Create import training set Create new training set from data C Use previously created training set from file Hierarchical Clustering Construct Hierarchical Trees TIGR x risum ia Reset Cancel Next gt 11 19 1 KNN Classification Initialization Dialog 133 KNN Classification Parameter Information Classify genes or samples This is self explanatory Although the following description refers to genes the same steps will apply to experiments if Classify samples is chosen Variance filter This is the first of two noise reduction filters that can optionally be applied before classification The variance filter keeps only those genes in the entire data set including the classifier set that have the highest variance across all samples The number of genes to be retained is specified by the user
287. tation identifying each element as well as the raw and scaled 0 1 distances and parameters related to taking the distance such as distance metric are reported From this report page a graph can be displayed using the Expression Graph button The graph overlays the expression graphs of the two genes 165 e Gene Distance Spot Information Xx AA488715 AA448711 Distance Information sem foa Dc Reon fete ier 11 22 5 GDM Element Information Record 166 e Samples vs Log Ratio File Control Sample vs Log Ratio red line row gene blue line column gene 5 5 Sample 11 22 6 GDM Element Information Expression Graph 167 11 23 COA Correspondence Analysis Fellenberg et al 2001 Culhane et al 2002 Correspondence analysis is an explorative method to study associations between variables Like principal components it displays a low dimensional projection of the data However in this case both genes and samples can be projected onto the same space revealing associations between them Correspondence analysis requires an expression matrix with no missing values Therefore any missing values have to be imputed first We use the k nearest neighbors algorithm to impute missing values The only user input in the initialization dialog is the desired number of neighbors for imputation 3 Number for neighbors for KNN imputation 10 TIGR
288. ter Below is a list of Preferences file parameters Input Preference Sets the method with which data is entered into MultiExperimentViewer Database Connects to database and loads slides from there File Loads slides from flat files as default but can also connect to database Only File Loads slides from flat files only No database connection Database Server Name Sets the path to the database This is not necessary if Input Preference is set to Only File For sybase use the following string jdbc sybase Tds lt yourhost gt lt yourport gt For oracle use the following string jdbc oracle thin lt yourhost gt lt yourport gt Database Names Sets the list of databases to choose arrays from All stored procedures must be accessible from all of the available databases on this list The database names should be separated by a colon This is not necessary if Input Preference is set to Only File Element Info Sets the number of row and column pairs and the number of intensities per element separated by a colon For example 3 2 would indicate that each element has three pairs of row and column values such as row column 252 meta_row meta_column sub_row sub_column and two intensities such as Cy3 Cy5 The input file must have the row and column pairs listed at the beginning of the line followed by the intensities and then by any additional fields such as common name or Genban
289. ters Now would be a good Rserve Connection Burn In Period time to start Rserve Enter new location Add Click the info button Iterations for help lower left localhost 6311 PostP Threshold 45 MultiExperiment 5 TIG Reset Cancel OK Bridge Initialization Dialog Rserve Connection BRIDGE is a package written in the R programming language and requires access to a computer running RServe to function See Section 19 for details on installing R and RServe By default Bridge will look on the local machine for an Rserve server However since Rserve is a TCP IP server theoretically it could be running anywhere The user need only enter an IP address and port number separated by a in the Text Field Enter a new location By clicking Add the new location will populate 194 the pull down menu It will be saved to the user s config file and be available for later use Sample output from this module is shown below eoe TIGR Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities w m M Cluster Manager 9 fh Analysis Results Apr 24 2006 11 33 45 AM 7 8 Data Source Selection jd 5 BRIDGE 1 Z a 2 3 2 9 B Expression Images H Significant Genes B Non significant Genes a Expression Graphs ais 2 2 Significant Genes a Non significantGenes 9 Table views 4 x Signif
290. tes the type of annotation being loaded An intermediate dialog Import Result Dialog will appear to display the results of the import and to allow you to select a subset of the identified elements before saving the elements as a cluster After review of identified elements a cluster attributes dialog will be presented so that a cluster name description and color can be defined for the new cluster 53 b Gene List Import Dialog Gene List Import Parameters Gene ID Type YORF m Paste List ctrl v TIGR Li ieee ee Reset Cancel OK 8 1 3 List Import Dialog 54 Genes Matched 9 of 10 input IDs were matched List length 9 Selected File Index Color YORF NAME GWEIGHT GenBank jAM460128 _ JAATSTA29 _ 156571 Al286022 AAS21 292 Select All Matching Results IDs Found 9 of 10 IDs Not Found 1 of 10 A0 x15 A1 A2 sind eriment Cancel Store Cluster 8 1 4 Import Result Dialog The Submit Gene List External Repository option permits gene lists to be submitted to external database repositories The data including the gene identifier types possibly organism under study and other factors can determine if the current data is suitable for submission to a repository To submit a cluster first select the cluster to submit from the cluster table by left clicking on the appropriate row Modify the cluster
291. text file that contains coordinate and expression data for a single microarray experiment A single header row is required to precede the expression data in order to identify the columns below With the exception of optional comment lines each remaining row of the file stores data for a particular spot feature on the array MeV and other TM4 software tools will consider comment lines non computational A comment line must start with the pound symbol and can be included anywhere in the file If the pound symbol is the first character on a line the entire line up to the newline character n will be ignored by the software tool files will typically contain at least one comment at the top of the file with the following information This information is optional The format and fields contained within these comments are subject to change version Version number based on revisions of expression data format_version The version of the mev file format document date Date of file creation or update analyst Owner or the person responsible for creating the file analysis_id id from the analysis table that corresponds to this set of expression values slide_type slide_type that this array is based on input_row_count Number of rows of expression eg non header data in input files output row count Number of rows of expression eg non header data in this file created by Software tool used to create the file description Commo
292. the 2D view will create a selection area which can be used to define a cluster Cluster options and other features are available by right clicking on the 2D view node on the navigation tree I the left pane PC plots PC information and Eigenvalues detail the calculations behind the construction of the display Often some meaning such as overall expression level expression trends or some other aspect of the data set can be found to correlate to the principal components Using the PC plots and noting where clusters of elements showing various trends labeled in other algorithms fall in the 3D viewer can help to assign some tentative meaning to each component Note that interpretation of the components is not exact and is somewhat subjective 170 File AdjustData Metrics Analysis Display E E A 5 HCL ST SOTA RN M Cluster Manager 9 Analysis Results Dec 21 2004 1 07 22 PM Data Source Selection B8 PCA genes 1 Projections on PC axes 9 Components 1 2 3 n 3D view amp Reset Z Options er QTC T AM ANOVA GSH SOM KMC KMS F zi EIE et f TRN EASE E lt KNN Plots E PC Informal E 9 ClEigenvalue Plot values General Infi Bh COA 2 Projections Inertia val
293. the Save item from the File menu of the algorithm results viewer Algorithms on Data Regions The items RegionAmplification and RegionDeletions are used to search for common regions of amplifications and deletions It is often important to identify minimal regions of alteration that are common between a number of experiments Select the RegionDeletions item Select the Results node in the newly created Region Deletions subtree Notice that there is one region on chromosome that is deleted in all of the samples and four regions that are deleted in six out of seven samples 230 Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities CloneValues CGH Analysis Help ES N HCL ST SOTA KMC arc A s T SOTA FOM amp FOM Me W e Q Experiment views o Chromosome views M Cluster Manager Analysis Results May 8 2006 1 09 37 PM 8 Data Source Selection e B GeneAmplifications 1 Bi RegionAmplifications 2 Results General Information Amplification Threshold 0 8 Deletion Threshold 0 8 Amplification 2 Copy Threshold 1 0 Deletion 2 Copy Threshold 1 0 si Script Manager e S History 4 File Annotations Links k Start 25253597 25520248 2197296 2228491 34675211 34715022 99734392 99993367 32634880 32822067 4434006
294. the gene across multiple experiments Row 22 Column 20 Cy3 126569 Cy5 111472 Plate amp 30485 Well 88 Feat name ORF00174 Locus ORF00174 Common Name PTS system component Gene Graph Experiment Detail Set Gene Color Close Spot Information 14 3 1 Spot information Several graphs can be created using the View Graph item in the Views menu The choices include scatter plots of the two intensities ratio histograms and a log ratio vs log product graph These graphs are displayed in a separate window and mention the dataset and normalization scheme that spawned them 2 Log Product vs Log Ratio 151 File Control Log Product vs Log Ratio Iterative Log 1 056 1 0ES 1 054 Log Cy5 m 2 8 P 1 027 1 058 1 059 1 0 10 1 0 11 1 0 12 1 0514 1 0615 Log Cy3 Cy5 4 gt 14 4 1 Graph view from Single Array Viewer A sub array can be created to view just those genes that are still displayed after applying the expression ratio Select the View Sub Array item from the Views 236 14 6 14 7 14 8 14 9 14 10 menu to create a new Single Array Viewer window with the selected elements These elements will be rearranged to eliminate gaps in the display The View Region item in the Views menu displays a dialog for inputting the coordinates of a metablock A new Single Array View
295. the resampling trials The legend for the color code corresponding to a given level of support can be found under the Help menu The two most useful options for support trees are likely to be bootstrapping genes to build experiment trees and bootstrapping experiments to build gene trees fe TIGR Multiple Array Viewer po D x File r Normalization Distance Analysis I Sort Help ES KMS CAT x A 48 i is HCL SOTA MultipleExperimentviewer An Main view Bi xl Analysis ig9z29595s2935 genes i dod d od d HG 100 Support E HCL 9 Gist DB Tree average linkage Ly Time 42802 ms Euclidean distance 3 Gene Tree Resampling 7 Expt Tree Resampling 80 90 Support D History SS 90 100 Support 70 80 Support 60 70 Support 50 60 Support TIGR MultiExperiment Viewer 11 3 1 Support trees for Hierarchical Clustering with Support Tree Legend displayed on right 78 ST Support Trees Gene Tree Sample Tree Draw Gene Tree Draw Sample Tree Resampling Options Resampling Options Bootstrap Sampless 0 Bootstrap Genes lt Jackknife Sampless Jackknife Genes No resampling No resampling Iterations 100 Iterations 100 Linkage Method Average Linkage Complete Linkage
296. the slide 154 that map to the set of loci If multiple spots are associated with a locus in the selection list each of these related spots are elements in the formed cluster Please see section 8 in the manual Working with Clusters to learn more about the options within the cluster manager File Output Options These options output selected loci or all loci to a text tab delimited file In all cases the files are in the TDMS tab delimited multiple sample format and ready for import into MeV as independent data sets The TDMS format is described in the appendix on file formats Save Selected Loci Locus Detail This options saves locus level information to a TDMS file which contains locus identifier chromosome if more than one exists and locus coordinate information The expression value for each locus is a mean expression value for all spots related to the locus Save Selected Loci Spot Detail This option saves spot values for spots related to selected loci All annotation for the spots is output to the TDMS format file Save All Loci This option provides the output described in the Save Selected Loci Locus Detail option described above except that it extends to include all identified loci in the viewer Additional LEM Output Viewers The LEM module produces three additional viewers 1 Linear Expression Graph Viewers 2 Table of Unmapped Spots and 3 Locus Mapping Summary 1 Linear Expression Graph Viewer L
297. the use of these packages requires a basic understanding of the R programming language Our goal is to provide point and click access of these statistically powerful bioconductor packages to the biomedical community through the MeV environment We have successfully integrated two Bioconductor packages Rama and Bridge in the MeV environment RAMA Robust Analysis of MicroArrays 1 uses a Bayesian hierarchical model for the robust estimation of cDNA microarray intensities BRIDGE Bayesian Robust Inference for Differential Gene Expression 2 tests for differentially expressed genes for both one and two color microarray data BRIDGE uses a similar Bayesian model as RAMA but they are two independent bioconductor packages In order to make use of the MeV R integrated environment every computer that is used to run MeV R needs the R Environment installed Furthermore Rama and Bridge are separate R packages that must be downloaded and installed from Bioconductor Since MeV is an application written in Java every computer running MeV R needs to have the Java Run Time Environment installed Alone Java and R are independent environments and mutually ignorant of the other In order to integrate MeV and R we use Rserve written by the University of Augsburg Institute for Mathematics as the link between Java and R Rserve is a TCP IP server Rserve must also be installed on every computer running MeV R Having installed all the necessary components the
298. three main categories represented on three tabbed panels in the Algorithm Selection Dialog Analysis Algorithms include gene and experiment clustering algorithms classification algorithms statistical algorithms and data visualizations The analysis algorithms are all described in the Modules section of the manual section 9 The Adjustment algorithms are those algorithms found in the Adjustment menu of the Multiple Array Viewer interface and include the Affymetrix based filters if Affymetrix data is currently loaded The Adjustment algorithms either filter the data based on some criteria or are used to perform a mathematical transformation of the data 208 Script Algorithm Initialization Dialog Selection Preview Algorithm Category Not Selected Algorithm Not Selected Analysis Algorithms Adjustment Algorithms Cluster Selection Aigonthms Adjustment Selection Gene Row Filters C Percentage Cutoff Lower Cutoffs Gene Row Based Adjustments Sample Column Based Adjustments _ Normalize Genes Rows C Normalize Samples Columns 1 Divide Genes Rows by RMS 1 Divide Samples Columns by RMS 1 Divide Genes Rows by SD 1 Divide Samples Columns by SD _ Mean Center Genes Rows _ Mean Center Samples Columns Median Center Genes Rows Median Center Samples Columns 0 Digital Genes Rows 0 Digital Samples Columns TIGR pe aie Reset Cancel OK 12 4 Algorithm Selection Dialog Adjustment Al
299. three possible viewers containing the genes within the biological theme These viewers are also accessible from a node in the result tree which follows the table node in result navigation tree TIGR Multiple Array Viewer xl File Adjust Data Normalization Distance Analysis Display Sort Help Alos m A o MA Xs 29 ST SOTA RN GDM KMC KMS cast QTC GSH SOM FOM PTM TTEST ak Term List Hits ListSize Pop Hits Pop Size Fisher s Ex KEGG pathway hsa00190 Oxidative phosphorylation Homo sapiens 18 120 18 120 4 748bE 19 GO Molecular Function 60 0015078 hydrogen ion transporter activity 18 38 18 323 T 378E 19 __ pathway 00193__ ATP synthesis Homo sapiens 17 20 17 120 B 003E 18 GO Biological Process GO D006818 hydrogen transport 18 38 20 325 3 549E 17 GO Biological Process G0 0015992 proton transport 18 38 20 325 3 549E 17 GO Molecular Function 60 0008324 cation transporter activity 22 38 32 323 1 239 16 GO Molecular Function GO 0015077 monovalent inorganic cation transporter activity 18 38 21 323 2 608 16 GO Molecular Function G0 0015075 ion transporter activity 122 38 34 1323 945bE 16 190 Biological Process G0 0006812 cation transport 22 38 36 1325 5 126 15 GO Biological Process GO 0008811 ion transpot _ 2
300. ting annotations that are provided on the platform entry Sample table begin ID REF VALUE HEADER 3 HEADER 4 HEADER N insert data table here columns may appear in any order after the ID REF column Sample table end dChip or DFCI core file format Each dChip data output file contains one experiment data It s a tab delimited file which records a lot of information for the experiment Right now only chip ID intensity detection and detection p value are read into Mev for the analysis 249 ER B 0 D E F 6 H J K L M N 0 z Expressiot Analysis Metrics Tab m 2 3 Analysis Name Probe Set Stat Pairs Stat Pairs Us Signal Detection Detection p value Stat Comn Signal Log Signal Log Signal Log Change Change p Positive 4 1 LH20051123 AFFX BioB 5 at 20 2 137 0 000509 5 2 LH20051123 AFFX BIoB M at 2 20 249 3 P 0 000095 6 3 LH20061123 AFFX BioE 3 at 20 2 142 P 0 000147 te 4 LH200611230 AFFX BioC 5 at 1 0 5122P 0 000052 8 5 LH20061123 AFFX BioC 3 at X 5739P 0 000044 9 B LH20051123 AFFX BioDn 5_at 20 0 78 0 000044 10 7 LH200511230 AFFX BioDn 3 at 20 20 3M65P 0 00007 11 8 LH20051123 AFFX CreX 5_at 1 X 5473 0 000044 12 9 LH200611230 AFFX Cree3 at 20 20 5999P 0 000044 13 0 LH20051123 AFFX DapX5_at 1 2 0 139482 14 1 LH20051123 AFFX Dap M at 1 2 99 48511 15 2 LH2005
301. tion Select Locus Identifier Field locus w Coordinate Data Selections C Use Coordinate File Hit the i button lower left corner for File Format Information 0 Multiple Chromosomes or Plasmids Select Chromosome ID Field locus name v Select Start Coordinate 5 End Field 5 end Select End Coordinate 3 End Field 3 end TIG nea gaia Reset Cancel OK fewer 1121 2 LEM Initialization Dialog Locus Identifier Field This option selects the annotation field that maps spots to loci Use Coordinate File This option indicates if coordinate information will be loaded via an auxiliary coordinate file described above Multiple Chromosomes Option This check box indicates if there are multiple chromosomes If selected MeV will expect chromosome identification information from the coordinate file or as a loaded annotation field in MeV If the presence of multiple chromosomes or plasmids are indicated then there will be one map produced for each chromosome or plasmid The 5 and 3 annotation fields indicate which annotation fields in MeV identify the coordinate information if a coordinate file is not used Basic LEM Features and Options 144 Several features to help navigate over the LEM to customize the appearance and to extract information from the LEM are available via a right click context menu The following is a list of general features that are described in more detail
302. tion Distance Analysis Display Sort Help P Hu A s x TrEST SVM A MultipleExperimentViewer 9 T Tests 9 0 Expression Graphs Genes D History TIGR MultiExperiment Viewer 11 14 2 TTEST Results Expression Graphs significant genes are in the graph on the left The TTEST module also outputs table viewers with gene specific statistics as shown below These tables can be saved as tab delimited text files by right clicking on them AII the columns in the tables can be sorted in ascending or descending order Successive clicks on a column header re order the rows in ascending or descending order of the values in the selected column Holding down the CTRL key while clicking anywhere on the header will restore the original ordering 114 TIGR Multiple Array Viewer ini x File AdjustData Metrics Analysis Display Utilities ik 4 et HCL d AEE n cast Ed GSH 8 m 25 w Sw A s GWEIGHT GenBank GroupA GroupA std GroupB me GroupB std 045320 1 4 10 8944272 18 1 452988 1 4 0 8944272 1 8 0 4472136 0 4472136 745 Cluster Manager EB Gene Clusters 261580 0 259568328 0 16670637 0 19215456 0 0998969 151 3 960912 2 2458272 0 17878574 1 749808 0 11078815 5 27 455997
303. to MeV This implementation is based on the over representation analysis feature of the EASE application available at http david niaid nih gov david ease htm Two classes have been utilized from the EASEOpensource package with modifications to enable some of the options described below A full description to the theory behind EASE and test studies can be found in the EASE reference Hosack et al 2003 Behind the MeV version of EASE there is a file structure that contains files required for annotation conversion and linking indices to biological themes The file structure mimics much of the file structure behind the stand alone version of EASE 3 MeV Base Directory Implies URL Data Tags Convert EASE Directory Structure 178 The following table lists the directories used in the EASE implementation within MeV The primary directories Convert Class Implies URL Data and Tags each contain files used by EASE The minimal requirement is a file in the Class directory to map gene indices to themes The optional annotation conversion files are located in the Convert directory and serve to convert annotation types This is only required if the annotation within MeV used to identify genes differs from that contained in the file in the Class directory linking annotation to biological theme The implies directory contains optional files describing the hierarchy of biological theme terms where one theme description might imply an
304. to a biological theme in this case a GO term 179 I GenBank AAB78286 Locus Link GenBank AA775447 86 AA410394 R91803 87 AAB69042 AJ460128 88 AA775521 T80924 89 4A196000 AA704242 901A 1261580 Classification N78022 4532 90 136910 File A 452988 91 AA436008 AA757429 92 AAB82819 ink GO Biological Process EE 32 H23277 076 protein amino acid dephosphorylation A521292 95 AA402915 076 biological process unknown 97 W80489 076 transmembrane receptor protein tyrosine phe AI344681 L AA705237 98 N49204 076 cell adhesion 85752 101 AA279188 763 ubiquitin cycle H81821 102 AAB72057 164 cel matrik adhesion 746924 077 biological process unknown 077 cell cell signaling Optional 079 cation transport H 079 metabolism Conversion 008 homophilic cell adhesion Gene List File 282 postiransialional membrane targeting MeV Cluster 083 intracellular signaling cascade 083 vision 084 regulation of transcription DNA dependent 085 developrrent 085 cell adhesion 087 immune response 087 protein amino acid phosphorylation Example of files linking annotation indices to biological themes EASE Input Parameters File Updates and Configuration Options Selecting the EASE File System This button enables the selection of a local directory to be used as the source for annotation files for EASE analysis Multiple file systems can be present to support a variety of array types Selection of a fi
305. tomization Dialog Y axis range X Axis Display options The second panel of the Customization Dialog includes two major graph rendering options The Offset Lines from Midpoint mode draws points for expression values and connects those with a neutral line or baseline value The cutoff values allow one to set limits which affect the color scheme of the offset lines All points above the upper cutoff have red lines to the baseline while points below the lower cutoff have green connecting lines Points that fall within the cutoffs have black connecting lines The other option is to render the graph using lines to connect points When viewing the graph in overlay mode this is the default setting so that the various samples can be easily identified The last rendering option is to use a discrete value overlay This option uses the supplied cutoff values and overlays the graph with a square wave pattern where points exceeding the limits force the line through the appropriate limit This overlay shown in the preview mode in the dialog figure can help to visually identify regions of contiguous at least in ordering genes that show similar behavior 158 Customize Graph 7 25 a Y Range Parameters Graph Rendering Options amp Offset Lines from Midpoint Lower Cutoff 1 Neutral Point 0 0 Upper Cutoff 1 Discrete Value Overlay Connect Points TIG Reset Cancel 11 21 13
306. ts after loading as instructed in the following section 4 6 Loading GenePix gpr Data Files GenePix gpr files can be loaded by selecting the GenePix file loader option from the list of available file formats to load Use the file system navigation tree 13 on the left to move the directories containing files to load Files appearing in the Available file list can be added to the Selected file list using the Add or Add All buttons Expression File Loader unix Load expression files of type GenePix Files gpr R 3 Attiris 2 r Selected Path AntTest C WMyProjects MeV_2 3 Devel Data apache ant 1 5 4 C Documents and Settings GenePix Files gpr E Mail Eclipse Available Selected ePOAgent External amp 26 03 forte4j j2sdk 1_4_2 j2sdk1 4 2 C JB Transferred stuff jdk1 3 1 01 Madam MIDAS 3 MyProjects C AffyForintegration beans CGHBuilder2 ClusterEnhancement Devel 2 3 GDM C devel mev EASE J FileLoader FileLoaderBuild C GDM Classes GenePix_sample gpr javadoc LOLA_3_0_Integration metabolism C MeV 2 2 devel 29 2 3 Devel ata Viewer emen ces 4 6 1 The Expression File Loader GenPix Files 4 7 Loading Agilent Files Agilent
307. ts in which each clone is deleted Annotating Regions 228 Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities CloneValues CGH Analysis Help E E A 2 9 lt A Ca om Cas n HCL ST SOTA cast QTC SOM FOM ST SOTA FOM QTC SOM PTM M File Annotations Links Name Chrom j Sto Alterations Altered _ 85265 2698 j eriment V hromosome Views ter Manager s Results 8 2006 1 09 37 PM gt Data Source Selection 3eneAmplifications 1 RegionAmplifications 2 _ 18 46727689 j 20157690 w wi 5 3 Results General Information Amplification Threshold 0 8 Deletion Threshold 0 Amplification 2 Copy Threshold 1 0 Deletion 2 Copy Threshald 1 0 Script Manager e History 6 86 42813053 42813173 44824154 44824274 46767486 46767606 43935330 49935450 50074990 62 co co co co lultiExperiment Viewer Clone deletions display with chromosome 4 deletions selected to be annotated Highlight all of the probes on chromosome 1 with 4 or more alterations and select the Annotate Selected item in the Annotations menu Figure 3 1 This will set the selected data regions to be annotated in the corresponding CGH Position graph Figure 3 2 Click on the Chromosome 1 item o
308. u Bh genes 3 Expression Images Centroid Graphs 9 EJ Expression Graphs EA Cluster 1 TIGR MultiExperiment Viewer 11 24 1 6 Show selection area IV Show spheres i Show text 5 White background Delete PCA 3D View 3 TIGR Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities Help gt a RN 4 Cluster Manager 9 analysis Results Dec 21 2004 1 07 22 PM Data Source Selection B PCA genes 1 Projections on PC axes Components 1 2 3 R 3D view 32D Views 341 Z2 Fact ANOVA p 5 SVM Un cast QTC GSH SOM KAN ANOVA GDM L5 Store cluster Launch new session E Save cluster 32 D Plots CPC Informati 9 ClEigenvalues C Piot M Larger point size 7 2 2 88 1 44 Show tick marks and labels Delete values General Information 9 COA 2 Projections COA axes FX Inertia values genes 3 e Expression Images Centroid Graphs 9 E Expression Graphs EA Cluster 1 Ta n TIGR MultiExperiment Viewer 11 24 2 PCA 2D view 171
309. ulti cluster X output data lt algorithm gt lt falg_set gt analysis gt 2 lt mev gt 31 lt TM4NL gt TIG EE Edit Script Close Log fewer 12 12 Script Error Log with XML editor opened Running a Script Script execution can be initiated from the Script Table viewer the Script Tree Viewer or the Script XML Viewer by using the right click menu option Execute Script The script is logically split up into units called Algorithm Sets that represent one or more algorithms sharing a common input data source The output results are grouped into algorithm sets and since each algorithm set has a unique input data set that can be a subset of the loaded data an input data node is used to display the input data for the algorithm set The figure displaying script analysis 216 output clearly shows the expected output from a script containing one algorithm SAM The Input Data node shows the number of experiments and the number of genes as well as three cluster viewers The Script Tree node in the output for an algorithm set helps to orient the researcher as to which part of the script falls within the enclosing algorithm set The algorithm set input data attached algorithms and the result data nodes are highlighted while other script nodes are semi transparent TIGR Multiple Array Viewer M E File Adjust Dz Normalizat Distanc Analysi Displa Sor Hel M
310. urposes A variety of normalization algorithms and clustering analyses allow the user flexibility in creating meaningful views of the expression data 3 Starting MultiExperiment Viewer 3 1 If using Windows run the TMEV batch file TMEV bat to start the program This batch file invokes the Java interpreter and stores input parameters Similarly if using Linux or Unix run the tmev sh file Macintosh users should double click on the application file named TMEV MacOSX 4 0 3 2 A main menu bar will appear with four menus File Display Window and References Use the File menu in the main menu bar to open a new Single or Multiple Array Viewer load a new preferences file or log in to a database MeV will continue to run while this menu bar is present To exit the entire application select Quit from the File menu 3 3 Expression data be viewed from within either a Single or a Multiple Array Viewer However the former can open only one set of expression data at a time The Multiple Array Viewer can display many samples together The real power of MeV is in the program s analysis modules found only in the Multiple Array Viewer That is where the clustering and visualization of data can take place Therefore the remainder of this manual will focus on the Multiple Array Viewer Please see section 13 for details regarding the Single Array Viewer 4 Loading Expression Data MeV can interpret files of several types including the Mult
311. used for determining cluster to cluster distances when constructing the hierarchical tree Single Linkage The distances are measured between each member of one cluster each member of the other cluster The minimum of these distances is considered the cluster to cluster distance 73 Average Linkage The average distance of each member of one cluster to each member of the other cluster is used as a measure of cluster to cluster distance Note that this option in MeV actually is determined by a weighted average of distances of cluster members Example Consider the distance from node d to cluster a b c a b Unweighted Average Linkage 7 rie 7 7 ld 445 dae gp dg Qa abe Em 3 3 3 3 Weighted Average Linkage i 7 7 7 J daa 4 2 4 Gap Gas di ee cual I ur d a b gt gt Nodes on are weighted unequally where nodes deeper in the sub tree contribute less to the overall computed distance Complete Linkage The distances are measured between each member of one cluster each member of the other cluster The maximum of these distances is considered the cluster to cluster distance Cluster Genes Cluster Samples Options These checkboxes are used to indicate whether to cluster genes samples or both Default Distance Metric Euclidean Cluster Size Specification State of the size of clusters you want to analyze You won t get too mu
312. used to correlate color to an expression value for an element in expression viewers Color provides a means to easily view patterns of expression Three preset options can be selected from the Display menu s Color Scheme menu The default is a double gradient green black red and displays under expression relative to the reference as green over expression as red and spots where there is little differential expression as black A blue black yellow color scheme is available as an alternative scheme The rainbow scheme is a third selectable option Custom color schemes can be created by selecting the Custom Color Scheme option from the Color Scheme menu The form will provide several options Gradient Style Allows the selection of a double gradient or a single gradient In some cases a single gradient is preferred if values are not compared to a reference Gradient Selection This panel selects the endpoint color that is being selected and allows for the center color on a two gradient scheme to be black or white Color Selector This area presents controls for selecting endpoint colors 36 Gradient Preview This area displays the current settings Color Scheme Selection Gradient Style Double Gradient Single Gradient Gradient Selection amp Select Low End Color Select High End Color vj Use Black as Neutral Color ENS a TT _ _ _ 888 ENS f E y S y S SNNNIISISIISISINNN Jee gEEFIITITITLIEEELLT
313. using the mean and the standard deviation of the row of the matrix to which the value belongs using the following formula Value Value Mean Row Standard deviation Row Divide Genes Rows by RMS 25 This will divide the value by the root mean square of the current row where root mean square square root Y xi n 1 where x is the i element in the row consisting of n elements Divide Genes Rows by SD This will divide each value by the standard deviation of the row it belongs to Mean Center Genes Rows This will replace each value by value Mean row that value belongs to Median Center Genes Rows This will replace each value by value Median row that value belongs to Digital Genes Rows This will divide up the interval between the minimum and the maximum values in a row into a number of equal sized bins Each value is now replaced by an integer value of zero or greater denoting which bin it belongs to e g the minimum value is assigned to bin zero indicating it belongs to the lowest bin the maximum value is assigned to the highest bin and the rest of the values fall in the intermediate bins Sample Column Adjustmenst These function in the same way as their corresponding options on genes rows except that the current column values rather than the current row values are used in the computation Log10 to Log2 This assumes that the current data are log 10 transform
314. vM Classification Result 3 Expression Images Total Number of Genes 459 Positive Genes E Centroid Graphs of Genes initially selected as Positive examples 18 Expression Graphs of Genes classified as Positive Total Positives 54 3 Classification Information of Genes retained in Positive class True Positives 18 3 General Information of Genes recruited into Positive class from Negatives False Negatives 36 D History Negative Genes of Genes initially selected as negative examples 441 of Genes classified as negative Total Negatives 405 of Genes retained in negative class True Negatives 405 of Genes recruited into negative class from Positives False Positives 0 TIGR MultiExperiment Viewer 11 18 5 Classification Information Viewer Expression image viewers reveal which elements have been recruited into each of the final classification partitions by coloring the annotation red Other result viewers are essentially the same as those describe in the K Means clustering section 132 11 19 KNNC K Nearest Neighbor Classification Theilhaber et al 2002 KNN Classification is a supervised classification scheme A subset of the entire data set called the training set for which the user specifies class assignments 1s used as input to classify the remaining members of the data set The user specifies the number of expected classes and the training se
315. w average replacing missing expression measurements with the mean expression of a row gene across all columns experiments OR 2 K nearest neighbors where the K most similar genes using Euclidean distance to the gene with a missing value are used to impute the missing value amp SAM Initialization Two class unpaired Group Assignments e Group Group B CO Neither group e Group Group Neither group e Group Group B CO Neither group e Group Group B Neither group e Group Group B Neither group Group lt Group B CO Neither group e Group Group B Neither group e Group Group B CO Neither group e Group A Group B CO Neither g Note Group A and Group B MUST each contain more than one experiment Save grouping Load grouping _ Reset Number of permutations Enter number of permutations 100 50 and Q Value parameters Select 50 using 5th percentile Y OREnter 50 percentile 0 100 SI Calculate q values No quick Yes slow Imputation Engine K nearest neighbors imputer Number of neighbors 10 Row average imputer Save Imputed Matrix Hierarchical Clustering 0 Construct Hierarchical Trees TIGR tern Reset cancel 11 15 1 SAM Initialization Dialog 118 SAM generates an interactive plot Fig 11 15 2 of the observed vs expected based on
316. wait a short eternity for Cross Validation to run When Cross Validation is finally finished you ll see the Choose Your Parameters dialog box Choose Your Parameters Dialog Box During Cross Validation the USC algorithm has compiled a list of results During each fold of cross validation each experiment in the pseudo test set has been tested back against the the remaining experiments of the pseudo training set Here you ll be asked to choose between accuracy and the of genes to use during testing When the Save Training Results checkbox is checked default you ll 200 be prompted to save the training file If you have any Test experiments they will be tested now using your chosen Delta A and Rho p values ene i Choose your parameters Mistakes AveeGenes Dena Raa 50 163 0 5 0 8 a 157 D ps i 755 453 07 x58 Fa 1 WEE 417 10 5 0 8 155 537 0 5 1 60 557 B Ei 05 IC 55 147 0 5 05 3066 557 0 5 07 To 81 0 6 Ca TU zd D 5 D B CX TB 127 l DB JE 17 i i leo 53 1 0 5 Save Iranin Results 8 TIGR err Reset cancer OK 11 29 4 USC Parameters Dialog USC Summary Viewer The results of the USC algorithm are returned to the main Analysis Tree in the left pane of the Multiple Array Viewer window Clicking on Summary will display the following view Any test experiments that were loaded are listed along with their class assignment and the Discriminan
317. wer CGH Position Graph view of chromosome 20 with flanking regions Launch Golden Path The Separated Viewer A common way to display CGH data is to draw all deletions on one side of the screen and all amplification on the other side The separated view of the CGH Postion Graph displays the cytogenetic bands and chromosome positions in the center of the panel the flanking regions corresponding to deletions on the left of the screen and the flanking regions corresponding to amplifications on the right side of the screen To display this view in the Display menu select Display Type Separated Figure 2 5 This display often looks better if the element width item is set to a smaller value 225 TIGR Multiple Array Viewer File AdjustData Metrics Analysis Display Utilities CloneValues CGH Analysis Help Em gt BR zs ST SOTA RN s vo gw a o em AN Z KMS cast QTC GSH SOM TrEST SAM anova Chromosome M Chromosome 1 M Chromosome 2 M Chromosome 3 M Chromosome 4 M Chromosome 5 M Chromosome 6 M Chromosome 7 M Chromosome 11 M Chromosome 12 M Chromosome 15 M Chromosome 16 M Chromo M Chromosome 18 M Chromosome 19 M Chromo M Chromo M Chromosome 22 M chromo M Chromosome Cluster dont HENCE IGR MultiExperiment Viewer X 2 TRN EASE B View of chromosome 2 The CGH Browser
318. when there are many experiments Load Grouping This button allows you to select and load a saved grouping Reset The reset button returns all of the group selection control buttons to Group 1 the initial state P Value Parameters This is used to input the critical p value P values are computed from the theoretical F distribution Hierarchical Clustering This check box selects whether to perform hierarchical clustering on the elements in each cluster created In addition to the standard viewers this module outputs gene specific statistics under the F Ratio Information tab as shown below These tables can be saved as tab delimited text files by right clicking on them All the columns in the tables can be sorted in ascending or descending order Clicking on a column header re orders the rows in ascending order of the values in the selected column Holding down the SHIFT key while clicking on the column header will re order the rows in the descending order of that column Holding down the CTRL key while clicking anywhere on the header will restore the original ordering 122 TIGR Multiple Array Viewer File AdjustData Normalization Distance Analysis Display Sort Help E wr x x amp s 2 E EC EH ESI AE W x MultipleExperimentviewer M Unique ID Spot Name Spot Weight Group1 me Group1 std Group2 me Group2 std Group3 me Group3 std D view AD 0 0 0 0 T 0 0 0 0 9 E Analysis 2 0 0 0 I
319. x selects whether to perform hierarchical clustering on the elements in each cluster created Default Distance Metric Euclidean The first principle component of the expression matrix is calculated and those genes whose variance is in the bottom 10 are removed These two steps are repeated using the remaining genes until only one gene remains This results in a series of nested clusters One cluster is chosen from this series using the gap statistic see below for details The expression matrix is then orthogonalized another series of nested clusters generated and one cluster chosen from it The process is repeated until the number of chosen clusters reaches the number specified in the Number of clusters parameter The method used to select one cluster out of a nested series is maximization of the Gap Statistic Randomized clusters are created from the existing expression matrix The ratio of expression variance of a given gene between experiments versus the variance of each gene about the cluster average is calculated The cluster whose ratio is furthest from the average ratio of the randomized matrices is chosen This module is computationally intensive so it may be several minutes before results are displayed The experiment subtree created by the module contains expression images centroid graphs and expression graphs of each of the clusters predicted and the genes not assigned to clusters It also contains a Cluster Information tab
320. y gene in a customized gene list Select this button and load the file named sample genelist txt included with the distribution of the MeV This list is a text file containing a large number of genes that have been identified as being associated with cancer Notice that the new Gene Alterations subtree now contains two subtrees corresponding to the number of times the genes in the list are amplified and deleted 233 Multiple Array Viewer File Adjust Data Metrics Analysis Display Utilities CloneValues CGH Analysis S A AE BEES E S X HCL ST SOTA cast SOM FOM ST SOTA FOM QTC 4 Altered eL Main View Start Alterations Main View 25249446 25295121 45431555 45483034 e Ei Chromosome Views 88244306 88248764 n Cluster Manager 38389448 38445293 9 Di Analysis Results 7512463 7531642 May 8 2006 12 08 51 PM 46810610 46860144 54860933 54998851 May 8 2006 12 08 24 PM 128817685 128822853 S S IOS IO IO I May 8 2006 12 09 36 PM 21792634 21855967 May 8 2005 12 18 58 PM May 8 2006 12 19 28 PM May 8 2006 12 27 31 PM 8 Data Source Selection Ji CloneDeletions 1 Ji GeneAiterations 2 9 GeneAmplifications EB Results M GeneDeletions Results General Information s Script Manager e S History ultiExperiment Viewer Gene am
321. zing the biological significance of large and heterogeneous lists of genes while simultaneously archiving and integrating the data via links to published literature In short LOLA allows researchers to measure the similarity of their gene list to those in the published literature Additional Submission Login Info Establish user name and password at www lola gwu edu Species Accepted Human Data Type Affymetrix Submission ID Types Affy_ID or LocusLink ID Reset Cancel Submit 8 1 6 Sample Repository Information Page 57 9 Utilities Menu 9 1 Search Utility The Search feature permits the user to search the data for genes or samples for a search term given search criteria Once the search is complete the elements are returned in a table Navigation shortcuts provide a means to open cluster viewers that contain the elements found in the search The search initialization dialog allows the option of finding genes or samples The search criteria include a search term a selection to make the search case sensitive and a selection to permit the search term to be an exact match or simply a contiguous portion of a larger annotation term Global Search E Search Mode Gene Search Sample Search Search Term Search Term Case Sensitive Exact Match Searchable Fields Select All Fields Reset Selection vj Clone ID _ GB Tc 1 Putative R

TIGR MultipleExperimentViewer (MEV)

Contents

Download Pdf Manuals

Related Search

Related Contents