Home

VIB`Ies Analysis of public microarray datasets

1. Sort KEGG_REACTION Probes 13 0 PANTHER_TERM_BP Samples 15 0 PANTHER_TERM_MF EL WIKIPATHWAY 525 Tomm5 Timm a1 DUM Keword sr n w mn mitochondrial pat Nutt mitochondrion O AED mitochondrial membrane 27369 1 mitochondrial envelope BERD Tfpi mitochondrial inner membrane ERB organelle inner membrane BEB organelle envelope 8 NULL Ywhah envelope 146 27 Pla2g5 mitochondrion 6 88 6 cytoplasmicpat 1 organelle membrane ES Aqpi 12477932 C Ll Trappc4 1 52 5 _ Atic March2 Save Save annotation Export heatmap data fourth heatmap Annotation KEGG_PATHWAY Search Sort KEGG_REACTION v gt 2 j L m 2 w 79 Probes 13 0 Samples 15 0 PANTHER_TERM_MF WIKIPATHWAY 525 Tomm5 Timm a1 Phyh me Keyword Q value mani mitochondrial part mitochondrion mitochondrilmembrne 223E9 mitochondrial envelope BERD mitochondrial inner membrane 2 organelle inner membrane BE organelle envelope 08 envelope EF Ywhah Pla2g5 mitochondrion
2. wo S S X XX 3 X X X w 09 go 02 wo 02 wo 02 92 go O This plot represents the distribution of data in all samples Since the data is supposed to be normalized you expect comparable boxes for all samples When box plots show large divergence it might point to the fact that the data in the Series Matrix file was not yet normalized Unfortunately you cannot perform normalization in GEO2R If the boxes are very different then it is not possible to compare the samples Search for the top 250 differentially expressed transcripts Since the boxplots show that the data has been normalized we can now proceed with finding DE genes top 250 being a good proxy for downstream analysis between the two groups Options can be set in the Options tab to handle log transformation and multiple testing correction to be applied to the data The default Options are shown below and are the best choice for most data sets GEO2R Value distribution Options Profile graph R script Apply adjustment to the P values More Apply log transformation to the data More Category of Platform annotation to display on Benjamini amp Hochberg False discovery rate Auto detect dn _ Benjamini amp Yekutieli 7 Yes Submitter supplied Bonferroni No NCBI generated _ Hochberg Holm _ Hommel None If you edit Options after performing an analysis you mu
3. umm 40 20 0 20 40 c PC1 distribution of the across principal components plot e e GSM160092 CEL GSM160093 CEL GSM160094 CEL GSM160091 CEL GSM160089 CEL GSM160090 CEL GSM160095 CEL GSM160096 CEL GSM160100 CEL GSM160098 CEL GSM160099 CEL as dist 1 cor exprs eset method pearson hclust complete Define design In this part we define contrasts and start the differential analysis Please note that complex design are possible here by defining metagroups next to the sample groups eg mouse background stimulant This is not the aim of this very simplistic design comparing two tissues Please refer to the software documentation for more detailed examples rFiles JUsers splaisan Projects BITS TUTORIALS BITS tutorials work Analysis of public microarray datasets C JUsers splaisan Projects BITS TUTORIALS BITS tutorials owork Analysis of public microarray datasets C In this step you can arrange your 39 Add selected JUsers splaisan Projects BITS TUTORIALS BITS tutorials work Analysis of public microarray datasets C data files in groups e g representing h li f 1 Users splaisan Projects BITS TUTORIALS BITS_tutorials work Analysis_of_public_microarray_datasets Users splaisan Projects BITS TUTORIALS BITS_tutorials work Analysis_of_public_microarray_datasets choose which groups are to be com
4. found that some computers operating systems refractory to RobiNA even with 6GB RAM may have issues running RobiNA with as few as 10 CEL files RobiNA requires a recent version of Java JDK you can obtain JDK from 1 http www oracle com technetwork javal javase downloads jdk8 downloads 2133151 html The RobiNA developers do not actively support the software right now and you will need to try things by yourself if you have issues The CDF annotation file RobiNA needs a CDF file to work CDF files are complex text tables allowing the identification of the probes and genes associated with the chip spots such files are sometimes difficult to find The RobinA software was developed by a plant groups and has plant as main focus and does not include non plant annotations For those who want to analyze other living organisms the microarray annotation file corresponding to the used chip should be obtained before starting RobiNA A place where to find Affymetrix CDF files should be the company website but it is often difficult to locate the CDF among the long lists of available Affymetrix resources http www affymetrix com estore free registration required The easiest way to obtain the correct Affymetrix CDF file is probably by using the Affymetrix Expression Console STU i the Affymetrix Expression Console software is only available for Windows and downloading using it requires a free user regist
5. 6856 Alpk2 cytoplasmic part 709E 6 eat organelle membrane ERS m 12477932 SERS mapeia ES E Atic March2 Save Save annotation Export heatmap data Other buttons and tabs allow inspecting details of any particular view Genes can be sorted alpha or by clustering order and a given gene can be searched using the search box Conclusion And there is even more for Geeks The data mining package RTools4TB can perform calls to TranscriptomeBrowser web service and implements the DBF MCL algorithm The R package RTools4TB http www bioconductor org packages 2 5 bioc html RTools4TB html is now part of the Bioconductor project References 1 1 http tagc univ mrs fr tbrowser Cyrille Lepoivre Aur lie Bergon Fabrice Lopez Narayanan B Perumal Catherine Nguyen Jean Imbert Denis Puthier TranscriptomeBrowser 3 0 introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks BMC Bioinformatics 2012 13 19 PubMed 22292669 WORLDCAT DOT I e Fabrice Lopez Julien Textoris Aur lie Bergon Gilles Didier Elisabeth Remy Samuel Granjeaud Jean Imbert Catherine Nguyen Denis Puthier TranscriptomeBrowser a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database PLoS ONE 2008 3 12 e4001 PubMed 19104654 WORLDCAT DOT I p 2 http w
6. 8 178 386 07 synapto Tasa OUS ST 175 10 18IEO7 Tomi troponi calpain 11398251 Rn97431 592 268 15632 674E 08 0000005 Camk2b calcium 11374659 7511 911 19 14361 70809 0 000001 Arpp2l ___ 11373697 at ____ 275861 1413 70 13949 227E 12 720E 09Mybpc2 myosin 1398306 1308 67 0 281 11 _ 300E 08 Ampd adenosi 1370359 at Rn 67070 2 8 72 1 97 107 49 0 000086 0 000973 1 amylase 1398655_at 94931 920 2 62 95 46 2 66E 07 0 000013 _ myoger 1381575 at Rn 15517 1 1190 5 36 93 45 1 99E 07 0 000011 nebulin 1374049 at Rn243811 948 2 93 93 39 2 36 07 0 000012 10 10035 smooth TEREA see bu qkuna iu iu om ON cu Transcript Transcript Cluster T nsctipt ID 01 eoueayiuBis Gene Rows 2002 Selected Rows 102 Selected 09 23 Q ND ooa The count of UR and DR genes is reported in the summary page 76 Windows 8 1 8 9E P b Fl 3 af fymetr Ix GSE6943 CAT RMA tac RAE230A Analysis Result Summary N Table Scatter Plot N Volcano Plot Chromosome Summary Hierarchical Clustering Heart vs Diaphragm Analysis Type Gene Level Differential Expression Analysis Array Type RAE230A Genome Version rn5 Annotation File RAE230A na34 annot csv 1 Total number
7. 26 August 2014 at 14 09 This page has been accessed 150 times m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 61 PubMA Exercise 6 From BioWareWIKI Full analysis workflow using CLC main workbench Main Page Hands on Analysis of public microarray datasets _ 5 PubMA Exercise 6 Analyze GEO data with the Affymetrix software build experiment Compute groupwise DE limits filtering functional enrichment AFFYMETRIA ST uy i The following content was directly taken from the current CLC documentation Feb17 2014 and applies only for people with access to the CLC Main workbench Contents CLC Tutorial material 1 1 Loading microarray data into the Workbench 1 2 Loading your own Affymetrix microarray data into the Workbench 1 3 Main results and specific settings 4 Enrichment analysis in CLC Main 1 4 1 hypergeometric tTest 14 2 GSEA 2 Published results 3 download exercise files CLC Tutorial material Users from VIB labs or people having a license for the CLC Main workbench can proceed with this exercise during the afternoon open session or later from home The final CLC project folder can be downloaded from the BITS server Heart vs Diaphragm zip and imported in the User CLC project manager The workflow is not further discussed here and we invit
8. k 1376227 at 1 44 10 1 0 13 38 7 21 6 6 48 Myoz1 myozenin 1 k 1387065 at 1 448 10 1 09 13 38 6 21 6 5 85 Plcd4 phospholipase C k 1384202 at 2 03 10 1 67 13 37 2 21 2 6 68 Tesc tescalcin k 1386931 at 2 03 10 1 78e 13 37 21 2 6 71 Tnni3 troponin type 3 k 1384178 at 2 562 10 2 422 13 36 1 20 9 4 38 Lrrc b leucine rich repe k 1371288 at 2 56e 10 2 57 13 35 8 20 8 6 34 H19 H19 imprinted m k 1367604 at 3 12e 10 3 33e 13 35 1 20 6 3 56 2 cysteine rich prot k 1367896 3 70e 10 4 18e 13 34 4 20 4 8 49 carbonic k 1375738 at 3 71e 10 4 43e 13 34 2 20 4 3 46 Ehd4 EH domain conta k 1389532 at 4 67 10 5 98e 13 33 4 20 1 Nebl nebulette Options Profile graph R script Log transformation has been applied to the data You can change this in the Options tab Save all results Select columns 5 07 The limma analysis results in a list of the 250 transcripts with the lowest p values ranked by increasing p value The results table contains the following columns adj P Val p value after correction for multiple testing This column is the statistic you should use for interpreting the results Genes with the smallest adjusted p values will be the most reliable Selecting all transcripts with adjusted p values lt 0 05 is equivalent to setting the False Discovery Rate FDR to 0 05 allowing that 5 of the selected DE genes are false positives As you can see GEO2R alway
9. Analysis of public microarray datasets Learn to QUO SIU su ol Jo Jepun 5 aul einquysip Aew siy UOdN JO ullojsu 1 eye J exi v eJeus y JO esn JO siopu Adu zey sjseDDns yeu Aem Aue ul jou 104 g A 1osSu ol pue Joujne y YIOM OU noA uonnglnv suonipuoo OU Jepur y 0 y pue doo eJeug 0 99JJ Ie su or1 0 2 exivyeJeug uonnqumy SUOLULUOY 5 e JOPUN SI 514 q qlA www dnu GIA 9ui Jo soneuuojuroiq Aq peuwo SI 8141 ALITI2VJ 33IA3d3S SNINIVHI I S 23ILWWSHO3NIOIH Hands on Analysis of public microarray datasets From BioWareWIKI Date October 17 2014 from 9h30 to 17100 Hands On Series Analysis of public microarray datasets 5 NCBI Main_Page Contents 1 Introduction 1 1 Summary a 1 2 Required skills 1 3 Morning Session 1 4 Afternoon Session a 1 5 More Info 2 Exercises a 3 Additional resources 3 1 Additional tutorials m 3 2 Web services and resources 3 3 Meta Analysis Resources 3 4 Commercial resources licensed by VIB 3 5 Do you still need MORE
10. Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160096 CEL vs Ihomejsplaisan Desktop Robi ults GSE6943 CEL GSM180098 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160096 CEL vs Ihomejsplaisan Desktop Robi ults GSE6943 CEL GSM180099 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160096 CEL vs Ihomejsplaisan Desktop Robi ults GSE6943 CEL GSM180100 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160098 CEL vs Ihomejsplaisan Desktop Robi ults GSE6943 CEL GSM150099 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160098 CEL vs Ihomejsplaisan Desktop Robi ults GSE6943 CEL GSM180100 CEL NINININININ Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160099 CEL vs thome splaisan Desktop Robi ults GSE6943_CEL GSM160100 CEL PCA Plot of 11 Affymetrix data files HCLUST Plot of 11 Affymetrix data files a Previous gt Next Step 3 of 4 Evaluating QC results One plot of each kind is reproduced below and shows that RobiNA generates classical QC plots showing how good the data is and how well it divides according to the defined groups Details for each QC Plot type Collapse Boxplot of 11 input files GSM160089 CEL gt LFC1 0 17 1
11. third party tools PubMA Exercise 1 Search GEO to find public datasets related to one s project PubMA Exercise 2 Compute differential analysis using GEO2R within the NCBI web portal and follow up in RStudio PubMA Exercise 2b Optional follow up analysis demo in RStudio PubMA Exercise 3 Clustering using the GEO Dataset browser only for data with attached GDS ID PubMA Exercise 4 Full RobiNA analysis as a standalone desktop alternative to GEO2R and Bioconductor PubMA Exercise 5 Web tools for functional enrichment of the obtained lists to identify key biological functions Analyze_GEO_data_with_the_Affymetrix_software Optional exercise using the Affymetrix Transcription Analysis Console Windows only free program Normalize CEL files with RMAExpress Windows and MacOSX free program PubMA Exercise 6 Optional exercise using the fully integrated CLC Main workbench for VIB users and CLC license owners PubMA_Exercise 7 Optional IPA analysis of the GEO2R DE table for VIB users and IPA license owners Additional resources Additional tutorials Find Transcriptome Signatures with TranscriptomeBrowser Alt option to get enrichment from the GEO data and much more Analyze your own microarray data in R Bioconductor See how to analyze your own microarray data in R Bioconductor Web services and resources Only few of the following resources will be used during this training GEO2R www ncbi nlm nih gov geo geo2r
12. Seay x xx x x x x gt x x x TC E Ac x x X x Susp 5 x x XX x x x x lt gt yX X x xX gt x x x x se x x x s x X x X x XX Diaphragm _____ 4 5 5 7 8 9 10 11 12 13 Volcano plots are very popular and show how confident the data is and how many genes show deviation from the steady state 200 1E 20 7 gt lt 190 1 19 180 1E 18 170 1E 17 4 160 1E 16 4 150 1 15 140 1E 14 130 1 13 120 1 12 110 1 11 100 1E 10 90 1E 09 X 2 0607 01 e sueoyiubis 80 1E 08 4 TO 1E 07H amp 0 1E 06 50 1E 05 4 40 0 0001 1 30 0 001 x SAN 25 2 E ds x 20 0 01H 3 10 0 1H 512 256 128 64 32 16 8 4 2 1 2 4 8 16 32 64 128 256 512 The interactive nature of the plot allows identifying outliers or significantly DE genes using the mouse 75 eoo Windows 8 1 gs 9 9 R E db l En XE m a metrix Analysis_1 tac RAE230A Analysis Result Scatter Plot Volcano Plot N Chromosome Summary Hierarchical Clustering Comparison Heart vs Search Prev Next Show Hide Columns Export Upin Heart vs Diaphragm E Down in Heart vs Diaphragm V Show Filtered Only Clear Current Clear Current Filter s Reset to De Reset to Default Reset to Default Customize An
13. 1092 vs Diaphragm Diaphragm Diaphragm 1367962_at Rn 17592 1 3 24 3059 71 6 76E 11 5 11E 08 Actn3 actinin 1374248 at Rn 9153 1 7 2 90 2986 48 3 76E 13 241E 09 Mybpcl myosin 1370214_at Rn 2005 1 3 07 2166 95 8 24E 10 2 62 07 Pvalb parvalbi 1372195 at Rn 43529 1 5 09 1052 32 1 91E 08 0 000002 Tnnc2 Itroponir 1386977 at Rn 1647 1 3 18 950 46 6 09E 13 241E 09 arboni 1374391 at Rn 16457 1 4 07 925 06 0 000011 0 000205 1387787_at Rn 6534 1 5 58 832 95 5 48E 10 1 98E 07 1370971 at Rn 40497 1 5 54 511 45 1 19E 09 3 34E 07 1370900_at Rn 1072 1 2 58 490 63 9 15E 07 0 000032 1371247_at Rn 22504 1 6 06 488 19 1 22E 08 0 000002 1388139_at Rn 10092 1 6 05 418 68 4 09E 10 1 69E 07 e All informabon 1369502 a at Rn 67070 1 1 27 391 82 7 61E 12 173E 08 1367964 at Rn 9924 1 6 95 348 32 2 27E 11 3 00E 08 Tnni Gene Symbols to Export 1371000_at Rn 10738 1 3 34 327 12 1 40 09 3 53E 07 1376968 at Rn 26659 1 4 01 308 15 2 24E 08 0 000002 1st gene symbol only 1370412 at Rn 13846 1 6 04 291 17 2 16E 08 0 000002 1367896_at Rn 1647 1 5 88 270 73 1 56E 10 8 53E 08 gene symbols 1371339_at Rn 11675 2 6 31 258 31 1 20E 11 2 12E 08 1372190_at Rn 36859 1 1 70 250 30 0 000005 0 000110 1368108 at Rn 10833 1 7 35 230 38 1 30E 07 0 000008 cepe 1370033 at Rn 40120 1 7 15 222 39 7 62 12 1 73E 08 1390355 at Rn 38647 1 5 29 195 73 1 77E 08 0 000002 1374012 at Rn 6456 1
14. 14 4 560e401 23 20335 8 25 Tnnil troponin I type 1 skeletal slow 1373697 at 7 19 11 2 71 14 36 01 22 71378 7 13 Mybpc2 myosin binding protein C fast type 1398306 at 9 89e 11 4 35e 14 4 18e 01 22 33291 1 06 Ampdl adenosine monophosphate deaminase 1 1374672 at 1 14 10 5 70 14 4 0Be 01 22 11085 B 82 Tnni3k TNNI3 interacting kinase 1367962 at 1 44e 10 B 13e 14 3 96e401 21 81731 1 15e 01 Actn3 actinin alpha 3 1367964 at 1 44e 10 9 53 14 3 912 01 21 6842 8 57 Tnni2 troponin I type 2 skeletal fast 1376227 at 1 44 10 1 0 13 3 87e401 21 59864 5 48 Myozl myozenin 1 13B7065_at 1 44e 10 1 09 13 3 86e401 21 57382 5 85 Pled4 phospholipase C delta 4 1384202 at 2 03e 10 1 67 13 3 72 01 21 20727 6 68 tescalcin 13B6931_at 2 03e 10 1 78 13 3 70e 01 21 15193 6 71 Tnni3 troponin I type 3 cardiac 1384178 at 2 56e 10 2 428 13 3 61 01 20 88785 4 38 Lrre4b leucine rich repeat containing 48 1371298 at 2 56e 10 2 57 13 3 592 01 20 83498 5 34 H19 H19 imprinted maternally expressed transcript non protein coding 1367604 at 3 12e 10 3 332 13 3 51 01 20 6094 3 56 2 cysteine rich protein 2 1367896 at 3 70e 10 4 18e 13 3 44e 01 20 40929 B 49 Car3 carbonic anhydrase 3 1375739 at
15. 3 71e 10 4 43e 13 3 422 01 20 35831 3 46 Ehd4 EH domain containing 4 1389532 at 4 67e 10 5 98e 13 3 34e 01 20 09179 5 07 Nebl nebulette 1370198 4 67e 10 6 44e 13 3 31e401 20 02504 5 15 Trdn triadin 1370157 at 4 67e 10 6 45e 13 3 31 01 20 02293 6 31 Bln phospholamban 13B6873_at 5 30e 10 7 66B 13 3 26e401 19 86935 7 37 Tnnil troponin I type 1 skeletal slow e If you wish to upload this table to Ingenuity Pathway Analysis IPA you may consider opening it first in Microsoft Excel and save it back as a xls file This will remove the double quotes around fields and allow better recognition of your data by IPA Saving the Rscript for further use in RStudio This is the last step of this tutorial and the first step of the follow up page PubMA Exercise 2b where we will produce R script to perform the GEO2R analysis on our own computer and prepare for more advanced microarray analyses GEO2R Value distribution Options Profile graph R script Version info R 2 14 1 Biobase 2 15 3 GEOquery 2 23 2 limma 3 10 1 R scripts generated Tue Aug 12 05 30 54 EDT 2014 Differential expression analysis with limma library Biobase library GEOquery library limma load series and platform data from GEO gset lt getGEO GSE6943 GSEMatrix TRUE if length gset gt 1 idx l
16. 7 79 6 1371339 at 4 62E 11 8 25 7 1373697 at 7 19E 11 7 13 8 1398306 at 9 89 11 22 7 06 9 1374672_at 1 14 10 1 22 8 82 10 1367962 at 1 44 10 11 5 Annotated Dataset geo2r DE table Preview Dataset geo2r_DE table Mapped IDs 13169 Unmapped IDs 2754 All IDs 15923 gt 8 80 01 0 030 1398906_at 0610009B22Rik RIKEN cDNA 0610009B2 4 35E 01 1390239 at 1500009L16Rik RIKEN cDNA 1500009L1 5 95 01 1378421 at 1500009L16Rik RIKEN cDNA 1500009L1 4 85E 01 1377537_at 1700015G11Rik RIKEN cDNA 1700015G 2 26E 02 1371434_at 1810037117Rik Gm203 predicted gene 2036 3 S0E 03 1375706_at 2200002D01Rik RIKEN cDNA 220000200 6 93E 02 1388186_at 2210010 04 RIKEN cDNA 2210010 0 4 87E 01 1389196_at 2310039 08 RIKEN 2310039 0 Other Other Other Other Nucleu Extraci Other Start core analysis and set filter Set Cutoffs Expression Value Type Cutoff Range Focus On False Discovery Rate q value 0 01 0 0 to 1 0 368 analysis ready molecules across observations Log Ratio 2 11 5 1011 0 Both Up Downregulated M When IDs map to the same gene protein or other molecule Apply cutoffs before consolidating IDs Yes recommended ADVANCED Set Cutoffs Resolve duplicates using Exp Value Lo
17. 721 1370157 at Rn 9740 1 6 67 13 08 0 35 0 16 85 15 2 96E 11 7 60E 09 Pin phospholamban chr20 36390885 36399430 22 1389727 at Rn 18919 1 4 01 10 34 0 16 0 18 80 08 3 05E 13 3 02E 10 LOC100911101 Lrrc10 leucine rich repeat containing protein 10 like leucine rich repeat co chr7 60099897 60100444 26 1374816 at 2307 1 541 11 61 0 18 0 25 73 37 3 43E 12 1 76 09 1 GRINL1A complex locus 1 chr8 75066585 75067085 30 1373987_at Rn 9940 3 4 49 10 55 0 12 0 18 66 56 2 16E 13 2 28E 10 Kcnip2 LOC100911951 Kv channel interacting protein 2 Kv channel interacting protein 2 lik chr1 270360899 2703615 31 1384202 at Rn 14758 1 5 31 11 33 0 16 0 19 64 99 8 09E 13 6 11E 10 Tesc tescalcin chri2 45972889 45978551 33 1371951 at 3849 2 5 9 1171 0 44 0 11 56 08 4 33 10 6 42E 08 Fhi2 four and a half LIM domains 2 chr9 49591196 49591681 735 1374672 at X Rn 3434 1 3 98 9 6 0 08 0 19 49 27 1 98E 13 2 28E 10 Tnni3k TNNI3 interacting kinase chr2 279730852 2797313 38 1389411 at Rn 19666 1 429 9 78 0 18 0 2 45 17 4 10E 12 2 03E 09 chr3 120021190 1200220 _44 1371677 Rn 3817 1 5 22 10 46 0 3 0 16 37 91 8 22E 11 1 65E 08 Spink8 serine peptidase inhibitor Kazal type 8 chr8 117258777 1172703 47 1367564 at 2004 1 5 94 11 16 0 18 0 4 37 24 3 15E 10 5 05E 08 Nppa natriuretic peptide chr5 168466312 1684676 48 1388506 at Rn 7293 1 7 34 12 44 0 26 0 13 34 43 1 60E 11 4 98E 09 Dsp desmoplakin chr17 29201855 29202516 50 1367949
18. Hands on Analysis of public microarray datasets Search public GEO datasets identify specific transcriptome signatures and perform functional enrichment on the found sets Contents Introduction 2 Walk through example 2 1 Load the GSE6943 dataset 2 2 Find Transcriptome Signatures 2 3 Show HeatMaps for each TS 3 Conclusion 3 1 And there is even more for Geeks Introduction TranscriptomeBrowser TBrowser l host a large database of transcriptional signatures TS extracted from GEO public microarray repository 2 TS have been produced using a new algorithm called Density Based Filtering And Markov Clustering DBF MCL The current database contains about 30 000 TS derived from 4 000 microarray datasets 222 millions expression values Each TS was tested for functional enrichment using annotation obtained from numerous ontologies or annotation databases Gene Ontology KEGG BioCarta Swiss Prot BBID SMART NIH Genetic Association DB COG KOG TargetScan PicTar TFBS conserv ed MSigDB GeneSigDB TBrowser comes with a sophisticated search engine so that users can perform combined queries using boolean operators VERSION 3 0 TranscriptomeBrowser host a large database of transcriptional signatures TS n 40 000 extracted from Gene Expression Omnibus 4 000 experiments using the DBF MCL algorithm TBrowser comes with a sophisticated search engine so that users can search for the biological contexts
19. Introduction Summary This basic training will give you an overview of the what GEO has to offer Several experiments will be analyzed using simple tools to obtain differential gene lists An introduction to downstream tools dedicated to functional enrichment will close the session Required skills This training is meant for biologists with little or no data of their own that need to identify genes of interest associated to a given biological problem The participants do not need any prior knowledge of programing Morning Session Find relevant data on GEO Analyze using the NCBI GEO2R utility Continue the GEO2R analysis in RStudio intro Find cluster using the NCBI GEO DataSets browser Analyze the same data using RobiNA Perform functional enrichment on the DE gene lists using public tools Afternoon Session Users search GEO datasets and analyze them with tools discovered during the morning session Users with access to CLC Main can follow the CLC tutorial VIB only Users with access IPA can follow the IPA tutorial VIB only More Info More on the VIB website 12 Related VIB training sessions 3 Related BITS Website pages 4 Exercises A Gane Omnibus ua Find datasets atas L follow up we functional enrichment de GEO2R analysis DATASFT BROWSER B KO You will find in this section exercises performed during the session
20. Run analysis is clicked to compute differential expression between the two groups affymetrix New Analysis Open Existing Result Preferences Gene Level Differential Expression Analysis Import Data Remove Selected Parse File Names Show Grouped Files Name Array Type File Type File Path Transcriptome Analysis Console 2 0 Heart GSM160089 rma GSM160090 rma GSM160091 rma GSM160092 rma GSM160093 rma GSM160094 rma Diaphragm GSM160095 rma GSM160096 rma GSM160098 rma GSM160099 rma GSM160100 rma Click to Create New Condition Analysis File psf Home Documents TAC AnalysisResults Analysis_2 tac Browse Run Analysis Other expression analyses can be performed when the probe type is compatible with transcript level analysis discerning between alternative transcripts However this is not demonstrated here and we only provide the example of gene level analysis The summary of a standard DE analysis is shown with counts for UR and DR genes under standard filtering values more than two fold difference between the groups and adjusted p value lt 0 05 73 affymetrix GSE6943_CAT RMA tac RAE230A Analysis Result Summary big Scatter Plot Volcano Plot Chromosome Summary Hierarchical Clustering Bl Upin Heart vs Diaphragm 0 Down in Heart vs Diaphragm Heart vs Diaphragm Analysis Type Gene Level Differential Expression Analysis Array Ty
21. SEnrichr Transcription Ontologies Disease Drugs Cell Types Misc 65 6943 DE Robina 238 238 genes lt KEGG Table Grid Network 42 Click the bars to sort Now sorted by combined score ING PATHWAY ERE MENT AND COAGULATION CASCADES ON RI SIGNALING PATHWAY SIGNALING PATHWAY AND GLUCONEOGENESIS FRUCTOSE AND MANNOSE METABOLISM LEUKOCYTE TRANSENDOTHELIAL MIGRATION SEnrichr Transcription Pathways Disease Drugs Cell Types Misc 971144711411 GSE6943_DE Robina 238 238 genes lt GO Biological Process Table Grid Network 42 Click the bars to sort Now sorted by combined score OR MUSEIEEORERAEHIG 0 0006937 GO 0055008 ment GO 0007517 ERREUR contraction GO 0008016 GO 0006816 ted muscle contraction GO 0006942 di trivalent inorganic cation transport GO 0015674 ent based movement GO 0030048 n r Login Register Transcription Pathways Ontologies Cell Types Misc GSE6943_DE Robina 238 238 genes lt Up regulated CMAP Down regulated CMAP GeneSigDB OMIM Disease Table Grid Network Click the bars to sort Now sorted by combined score rophy blood alzheimer disease Enrichr Losin Regt Pathways Ontologies Di
22. To use this cdf file execute the following R commands biocLite ath1121501cd uf load Bioconductor libraries library ath112150lcdf library affy uf specify path on your computer where the directory that contains the CEL files is located icelpath C Users Janick My Documents R win library 2 14 affydata celfiles uf import CEL files containing raw probe level data ReadAffy celfile path celpath rl BrainArray provides a list of custom annotation packages http brainarray mbni med umich edu bioc bin windows contrib 3 0 To use these cdf files download the zip file from the website Install it from the local zip file IR 64 bi File Edit View Misc Windows Help Vignettes Load package RROmoe Set CRAN mirror Select repositories Install package s version 3 1 1 Update packages Copyright C 2 1 Platform x86 6 Install package s from local zip files Then execute the following code load Bioconductor libraries dlibrary affy specify path on your computer where the directory that contains the CEL files is located icelpath D R 2 15 2 library affydata celfiles Apum of import CEL files containing raw probe level data data ReadAffy celfile path celpath ff indicate you want to use the custom cdf If you don t specify the cdfname Bioconductor will use the default Affymetrix cdf data cdfName ATH1121501 At TAIRT You can f
23. f Claudia Mimoso Ding Dar Lee Jiri Zavadil Marjana Tomic Canic Miroslav Blumenberg Analysis and meta analysis of transcriptional profiling in human epidermis Methods Mol Biol 2014 1195 61 97 PubMed 24297317 4WORLDCAT DOI I p Matgorzata Janas Kozik Urszula Mazurek Irena Krupka Matuszczyk Matgorzata Stachowicz Joanna Gtogowska Ligus Tadeusz Wilczok The transcript expression profile of the leptin receptor coding gene assayed with the oligonucleotide microarray technique could this be an anorexia nervosa marker Cell Mol Biol Lett 2006 11 1 62 9 PubMed 16847749 WORLDCAT DOI I p Yoseph Barash Elinor Dehan Meir Krupsky Wilbur Franklin Marc Geraci Nir Friedman Naftali Kaminski Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays Bioinformatics 2004 20 6 839 46 PubMed 14751998 WORLDCAT DOI P p 2 T http rmaexpress bmbolstad com Main Page Hands on Analysis of public microarray datasets Retrieved from http stelap local BioWareWIKT index php titlezNormalize CEL files with RMAExpress amp oldid 1 1803 Category Howto This page was last modified on 20 October 2014 at 10 08 This page has been accessed 20 times 85 m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 86 Find Transcriptome Signatures with TranscriptomeBrowser From BioWareWIKI Main Page
24. las 2 col f1 legend topleft labels fill palette bty n Improved code The next script was adapted to keep local files and to save results to file instead of sanding them to a new browser window Edits are shown where original lines are commented out by 7 m destdir base saves the downloads to the local folder instead of to the temp directory m number nrow fit2 generates the full table instead of the top250 We now show the edited script where lines starting with are modified in the next line s added configuration ibase PUBMA2014 ex2b files sep isetwd base Version info 2 14 1 Biobase 2 15 3 GEOquery 2 23 2 limma 3 10 1 R scripts generated Tue Aug 12 05 30 54 EDT 2014 AEE EEE AAA Differential expression analysis with limma i ilibrary Biobase ilibrary GEOquery ilibrary 1limma load series and platform data from GEO gset lt getGEO GSE6943 GSEMatrix TRUE igset lt getGEO GSE6943 GSEMatrix TRUE destdir base iif length gset gt 1 idx lt grep GPL341 attr gset names else idx lt 1 gset lt gset idx make proper column names to match toptable i fvarLabels gset lt make names fvarLabels gset group names for all samples i lt c G0O G0 G0 G0 G0 G0 G1 G1 G1 G1 G1 G1 i log2 transform lt exprs gset i lt as numeric quantile ex c 0 0 25 0 5 0 75 0 99 1 0 na rm T
25. 10930 lt qx 5 gt 100 i 6 1 gt 50 amp amp 2 gt 0 2 gt 0 amp amp qx 2 lt 1 amp amp qx 4 gt 1 amp amp qx 4 lt 2 i if LogC ex which ex lt 0 lt NaN i exprs gset lt log2 ex set up the data and proceed with analysis fl lt as factor sml igset description lt fl design lt model matrix description 0 gset icolnames design lt levels fl fit lt lmFit gset design icont matrix lt makeContrasts Gl G0 levels design ifit2 contrasts fit fit cont matrix fit2 eBayes fit2 0 01 lt topTable fit2 adjust f r sort by B number 250 tT lt topTable fit2 adjust fdr sort by B number nrow fit2 load NCBI platform annotation igpl lt annotation gset platf getGEO gpl AnnotGPL TRUE 21 platf lt getGEO gpl AnnotGPL TRUE destdir base mcbifd lt data frame attr dataTable platf table uf replace original platform annotation tT lt tT setdiff colnames tT setdiff fvarLabels gset ID tT lt merge tT ncbifd by ID tT lt tT order tT P Value restore correct order d tT lt subset tT select c ID adj P Val P Value t B logFC Gene symbol Gene title tT final lt subset tT select c ID adj P Val P Value t B logFC Gene symbol Gene title 7 write table tT final file stdout row names F sep t write table tT fi
26. 2F 2Fanalysis ingenuity com 2Fpa4c2Fj spring cas security check amp originalUrl https 403A 2F 2Fanalysis ingenuity com 2Fpa 3Futm_source 3DIngenuity 26utm medium 3 Please keep in mind that IPA is only meant for human mouse rat data IPA Tutorial material The starting material for the IPA upload be downloaded from this link http data bits vib be pub trainingen gPUBMA2014 ex7 files geo2r DE table xls bottom of this page IPA analysis Upload data in IPA Dataset Upload 2 DE table xls 1 Select File Format Flexible Format v More Info 2 Contains Column Header Yes O 3 Select Identifier Type Affymetrix Specify the identifier type found in the dataset 4 Array platform used for experiments Rat Expression Set 230 A Y Select relevant array platform as a reference set for data analysis 5 Use the dropdown menus to specify the columns that contain identifiers and observations For observations select the appropriate expression value type Raw Data 15924 Dataset Summary 13169 More Info ID v Observation 1 gt Ignore Ignore Ignore v Observation 1 Ignore Ignore False Discov v Log Ratio X 1 D logFC 2 1388876 at 7 67E 12 25 288 7 14 3 1374248 at 7 67E 12 7 11 5 4 1374622_at 7 67E 12 4 69 5 1370033 at 3 47 11
27. 4 65 189 31 1 26E 09 3 38E 07 synapto 1386873 at Rn 4035 1 6 69 176 57 3 75E 10 1 61E 07 Tnni troponir 1369375 a at Rn 9726 6 3 27 176 35 3 49E 08 0 000003 calpain 1398251 a at Rn 9743 1 2 63 156 32 6 74E 08 0 000005 calcium 1374659_at Rn 7511 1 1 95 143 61 7 08E 09 0 000001 cAMP r 1373697 at Rn 27586 1 7 00 139 99 2 27E 12 7 20E 09 myosin 1398306_at Rn 9794 1 6 07 129 20 2 81E 11 3 00E 08 adenosi 1370359_at Rn 67070 2 1 97 107 49 0 000086 0 000973 amylase 1398655_at Rn 9493 1 2 62 95 46 2 66E 07 0 000013 myoger 1381575_at Rn 15517 1 5 36 93 45 1 99E 07 0 000011 nebulin 1374049_at Rn 24381 1 i 2 93 93 39 2 36E 07 0 000012 10 10035 smooth Gene Symbol Transcript Cluster ID Gene level Information only ea ee 4 Condition Heart File GSM160093 ID 1388044_at Pfkfb2 Signal 1 20 Additional columns can be added to the table if the user needs them Show Hide Columns evo ff Transcript Cluster ID Transcript ID Array Design Heart Bi weight Avg Signal 1092 Diaphragm Bi weight Avg Signal lo Heart Standard Deviation 3 B9 Ed D D D 8 pa p Diaphragm Standard Deviation Fold Change linear Heart vs Diap ANOVA p value Heart vs Diaphrag FDR p value Heart vs Diaphragm Gene Symbol Description
28. 48475646698994 9 4716451160078 9 59635427766681 9 65872553515732 9 63503499785589 9 11367453 at 10 1553644909771 10 2499020063438 10 1861859768652 10 0965629262007 10 159434523851 10 2076070314483 10 2269807954293 10 195253118062 16 11367454 at 8 73905780465407 9 08252980064849 9 07161018095027 8 68762900735943 8 81499836330383 8 83806850836661 8 88654295440762 9 05310137910709 8 11367455 at 10 4763615418993 10 4471239467834 10 5710504733759 10 3264804870339 10 4745349426514 10 5552841089126 10 5959387062089 10 5139881779964 1 11367456 at 10 9984593902943 10 8441477983143 10 673570257498 10 6700249981427 10 6750331606405 10 7577588683534 10 1986166073672 10 3974448991471 10 1367457 at 8 78904682538038 8 72229122394877 8 67050724938203 8 9448960477874 8 72229122394877 8 66089835690261 8 76340465790531 8 76513510798621 8 11367458 at 7 53098410601309 7 35718220143744 7 71709223686958 7 65558449687549 7 45098048336888 7 52823368962808 7 7461437721208 7 40572408979464 74 11367459 at 11 1105760949228 11 2159762048115 10 9101244239086 11 149432063699 10 8953374468102 10 970737308597 11 2355512291063 11 2441035772632 11 11367460 at 9 96660388228079 9 84511446842093 9 94477628710098 9 91078389672863 9 84447947912025 9 94182586703467 9 87080073408934 9 70116063385167 9 Adding annotations to the RobiNA data Because RobiNA does not support non plant organisms it saves the data in a quite anonymous way with only probe IDs The code below is borrowed from th
29. Biological Process Gene Ontology Cellular Component 1 Gene Ontology Molecular Function Pathway InterPro Trans Membrane Annotation Description C Annotation Transcript Cluster Trancerint Accianmente Check Uncheck All A plot of differential expression per chromosome may highlight local regulatory biases hot spot loci 77 1 t i TUA n I F i i NI UP omi F il 1 21 T T 1 L 1 I T 1 17 F5 WaT d i un util 1 4 ah aT m FA Lh NUI Ft a LL al d NAT p tt gf ran w wn ny mmy pr 11 fn OY 12 1 8 13 14591 7 5 mom 16 a lan mi ot Ut aug 18 FS t gg gg 20 NT 9 W du x F 8 T num X Heatmaps can be generated that show genes with similar pattern of variation across samples eoo Windows 8 1 9 9 Ez 6 db D En 4 affymetrix Analysis_1 tac RAE230A Analysis Result Summary Scatter Plot Volcano Plot Chromosome Summary Hierarchical Clustering N Comparison Heart vs Diaphragm Search Prev Next Show Hide Columns Show Filtered Only Clear Curren
30. Chromosome Genomic Position Strand Comment After download to txt files results can easily be converted and filtered in the Excel spreadsheet editor 1368093 at Rn 54399 1 4 31 12 98 0 23 0 12 407 47 6 94E 14 1 14E 10 Myh6 myosin heavy chain 6 cardiac muscle alpha chr15 37492581 37516282 5 1367665 at Rn 3789 1 4 9 13 14 0 29 0 28 301 97 5 01E 12 2 21E 09 Ankrdi ankyrin repeat domain 1 chri 262038143 2620467 78 1367664 at 3789 1 4 52 12 48 0 43 0 19 249 07 3 07E 11 7 60E 09 Ankrd1 ankyrin repeat domain 1 chr1 262038143 2620467 9 1367592 at 9965 1 5 03 12 41 0 65 0 24 166 63 1 89 09 2 07 07 Tnnt2 troponin T type 2 cardiac chr13 57716336 57729185 10 1388876 at Rn 1192 1 5 68 12 79 0 24 0 18 138 33 1 52 12 9 54 10 Pin phospholamban chr20 36399680 36400626 11 1387049 at 54399 1 4 63 11 5 0 17 0 35 116 45 1 10 11 4 03 09 Myh6 myosin heavy chain 6 cardiac muscle alpha chr15 37492581 37516282 16 1369313 at 3849 1 6 12 73 0 62 0 1 105 96 1 97E 09 2 09 07 Fhi2 four and a half LIM domains 2 chr9 49591626 49620648 17 1367616 at 3835 1 4 10 62 0 14 0 26 98 18 1 44E 12 9 53E 10 Nppb natriuretic peptide B chr5 168454272 1684556 18 1388597 at Rn 28286 1 6 01 12 63 0 21 0 19 98 15 1 26E 12 8 71E 10 Mybpc3 myosin binding protein C cardiac chr3 86666789 86667480 19 13869314 Rn 64141 1 7 38 13 9 0 63 0 14 92 07 3 64E 09 3 38E 07 Tnni3 troponin type 3 cardiac chri 75665118 75668803
31. Clear Selection a Sh Eee Show Filtered Clear Current Clear Current Filter s Reset to Default Reset to Defauit Customize Annotations View Interaction Network View Interaction Network 140 1E 14 Diaphragm Fold Change ANOVA FOR p value Bi weight Gene linear Heart value Heart v Heart ID Avg Signal Symbol Design vg 519 vs Diaphragm 130 1E 13 Si os log2 mmm actinin 7 3 76 13 2 41 09 120 1 12 14 15 2166 95 8 24 10 2 62 07 parvalbt 1386977_at M 241 E 09 Car3 carboni 1374391_at 0 000205 Sin sarcolip 1387787_at 6534 1 98 07 Mylpf myosin 100 1E 10 3 34E 07 Myh1 My myosin 1370900 at f 5 0 000032 Myh4 myosin Ed at 1 troponit 409E 10 169E 07 Myh2 myosin 12 SA9 I6 Amyls A amylase 2 27E 11 300E 08 Tnni2 troponir 371000 2t 1 107381 1169 334 32712 14009 3 53E 07 Cacnals calcium 1376068 at 127 401 30815 224E 08 0 000002 Mybpc2 myosin 370412 t RnI3846 1 1423 604 2947 2416 08 0 00002 Tnnt 0 nnd s E Ca carboni 1 20E 11 2 12E 08 Tnnil at 36859 1 0 000005 0 000110 Aqp4 Rn 10833 1 1 30E 07 0 000008 Atp221 ATPase r 7 62 12 EM 08 Myl1 myosin ToS ao imn
32. Oryza sativa spp japonica 1 gt SolanumEsculentum P value correction gt Solanum lycopersicum Multiple testing z gt Solanum lycopersicum tomato nestedF strategy gt 21 Saccharum officinarum sugarcane gt Solanum tuberosum potato gt Triticum aestivum gt Vitis vinifera gt Zea mays gt Zea mays maize Write out normalized raw data Preview R script V Log fold change min 1 Download more mappings J ES Import new 39 Skip Annotate p value cutoff 0 05 t Create Metagroup Delete Metagroup Previous wb Next idle Finished analysis Manual We choose Skip as we do not have annotation files for rat The differential expression gets computed When done we select Exit from the front window RobiNA The transcriptomics data preprocessor Version 1 2 4_build656 Design your experiment You can arrange the groups by dragging them around Define which groups shall be compared by holding down the CONTROL key and then click dragging from the first group to the second group i i dalasa en Right click 2 eoo one metag you want ti and drawi around the Create Me Finished successfully Results were written to Users splaisan Desktop test Click Modify if you want to modify the design and re run the analysis Be sure to specify a different name for the output folde
33. PUBMA2014 This page was last modified on 16 October 2014 at 09 49 This page has been accessed 181 times Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 51 PubMA Exercise 5 From BioWareWIKI Functional enrichment of the obtained lists to identify key biological functions Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 4 PubMA_Exercise 5 PubMA_Exercise 6 Contents 1 Why we are not yet done 2 Functional enrichment Analysis of the RobiNA DE data 2 1 Preparing probe lists for enrichment testing 2 2 DAVID father of enrichment tools 2 3 Current web based Enrichment tools 2 3 1 Enrich 2 3 2 Webgestalt 3 Conclusion 4 download exercise files Why we are not yet done Once upon a time scientists worked for a lifetime on one or few genes they read all what came published about their favorite proteins and did not need any computer to help them follow up and understand published data This time now belongs to the past and modern biologists need to cope with publication frequency far higher than their reading speed happy or not they have to rely on computers for some of the tasks they used to do manually Analysis of MA data similarly to any other high throughput technology generates thousands of lines of results out of which hundreds are statistically significant It is therefore very unlikely that the
34. The RMA conversion is operated with user selected options leading to the activation of new menu items Read Unprocessed files Add new CEL files Write RME format Compute RMA measure Write Results to file log scale Write Results to file natural scale Export expression values Output Log file 82 Welcome to RMAExpress Written by B M Bolstad lt bmb bmbolstad com gt Version 1 1 0 http rmaexpress bmbolstad com Select CDF file CDF file RAE230A CDF in work TUTORIALS Analysis of public microarray datasets ref Select CEL files CEL files GSM160089 CEL CEL files GSM160090 CEL CEL files GSM160091 CEL CEL files GSM160092 CEL Mos CEL files GSM160093 CEL Yes CEL files GSM160094 CEL CEL files GSM160095 CEL CEL files GSM160096 CEL CEL files GSM160098 CEL er eee CEL files GSM160099 CEL CEL files GSM160100 CEL Quantile C None Reading in data Opening CDF and CEL files Done Reading in datafiles Method Median Polish Done writing binary output Choose Preprocessing Steps PLM _ Store Residuals Camel log tansformed results saved to for use with other programs Read Unprocessed files Add new CEL files Write RME format _ Compute RMA measure Write Results to file log scale Write Results to file natural scale Export expression values Output Log file Plot normalized d
35. at Rn 10015 1 6 65 11 58 0 22 0 35 30 56 3 73E 10 5 79E 08 Penk proenkephalin 5 21834402 21839358 51 1370773 Rn 54469 1 4 33 9 12 0 14 0 24 27 53 1 44 11 4 75 09 2 10 100911951 channel interacting protein 2 Kv channel interacting protein 2 lik chr1 270338004 2703602 52 1372539 at 3291 1 4 81 9 55 0 12 0 29 26 69 3 45 11 8 16 09 chr14 17884583 17886375 53 1370229 at 81250 1 6 45 1111 0 27 0 04 25 23 3 04E 11 7 60E 09 Ndrg4 NDRG family member 4 19 9751516 9762475 56 1389532 at 7963 1 5 17 9 62 0 11 0 18 21 89 2 05 12 1 16 09 chri7 85869857 85871678 _57 1371566_at Rn 15764 1 4 04 8 44 0 09 0 23 21 02 1 12E 11 4 03E 09 Fbxl22 F box and leucine rich repeat protein 22 chr8 71869413 71872379 59 1398243 Rn 11345 1 7 91 12 29 1 24 0 1 20 1 0 000044 0 000457 Csrp3 cysteine and glycine rich protein 3 cardiac LIM protein chri 105206458 1052171 60 1370061 at Rn 3788 1 5 64 9 92 0 13 0 34 19 45 4 51 10 6 50E 08 Rab3b RAB3B member RAS oncogene family chr5 132356507 1324088 79 1 2 1372195 at 43529 1 13 38 3 97 0 14 0 19 680 92 8 22E 15 1 86E 11 Tnnc2 troponin C type 2 fast chr3 167429205 1674302 Lio 1370971_at Rn 40497 1 12 48 3 55 0 17 0 13 490 32 7 11 15 1 86 11 Myhi 2 Myh8 myosin heavy chain 1 skeletal muscle adult myosin heavy chain 2 chr10 53514514 53517766 _ 8 1371247 at Rn 22504 1 13 27 4 42 0 08 0 1 461 19 0 00 00 0 00 00 Tnnt3
36. biologist evaluate each line and identify the proteins genes products that are significantly altered in expression and may be responsible for the biology under investigation The approach consisting in recognizing genes in the list and selecting them for validation may seem appropriate but will unlikely lead to any discoveries As the main need for publication is to find novelty this method is pretty much useless A better way to analyze and prioritize targets from a screen is to consider the biological functions and pathways that include are enriched in differentially expressed genes This can be done after adding ontology annotations to the data and using these added column to identify functions diseases pathways or any ontologies terms that are enriched in the DE set et as compared to the full set of genes measured by the platform Again this apparently straightforward statistical testing can be quite lengthy if you consider hundreds of available ontologies and hundreds to thousand genes to annotate Hypergeometric T test and Gene set enrichment analysis GSEA are the two mainly used statistical approaches to identify enrichment based on gene lists A number of standalone and Web tools implement these methods and is falls beyond the scope of this training to list them all or to argue for one or another We instead present a few alternative tools that will accept the data obtained in the former exercises and process it to find enriched ontolog
37. bits vib be pub trainingen AffyECTAC2014 GSE6943 EC mas5 qc PDF and PLIER method http data bits vib be pub trainingen AffyECTAC2014 GSE6943 EC plier qc PDF on our server Users are welcome to evaluate each QC plot by themselves using the data available on the server as input see link at the bottom of this page The Affymetrix Transcriptome Analysis Console TAC Importing EC data and defining Groups 71 affymetri x Transcriptome Analysis Console 2 0 New Analysis Open Existing Result Preferences 4m Gene Level Differential Expression Analysis Open File Current Directory hisan Documents Affy_data GSE6943_Affy analysis GSE6943_RM Up One Level s analysis Filename Files of type CHP Files _ Z Add File s Here Add File s Here Click to Create New Condition Each condition must have at least one file Analysis File psf Home Documents TAC AnalysisResults Analysis_2 tac Run Analysis Each group is in turn defined by moving CHP files to the appropriate group window This is done for Heart and for Diaphragm samples affymetrix Transcriptome Analysis Console 2 0 New Analysis Open Existing Result Preferences 4m Gene Level Differential Expression Analysis Remove Selected Parse File Names Show Grouped Files Array Type File Type File Path GSM160095 rma C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160095 rma chp GS
38. data files you want to include in your analysis by pressing the Add button The Info button will provide some details for each selected data file If you are working with a custom affy chip platform that is not yet supported by the bioconductor project you need to supply the appropriate CDF file upon data import Import CDF file roarray datasets ref RAE230A CDF 2 Imported files Volumes trainingen PUBMA2014 ex4 files GSE6943 CEL GSM160089 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160090 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160091 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160092 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160093 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160094 CEL Volumes trainingen PUBMA20 14 ex4 files GSE6943_CEL GSM160095 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160096 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160098 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160099 CEL Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160100 CEL Remove Performing QC on each CEL file When QC finishes a number of single plots can be reviewed by clicking on each section These figures report issues or demonstrate good quality of the data and are also saved to disk for later inclusion in your reports Overview of all QC Plots Colla
39. domains 3 calcium channel voltage dependent L type alpha 1C subunit four and a half domains 2 Current web based Enrichment tools We chose to show you only two recent tools in this section Many others are available and you are welcome to try them and share your experience back with us Enrich Enrichr 31 and its associated tool Lists2Networks use respectable number of sources to compute enrichment source list http amp pharm mssm edu Enrichr index html stats To learn more about Enrich please refer to their FAQ page http amp pharm mssm edu Enrichr help Enrichr uses a list of Entrez gene symbols as input Each symbol in the input must be on its own line You can upload the list by either selecting the text file that contains the list or just simply pasting in the list into the text box We will use the gene symbols extracted from the RobiNA file cleaned to remove duplicates and where double ID lines a probeset mapping to two distinct genes were expanded This input file is available on the BITS server link http data bits vib be pub trainingen PUBMA2014 ex5 files RobiNA DE genes_LFC2 FDR0 001 txt and its content can be used on the Enrich submission page http amp pharm mssm edu Enrichr index html We present below the top part of 5 randomly selected output bar charts please explore Enrich to get much more than only these The online version charts are interactive and provide much more info than these pictures
40. each TS we can visualize the actual expression data as a heatmap by using the corresponding plugin 90 Keyword contractile fiber myofibril contractile fiber part muscle protein MF00261 ACTIN BINDI MF00071 TRANSLATIO protein binding 00250 MUSCLE DEV 173 MUSCLE CON BP00286 CELL STRUC skeletal muscle develo striated muscle develo Nb probes Nb samples Nb genes 684 Pratt Q value BH 3 71998E 5 1 49866E 4 5 58187E 4 0 00129896 0 00407486 0 00607071 6 04472E 4 0 00104774 0 00238776 0 00377257 2 588565E 4 0 0019021 4 signatures 1 platforms 1 experiments 94879128C 94 7 94BB8DCA2 Plugins Heatmap Settings Ww first heatmap cut at 20 genes Signature Annotation Search Sort 4 a 1 PANTHER_TERM_BP Probes 13 0 Samples 15 0 RGD1311260 3 1 Seppi Keyword BP00005 GLYCOLYSIS Save heatmap Save annotation Export heatmap data Annotation Signature 94 Search Sort PANTHER_TERM_BP Probes 13 0 Samples 15 0 GSM160089 Title Diaphragm 1 Description D
41. in several publications eg 1 and is available for Windows AND for mac OSX from its developer site 2 manual is also available here http rmaexpress bmbolstad com RMAExpress_UsersGuide pdf The Affymetrix tools produce same kind data with more QC plots and preferred if you want to have close look to your data RMAeXpress run with the GSE6943 data In order to perform this exercise please first install the software from the following link http rmaexpress bmbolstad com 2 You will need the rat RAE230A CDF file accessible from the link at the bottom of this page Convert CEL data The user locates the CEL folder and the CDF file SF he 73 obviously all CEL files should come from the same CHIP Read Unprocessed files Output Log file 81 Welcome to RMAExpress Written by B M Bolstad lt bmb bmbolstad com gt Version 1 1 0 http rmaexpress bmbolstad com Select CDF file CDF file RAE230A CDF work TUTORIALS Analysis of public microarray datasets ref Select CEL files Please Select your CEL files CDE w s cca C FAVORITES 1 Dropbox GSM160089 CEL E All My Files GSM160090 CEL GSM160091 CEL GSM160092 CEL work GSM160093 CEL 7 biotools GSM160094 CEL SJ biod GSM160095 CEL iodata GSM160096 CEL 121 splaisan GSM160098 CEL usr i GSM160099 CEL GSM160100 CEL N open_terminal_here 1 git repos N Applications 2 Desktop
42. process Raw universal gene list size 15923 Used universal gene list size requiring annotation and one feature per gene only 8603 Raw subset gene list size 142 Used subset gene list size requiring annotation 73 Expression values used when filtering to one feature per gene Transformed expression values Applied filter to reduce features to one per gene true Filter applied to reduce features to one per gene used feature with highest IQR The next figure details the Top 30 hypergeometric results for the contrast Heart vs diaphragm glycogen metabolic process 27 7 1 7 1 84 09 51 5975 carbohydrate metabolic process 1128 711 6 0000104 m cellular calcium ion homeostasis 64 5 1 d 4 10000199 Eos cardiac musele contraction MM NEUE ME 7519 iskeletal muscle tissue development ae 000042 positive regulation of sequestering of triglyceride 5 d MS MM E 2m i 0 0007 32780 i negative regulation of ATPase activity iin ibe i ks RCS A On gt CUM DOG UM NET 0 001454 35814 iskeletalmusclecelldifferentiation D Se l JEROME ta 4002443 cardiac myofibril assembly 10 OMEN 3 0 003064 55008 icardiac muscle tissue morphogenesis 2 _ 0 2 10003064 43268 ipositive regulation of potassium ion transport 5 1 12121 0 2 0004445 heart contraction TA DE TM 0 1 2 10005224 10880 regulation of release of sequestered calcium
43. splaisan Desktop Robi ults GSE6943_CEL GSM160089 CEL vs home splaisan Desktop Robi ults GSE6943_CEL GSM160091 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM180089 CEL vs home splaisan Desktop Robi ults GSE6943_CEL GSM160092 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CELIGSM160089 CEL vs thome splaisan Desktop Robi ults GSE6943 CEL GSM160093 CEL Scatter plot of file Jhome splaisan Desktop Robi ults GSE6943 CEL GSM1800689 CEL vs thome splaisan Desktop Robi ults GSE6943 CEL GSM1600984 CEL Scatter plot of file Jhome splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs home splaisan Desktop Robi ults GSE6943 CEL GSM160095 CEL KIKIN ISA INI SIN Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs Quality check results Click in the list to open a fullsize view of the results Chips showing very poor PLM results may be excluded from further analyses by checking the Exclude box RNA Plot of 11 Affymetrix data files HIST Plot of 11 Affymetrix data files Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs Ihomejsplaisan Desktop Robi ults GSE6943 CEL GSM150090 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs Ihome splaisan Desktop Robi ults GSE6943 CEL GSM180091 CEL Sc
44. stephane plaisance Documents Affy data AffyEC lib 0 p bed Name Date modified Type Size 3prime IVT AFFX_README NetAffx CSV Files 2014 09 02 08 45 Text Document 5KB iads ATH1 121501 cdf 2014 10 10 03 35 File 39 496 places ATH1 121501 mas5_configuration 2006 08 11 10 56 MAS5_CONFIGUR 6 KB ta 1 ATH1 121501 psi 2014 10 10 03 35 PSI File 440 KB o Date ATH1 121501 splaisan_default report_controls 2014 10 10 03 35 REPORT CONTRO 8KB wnloads LI ATH1 121501 splaisen_default report thresholds 2014 10 10 03 35 REPORT THRESH 7KB You can change this folder via Edit in the top menu Click Set library path xpression Conso File Edit Report Graph Analysis Tools Export Window Change User Profile Set library path Set Internet Settings 3 Expression Report Controls Report Thresholds Probe Level Summarizations Report Options Create Annotation Merge File Transform Signals log v linear Step by Step RobiNA analysis workflow You can use RobiNA for quality assessment of your data normalization of your microarray data 40 detection of differentially expressed genes preparation of the data for an import into MapMan and or excel m generation of informative plots on your experiment Pre requisites are the CEL files containing each hybridization data obtained from the GEO page as a tar arc
45. 00E 12 3 43E 09 chri 89135621 89137093 37 1369706 at Rn 24079 1 9 83 4 27 0 16 0 29 47 28 1 69E 11 5 15E 09 Cacngi calcium channel voltage dependent gamma subunit 1 chr10 95655312 95668315 Conclusion The combination of the Affymetrix Expression and Transcription Analysis Consoles allows Windows PC users without any knowledge of R to perform standard analysis of Affymetrix microarray data and obtain differential expression tables that can be used for downstream biological interpretation Note that other more specific options and alternative analysis workflows are available with the same tools and that this tutorial is only an introduction with a selection of basic methods The main added value of these tools over R are the full range of QC plots generated and classically produced by bioinformatician experts as well as the very rapid processing of public Affymetrix CEL data within minutes We therefore recommend exploring the EC and TAC tools and associate them to IPA and other downstream tools allowing biological evaluation of public microarray data Youtube videos from the Affymetrix training team Please follow the video webcasts below to get familiar with the Affymetrix Expression Console and Transcription Analysis Console A series of YouTube videos can be found on the Affymetrix web site Expand download exercise files Download exercise files here Expand References 1 1 http www affymetrix com estore browse level seven
46. 01509 29155 114495 117558 C 80 0 5 E 1 76 R 2 83 rawP 0 0315 adjP 0 1496 G13 Signaling Pathway 3 366624 56781 83708 C 30 0 3 E 0 66 R 4 54 rawP 0 0277 adjP 0 1496 Glycolysis and Gluconeogenesis 3 25438 114508 25058 C 38 0 3 E 0 84 R 3 58 rawP 0 0507 adjP 0 1991 SIDS Susceptibility Pathways 4 29253 25665 60449 689560 C 64 0 4 E 1 41 R 2 83 rawP 0 0524 adjP 0 1991 Again many more tables can be generated in WebGestalt and you should choose the type of enrichment that fits your experimental needs Data can be saved back to disk for further use Conclusion More complete analysis can be performed by those few who can program in R Bioconductor As this session is aimed at Biologists this option is not further discussed Users can also consider using commercial packages like Ingenuity pathway Analysis that provide much more detailed and rich information than what free tools can offer download exercise files Download exercise files here Expand References 1 1 http david abcc ncifcrf gov 2 http www nature com nprot journal v4 n1 abs nprot 2008 211 html Da Wei Huang Brad T Sherman Richard A Lempicki Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources Nat Protoc 2009 4 1 44 57 PubMed 19131956 WORLDCAT DOI I p 3 http amp pharm mssm edu Enrichr Edward Y Chen Christopher M Tan Yan Kou Qiaonan Duan Zichen Wang Gabriela Vaz Meirelles Neil R Clark Avi Ma ayan Enri
47. 0ma chp Outside Bounds log e QC Array Metrics T ae QC Signal Distribution Add Intensity Files Run Analysis Add Summarization Fies Remove Refresh Attributes QC Array Comparisons Check All Uncheck Check Group gt Export Results Utilities Graphs tables are sorted by v and labeled by 02 09 2014 09 33 37 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160095 ma chp A 02 09 2014 09 33 37 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160096 ma chp 02 09 2014 09 33 37 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160098 ma chp 02 09 2014 09 33 37 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160099 ma chp 02 09 2014 09 33 37 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160100 ma chp 02 09 2014 09 33 37 Done opening files v Library path C Users splaisan Documents Affy_data AffyEC_lib User Profile splaisan_default Z Enr D 5 09 33 3 x B ND As seen above several samples are reported outside bounds by the RMA workflow It means that some control probe sets did not meet the quality requirements We looked it up and saw that the sample prep control probe sets targeting B subtilis genes dap thr phe and lys were not behaving as expected Dap RNA is added in h
48. 13521 i 4 96324665 Clear Filter 7 QQ QQ74 s TET filter on the adj P Val column with a maximum of 0 001 52 1 21E 17 2 01E 17 4 Ascending 4 Descending 2 01E 17 2 27E 17 2 52E 17 187E 16 2 32E 16 By color 3 39 16 5 27 16 6 26E 16 And Or 1 23E 15 1 34E 15 1 38E 15 1 38E 15 1 41 15 Select All 1 23 18 3 11E 15 i 4 20E 15 9g 90E 18 4 99 15 1 21 17 5 10 15 5 20 15 color Filter Less Than or Equal Choose One Clear Filter ee 60 fx 1371247 at B K D 2 1538135 2 15381345 4 77775773 11 983872 2 11497824 2 11497824 6 4997848 11 6641876 2 50771386 2 50771386 7 19402634 11 5893498 2 0120609 2 01206089 7 44612905 11 468981 2 07024751 2 07024751 8 34658697 11 0926596 2 17655819 2 17655819 8 31144022 11 0138358 2 40106034 2 40106034 5 55231151 10 9837808 2 6722469 2 67224686 6 99242881 10 920723 3 25780076 3 25780076 6 2035536 10 6956697 2 39907696 2 39907696 6 05944553 10 4667296 2 53756687 2 53756687 7 51689923 9 8247405 2 34032412 2 34032412 5 38095574 9 70894747 _ 2 55866279 2 55866279 6 1254086 9 68207457 2 63789108 2 63789108 5 61884298 9 33990911 2 3523361 2 35233615 4 86835957 9 092688 4 1096341 4 10963407 10 0033087 8 9794419 2 0626017 2 06260169 8 83099319 8 906937 2 62751169 2 62751169 5 63399946 8 62643898 2 02380876 2 02380876 6 26391359 8 30930684 2 73855152 2 73855152 7 22961663 7 93015146
49. 2 10 14 12 10 one for each sample 00 L09 NSS 13966009 WSD 739 86009 L NSS 739 960091 NS5 739 S6009 L ASS 780 P6009 NSS 6009 ASS 7190 26009 NSS H9 F6009LINS 739 06009L INS5 7139 680091 NS5 RNA degradation plot GSM160089 CEL GSM160090 CEL GSM160091 CEL GSM160092 CEL GSM160093 CEL GSM160094 CEL QGSM160095 CEL GSM160096 CEL GSM160098 CEL GSM160099 CEL GSM160100 CEL 90 GSM160089 CEL GSM160090 CEL GSM160091 CEL GSM160092 CEL GSM160093 CEL QSM160094 CEL GSM160095 CEL GSM160096 CEL GSM160098 CEL GSM160099 CEL GSM160100 CEL v0 60 Ayisuap aN co LO 00 0 GE 06 Ge 01 6 5 pue pejus Ajisuaju 0 16 14 12 10 10 5 lt gt 3 Probe Number log intensity 44 5 160090 14 12 10 8 10 5 160089 12 14 lt one for each pairwise sample comparison 172 45 Screeplot Principal Components Plot Cluster Dendrogram S z 8 v T GSM160089 CEL GSM160090 CEL A GSM160091 CEL GSM160092 CEL k GSM160093 CEL co GSM160094 CEL e o e v GSM160095 CEL S o 9 GSM160096 CEL 2 GSM160098 CEL gt GSM160099 CEL GSM160100 CEL N e e o T e 5
50. 2 1144882 2 1144882 6 85111303 7 4441184 2 39924695 2 39924695 5 87296475 7 13857143 2 66484111 2 66484111 8 64149113 7 09413281 2 13639985 2 13639985 10 2130565 6 85059104 3 4311758 3 43117575 7 16625921 6 0431623 3 4799808 3 47998084 5 75248374 5 2935557 2 07285108 2 07285108 5 76790947 5 91969991 2 00792833 2 00792833 6 8603389 5 33770041 44 GSE6943 DE Robina Normal View 301 of 15923 records found export probes and gene lists to text files 301 DE probes http data bits vib be pub trainingen PPUBMA2014 ex5 files RobiNA DE probes LFC2 FDRO 001 txt 238 DE genes unique http data bits vib be pub trainingen PPUBMA2014 ex5 files RobiNA DE genes LFC2 FDRO 001 txt all probes in the rat array http data bits vib be pub trainingen PUBMA2014 ex5 files RobiN A all probes txt DAVID father of enrichment tools The canonical web tool is DAVID DAVID has been around since 1997 and is stil very popular although its interface is quite outdated A recent nature Protocol paper will help you start using DAVID In order to use DAVID we need to spit our data in two groups genes probe IDs considered differentially expressed TEST the remaining of the genes probes present on the platform BACKGROUND note that the DAVID Background tab allows selecting the actual MA chip Rat Genome RAE230A Array 57 This division is very important to obtain good results and ensures that only functio
51. 3 Although former HowTo page Analyze_GEO_data_with_GEO2R is already present on the BITS Wiki we repeat the analysis here with the same dataset used in the CLC main workbench to be able to compare results of free and commercial solutions This work was published by van Lunteren E Spiegler 5 Moyer M 2 Full details about this dataset can be found on the http www ncbi nlm nih gov geo query acc cgi acc GSE6943 GEO page The GEO2R interface The initial window shows several TABs that will be reviewed in the remaining of this tutorial 11 Gene Expression Omnibus GEO Publications FAQ MIAME Email GEO NCBI GEO GEO2R GSE6943 Login Use GEO2R to compare two or more groups of Samples in order to identify genes that are differentially expressed across experimental conditions Results are presented as a table of genes ordered by significance Full instructions GEO accession 556943 Set Normal Heart vs Normal Diaphragm Samples Selected 0 out of 12 samples Columns Set Group Accession Title source name Young adult SD rat GSM160089 Diaphragm 1 Diaphragm GSM160080 Diaphragm 2 Diaphragm GSM160081 Diaphragm 3 Diaphragm GSM160092 Diaphragm 4 Diaphragm GSM160083 Diaphragm 5 Diaphragm GSM160084 Diaphragm 6 Diaphragm GSM160085 Heart 1 Heart left vent GSM160096 Heart 2 Heart left vent GSM160087 Heart 3 Heart left vent GSM160098 Heart 4 Heart left vent GSM160099 Heart 5 Heart le
52. 3 52 4 56 i 0 029922 0 088971 4 87 5 90 0 023069 0 073187 5 68 6 72 i 0 034185 0 098780 Nup37 nucleoporin 37 6 51 7 55 2 06 0 002813 0 014364 Rgs12 regulator of G protein sig 7 72 8 76 2 06 0 000015 0 000266 Tex264 testis expressed 264 6 89 7 93 2 06 0 000589 0 004211 Lpcat1 lysophosphatidylcholine 9 65 10 69 2 06 0 000001 0 000042 7 51 8 55 2 06 0 000588 0 004211 Jmjd8 jumonji domain containi 6 94 7 98 2 06 0 000008 0 000171 Pced1b PC esterase domain cont 6 32 7 37 2 06 0 000566 0 004095 6 97 8 01 2 06 9 07 08 0 000006 Timm22 translocase of inner mito 10 43 11 47 2 06 0 000267 0 002293 Lama2 laminin alpha 2 7 31 8 35 2 06 0 002312 0 012309 RGD1563888 similar to DNA segment 621 725 2 06 0 048207 0 127986 Epha4 Eph receptor A4 20481024 512 256 128 64 32 16 8 32 64 128 256 512 10242048 110 1E 11 9 607 01 esueoyiuDis Gene Rows 2002 Selected Rows 81 Selected 09 22 6 x Fe ND opa eoo Windows 8 1 gs 9 9 Ez E L E2 E a fymetr IX Analysis 1 tac RAE230A Analysis Result Summary Scatter Plot Volcano Plot Chromosome Summary Hierarchical Clustering Comparison Heart vs NAME Search Prev Next Show Hide Columns M Export M Bl Upin Heart vs Diaphragm 0 Down in Heart vs Diaphragm 171
53. 42897453 100 528072562373 5 87151716284703e 21 2 01180346646277e 17 37 3935367235781 11370971 at 8 88483556476391 8 44782914943697 99 9889507276205 6 31728778013807e 21 2 01180346646277e 17 37 3386444895643 11367896 at 8 17319442370653 8 32878048386055 97 7761545656647 8 56617301567765 21 2 27331954881059 17 37 1083132290366 i 11374248 at 7 92028193041167 9 06490481778715 95 9358621573261 1 10936083087999e 20 2 52347893001459e 17 36 9103955684498 i 1368093 at 8 61265104186292 8 23846334693981 81 9875192287634 9 40341381711377e 20 1 87163197762378e 16 35 195893580546 11388139 at 8 60278366808529 8 70065810953839 80 0047250421511 1 31181945592154e 19 2 32090013295985 16 34 9170369337463 Head of mean_rma_normalized_expression_values txt Collapse Identifier Heart Diaphragm 1367452 at 9 5293566110383 9 5889443412077 11367453 at 10 1758428259477 10 1726921462891 11367454 at 8 87231561088045 8 86593395245015 i 11367455 at 10 4751392501094 10 3325900097593 11367456 at 10 7698324122072 10 286304116399 i 11367457 at 8 75165515455833 8 73159768841635 11367458 at 7 54000953569876 7 71781966096578 11367459 at 11 0420305904582 11 1519497142209 11367460 9 90893064678104 9 5235636384544 Head raw_rma_normalized_expression_values txt Collapse Identifier GSM160089 CEL GSM160090 CEL GSM160091 CEL GSM160092 CEL GSM160093 CEL GSM160094 CEL GSM160095 CEL GSM160096 CEL Gs 11367452 at 9 64295883656058 9 65178912787566 9 32863584112902 9
54. 60095 GSM160096 GSM160097 GSM160098 GSM160099 GSM160100 Gene title Gene symbol Gene ID UniGene title UniGene symbol UniGene ID Nucleotide Title GI GenBank Accession Platform CLONEID Platform_ORF Platform_SPOTID Chromosome location Chromosome annotation G0 Function GO Process G0 Component GO Function ID GO Process ID G0 Component ID tissue diaphragm diaphragm diaphragm diaphragm diaphragm diaphragm heart heart heart heart heart heart 1367452 at 2532 9 2518 6 2384 6 2304 2360 2482 8 3166 2938 9 2953 3 2558 8 3043 3 2711 5 SMT3 suppressor of mif two 3 homolog 2 S cerevisiae Sumo2 690244 Rattus norvegicus SMT3 suppressor of mif two 3 homolog 2 S cerevisiae Sumo2 mRNA 210147495 NM 133594 10432 3 Chromosome 10 005109 3 104184388 104195497 SUMO ligase activity protein binding ubiquitin protein ligase binding ubiquitin protein ligase binding cellular protein localization negative regulation of transcription DNA dependent positive regulation of proteasomal ubiquitin dependent protein catabolic process positive regulation of transcription from RNA polymerase II promoter protein sumoylation protein sunoylatien protein sumoy lation PML body PML body nucleus G0 0019789 G0 0005515 G0 0031625 G0 003162 GO 0034613 G0 0045892 60 0032436 60 0045944 60 0016925 60 0016925 60 0016925 GO 0016605 G0 0016605 G0 0005634 1367456 at 6090 8 5352 2 5614 9 5249 6 5834 6 5915 9 3995 3 4356 9 46
55. 71517e 21 317288e 21 566173e 21 adj 1 2059 2 0118 2 0118 1 230329e 18 9 901121e 18 2 273320e 17 detailed results sep P Val B 40 30280 38 51476 38 08932 37 39354 37 33864 37 10831 68e 17 03e 17 03e 17 troponin T type 3 skeletal Gene title Gene symbol fast troponin T type 1 skeletal slow myosin light chain phosphorylatable fast skeletal muscle troponin C type 2 fast carbonic anhydrase 3 50 Tnnt3 1 Mylpf Tnnc2 2 skeletal muscle adult myosin heavy chain 1 skeletal muscle adult Myh2 Myhl Car3 12961 171409 111906 24584 4744 296369 13520 691644 287408 445 54232 v save to new file outfile lt paste robina folder GSE6943 DE Robina txt sep write table data file outfile row names F sep t quote FALSE colnames data 1 ID logFC AveExpr EU 5 P Value adj P Val B Gene title i 9 Gene symbol Gene ID UniGene title UniGene symbol i 13 UniGene ID Nucleotide Title GI GenBank Accession 17 Platform CLONEID Platform ORF Platform SPOTID Chromosome location 21 Chromosome annotation GO Function GO Process GO Component 25 GO Function ID GO Process ID GO Component ID Conclusion RobiNA is a wrapper for R code developed and standardized by the authors to run reproducibly and do as they are expected Although simple in layout RobiNA appears as a quite performant alternativ
56. 75 1 4570 4 4994 8 4231 6 ubiquitin conjugating enzyme E2D 3 Ube2d3 81920 Rattus oe ubiquitin conjugating enzyme E2D 3 Ube2d3 mRNA 13676842 NM 031237 Chromosome 2 NC 005101 3 259146589 259174396 ATP binding acid amino acid ligase activity ubiquitin protein ligase activity ubiquitin protein ligase activity ubiquitin protein ligase activity dependent protein catabolic process proteasomal ubiquitin dependent protein catabolic process protein K11 linked ubiguitination protein K11 linked ubiguitination protein K48 linked ubiguitination protein K48 linked ubiguitination protein monoubiguitination protein polyubiguitination protein polyubiguitination protein ubiguitinatien ubiquitin dependent protein catabolic process endosome membrane plasma membrane GO 0005524 G0 881 60 0004842 60 0004842 60 0004842 G0 0006281 G0 0006915 G0 0043161 G0 0043161 G0 0070979 G0 0070979 G0 0070936 G0 0070936 G0 0006513 G0 0000209 G0 0000209 G0 0016567 G0 0006511 G0 0010008 G0 0005886 1367459 at 7665 8 7415 9 7075 9 7349 4 6406 7 6664 6 10400 5 9729 2 9679 2 9996 8 9783 7 8333 ADP fibosylation factor 1 Arfi 64310 10922 Chromosome 10 NC 005109 3 45319018 45334501 complement GTP binding protein transport small GTPase mediated ciansal trancductinn fwuvacinclaocmadiatad tranenart alni annaratiuc finarinucleasr ranian cutanisem Profile pathways Search pathways enr
57. 9 GSM160090 GSM160091 GSM160092 GSM160093 160094 GSM160094 GSM160095 GSM160096 160096 5 160097 60097 5 160098 GSM160099 GSM160100 The comparison is based on the selected test method The choice of the tail s allow finding genes less more or simply differentially expressed Step 1 Select test and significance level Two tailed t test Avs B Significance level 0 010 Step 2 Select which Samples to put in Group A and Group B Step 3 Query Group A vs B 28 The result is a very long list of DE genes with barplot view on the right confirming the expression difference but not really useful as is Display Settings Summary 20 per page Sorted by Default order Send to Results 1 to 20 of 4409 1 04221 Next Last gt gt 1 Sumo2 Heart left ventricle and diaphragm comparison 1 Annotation SMT3 suppressor of mif two 3 homolog 2 S cerevisiae Organism Rattus norvegicus Reporter GPL341 1367452 at ID REF 5053224 690244 Gene ID NM 133594 DataSet type Expression profiling by array count 12 samples ID 51748101 GEO DataSets Gene UniGene Homologene neighbors O Ube2d3 Heart left ventricle and diaphragm comparison 2 Annotation Ube2d3 ubiquitin conjugating enzyme E2D 3 Organism Rattus norvegicus Reporter GPL341 1367456 at ID REF 5053224 81920 Gene ID NM 031237 DataSet type Exp
58. 9 2 9996 8 9783 1367460_at Gdi2 3155 7 2946 9 3589 7 3487 4 3131 5 3338 3198 2991 3 2781 2754 1 2406 pm Walking through the tools Several built in tools are ready to serve you on demand We provide below a rapid overviews of the results obtained by each tool Find Genes Allows user interested by a particular gene to get its expression values across samples This simple feature is of limited interest for most applications Data Analysis Tools Find genes Find gene name or symbol Compare 2 sets of samples Find genes that are up down Cluster heatmaps for this condition s amp tissue Experiment design and distribution Compare two sets of samples This allows fine tuning the system and select two groups of samples that are compared using T test statistics Data Analysis Tools Find genes gt Step 1 Select test and significance level Compare 2 sets of samples 2 Significance level 0 100 Cluster heatmaps One tailed t test A gt B One tailed t test A B Value means difference Rank means difference Step 3 Query Group A vs 8 les to put in Group A and Group B Experiment design and value distribution Two tails comparison The groups are defined using the mouse Ok Click on accessions to select samples individually click on colored blocks and then on blinking arrows to select groups of samples Reset Cancel GSM16008
59. CAT I p 5 gt http www vib be en training research training courses Pages Analysis of public microarray data sets aspx http www vib be en training research training courses Pages Introduction to A ffymetrix microarray analysis aspx http www vib be en training research training courses Pages Analysis of public microarray data using Genevestigator aspx https www bits vib be index php training 177 microarray bioconductor https www bits vib be index php training 125 genevestigator http genepattern org http www broadinstitute org cancer software GENE E http www broadinstitute org gsea http tagc univ mrs fr tbrowser 99 CON en de Cyrille Lepoivre Aur lie Bergon Fabrice Lopez Narayanan B Perumal Catherine Nguyen Jean Imbert Denis Puthier TranscriptomeBrowser 3 0 introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks BMC Bioinformatics 2012 13 19 PubMed 22292669 WORLDCAT DOI I e Fabrice Lopez Julien Textoris Aur lie Bergon Gilles Didier Elisabeth Remy Samuel Granjeaud Jean Imbert Catherine Nguyen Denis Puthier TranscriptomeBrowser a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database PLoS ONE 2008 3 12 e4001 PubMed 19104654 WORLDCAT DOI I p 9 https insilicodb com InsilicoDB Jonatan Taminau S
60. EL GSM160095 CEL GSM160096 CEL GSM160098 CEL GSM160099 CEL GSM160100 CEL 46 06 S 9 S lt I lt S lt lt lt wasaypas Somat ias fene inawa reck Gew 02 09 2014 09 32 29 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160095 CEL 02 09 2014 09 32 29 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160096 CEL 02 09 2014 09 32 29 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160098 CEL 02 09 2014 09 32 29 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160099 CEL 02 09 2014 09 32 29 Opening C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160100 CEL 02 09 2014 09 32 29 Done opening files v Library path C Users splaisan Documents Affy_data AffyEC_lib User Profile splaisan_default 09 32 2 x ND Toolbox Report Controls Configuration 1 Specify user profile 2 Select library path 3 Download library files 4 Download annotation files 5 Specify report controls The choice of the right method to apply for normalization is not detailed here please refer to the BITS microarray training session and material for more information about this topic Introduction to Affymetrix Microarray Anaysis https www
61. M160096 rma C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160096 rma chp GSM160098 rma C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160098 rma chp GSM160099 rma rma C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160099 rma chp GSM160100 rma C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160100 rma chp Heart GSM160089 rma GSM160090 rma GSM160091 rma GSM160092 rma Add File s Here GSM160093 rma Click to Create New Condition GSM160094 rma Each condition must have at least one file Analysis File C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSE6943_CAT RMA tac Run Analysis 72 Transcriptome Analysis Console 2 0 affymetrix New Analysis Open Existing Result Preferences 4m Gene Level Differential Expression Analysis Import Remove Selected Parse File Names _ Show Grouped Files I Array Type File Type File Path Heart GSM160089 rma GSM160090 rma GSM160091 rma GSM160092 rma Diaphragm GSM160095 rma GSM160096 rma GSM160098 rma GSM160099 rma GSM160093 rma GSM160100 rma GSM160094 rma Click to Create New Condition Analysis File C Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSE6943_CAT RMA tac Browse Run Analysis Computing gene level Differential expression
62. Sequence neighbors Homologene neighbors a The pathways are now very specific to heart biology as expected FLink Frequency weighted Links ABOUT HOWTO HELP FAQ NEWS PUBLICATIONS DISCOVER Links from geoprofiles records to biosystems records weighted by frequency click to details v _ Clear Selections Show Download CSV Summary BSID Type Organism 1010675 Metabolisrn organism specific biosystem Rattus norvegicus 1010015 Signal Transduction organism specific biosystem Rattus norvegicus 909656 Adrenergic signaling in conserved biosystem cardiomyocytes 908278 Adrenergic signaling organism specific biosystem Rattus norvegicus cardiomyocytes 198436 WikiPathways Striated Muscle Contraction organism specific biosystem Rattus norvegicus 117251 KEGG Arrhythmogenic right ventricular organism specific biosystem Rattus norvegicus cardiomyopathy 115128 KEGG Arrhythmogenic right ventricular conserved biosystem cardiomyopathy 83442 Calcium signaling pathway organism specific biosystem Rattus norvegicus 458 Calcium signaling pathway conserved biosystem 1009963 Transmembrane transport of organism specific biosystem Rattus norvegicus small molecules r r Pi 3 3 mi 4 Page 1 of 56 b b 10 Y Displaying BioSystems Records 1 10 af 551 Cluster heatmaps Two options are presented in this part the hierarchical clustering and the KMean approaches Hiera
63. The Broad Institute http www broadinstitute org has developped a number of tools including GenePattern Gene E 6 without forgetting the famous GSEA platform Ul DAVID http david abcc ncifcrf gov BITS WIKI http wiki bits vib be index php Exercises on Gene Ontology Enrichr http amp pharm mssm edu Enrichr webgestalt http bioinfo vanderbilt edu webgestalt TranscriptomeBrowser 8 works as standalone on your computer and look very impressive when in good hands Please follow the webcast http tagc univ mrs fr tbrowser index php option com_content amp task view amp id 35 amp Itemid 28 Please feel free to discover these other ones with more Plant dedicated resoures than above PlantGSEA http structuralbiology cau edu cn PlantGSEA analysis php MapMan http mapman gabipd org web guest mapman download ToppGene http toppgene cchmc org prioritization jsp gProfiler http biit cs ut ee gprofiler PLAZA http bioinformatics psb ugent be plaza developed at VIB http www biomart org biomart martview QuickGO http www ebi ac uk ego http www arabidopsis org tools bulk index jsp BAR http bar utoronto ca welcome htm BioCyc http biocyc org gene search shtml BioCyc Ath gt pathways http biocyc org ARA class tree object Pathways Meta Analysis Resources InsilicoDB 9 offers similar services by linking the data to the Broad data mining tools GenePattern amp Gene E Please ref
64. WORLDCAT DOI I p 10 1 http omictools com Vincent J Henry Anita E Bandrowski Anne Sophie Pepin Bruno J Gonzalez Arnaud Desfeux OMICtools an informative directory for multi omic data analysis Database Oxford 2014 2014 PubMed 25024350 4WORLDCAT DOI I e Main Page Retrieved from http stelap local BioWareWIKT index php title Hands on Analysis of public microarray datasets amp oldid 11837 Categories Training HandsOn PUBMA2014 a This page was last modified on 21 October 2014 at 10 39 This page has been accessed 289 times a Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted PubMA Exercise 1 From BioWareWIKI Search GEO to find public datasets related to one s project Gene Expression Omnibus Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 2 Contents Introduction 2 Find GEO datasets relevant for your Biological question 3 Get information about GSE6943 used in this session 4 download exercise files Introduction The Gene Expression Omnibus GEO is a public repository that archives and freely distributes microarray next generation sequencing and other forms of high throughput functional genomic data submitted by the scientific community In addition to data storage a collection of web based interfaces and applications are available to help users query and download the studie
65. a preprocessor Version 1 2 4_build656 Reading mapping files 68 LN c RobiNA was developed for plants and can use annotation files from the website to add gene descriptions to the differential expression table For non plant data users will need to add these annotations using external tools 47 RobiNA The transcriptomics data preprocessor Version 1 2 4_build656 Design your experiment You can arrange the groups by dragging them around Define which groups shall be compared by holding down the CONTROL key and then click dragging from the first group to the second group Right click and choose delete to delete connections To combine several groups into one metagroup select all groups you want to combine by left clicking and drawing a selection rectangle around them and click Create Metagroup Annotate the results Using the functional annotation data provided by the MapMen project you can annotate your raw result files Please double click the species you are working with and select a matching mapping file Installed Mappings gt Arabidopsis thaliana thale cress gt Arabidopsis thaliana gt Glycine max soybean gt unknown species gt Medicago truncatula gt Medicago truncatula barrel medic gt Oryza sativa rice gt Populus trichocarpa Show expert settings r Expert settings Normalisation rma il gt 21
66. agm 5 1 GSM160095 rma chp 2 GSM160096 rma chp 3 GSM160098 rma chp 4 GSM160099 rma chp 5 GSM160100 rma chp Adjusting Differential expression limits The filtering values can be adapted by the user to restrain or increase the DE gene list and new plots generated Adjusting the differential expression limit Fold Change ANOVA p FDR p value linear Heart value Heart vs Heart vs vs Diaphragm Diaphragm Diaphragm 1 86E 11 V 0 00E00 0 00E00 Ti Fold Change linear Heart vs Diaphragm X Or O And P a nar antes Adjusting the limit for the adjusted p value 74 FDR p value value Heart vs Heart vs Diaphragm gt Diaphragm Gene Symbol 1 86 11 2 7 11 15 1 86E 11 Myh1 0 00E00 0 00E00 Tnnt3 Itropon 2 09E 13 2 28E 10 Actn3 lactinin QLA ANOVA p value Heart vs Diaphragm Or O And 05 1 Cancel Plots based on the filtered differential expression table Additional graphs can be obtained to view the data from different angles The scatter plot highlights potential differences between UR and DR genes between the groups The graphs are interactive and the user can query the full data to find which probesets or genes are UR or DR using the mouse and selecting area around points x x x x M ax 7 x x xX X x
67. aling 1 84E 12 2 37E 03 3 55E 11 3 96E 03 1 42E 10 4 62E 03 1 42E 10 4 62E 03 6 36E 09 2 91E 03 108 135 149 172 61 Physiological System Development and Function Much more information and tools are available in IPA to continue the analysis and identify markers and targets for validation If you are a VIB scientist with 2 5 interested VIB colleagues you may ask for a free custom 1 training provided by BITS inside your lab Please email us to discuss the possibilities download exercise files Download exercise files here Expand References Main_Page Hands on Analysis of public microarray datasets PubMA_Exercise 6 Retrieved from http stelap local BioWareWIKI index php title PubMA_Exercise 7 amp oldid 11836 Category PUBMA2014 This page was last modified on 21 October 2014 at 10 37 This page has been accessed 2 times Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 94
68. aphragm 1 5 160093 Value for GSM160093 Diaphragm 5 src Diaphragm V GSM160094 Value for GSM160094 Diaphragm 6 src Diaphragm VGSM160095 Value for GSM160095 Heart 1 src Heart left vent VGSM160096 Value for GSM160096 Heart 2 src Heart left vent VGSM160097 Value for GSM160097 Heart 3 src Heart left vent VGSM160098 Value for GSM160098 Heart 4 src Heart left vent VGSM160099 Value for GSM160099 Heart 5 src Heart left vent 65 160100 Value for GSM160100 Heart 6 src Heart left vent head GDS3224 data txt column t REF IDENTIFIER GSM160089 GSM160090 GSM160091 GSM160092 11367452 at Sumo2 2532 9 2518 6 2384 6 2304 11367453 at Cdc37 3464 2 3197 4 3487 1 3133 2 11367454 at Copb2 1620 8 1870 5 1538 6 1334 11367455 at Vcp 5512 5 4103 9 5746 5 4393 6 11367456 at Ube2d3 6090 8 5352 2 5614 9 5249 6 GSM160093 GSM160094 GSM160095 2360 2482 8 3166 3432 5 3486 9 3860 2 1502 9 1520 3 1849 8 5870 7 5851 2 5408 8 5834 6 5915 9 3995 3 27 GSM160096 2938 3429 1852 4682 4356 9 4 6 GSM160097 2953 3381 1858 4734 4675 3 1 QI N Collapse GSM160098 5 16 2558 8 3043 4131 3 4364 1483 3 1766 4403 7 3940 4570 4 4994 11367457 at Becnl 1093 9 1134 3 736 4 1219 774 9 712 2 892 1 998 9 782 3 710 8 923 2 11367458 at Lypla2 347 8 223 9 261 4 338 8 249 6 363 7 422 2 409 9 273 3 492 2 458 1367459_at Arfl 7665 8 7415 9 7075 9 7349 4 6406 7 6664 6 10400 5 9729 2 967
69. ata A number of QC plots are produced to Inspect the normalized data 83 log2 PM by array for raw data 716008450 001090955 6200911459 26009 WSS 5 60091959 04590 6000 1459 9600911499 6600911959 Density plots of 1092 PM by array Convert data with the Convertor tool 84 The companion tool RMA convertor allows direct batch conversion without QC plots RMAExpress Data Convertor CEL CDF conversion PGF CLF conversion mm mid CEL File Directory work TUTORIALS Analysis_of_puk Browse eee oe 5 57 2 s I7 ry CDF file work TUTORIALS Analysis of put Browse Arrays in Buffer J 30 Restrict File Browse Probes in Buffer ITT 25000 Force Temporary File Location tmp Choose Dir Output directory work TUTORIALS Analysis_of_puk Browse About Preferences Convert Final result The resulting data for the GSE experiment is shown below top 5 lines by first 5 columns for readability Probesets GSM160089 CEL GSM160090 CEL GSM160091 CEL GSM160092 CEL 11367453 at 10 155364 10 249902 10 186186 10 096563 i 11367452 at 9 642959 9 651789 9 328636 9 484756 11367454 at 8 739058 9 082530 9 071610 8 687629 11367455 at 10 476362 10 447124 10 571050 10 326480 download exercise files Download exercise files here Expand References 1
70. atmaps derived from manual curation of some of the GEO datasets selected by the NCBI team This processing is otherwise laborious and requires R skills that not every scientist can build More info about this toolset can be found on the NCBI help page http www ncbi nlm nih gov geo info datasets html 7 The web interface allows finding co expressed genes in bi clusters that have been shown to often belong to common signaling pathways or result from transcriptional co regulation by common key regulators TFs or pathways The GEO Dataset browser The remaining of this page shows key view obtained by navigating in the GEO Dataset browser Web interface Start the tool Start by locating the GDB link in the dataset information page found on the http www ncbi nlm nih gov bioproject PRJNA98125 GEO BioProject page or search for the GDSIDF if you know it 25 Display Settings Send to Normal Heart vs Normal Diaphragm Norway rat Accession PRJNA98125 10 98125 Comparison of gene expression of heart left vent and diaphragm of normal Sprague Dawley rats young adult Keywords Cell type comparison Overall design 6 diaphragm samples 6 heart samples Project Data Type Transcriptome or Gene expression Attributes Scope Multiisolate Material Transcriptome Capture Whole Method type Array 1650 additional projects are related Other Accessions GEO GSE6943 by organism Relevance Model Organism Project Data Resource Name N
71. atter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM180089 CEL vs home splaisan Desktop Robi ults GSE6943_CEL GSM160092 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs Ihome splaisan Desktop Robi ults GSEB943 CEL GSM180093 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943_CEL GSM160089 CEL vs Ihome splaisan Desktop Robi ults GSE6943 CEL G5M180094 CEL X OX UNS Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs home splaisan Desktop Robi ults GSE6943 CEL GSM160095 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM180089 CEL vs a Previous Next a Previous Next Step 3 of 4 Step 3 of 4 Quality check results Click in the list to open a fullsize view of the results Chips showing very poor PLM results may be excluded from further analyses by checking the Exclude box Scatter plot of file shome splaisan Desktop Robi ults GSE6943_CEL GSM160089 CEL vs Ihomej splaisaniDesktop Robi ults GSE6943 CEL GSM160090 CEL Scatter plot of file shome splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs Ihome splaisan Desktop Robi ults GSE6943 CEL G5M1860091 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs thome splai
72. bits vib be index php bits search results searchword Intro 20A ffymetrix amp searchphrase all The normalization method is selected from a pop down menu Available Analyses The process takes some time and leads to a summary page and saves new files to the disk with extension chp containing the normalized data one for each imported CEL file The chp files are ready for import in the TAC tool 70 eoo Windows 8 1 E P b E Expression Console Affymetrix Study B x File Edit Report Graph Analysis Tools Export Window Help 35 B X Updates Available File Threshold Test S Probe Cell Intensity Data L GSM160089CEL GSM160090 CEL ae GSM160091 CEL 2 Open Existing Study GSM160092 CEL 3 Add Intensity Files GSM160093 CEL 4 Add Summarization Files GSM160094 CEL 5 Save Study GSM160095 CEL Close Study GSM160096 CEL GSM160098 CEL GSM160099 CEL GSM160100 CEL RMA Group 1 GSM160089 ma chp Within Bounds 1092 GSM160090 ma chp Within Bounds 092 GSM160091 ma chp Within Bounds 092 GSM160092 ma chp Within Bounds 1092 GSM160093 ma chp Within Bounds 1092 GSM160094 ma chp Within Bounds 1092 65 160095 Outside Bounds 092 65 160096 Outside Bounds 092 GSM160098 ma chp Within Bounds 1092 65 160099 Outside Bounds log2 GSM16010
73. chr interactive and collaborative HTMLS gene list enrichment analysis tool BMC Blioinformatics 2013 14 128 PubMed 23586463 4WORLDCAT DOI I e Alexander Lachmann Avi Ma ayan Lists2Networks integrated analysis of gene protein lists BMC Bioinformatics 2010 11 87 PubMed 20152038 WORLDCAT DOT I e 5 http bioinfo vanderbilt edu webgestalt Stefan Kirov Ruiru Ji Jing Wang Bing Zhang Functional annotation of differentially regulated gene set using WebGestalt a gene set predictive of response to ipilimumab in tumor biopsies Methods Mol Biol 2014 1101 31 42 PubMed 24233776 WORLDCAT DOT I p Jing Wang Dexter Duncan Zhiao Shi Bing Zhang WEB based GEne SeT AnaLysis Toolkit WebGestalt update 2013 Nucleic Acids Res 2013 41 Web Server issue W77 83 PubMed 23703215 WORLDCAT DOI I p Bing Zhang Stefan Kirov Jay Snoddy WebGestalt an integrated system for exploring gene sets in various biological contexts Nucleic Acids Res 2005 33 Web Server issue W741 8 PubMed 15980575 4WORLDCAT DOT I p http bioinfo vanderbilt edu webgestalt WebGestalt manual 2013 04 12 pdf porn Main Page Hands on Analysis of public microarray datasets _ 1 4 PubMA_Exercise 5 PuUDMA Exercise 6 Retrieved from http stelap local BioWareWIKTU index php titleZPubMA Exercise 5 amp oldid 10960 Category PUBMA2014 This page was last modified
74. d genes Upregulated genes Heart Diaphragm Heart Diaphragm 14876 15480 Downregulated genes n Heart Diaphragm 15319 In addition to the plots several text tables saved in detailed_results and sampled below Analysis summary Collapse Robin affymetrix data analysis summary RobiNA results V 8 18 2014 16 35 28 f Input files 49 if Normalization settings for quality control normalization method rma P value correction method BH analysis strategy Limma Normalization settings for main analysis 1 mormalization method P value correction method BH Multiple testing strategy nestedF P value cut off value for significant differential expression 0 05 iGenes that showed a log2 fold change smaller than two ignored yes uff The analysis produced the following warnings Head full_table_Heart Diaphragm txt Collapse ID logFC AveExpr t P Value adj P Val B 11371247 at 8 8690486752098 9 2066061992338 138 180384786338 7 72673860255015e 23 1 23032858768406 18 40 3028038935592 1370412 at 8 00700839973324 8 66636177255196 112 671243745562 1 24362501567467e 21 9 90112056229389e 18 38 5147635203019 11387787 at 8 51384348690296 10 0136513602682 107 791218026326 2 27212516655303e 21 1 2059683009008e 17 38 0893208171574 11372195 at 9 44110458586408 9 10616
75. e used feature with highest results of the hypergeometric test for Pathways Hyper geometric test for association of annotation categories to a sublist of a larger gene list Aug 13 14 49 36 CEST 2014 Version CLC Main Workbench 7 0 3 User splaisan Parameters Gene identifier column used in tests Gene symbol Annotation column used in tests GO molecular function Raw universal gene list size 15923 Used universal gene list size requiring annotation and one feature per gene only 8969 Raw subset qene list size 142 Used subset gene list size requiring annotation 74 Expression values used when filtering to one feature per gene Transformed expression values Applied filter to reduce features to one per gene true Filter applied to reduce features to one per gene used feature with highest IQR GSEA The GSEA method does not require partitioning the data as for the hypergeometric test it takes the full table and considers the relative ranking of gene list members in relation to the individual gene expression levels in the data m Settings for the GSEA test for GO BP Gene set enrichment analysis Wed Aug 13 12 42 16 CEST 2014 Version CLC Main Workbench 7 0 3 User splaisan Parameters Gene identifier column used in tests Gene symbol Annotation column used in tests GO biological process The features were ranked on t statistic group comparison Heart Diaphragm Raw universal gene li
76. e 07 I 682930 24837 29658 29275 29248 84396 29556 25399 117557 24239 689560 C 70 O 12 E 1 54 R 7 78 rawP 3 39e Cardiac muscle contraction 12 116600 08 adjP 6 10e 07 Tight junction 11 83807 171009 24584 289759 691644 85420 29556 360543 81755 287408 307505 C 101 O 11 E 2 23 R 4 94 rawP 1 30e 05 adjP 0 0002 Vascular smooth muscle contraction 10 682930 81636 85420 362039 58965 24239 29354 64532 24173 117558 Vobis a aa qi 682930 81636 116601 296369 64561 24239 689560 29353 24245 24173 117558 C 162 0 13 E 3 57 R 3 64 rawP 5 87e signaling pathway 5 64672 114207 05 adjP 0 0006 GnRH signaling pathway 9 682930 81636 60352 362039 24239 29354 64532 114495 24245 C 62 0 9 E 1 81 R 4 98 rawP 7 60e 05 adjP 0 0007 Pancreatic secretion 9 54242 81779 81636 29354 689560 64532 116601 84396 362039 C 85 0 9 E 1 87 R 4 80 rawP 0 0001 adjP 0 0008 Insulin signaling pathway 8 25058 64561 50671 114508 689995 114203 29353 25739 C 113 0 8 E 2 49 R 3 21 rawP 0 0035 adjP 0 0252 Wiki pathways 59 WEB based GEne SeT AnaLysis Toolkit Translating gene lists into biological insights WebGestalt User data and parameters User data textAreaUpload txt Organism rnorvegicus Id Type affy_rae230a Ref Set affy_rae230a Significance Level Top10 Statistics Test Hypergeometric MTC BH Minimum 2 This table lists the enriched Wikipathways number of Entrez IDs in your user data set for the pathway the corresponding Entrez IDs and the
77. e GEO accession number GSExxx Loading your own Affymetrix microarray data into the Workbench The Workbench assumes that expression values are given at the gene probe set level thus probe level analysis of Affymetrix arrays and import of Affymetrix CEL and CDF files is not supported However you can import your own Affymetrix data via two ways m as CHP files generated by Affymetrix Expression Console containing normalized Affymetrix data See the section on how to convert CEL files to CHP files using the Expression Console http wiki bits vib be index php Analyze GEO data with the Affymetrix softwareZConverting data to format required for for a detailed discussion on how to do this Use RMA for the normalisation m as txt files exported from R containing normalized Affymetrix data 62 Expand this section to see how you can the normalization in R hide To create these txt files open R http www r project org RStudio http www rstudio com as administrator and install the following packages Matrix lattice f drtool rpart File Edit View Misc Windows Help Vignettes Load package E2 R Console Set CRAN mirror Select repositories Install package s R version 3 1 1 Update packages Copyright C 2 P 9 Platform x86 6 Install package s from local zip files Then run the following code Install all required Bioconductor packages iso
78. e GEO2R script and adds gene symbols and additional annotations to the RobINA table R code to annotate the RobiNA data Collapse Add annotations to the RobiNA full table d the code below is borrowed to the GEO2R code and adapted library GEOquery make sure that the surrent directory is set to folder enclosing the RobiNA results folder base getwd load RobiNA full table in a data frame robina folder lt paste base robina data lt paste robina folder full table Heart Diaphragm txt wobina full lt read delim robina data as is RobiNA results load NCBI platform annotation gpl lt GPL341 platf lt getGEO gpl AnnotGPL TRUE destdir TRUE base mcbifd lt data frame attr dataTable platf table v replace original platform annotation data lt merge robina full ncbifd by ID data lt data order dataSP Value restore correct order preview first 10 columns head data c 1 10 gt head data c 1 10 ID logFC AveExpr t 13796 1371247_at 8 869049 9 206606 138 18038 12961 1370412 at 8 007008 8 666362 112 67124 111906 1387787 at 8 513843 10 013651 107 79122 i4 744 1372195 at 9 441105 9 106164 100 52807 13520 1370971 at 8 884836 8 447829 99 98895 445 1367896 at 8 173194 8 328780 97 77615 8 13796 12961 111906 14744 3520 myosin heavy chain 445 i Gene ID 13796 24838 7 1 2 5 6 P Value 726739e 23 243625e 21 272125e 21 8
79. e and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism Respir Physiol DataSet Record GDS3224 Expression Profiles Data Analysis Tools Sample Subsets Go Gene Expression Omnibus Cluster Analysis Download DataSet full SOFT file DataSet SOFT file Series family SOFT file Series family MINIML file Annotation SOFT file Getting the full table of expression values can be important to extract multiple gene values or any other aim you could have in mind Tou can get the full dataset from the Download item on the right of the window DataSet Record GDS3224 Expression Profiles Data Analysis Tools Sample Subsets Title Heart left ventricle and diaphragm comparison Summary transcriptional strategies for ensuring long term energy supplies in these two muscles Organism Rattus norvegicus Platform GPL341 RAE230A Affymetrix Rat Expression 230A Array Citation 20 161 1 41 53 PMID 18207466 GSE6943 count Reference Series Value type Sample count Series published The resulting text file contains a 49 lines header that may pose problems in Excel split the file in two using some CLI magic Analysis of normal heart left ventricle and diaphragm of young adult Sprague Dawley males Concurrent rhythmic contractions of the diaphragm and heart are needed to sustain life Results provide insight into van Lunteren E Spiegler S Moyer M Contrast between cardiac lef
80. e file outfile sessionInfo R version 3 1 1 2014 07 10 Platform x86 64 apple darwinl10 8 0 64 bit locale 1 en US UTF 8 en US UTF 8 en US UTF 8 C en US UTF 8 en US UTF 8 attached base packages 1 parallel stats graphics grDevices utils datasets methods base other attached packages 1 GEOquery 2 31 1 Biobase 2 25 0 BiocGenerics 0 11 4 limma 3 21 12 loaded via a namespace and not attached 1 RCurl 1 95 4 3 tools 3 1 1 XML 3 98 1 1 Version info R 2 14 1 Biobase 2 15 3 GEOquery 2 23 2 limma 3 10 1 R scripts generated Mon Oct 13 08 43 55 EDT 2014 ug VETE TETITIETETHRHEIHIEETIHTIHIHIHIHPHEIHIEETHIHIHIHIPIPHIHIHIEHTHIHIHHIPIPHEIHEHIHIHIHHIHHIEHIHIHHIHIHIHIHIHIHIHIHIHIHIEIE ur Differential expression analysis with limma 1 i library GEOquery library affy 22 load CEL files from GEO igetGEOSuppFiles GSE6943 untar GSE6943 GSE6943 RAW tar exdir data cels lt list files data pattern gz sapply paste data cels sep gunzip icels V path to the folder in which R saved the CEL files of the files GSM160097 has been corrupted you have to remove it from the folder icelpath lt C Users Janick Documents data fns lt list celfiles path celpath full names TRUE 15 icat Reading files n paste fns collapse n n celfiles lt ReadAffy celfile path celpath download exercise files Download exercise file
81. e in expression of genes involved in carbohydrate and lipid metabolism Respir Physiol Neurobiol 2008 161 1 41 53 PubMed 18207466 WORLDCAT DOI P p http www ncbi nlm nih gov geo query acc cgi acc GSE6943 Main Page Hands on Analysis of public microarray datasets PUbMA_Exercise 5 PubMA_Exercise 6 Analyze GEO data with the Affymetrix software Retrieved from http stelap local BioWareWIKT index php titleZPubMA Exercise 6 amp oldid 1 1808 Category PUBMA2014 This page was last modified on 20 October 2014 at 16 21 This page has been accessed 88 times m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 68 Analyze GEO data with the Affymetrix software From BioWareWIKI Analyzing a selected GEO dataset using the Affymetrix Expression Console EC and Transcriptome Analysis Console TAC affymetrix Biology for a better world Main_Page Hands on Analysis of public microarray datasets Contents Introduction 2 The Affymetrix Expression Console EC 2 1 Converting CEL data to CHP format required for 2 2 Performing QC on the data and generating summarizing plots 3 The Affymetrix Transcriptome Analysis Console TAC 3 1 Importing EC data and defining Groups 32 Computing gene level Differential expression 3 3 Adjusting Differential expression limits 34 Plots based on the filtered differential expressio
82. e interested people to follow the four PDF files also present on the server Expression_analysis_part_I pdf http data bits vib be pub trainingen PPUBMA2014 ex6 files Expression analysis part I pdf m Expression analysis part II pdf http data bits vib be pub trainingen lPPUBMA2014 ex6 files Expression analysis II pdf Expression analysis part III pdf http data bits vib be pub trainingen PPUBMA2014 ex6 files Expression analysis III pdf m Expression analysis IV pdf http data bits vib be pub trainingen PPUBMA2014 ex6 files Expression analysis part IV pdf A recompiled version of this tutorial is part of a former BITS CLC training accessible Here http data bits vib be pub trainingen CLCMain TutorialMicroarrays pdf Loading microarray data into the Workbench The Workbench supports analysis of one color expression arrays These may be imported from GEO http www ncbi nIm nih gov The Workbench supports the following formats s GEO SOFT sample files http www ncbi nlm nih gov geo info soft2 html simple line based plain text files that contain all the data and the descriptive information of a microarray experiment example SOFT file http www ncbi nlm nih gov geo info soft ex platform txt GEO series file txt files containing the definitions of a group of related samples They contain tables describing extracted data summary conclusions and analyses Each Series file is assigned a uniqu
83. e top more top notch analysis methods reported in the literature A computer with sufficient resources is wished to perform QC steps in a reasonable time but current strong laptops and desktops should do the job For those who cannot afford the more expensive CLC Main workbench and do not wish to learn R and Bioconductor RobiNA seems a good choice when in need of MA analysis and it can also do standard RNASeq analysis as will be described in a separate hands on The complete file is accessible here http data bits vib be pub trainingen PUBMA2014 ex4 files RobiNA results detailed_results GSE6943_DE Robina txt use right click download linked file as or navigate from the page bottom link download exercise files Download exercise files here Expand References LT Marc Lohse Adriano Nunes Nesi Peter Kr ger Axel Nagel Jan Hannemann Federico M Giorgi Liam Childs Sonia Osorio Dirk Walther Joachim Selbig Sreenivasulu Mark Stitt Alisdair R Fernie Bj rn Usadel Robin an intuitive wizard application for R based expression microarray quality assessment and analysis Plant Physiol 2010 153 2 642 51 PubMed 20388663 WORLDCAT DOI I p M http www affymetrix com estore Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 3 PubMA_Exercise 4 PubMA_ Exercise 5 Retrieved from http stelap local BioWareWIKI index php title PubMA_ Exercise 4 amp oldid 11748 Category
84. e treated group it will be colored pink The order is important for calculating log fold changes later in the analysis If you reverse the order genes that are upregulated according to the publication that supports the data will be downregulated in your results and vice versa The list of samples in each group can be reviewed by clicking on List in the group definition popup window Define groups 5 LS E rE L Sample Groups x Diaphragm GSM160089 GSM160090 GSM160081 GSM160082 GSM160083 GSM160094 P Heart 55 160095 GSM160096 GSM160097 GSM160098 GSM160099 GSM160100 Visualize the distribution of log transformed expression values Before proceeding with DE analysis it is very important to first control for sample value distribution homogeneity in the Value distribution TAB GEO2R Value distribution Options Profile graph R script Calculate the distribution of value data for the Samples you have selected Distributions may be viewed graphically as a box plot or exported as a number summary table The plot is useful for determining if value data are median centered across Samples and thus suitable for cross comparison More View Export 13 GSE6943 GPL341 selected samples L3 Diaphragm DrHeat T 7 7 T 1200 1 d 1000 800 600 400 200 0 w e Oo o o e e e e e e e e e e e eo e
85. eased Diaphragm A B D E F 1 Description 22222 TiSie Test Lower taisi Upper tai v 2 55114 ioxationreduction process essct 473 1 285285603 101000 3 B6005 regulation of ventricular cardiac muscle cell action potential 113 28 4480485 1 d B6 reguiation of heart contraction 25 1214227626 09999 1 00004 _5 55010 Ventricular cardiac muscle tissue morphogenesis ER 771189372308 0 599 00004 6 6099 triarboxylicaeidcycle U U U III a 124 _ 17 8151236 0 9994 0 0006 7 86004 7a LISSE P 88988 09 8 86091 regulation of heart rate by cardiac conduction su 15 18 0252284 0 3993 00007 9 51291 protein heterooligomerization 16 8587409 i 09992 00008 10 2026 __ regulation ofthe force of heart contraction 21 _ 17 9066909 0 9992 0 0008 Published results The original publication ends with functional enrichment results identifying key differences between heart and diaphragm tissues in rats We link here to the results published by van Lunteren E Spiegler 5 Moyer M Full details about this dataset can be found on the http www ncbi nlm nih gov geo query acc cgi acc GSE6943 GEO page download exercise files Download exercise files here Expand References 1 Erik van Lunteren Sarah Spiegler Michelle Moyer Contrast between cardiac left ventricle and diaphragm muscl
86. either only a command line interface or solely very basic user interaction Finally there are tools such as RMA Express which offer a rich user interface but only a very limited set of options RobiNA tries to bridge this gap by providing a flexible user friendly graphical interface to unleash the power of R BioConductor for the individual biologist RobiNA comes as a convenient all in one installation package that automatically installs the application itself plus all required external tools i e the R and BioConductor frameworks and bowtie STU 7 77 Although RobiNA can handle several data type including two color microarray and even RNASeq data we here provide a simple tutorial to perform QC and differential analysis of Affymetrix microarray data comparable to what can be achieved using the CLC workbench Please see the RobiNA quick guide http mapman gabipd org c document library get file uuid 2a09272e9 e474 402e a554 b03d6ec9efd6 amp groupId 10207 and Robin and RobiNA user s manual http mapman gabipd org c document_library get_file uuid 60912d03 660e 4281 9834 22f2789424d2 amp groupId 10207 which contain step by step walk through and detailed information RobinA be downloaded from http mapman gabipd org web guest robin Required before starting RobiNA RobiNA working great or NOT ST uy RobiNA should work under Windows Unix as it uses Java which is universal language However
87. ene neiqhbors 29 Send to Help Filters Filters Profile data Download profile data 1 Profile pathways 7 Find pathways gt 2 Find related data Database Select Recent activity Turn Off Clear Heart left ventricle and diaphragm comparison GDSBrowser Q GDS3224 ACCN AND GDSffilter 1 GDSBrowser GEO DataSets for BioProject Select 98125 2 GEO DataSets Normal Heart vs Normal Diaphragm BioProject Boletus calopus taxonomy See more Profile data e Page of 221 Next Last gt gt peewee mon Download profile data Download the value data red bars for each profile on this page Download files are tab delimited and suitable for opening in a spreadsheet application such as Excel Retrievals that incorporate multiple DataSets are organized by DataSet blocks Experimental factor and annotation information is included A download file includes profiles shown on the current under Display Settings set Items per page to 500 to get the maximum number of profiles To download values for a complete DataSet please use the DataSet full SOFT file link available on the DataSet record Note Cross DataSet normalizations are not performed direct comparisons of values between different DataSets are not appropriate ID REF GSM160089 GSM160090 GSM160091 GSM160092 GSM160093 GSM160094 GSM1
88. er 2014 at 13 58 This page has been accessed 94 times m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 17 18 PubMA Exercise 2b From BioWareWIKI Follow up analysis demo in RStudio Studio Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 1 PubMA_Exercise 2 PubMA_Exercise 3 Contents Introduction 2 installing on non BITS laptop 3 Extend the GEO2R analysis in R with RStudio 3 1 adapt the GEO2R script in RStudio 3 1 1 Original GEO2R code 3 1 2 Improved code 4 download exercise files Introduction The former exercise produced a full table with differential expression between Heart and Diaphragm samples that can be used directly with Functional enrichment tools or IPA The R script used by GEO2R is quite standard and basic and only provides BoxPlots for signal distribution This is a minimum when doing MA data analysis and users often appreciate to have some more QC done on the data to control for biases or inconsistencies associated with artifacts or with variability in the experiment Expert analyst makes use of the R Bioconductor toolbox to further analyze their data Some examples of standard R functions are shown below to show you the power of this programing language The following exercise is provided as an appetizer and is far not exhaustive you will find many more methods and tools by expl
89. er to the InsilicoDB tutorial pages https insilicodb com category tutorials for more info Commercial resources licensed by VIB Genevestigator not covered during this training but warmly recommended for all users who do not have their own MA data but need to find biomarkers CLC Main workbench http data bits vib be pub trainingen CLCMain TutorialMicroarrays pdf used in the optional PubMA_Exercise 6 Ingenuity Pathway Analysis IPA is strongly advised for more advanced users usage You can use on any Java installed computer after asking for a personal account to mailto bits vib be and login in here https apps ingenuity com ingsso login service https 3A 2F 2Fanalysis ingenuity com 2F pa 2Fj spring cas security check amp originalUrl https 2F 2Fanalysis ingenuity com 2F pa 3Futm_source 3D Ingenuity 26utm_medium 3D Website 26utm_campaign 3DIPA LoginPage Please keep in mind that IPA is only meant for human mouse rat data Do you still need MORE Find more tools with OMICtools http omictools com References 1 1 Tanya Barrett Ron Edgar Mining microarray data at NCBI s Gene Expression Omnibus GEO Methods Mol Biol 2006 338 175 90 PubMed 16888359 WORLDCAT DOI P p Ron Edgar Michael Domrachev Alex E Lash Gene Expression Omnibus NCBI gene expression and hybridization array data repository Nucleic Acids Res 2002 30 1 207 10 PubMed 11752295 WORLD
90. es allows plotting their profiles and exporting the results to file GDS3224 Heart left ventricle and diaphragm comparison Rattus norvegicus D53224 lage caunt vs samples 15 eu e GSM160089 GSM160090 E GSHM160091 GSM160092 bo GSM160093 a GSM160095 GSM160096 GSM160097 GSHM160098 GSM160099 GSM160100 gt 1298121 295824 A 1178713 Ctnnal Pudk Fads3 Tnsi Adami3 Camked Download displayed data Show heat map region View profiles in Entrez Experiment design and value distribution This QC tool plots box plot for each sample and allows evaluating the global quality of the dataset A good dataset has a constant median line and similar distributions 36 Profile GD83224 Title Heart left ventricle and diaphragm comparison Organism Rattus norvegicus 053224 95 90 75 median 25 10 5 GSM160089 Diaphragm 1 GS5M160000 Diaphragm 2 GSM160091 Diaphragm 3 rw 07 Heart3 CSM L60009 download exercise files Download exercise files here References 1 1 http www ncbi nlm nih gov sites GDSbrowser 2 http www ncbi nlm nih gov geo info datasets html Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 2 PubMA_Exercise 3 _ Exercise 4 Retrieved from http stelap local BioWareWIKI index php title PubMA_Exercise 3 amp oldid 10946 Category PUBMA2014 This page was last modified
91. f lt getGEO gpl AnnotGPL TRUE iplatf lt getGEO gpl AnnotGPL TRUE ncbifd lt data frame attr dataTable platf table replace original platform annotation tT lt tT setdiff colnames tT setdiff fvarLabels gset ID tT lt merge tT ncbifd by ID tT lt tT order tT P Value restore correct order lt subset tT select c ID adj P Val P Value t B logFC Gene symbol Gene title write table tT file stdout row names F sep t Boxplot for selected GEO samples library Biobase Library GEOquery load series and platform data from GEO gset lt getGEO GSE6943 GSEMatrix TRUE igset getGEO GSE6943 GSEMatrix TRUE if length gset gt 1 idx lt grep GPL341 attr gset names else idx lt 1 igset lt gset idx 20 group names for all samples in a series sml e G0 G0 G0 G0 GO GO Gl Gl Gl Cl 61 order samples by group lex exprs gset order sml isml sml order sml fl lt as factor sml labels lt c Diaphragm Heart set parameters and draw the plot palette c dfeaf4 f4dfdf AABBCC idev new width 4 dim gset 2 5 height 6 mar c 2 round max nchar sampleNames gset 2 4 2 1 title lt paste GSE6943 annotation gset selected samples sep boxplot ex boxwex 0 6 notch T main title outline FALSE
92. ft vent GSM160100 Heart 6 Heart left vent GEO2R sample definition The first step in the GEO2R analysis is performed by cliscking on Define groups to setup sample groups based on available samples and label them These groups will be used to define contrasts and compute pairwise differential expression analyses Two groups are created with names diaphragm and heart then samples labeled using the mouse Samples Define groups Enter a group name List Group Accession x Cancel selection GSM160089 Diaphragm GSM160090 Heart GSM160081 12 Samples Define groups Enter a group name List Selected 12 out of 12 samples Columns Set Group Accession Source name Young adult SD rat x Cancel selection Diaphragm GSM160089 Diaphragm Diaphragm 6 samples Diaphragm GSM160090 Diaphragm Heart 6 samples m Diaphragm GSM1860081 Diaphragm Diaphragm GSM160082 Diaphragm 4 Diaphragm Diaphragm GSM160083 Diaphragm 5 Diaphragm Diaphragm GSM160084 Diaphragm 6 Diaphragm Heart GSM160095 Heart 1 Heart left vent Heart GSM160096 Heart 2 Heart left vent Heart GSM160087 Heart 3 Heart left vent Heart 5 160098 4 Heart left vent Heart GSM160098 Heart 5 Heart left vent Heart GSM160100 Heart 6 Heart left vent The order in which you assign the groups is important First define the control group it will be colored in blue then define th
93. g Ratio Consolidate IDs using the expression value median Review the obtained results 93 full summary report can downloaded from the server see link at bottom of this page geo2r DE table LR2FDR0 0 m Summary Canonical Pathways Upstream Analysis Diseases amp Functions Regulator Effects Networks Lists Molecules 35 Download Summary PDF p value Ratio Calcium Signaling Protein Kinase A Signaling Hepatic Fibrosis Hepatic Stellate Cell Activation Thrombin Signaling Role of NFAT in Cardiac Hypertrophy Upstream Regulator 3 66E 08 2 95E 05 1 04E 04 1 35E 04 1 88E 04 p value of overlap 21 130 0 162 26 272 0 096 16 137 0 117 16 140 0 114 16 144 0 111 Predicted Activ DMD MEF2C GATA4 LIPE MYOD1 E Top Diseases and Functions Diseases and Disorders Name 2 38E 19 2 37E 12 9 17 11 7 17E 10 2 87E 09 p value Inhibited B Molecules Skeletal and Muscular Disorders Cardiovascular Disease Organismal Injury and Abnormalities Developmental Disorder Neurological Disease Molecular and Cellular Functions Name 9 62E 13 4 62E 03 7 30E 11 4 62E 03 7 30E 11 4 62E 03 7 10E 09 4 27E 03 1 59E 07 4 62E 03 p value 131 122 247 94 141 B Molecules Cell Morphology Molecular Transport Cellular Development Cellular Growth and Proliferation Cell Sign
94. generated Tue Aug 12 05 30 54 EDT 2014 AA Differential expression analysis with limma library Biobase ilibrary GEOquery library limma load series and platform data from GEO gset lt getGEO GSE6943 GSEMatrix TRUE igset lt getGEO GSE6943 GSEMatrix TRUE if length gset gt 1 idx lt grep GPL341 attr gset names else idx lt 1 igset lt gset idx make proper column names to match toptable ifvarLabels gset lt make names fvarLabels gset group names for all samples sml lt c G0 GO GO GO GO GO G1 G1 G1 G1 G1 G1 log2 transform exprs gset lt as numeric quantile ex c 0 0 25 0 5 0 75 0 99 1 0 na rm T ibogC lt 5 gt 100 ax 6 qx 1 gt 50 55 qx 2 gt 0 qx 2 gt 0 amp amp qx 2 lt 1 58 4 gt 1 58 4 lt 2 if LogC ex which ex lt 0 NaN exprs gset lt log2 ex set up the data and proceed with analysis fl lt as factor sml igset description lt fl design model matrix description 0 gset 1 design lt levels f1 fit lt lmFit gset design icont matrix makeContrasts Gl G0 levels design 12 lt contrasts fit fit cont matrix 12 lt eBayes fit2 0 01 lt topTable fit2 adjust fdr sort by B number 250 load NCBI platform annotation gpl annotation gset plat
95. gicus Nb genes 662 Experiment GSE6943 Nb probes 885 Platform CPL341 Nb samples 12 Annotation Table Keyword Q value BH GOTERM_MF_ALL translation factor activity 0 00559898 PANTHER TERM BP BPOOOOS GLYCOLYSIS 0 00424492 89 Info 4 signatures 1 platforms 1 experiments Platform Experiment V 9482 923 Organism Nb genes Experiment GSE6943 Nb probes 1060 9 Platform GPL 341 Nb samples 94BB8DCA2 Load data Send to plugins Create group Plugins Annotation Table PANTHER_FAMILY PANTHER_TERM_BP PANTHER TERM MF PRODOM PUBMED ID PFAM NAME GOTERM CC ALL GOTERM CC ALL GOTERM CC ALL PROSITE MAME SP KEYWORDS SP PIR KEYWORDS Keyword PTHR10574 LAMININ 120 CELL ADHESI MF00261 ACTIN BINDI PD0000D686 5H3 12477932 PFOOO amp 5H3 1 mitochondrion mitochondrial part cytoplasmic part P550002 P550002 mitochondrion sh3 domain Q value BH 0 00382715 5 7 242E 4 0 00147612 0 00483506 5 73342 4 0 00554995 8 01806E 6 8 96956E 5 1 59748E 4 0 00249775 6 05554E 4 0 00479139 Heatmap third TS 622 clearly associated with heart and mitochondrial functions Results Info IO 1 experiments Load data Send to plugins Create group Back Plugins Heatmap x Settings last TS 684 apparently specific to the cardiac
96. hare Alike unless otherwise noted 10 PubMA Exercise 2 From BioWareWIKI Compute differential analysis using GEO2R within the NCBI web portal Gene Expression Omnibus Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 1 PubMA_Exercise 2 PubMA_Exercise 2b PubMA_Exercise 3 Contents Analyze public GEO data on the NCBI portal 1 1 GEO2R step by step walk through for GSE6943 1 1 1 The GEO2R interface 1 1 2 GEO2R sample definition 1 1 3 Visualize the distribution of log transformed expression values 1 1 4 Search for the top 250 differentially expressed transcripts 1 1 5 Saving the Rscript for further use in RStudio 2 download exercise files Analyze public GEO data on the NCBI portal The GEO portal links to several web tools allowing data analysis without the need to install anything on your computer Although these tools will not compete with sophisticated R Bioconductor methods they remain very attractive as they do not require prior knowledge in MA data analysis and are very fast leading the users to tabular results and pictures that can be fed to other tools or used as is in scientific reports We proceed here with GEO2R which allows finding differentially expressed genes by comparing sample groups within one GEO submission Full instructions https www ncbi nlm nih gov geo info geo2r html Tutorial video 2 GEO2R step by step walk through for GSE694
97. hive and decompressed to individual CEL files and a CDF file listing all probes present on the chip used in the experiment Today s CDF file can be obtained from the Affymetrix site 2 after registering free and searching for the library file corresponding to the platform reported in the GEO pages in our case Rat Expression Set 230 aka 2304 wm http www affymetrix com Auth support downloads library files rae230 libraryfile zip The decompressed archive contains the required CDF file under CD RAE230 rev04 Full RAE230A LibFiles RAE230A CDF that can be copied in the RobiNA project area start RobiNA and create a new project for results RobiNA The transcriptomics data preprocessor Version 1 2 4 build656 Release Notes 1 2 4 build656 yze Please Welcome to RobiNA and thank you for using it to evaluate and analyse your microarray and RNA Seq data Before taking off please take the time to read the release notes carefully e New workflow for RNA Seq based transcript profiling Check out the new workflow le chan nel for RNA Seq based analysis of differential gene expression To date it supports import oflllumina Solexa type raw sequence data in FASTQ format SAM BAM prealigne kpe riment reads and precomputed counts tables ays etc e Treatment of replicates The current Version of RobiNA assumes that all replicates that you enter are true biological replicates Technical replication is not yet taken int
98. iaphragm Source Diaphragm RGD1311260 3 1 Dolk NULL Ralbp1 Als2cr2 NULL 1 3 Pfkfbi Tst Dtdi Khi13 The full heatmap picture can be saved to file using buttons present at the bottom of the window full heatmap 1 Expand Similarly text files can be saved with the data used to plot the heatmaps http data bits vib be hidden jhslbjcgnchjdgksqngcvgqdlsjcnv TBrowser2014 09d_TS1 heatmap data txt and a text table reporting enriched terms http data bits vib be hidden jhslbjcgnchjdgksqngcvgqdlsjcnv TBrowser2014 09e TS1 terms txt second heatmap Signature 948791256 ct c c cc c c c c Annotation PANTHER_FAMILY Sach Sort PANTHER TERM BP Probes 13 0 PANTHER_TERM_MF Samples 15 0 Tm NULL Pidi Rfc3 Aldh2 LOC690871 Hk1 Arpcia m 1306353 BP00120 CELL ADHESION MEDIATED SIGNALING NULL 12477932 Snn Lcat Gprii Lamb2 Cidea NULL Mapia Carhsp1 Q value E a MF00261 ACTIN BINDING CYTOSKELETAL PROTEIN Ili Prkaria 1 19511 Save heatmap Save annotation Export heatmap datal third heatmap 91 Siqnature 9487 FERQBI ect ccc c Annotation m PATHWAY
99. ic Tests on Annotations Select two nested experiments experiments Navigation Area a Y Heart vs Diaphragm E Heart vs Diaphragm lt enter search term gt 2 Previous iw a s Selected elements 2 Heart vs Diaphragm Heart vs Diaphragm subexperiment n 142 Finist Cancel A window allows selecting the annotation type to be used in the test and the action to take with duplicate probes here merged by gene symbol to the highest IQR 65 eoo Hypergeometric Tests on Annotations Set parameters 1 Select two nested p experiments 2 Set parameters for hyper geometric tests on annotations Annotations Annotation to test Pathway Annatated features 1299 Reduce feature set Remove duplicates Using gene identifier Gene symbol Annotated features 12853 Keep feature with Highest IQR Highest value r Values to analyze 1 Original expression values Transformed expression values 1 Normalized expression values Previous Next Finish Cancel Hyper geometric test for association of annotation categories to a sublist of a larger gene list Wed Aug 13 12 38 46 CEST 2014 Version CLC Main Workbench 7 0 3 User splaisan Parameters Gene identifier column used in tests Gene symbol Annotation column used in tests GO biological
100. iched in the obtained subset IN Profile pathways 5 See frequency weighted list of pathways these profiles This button links to the NCBI BioSystems database Use it to display the list of pathways in which these gene expression profiles participate The pathways are ranked by the number of profiles to which they are linked Amaximum of 100 000 profiles are considered This tool can be particularly useful for helping to characterize lists of profiles that have been determined to be differentially expressed across experimental variables 30 FLink Frequency weighted Links ABOUT HOW HELP FAQ NEWS PUBLICATIONS DISCOVER Links from geoprofiles records to biosystems records weighted by frequency click to see details eal Clear Selections Show Download CSV Summary Frequency BSID 2n Organism an 381 1010675 Metabolism organism spacific biosystem Rattus norvegicus am 306 1010015 Signal Transduction onganism specific biosystem Rattus norvegicus an 226 1010824 Immune System onganism specific biosystem Rattus norvegicus Gn 218 1010484 Disease organism specific biosystem Rattus norvegicus an 175 1010388 Gene Expression onganism specific biosystem Rattus norvegicus am 144 1009706 Metabolism of proteins onganism specific biosystem Rattus norvegicus s mp ral sp r Gm 130 1010682 Metabolism of lipids and onganism spacific biosystem Rattus norvegicus li
101. igher concentrations than thr RNA so the signal of dap should be higher than that of thr and this was not the case for the samples that were flagged outside bounds The other control probes behaved as they should So it might be that in some samples the reverse transcription of the high abundance transcripts was not completely efficient because of saturation STU fe As part of the standard Affymetrix microarray processing control molecules are added to the mRNA at different concentrations prior to producing the cDNA Other molecules cDNA are a gt added later in the sample preparation to control for hybridization on the The of bound errors reported above result from the discrepancy between known spiked in quantities and the readout after scanning the chip The highest concentration of control does not produce a final value higher than a lower concentration of control which results in raising an alarm and showing the 4 samples with colored background Full details about the identity of the faulty probes and the obtained values can be found at the bottom table part of the full report linked in the next paragraph PDF Performing QC on the data and generating summarizing plots A number of QC plots can be generated using the right tools The full QC report can then be saved as PDF file and is available both for the RMA method http data bits vib be pub trainingen AffyECTAC2014 GSE6943_EC rma qc PDF MASS method http data
102. in which several genes were concomitantly regulated Several examples are provided below and in the article published in PLoSONE 1 video tutorial for TranscriptomeBrowser is available here https www youtube com watch bJMEPeSgHI and a second for the InteractomeBrowser plugin here https www youtube com watch v SxOBmCP1G1A The full manual can be read here http tagc univ mrs fr tbrowser index2 php option com content amp task view amp id 19 amp pop 1 amp page 0 amp Itemid 23 After installation see online documentation and startup the main TBrowser interface awaits user input 87 Search Signature Platform Experiment Gene symbol Organism Nbges Entrez ID Experiment 5 _____ Probe ID Plaform Nb samples C HomoloGene ID Annotation Q value Platform _ Experiment Signature Tables ALL Q value max 1E 0 BH Plugins SEARCH Heatmap 2 Settings Select Annotation Sort 130 15 0 Q value 5 save heatmap Save annotation Export heatmap data A Walk through example As shown above different search angles can be used to populate the interface Instead of searching for genes we choose to use here the GSE6943 dataset used in other BITS training t
103. ind the name of the custom cdf by going to the package folder DATA D gt R 3 1 1 library ath1121501attairtcdf gt sua a w New folder Name i Date modified Type Size data 10 10 2014 13 00 File folder k help 10 10 2014 13 00 File folder html 10 10 2014 13 00 File folder 10 10 2014 13 00 File folder 10 10 2014 13 00 File folder CITATION 10 10 2014 13 00 File 2 KB DESCRIPTION 10 10 2014 13 00 File 1 KB INDEX 10 10 2014 13 00 File 1 KB NAMESPACE 10 10 2014 13 00 File 1 KB Open the DESCRIPTION file and look in the Description line Package ath112150lattairtcdf Title ath112150lattairtcdf Version 18 0 0 Created Wed Jan 29 12 33 39 2014 Author Manhong Dai Description A package containing an environment representing the customcdf VEU RE file Maintainer Manhong lt daimh umich edu gt License LGPL biocViews MBNICustomCDF AnnotationData AffymetrixChip ath1121501 ath112150lattairt Arabidopsis thaliana Then proceed by executing the following R code uf normalisation using rma algorithm data rma rma data L The output is an exprSet object with a data matrix containing normalized log intensities on probe set level in the exprs slot writing probe set level data to a file called data txt write exprs data rma file data txt The resulting text file can be imported into the Wor
104. ined from the RobiNA results is available on the BITS server link http data bits vib be pub trainingen PPUBMA2014 ex5 files RobiNA DE probes LFC2 FDR0 001 txt and its content can be used on the WebGestalt submission page http bioinfo vanderbilt edu webgestalt The ID type of the enriched list can be identified by selecting it in the list 57 WEB based GEne 5 AnaLysis Toolkit WebGestalt Translating gene lists into biological insights Select the organism of interest rnorvegicus Select gene ID type rnorvegicus affy rae230a Upload gene list Choose File no file selected OR 1367707_at 1370355_at 1376371_at 1368000_at 1373410_at 1388116_at 1375230_at 1371315_at 1368966_at 1371293 Clear ENTER In the next window the Reference Set for Enrichment Analysis should be selected from the list Select Reference Set for Enrichment Analysis X Select Id Type from Drop Down Menu rnorvegicus genome rnorvegicus entrezgene protein coding rnorvegicus affy rae230a 1388502 at D Type fi rnorvegicus__affy_rae230b ul m rnorvegicus raex 10561 l prepa rnorvegicus affy ragene 1 OQ st vl 1 376309 at rnorvegicus rat230 2 rnorvegicus rg u34a rnorvegicus rg u34b rnorvegicus affy rg u34c rnorvegicus rn u34 rnorvegicus affy rt u34 rnorvegicus agilent G4131A rnorvegicus agilent G4131F rnorvegicus agilent wh
105. ion into cytosol by sarcoplasmic reticulum LEM MM NE d 005224 88070 nitrogen compound metabolic process 000000000000 MEC NU CNN Q0 12 0005224 48738 cardiac musele tissue development fo i S es MM ce 005224 5978 igiycogen biosynthetic process DM eee ee 0 1 gt 00606 regulation of synaptic activity 1 1 1 0008433 5420 iglutamine biosynthetic NALE MID a D T i 008493 7 1901020 negative regulation of calcium ion transmembrane transporter activity 1 i 1 a T0008493 adrenergic receptor signaling pathway involved in heart process 1 JB UM um 0 008493 ie DE 1 0 d 1000843 MM iphosphocreatine metabolic process es 1 70008493 ipyridoxal phosphate biosynthetic process 1 1 9 d 1 0 008493 1634 iregulation of germinal centerformation eR ee D T i 008493 FE jmocdnesignaling i aaa L l l 1 0 1 T0008493 regulation of lateral pseudopodium assembly 1 1 Xue MB ME b ae i 10 008493 iis ie j MEC i meng 0 008453 46439 iLeysteine metabolic process CEN 1 0008493 2424 icatecholamine catabolic process L l l A l O L 1 _ 100083 Another page presents details about the parameters used in the different tests 66 m results of the hypergeometric test GO BP Hyper geometric test for associatio
106. kbench Main results and specific settings Only key steps are reproduced here to provide information to interpret the figures All other steps and parameters will be explained in the tutorial PDF files linked above After computing group wise differential expression a filtering step is applied to the full table to retain only DE genes with at least 2 fold change in expression with an adjusted p value of at most 5x10 3 and with expression data present calls in at least 4 of the 6 replicates Rows 142 15 923 Match Match all t test Heart v abs 2 t test Heart lt 0 0005 Heart Presen gt gt 4 4 EE Diaphragm P gt Filter The classical volcano plot is produced with in red the 142 DE genes selected during filtering This subset will be used as test set against all other genes in the data table in the hypergeometric enrichment analyses detailed below 64 Volcano Plot t test 15 14 13 12 11 10 p values 10 5 10 0 Difference of group means STU i Due to the logarithmic nature of the data transformed the Difference column should be used instead of Fold Change to represent the differential expression 94 Enrichment analysis CLC hypergeometric tTest The following figure shows data samples used in the hypergeometric tTest eoo Hypergeometr
107. les Organism Rattus norvegicus Type Expression profiling by array count 2 tissue sets Platform GPL341 Series GSE6943 12 Samples Download data GEO CEL DataSet Accession GDS3224 3224 PubMed Similar studies Profiles Analyze DataSet The highlighted link http www ncbi nlm nih gov sites GDSbrowser acc GDS3224 opens in a new tab 26 DATASET BROWSER T Search for GDS3224 ACCN __________ Search Clear Show All Advanced Search Neurobiol 2008 20 161 1 41 53 PMID 18207466 GSE6943 count Reference Series Sample count 12 Value type Series published Find genes P Compare 2 sets of samples Find genes that are Cluster heatmaps Experiment design and value distribution Download the full data table 2008 01 24 Data Analysis Tools Find gene name or symbol up down for this condition s tissue _Go_ Title Heart left ventricle and diaphragm comparison Summary Analysis of normal heart left ventricle and diaphragm of young adult Sprague Dawley males Concurrent rhythmic contractions of the diaphragm and heart are needed to sustain life Results provide insight into transcriptional strategies for ensuring long term energy supplies in these two muscles Organism Rattus norvegicus Platform GPL341 RAE230A Affymetrix Rat Expression 230A Array Citation van Lunteren E Spiegler S Moyer M Contrast between cardiac left ventricl
108. muscle 4 signatures ALL 94B2EC923 94B7912BC 94BB8DCA2 Experiment Platform Organism Experiment GSE6943 Platform Annotation Table Keyword Q value BH KEGG PATHWAY RNOOO0190 OXIDATIVE 3 73533E 4 TERM BP 00019 LIPID FATTY 0 0014345 2 GOTERM_BP_ALL heart process 0 00279106 GOTERM_BP_ALL heart contraction 0 00280895 PANTHER TERM MF MF00123 OXIDOREDU 0 00739837 WIKIPATHWAY Rn Electron Transport 3 18418E 4 PUBMED ID 12477932 1 52352E 5 KEGG REACTION RO2164 UBIQUINONE 0 00884734 GOTERM CC ALL mitochondrial part 5 00014E 11 GOTERM CC ALL mitochondrion 8 71289E 10 GOTERM CC ALL mitochondrial membrane 2 72681E 9 SP KEYWORDS mitochondrion 6 8 26E 6 Nb genes 622 Nb probes 772 Nb samples 12 Results Info 4 signatures ALL 9482 923 948 7912 94 7 8 0 Load data Send to plugins Create group Plugins x Heatmap 7j Settings Show HeatMaps for each TS 1 platforms Back P E Platform Experiment Organism Rattus norvegicus Experiment GSE6943 Platform Annotation Table GOTERM CC ALL IGOTERM CC ALL IGOTERM CC ALL ISP KEYWORDS PANTHER TERM MF IPANTHER TERM MF MF ALL TERM BP PANTHER TERM BP TERM BP GOTERM BP ALL BP ALL For
109. n of annotation categories to a sublist of a larger gene list Wed Aug 13 14 49 36 CEST 2014 Version CLC Main Workbench 7 0 3 User splaisan Parameters Gene identifier column used in tests Gene symbol Annotation column used in tests GO molecular function Raw universal gene list size 15923 Used universal gene list size requiring annotation and one feature per gene only 8969 Raw subset gene list size 142 Used subset gene list size requiring annotation 74 Expression values used when filtering to one feature per gene Transformed expression values Applied filter to reduce features to one per gene true Filter applied to reduce features to one per gene used feature with highest IQR m results of the hypergeometric test for GO MF Hyper geometric test for association of annotation categories to a sublist of a larger gene list Wed Aug 13 14 49 36 CEST 2014 Version CLC Main Workbench 7 0 3 User splaisan Parameters Gene identifier column used in tests Gene symbol Annotation column used in tests GO molecular function Raw universal gene list size 15923 Used universal gene list size requiring annotation and one feature per gene only 8969 Raw subset gene list size 142 Used subset gene list size requiring annotation 74 Expression values used when filtering to one feature per gene Transformed expression values Applied filter to reduce features to one per gene true Filter applied to reduce features to one per gen
110. n table m 3 5 Exporting results 4 Conclusion 5 Youtube videos from the Affymetrix training team download exercise files Introduction 6 The data used in this how to tutorial is the same as that used for the BITS hands on training Hands on_Analysis_of_public_microarray_datasets The Affymetrix online training page dedicated to MA and transcriptome analysis can be browsed here http www affymetrix com estore browse level_seven_software_products_only jsp productld 131414 amp categoryld 35623 amp productName A ffymetrix 2526 2523174 253B Expression Console 2526 2523153 253B Software 1_1 This main pages contains links to download the necessary software as well as links to other Affymetricx resources necessary to perform a full expression analysis Also refer to the Affymetrix Transcriptome Analysis Console TAC Software and Expression Console Software tutorial pages http www affymetrix com support learning training_tutorials tac_ec index affx 1_2 Ge You will need to set a free NetAffx http www affymetrix com analysis index affx account to download software and access data pages Data workflow Affymetrix Expression Console EC Software Perform exon evel normalization and signal summarization Perform gene level normalization and signal summarization Affymetrix Transcriptome Analysis Console TAC Software Select analysis 1 Gene level 2 Exon level 3 Alternati
111. nal file GSE6943 DE txt row names F sep t quote FALSE VETE TIETIHTIETETHIHEIHIEETIHIHTIEIHIPPHEIHIEEHIHIHIHIPIPHEIHIEHIHIHIHIHIPIPHEIHEHIHIHIHIHIHIHIHIHIHHIHIHIHIHIHIHIHIHIHIHIEE ur Boxplot for selected GEO samples library Biobase library GEOquery load series and platform data from gset lt getGEO GSE6943 GSEMatrix TRUE igset lt getGEO GSE6943 GSEMatrix TRUE destdir base if length gset gt 1 idx lt grep GPL341 attr gset names else idx lt 1 igset lt gset idx uf group names for all samples in a series sml lt c GO GO GO GO GO GO G1 G1 G1 G1 G1 G1 uf order samples by group lt exprs gset order sml isml lt sml order sm1l fl lt as factor sml labels lt c Diaphragm Heart uf set parameters and draw the plot 77 save to file filename lt paste base GSE6943 boxplot pdf sep pdf file filename bg white ipalette c dfeaf4 f4dfdf ZAABBCC dev new width 4 dim gset 2 5 height 6 mar c 2 round max nchar sampleNames gset 2 4 2 1 title lt paste GSE6943 annotation gset sample signal distribution sep iboxplot ex boxwex 0 6 notch T main title outline FALSE las 2 col f1 legend topleft labels fill palette bty n idev off save R workspace for reuse ioutfile lt paste base Workspace RData sep save imag
112. nk mean difference 4x amp either The former test found too many hits n 4409 to be specific for given pathways if we try instead the rank mean difference test with a 4 fold difference cutoff we get only 152 genes that are probably more specific for the biology behind the sample groups Search pathways enriched in the obtained subset Data Analysis Tools Find genes Step 1 Select test and significance level Compare 2 sets of samples 2 Rank means difference AvsB 4 fold either Cluster heatmaps Step 2 Select which Samples to put in Group A and Group B Experiment design and value distribution Step 3 Query Group Avs B 31 Display Settings v Summary 20 per page Sorted by Default order Send to Filters Manage Filters Results 1 to 20 of 152 Page 1 1048 Next gt Last gt gt Profile data gt J Heart left ventricle and diaphragm comparison Download profile data Annotation Nppa natriuretic peptide A Organism Rattus norvegicus Reporter GPL341 1367564 at ID REF 5053224 24602 Gene ID 012612 u Profile pathways DataSet type Expression profiling by array count 12 samples ey ID 51748213 Find pathways GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors Sequence neighbors Homologene neighbors Nppb Heart left ventricle and diaphragm comparison 2 Annotation N
113. notations View Interaction Network View Interaction Network oe 140 1E 14 bed d Fold Change C ANOVAp p FDR p value veight Bi weight Gene linear Heart value Heart vs Heart vs Description Comment Avg Avg Signal Symbol 130 1E 13 TN C vs Diaphragm Diaphragm s Diaphragm 30 10 71 Mida d 6 47E 07 Cox17 COX17 cytochrome c oxi 9 18 10 21 0 000093 Tsc22d4 TSC22 domain family me 120 1E 12 5 70 6 73 0 051391 Plekhg2 pleckstrin homology do 8 76 9 79 i 0 000179 775 8 78 0 001014 9 23 10 26 0 000014 Ap3b1 adaptor related protein c 2 70 3 73 0 046805 12 PR domain containing 12 100 1E 10 9 39 10 42 0 000700 1 cysteine tyrosine rich 1 5 81 6 84 0 017993 0 060586 10 680770 similar to dachshund b 8 80 9 83 0 000003 0 000068 Nde1 nudE nuclear distribution 4 91 5 95 i 0 034755 0 100022 Serpinal serpin peptidase inhibitor 7 70 8 73 1 0 000563 0 004081 Apbb1 amyloid beta A4 precurs 8 88 9 91 0 000001 0 000042 Lipa lipase A lysosomal acid 6 41 7 44 0 000250 0 002172 2 potassium voltage gated 10 88 11 92 6 00E 08 0 000005 Eci2 enoyl CoA delta isomeras 5 52 6 55 0 048 794 0 129042 protein tyrosine phosphat 9 63 10 66 0 002494 0 013072 Clic4 chloride intracellular cha 5 69 6 73 A 0 041055 0 113600 Alkbh2 alkB alkylation repair ho 8 85 9 88 0 000085 0 000963 Parvb parvin beta
114. ns represented by genes assessed in the experiment will be considered although this is normally true ith modern platform where almost all known genes are present When the full Rat genome is taken as background we obtain relatively high confidence predictions 53 DAVID Bioinformatics Resources 6 7 Ak On National Institute of Allergy and Infectious Diseases NIAID NIH Functional Annotation Clustering Help and Manual Current Gene List List_1 Current Background Rattus norvegicus 246 DAVID IDs E Options Classification Stringency Highest Rerun using options Create Sublist 58 Cluster s E Download File Annotation Cluster 1 Enrichment Score 7 06 Count P_Value Benjamini GOTERM_BP_FAT myofibril assembly RT 8 1 1E 8 1 3 6 GOTERM_BP_FAT actomyosin structure organization RT 8 1 2bE 7 1 0E 5 GOTERM_BP_FAT cellular component assembly involved in RT 8 5 2E 7 4 4E 5 morphoqenesis Annotation Cluster 2 Enrichment Score 3 56 i Benjamini INTERPRO Zinc finqer LIM type RT 7 1 9E 4 2 1E 2 SP_PIR_KEYWORDS pee RT 7 2 6 4 8 1E 3 m SMART LIM RT 7 4 1E 4 3 6 2 Annotation Cluster 3 Enrichment Score 3 24 X Benjamini _ _ cellular qlucan metabolic process RT 6 2 0E 4 7 1E 3 m GOTERM_BP_FAT qlucan metabolic process RT 6 2 0E 4 7 1E 3 g GOTERM_BP_FAT glycogen metabolic process RT 6 2 0E 4 7 1E 3 m GOTERM_BP_FAT eneray reserve metab
115. o account When entering technical replicates the significance of differential expression will be overestimated leading Ing to an artificially increased number of genes that are significantly called differentially expressed e Unconnected designs in two color experiments Comparing results from two sets of two color microarrays that are not connected requires analysing the color channels separately This is not yet supported in RobiNA quencing show startup OK Welcome to RobiNA idle Manual Welcome to RobiNA d Start new project Open existing project The first step of the workflow will beto choose a project directory in which all files related analysis will be stored Please make sure that the chosen will be on a volume hard drive USB stick etc tough free space to possibl your Dp P data in case Project folder Free space on target volume Cancel Continue Importing CDF amp CEL files STU 7 7 While preparing this training we discovered that GEO had a damaged file GSM160097 for one of the sample we will therefore do this training with only 5 replicate in the Diaphragm group while the CLC analysis was done with the full data You are now ready to import the CEL files and the matching CDF annotation database 41 Welcome to Robin The first step of the data analysis is the import of microarray data into Robin Please choose the raw
116. of genes 15866 2 1043 genes are differentially expressed Algorithm Options 1 One Way Between Subject ANOVA Unpaired Default Filter Criteria 1 Fold Change linear lt 2 Fold Change linear gt 2 2 ANOVA p value Condition pair lt 0 05 Conditions Heart 6 1 GSM160089 rma chp 2 GSM160090 rma chp 3 GSM160091 rma chp 4 GSM160092 rma chp 5 GSM160093 rma chp 6 GSM160094 rma chp Diaphragm 5 1 GSM160095 rma chp 2 GSM160096 rma chp 3 GSM160098 rma chp 4 GSM160099 rma chp 5 GSM160100 rma chp 09 52 MD voy Additional annotations can be added using the dedicated menu Customize Annotations Annotation File RAE230A na34 annot csv xl Annotations Assignment O Top Assignment e All Assignments Select Annotation Column s to Add GeneChip Array 1 Species Scientific Name Annotation Date Sequence Type Sequence Source C Transcript ID Array Design Already Added Target Description _ Representative Public ID Archival UniGene Cluster UniGene ID Genome Version Alignments Gene Title Gene Symbol Already Added Chromosomal Location Unigene Cluster Type Ensembl Entrez Gene SwissProt EC 1 OMIM RefSeq Protein ID RefSeq Transcript ID FlyBase C AGI WormBase C MGI Name RGD Name SGD accession number Gene Ontology
117. olegenome 4x44k vl rnorvegicus agilent wholegenome 4x44k v3 rnorvegicus codelink Level Two sets of enrichments are available User Data textAreaUpload txt Total number of User IDs 301 255 user IDs can unambiguously map to 226 unique Entrez Gene IDs 46 user IDs were mapped to multiple Entrez Gene IDs or could not be mapped to any Entrez Gene ID The Enrichment Analysis and GO Slim Classification will be based upon the 226 unique Entrez Gene IDs Click here for new analysis Enrichment Analysis GO Slim Classification Enrichment Analysis GO Slim Classification Select Reference Set for Enrichment Analysis rnorvegicus rae230a 2 58 Enrichment Analysis GO Analysis KEGG Analysis or Wikipathways Analysis Pathway Commons Analysis Transcription Factor Target Analysis x MicroRNA Target Analysis Protein Interaction Network Module Analysis m Cytogenetic Band Analysis Disease Association Analysis Drug Association Analysis Phenotype Analysis PheWAS Analysis ru GO Slim Classification GO Slim Classification Biological Process Moleclular Function Cellular Component The results are shown in tables with annotations and scores as well as the list of genes responsible for the enrichment KEGG pathways WEB based GEne SeT AnaLysis Toolkit Translating gene lists into biological insights WebGe
118. olic process RT 6 3 7E 4 1 3E 2 g GOTERM_BP_FAT cellular polysaccharide metabolic process RT m 6 7 1E 4 2 0E 2 GOTERM BP polysaccharide metabolic process RT 6 1 7E 2 2 3E 1 By contrast when the true background is set to what the RAE230A really covers a lower confidence is obtained This is not a major issue here but when reporting p values you should always be careful to use the correct background in order not to overestimate your findings DAVID Bioinformatics Resources 6 7 Ab On National Institute of Allergy and Infectious Diseases NIAID NIH Functional Annotation Clustering Help and Manual Current Gene List List_1 Current Background Rat Genome RAE230A Array 246 DAVID IDs E Options Classification Stringency Highest Rerun using options Create Sublist 58 Cluster s Download File Annotation Cluster 1 Enrichment Score 5 91 Count P_Value Benjamini GOTERM_BP_FAT myofibril assembly RT 8 1 6E 7 1 9 5 GOTERM BP FAT actomyosin structure organization RT 8 1 5E 6 1 6 4 GOTERM_BP_FAT cellular component assembly involved in RT 8 7 9E 6 6 4E 4 morphogenesis Annotation Cluster 2 Enrichment Score 3 65 i Benjamini INTERPRO Zinc finger LIM type RT 7 1 6E 4 2 4E 2 SP PIR KEYWORDS LIM domain RT 7 2 1E 4 8 8E 3 SMART LIM RT 7 3 3E 4 2 9E 2 Annotation Cluster 3 Enrichment Score 2 69 i Benjamini GOTERM_BP_FAT cellular glucan metabolic proce
119. on 26 August 2014 at 13 39 This page has been accessed 160 times Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted m ce ce mo GSM160096 GSM160099 GSM160100 37 Expand 38 PubMA Exercise 4 From BioWareWIKI RobiNA analysis Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 3 PubMA_Exercise 4 PubMA_Exercise 5 Contents Introduction 2 Required before starting RobiNA 2 1 RobiNA working great NOT 2 2 The CDF annotation file 3 Step by Step RobiNA analysis workflow m 3 1 start RobiNA and create new project for results 3 2 Importing CDF amp CEL files 3 3 Performing QC on each CEL file 3 3 1 Evaluating QC results 3 4 Define design 3 5 Computing differential expression 3 5 1 Reviewing DE results m 4 Adding annotations to the RobiNA data 5 Conclusion download exercise files Introduction Several commercial programs such as GeneSpring featuring a rich and simple user interface to statistically analyze high throughput omics data are available However these are usually very expensive and might even require an annual subscription On the other hand there are free open source tools such as BioConductor which offer great statistical options and support for free but this power often comes at the price of usability since these tools often feature
120. oring the CRAN documentation http cran r project org installing on a non BITS laptop required for non BITS laptops If running a unix OS computer please use yum to install ilibxm12 devel libcurl devel required to build dependencies required for markdown itexlive collection latexrecommended noarch texlive latex noarch also required for markdown install minimal bioconductor isource http bioconductor org biocLite R ibiocLite Add required packages for today s session will add RCurl XML dependencies biocLite c Biobase GEOquery limma affy some other packages are not required here but are required for RobiNA in other exercises these required packages were identified by parsing all R scripts present in the windows build of Robin biocLite c affy affyPLM RankProd limma germa statmod marray plier edgeR DESeq EDASeq Extend the GEO2R analysis in R with RStudio 19 Collapse makecdfen RStudio is the current best graphical environment to program 1 the language and offers many facilities to learn and develop your skills The program is available free of charge from http www rstudio coml21 adapt the GEO2R script in RStudio Original GEO2R code The starting script 18 first reproduced here initial GEO2R code Collapse Version info R 2 14 1 Biobase 2 15 3 GEOquery 2 23 2 limma 3 10 1 v R scripts
121. otein C fast type chr1 101572108 1015759 31 1374391 at Rn 16457 1 12 42 5 09 0 24 1 49 160 99 0 000002 0 00004 Sin sarcolipin chr8 57016720 57017242 33 1386873_at Rn 4035 1 12 5 5 48 0 15 0 32 129 87 4 26E 12 2 04E 09 Tnni1 troponin type 1 skeletal slow chr13 57676633 57684216 35 1371339_at Rn 11675 2 12 59 5 64 0 24 0 09 123 72 4 13E 13 3 86E 10 Tnni1 troponin type 1 skeletal slow chr13 57683074 57685257 738 1398306_at Rn 9794 1 11 19 4 46 0 19 0 07 106 44 7 16E 14 1 14E 10 Ampd1 adenosine monophosphate deaminase 1 chr2 224999610 2250204 744 1376227_at Rn 41395 1 12 91 6 25 0 13 0 58 101 15 6 68E 10 8 82E 08 Myozi myozenin 1 chris 8185059 8185524 747 1376968 at Rn 26659 1 10 54 4 26 0 48 0 17 77 47 5 60E 10 7 80E 08 Mybpc2 myosin binding protein C fast type chri 101591233 1015937 48 1370900 at Rn 1072 1 9 47 3 2 0 94 0 05 76 98 1 46E 07 0 000006 Myh4 myosin heavy chain 4 skeletal muscle chr10 53552135 53553903 50 1386977_at Rn 1647 1 11 08 4 96 0 2 0 27 69 53 9 73E 12 3 77E 09 Car3 carbonic anhydrase 3 chr2 107900426 1079092 51 1381575 at Rn 15517 1 9 51 3 48 0 32 0 23 65 69 5 90E 11 1 28E 08 Neb nebulin chr3 42756010 42756391 52 1371298 at Rn 3968 1 12 6 6 61 0 23 0 3 63 38 2 54E 11 6 95E 09 H19 H19 imprinted maternally expressed transcript non protein coding chr1 222639223 2226401 _53 1373873 at Rn 14050 1 11 44 5 64 0 13 0 26 55 6 3 17E 12 1 68E 09 chr17 34950334 34951353 56 1390355 at Rn 38647 1 11 42 5 73 01 0 3 51 53 8
122. pared delete group Users splaisan Projects BITS TUTORIALS BITS_tutorials work Analysis_of_public_microarray_datasets Configure groups r Input files Files JUsers splaisan Projects BITS TUTORIALS B Name Users splaisan Projects BITS TUTORIALS B JUsers splaisan Projects BITS TUTORIALS B Add selected JUsers splaisan Projects BITS TUTORIALS B JUsers splaisan Projects BITS TUTORIALS B _ delete group Addgroup Delete group P Previous zb Next 46 Design your experiment You can arrange the groups by dragging them around Define which groups shall be compared by holding down the CONTROL key then Diaphragm click dragging from the first group to the second group Right click and choose delete to delete connections To combine several groups into one metagroup select all groups you want to combine by left clicking and drawing a selection rectangle around them and click Create Metagroup Show expert settings r Expert settings Normalisation P value correction BH Multiple testing strategy nestedF Write out normalized raw data Preview R script Log fold change min 1 p value cutoff Reset design Create Metagroup Delete Metagroup Previous wb Next Step 4 orf 4 running Computing differential expression RobiNA The transcriptomics dat
123. pe RAE230A Genome version rn5 Annotation File RAE230A na34 annot csv Summary 1 Total number of genes 15866 2 1043 genes are differentially expressed 3 Heart vs Diaphragm a 605 genes are up regulated b 438 genes are down regulated Algorithm Options 1 One Way Between Subject ANOVA Unpaired 1 Fold Change linear 2 or Fold Change linear 2 2 ANOVA p value Condition pair 0 05 Conditions Heart 6 1 GSM160089 rma chp 2 GSM160090 rma chp 3 GSM160091 rma chp 4 GSM160092 rma chp 5 GSM160093 rma chp 6 GSM160094 rma chp Diaphragm 5 1 GSM160095 rma chp 2 GSM160096 rma chp 3 GSM160098 rma chp 4 GSM160099 rma chp 5 GSM160100 rma chp Summary GSE6943 CAT RMA Heart vs Diaphragm Analysis Type Gene Level Differential Expression Analysis Array Type RAE230A Genome Version rn5 Annotation File RAE230A na34 annot csv Summary 1 Total number of genes 15866 2 1043 genes are differentially expressed 3 Heart vs Diaphragm a 605 genes are up regulated b 438 genes are down regulated Algorithm Options 1 One Way Between Subject ANOVA Unpaired Default Filter Criteria 1 Fold Change linear lt 2 or Fold Change linear gt 2 2 ANOVA p value Condition pair 0 05 Conditions Heart 6 1 GSM160089 rma chp 2 GSM160090 rma chp 3 GSM160091 rma chp 4 GSM160092 rma chp 5 GSM160093 rma chp 6 G5M160094 rma chp Diaphr
124. poproteins Gn 121 1010825 Adaptive Immune System onganism specific biosystem Rattus norvegicus 0953 nnate Immune System organism sp iosys norvegicus 118 1010853 System i ecific bi tem Rattus r s mi aa 111 1010881 Hemostasis onganism specific biosystem Rattus norvegicus 4 4 1222 P qe Per Page 10 Displaying BioSystems Records 1 10 of 2215 Description This table shows links from 3450 gene records to 2215 biosystems records The link used was gene_biosystems which is described as BioSystems that contain the specified gene s The association between the biosystems and genes was made using the method described in BioSystems data processing A one to one mapping this link is available more about one to one mappings Additional details about the FLink output display are provided in the help document Column legends Selected table column descriptions Frequency The number of gene records from your input list that are linked to the BioSystems record Max Frequency The total number of gene records that are linked to the BioSystem record This represents the maximum value that can appear in the frequency column and is used to calculate the score percent coverage BSID BioSystem record identifier Source Depositor of the BioSystem Mame of the BioSystem Type The taxonomic span of the BioSystem Organism The organism containing the BioSystem Ra
125. ppb natriuretic peptide B Find related data DataSet type Expression profiling by array count 12 samples o ID 51748265 GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors Sequence neighbors Homologene neighbors Organism Rattus norvegicus Database Select n Reporter GPL341 1367616 at ID REF 5053224 25105 Gene ID 031545 4 4 Ptgds Heart left ventricle diaphragm comparison Recent activity 3 Annotation Ptgds prostaglandin D2 synthase brain Turn Off Clear Organism Rattus norvegicus PJ Heart left ventricle and diaphragm Reporter GPL341 1367851 at ID REF GDS3224 25526 Gene ID 04488 comparison GDSBrowser anim DataSet type Expression profiling by array count 12 samples ID 51748500 77777 Q GDS3224 ACCNJ AND GDSffilter 1 GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors Sequence neighbors Homologene neighbors GDSBrowser GEO DataSets for BioProject Select Car3 Heart left ventricle and diaphragm comparison 98125 2 GEO DataSets 4 Annotation Car3 carbonic anhydrase 3 FJ Normal Heart vs Normal Diaphragm Organism Rattus norvegicus BioProject Reporter GPL341 1367896_at ID_REF GDS3224 54232 Gene ID NM_019292 2 gt DataSet type Expression profiling by array count 12 samples E511 Boletus calopus ID 51748545 EX taxonomy GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors
126. pse Run quality check tools Son Box plot and MA plots Choose the quality checks you want to include in your analysis and click Next to continue If you don t want any quality checking rInfo a Edit expert options by an expert 4 PLM Fitting probe level models to the data to detect possible RNA degradation you can skip this step Z digestion Uses ordered probes in probeset Histogram Shows density plots of the signal intensity each chip Scatterplot The data will be normalised before plotting the log2 fold expression values of all possible comninations of two chips against each other M Include more PCA and hierarchical clustering NUSE and RLE Normalized unscaled standard errors and relative logarithmic expression The chosen default values will be suitable in most cases They should only be changed Normalisation rma 2 o p value correction BH M Include more Include Include more M Include more Plot components PC1 amp 2 2 Include more include more Select all V Expert settings Analysis strategy Linear models package limma A Previous 9 nex Step 2 of 4 Quality check results Click in the list to open a fullsize view of the results Chips showing very poor PLM results may be excluded from further analyses by checking the Excl
127. r trends in both sample groups Groups 1 and 3 being highly contrasted between Diaphragm and Heart while 2 and 4 are more subtle DE groups 34 GDS3224 Heart left ventricle and diaphragm comparison Rattus norvegicus Clustering Euclidean K means Colors SW Full image 10425 x 12 spots features 4158 features z279 features 5761 features Clicking on the first group to get more details for genes higher in the Diaphragm Gps3224 Heart left ventricle and diaphragm comparison Rattus norvegicus Clustering Euclidean K means Colors Full image 727 x 12 spots Reset Expression level ms y High Low Absent tissue E 2 8 5 B 8 8 Gene list searchable RGD1309821 Smarcd3 17406350 45012 nfi 31276990 L0C294154 oli4al 11 Fbln2 Myh10 Mnat1 Kpna3 BF388415 GSM160083 GSM160090 GSM160091 GSM160092 GSM160093 GSM160094 GSM16009 GSM16010 GSM1600 GSM1600 GSM1600 GSM1600 Dcun1d2 Eif4b 14799471 21442 Hspala dadc1 Dnajc2 Mknk2 pt Hmgn3 BE113371 sp15 Similarly for genes higher in the Heart 35 Gps3224 Heart left ventricle and diaphragm comparison Rattus norvegicus Clustering Euclidean K means Colors iii Full image 278 x 12 spots Reset Expression level O a tar High Low Absent tissue e e 9 7 Finally zooming a small number of gen
128. r unless you want to overwrite the results of the previous analysis run Clicking Exit will close Robin If you want to view the results in MapMan please start MapMan now Robin will try to automatically transfer the analysed data set to MapMan View Data in MapMan Exit Modify Restart Reset design Create Metagroup Previous wb Next Reviewing DE results RobiNA saves a number of files to the disk in its folder structure Text tables can be found in the detailed results folder and are partly reproduced below A short summary lists the parameters applied to perform the analysis while two tables report the full results and the top 100 results after sorting lines in decreasing adj P Val order Three PNG pictures below report the Up regulated gene count DE gene count and Down regulated gene count respectively Final plots summarize key facets of the analysis as done in classical MA analysis The left plot shows in red probesets that are differentially expressed with the selected statistics the right plot confirms that the samples segregate as defined in the grouping The bottom simple venn diagrams report counts of DE probesets Total Up Down Pats 7 Collapse 48 Principal component analysis MA Plot of contrast Heart Diaphragm Diaphragn Heart u 5 o 5 u 40 20 0 20 40 PC1 81 44 A Significantly regulate
129. rameters Gene identifier column used in tests Gene symbol Annotation column used in tests Pathway The features were ranked on t statistic group comparison Heart Diaphragm Raw universal gene list size 15923 Used universal gene list size requiring annotation and one feature per gene only 994 Applied filter to reduce features to one per gene true Filtered features to one per gene by using feature with highest IQR Expression values used when filtering features Transformed expression values Minimum category size required 10 Number categories passing mimimum size requirement 45 Humber permutations for p value calculation 10000 Top 10 GSEA results for BP increased in Heart 1 ee imi oT gt Lower taisti Upper tai _2 7519 _ iskeletal muscle tissue development 18 42 1281338605 00 1 3 309 skeletal muscle contraction sa 143862 NM 0 ilis 4 6936 musdecontrartion 3 43126281 0000100 5 5977 tglycagen metabolic process aa as 00062 69988 6 6094 glucomengenesis 23 _ 18815161 0 0002 0 9998 7 6937 regulation of musclecontraction 000000000009 21287718 00002 0 8998 5 5978 iglycogen biosynthetic process 44 21 712337 0 0003 0 9997 9 s412 _ aaa aaa aaa aaa L 205 i 714 686305 0 0004 i 0 9996 Eug O15 translational 222 2 46 2 18199774 0 0004 09996 1 10 GSEA results for BP incr
130. ration After installing and starting Affymetrix Expression program the File menu to access the library download manager 39 Expression Console Save Study ZIP Study UnZIP Study Open Sample Array Attribute File Ctrl Shift O Properties Download Annotation Files Utiliti Page Setup Exit Log into the Affy support area using your credentials NetAffx Account Information Enter your Affymetrix com email address and password NetAffx Library Files Select the library files to download Mul C OviGene 1_O st v1 OviGene 1 11 C Gla Piasmodum Anopheles Poplar Porcine CI PorGene 1 1 tv RabGene 1_O st 1 RabGene 1 1atae1 C RAE230A RAE2308 RaEx 1 0 41 RaGene 1 0 4 1 RaGene 1 1 4 1 RaGene 2 RaGene 2 1 41 In the example above we highlight the rat array that was already downloaded and was used to generate the demo data the CLC Bio Workbench measurements of gene expression in tissues from cardiac left ventricle and diaphragm muscle of rats Lunteren et al 2008 The Affymetrix Expression Console will download the CDF file to the folder that you have specified in the library path typically the folder where your CEL files are stored The file shown here as example is for Arabidopsis Se P was T 1
131. rchical clustering Hierarchical clustering and heatmaps are classical representations of differential expression that nicely provide high level view on the data Both are done for you by the browser and return images as well as the corresponding tables for further exploration and without one line of code 32 Data Analysis Tools Find genes Hierarchical Distance Pearson Correlation Compare 2 sets of samples M 7 Linkage Complete Cluster heatmaps Display x Experiment design and value distribution Partitional K means K medians By location on chromosome A number of interactive links allow changing the correlation method changing colors and or selecting heatmap regions to plot other graphs or export data to file NCBI gt GEO gt GDS Browser gt GDS Analysis Selected profiles Plot values GD53224 Heart left ventricle and diaphragm comparison Rattus norvegicus Clustering Correlation Complete Linkage Full image 10425 x 12 spots 33 GDS3224 Heart left ventricle and diaphragm comparison Rattus norvegicus Clustering Correlation Complete Linkage Colors ii Full image 10425 x 12 spots Reset Expression level Si High Low Absent tissue Correlation Gene list GSM160089 GSM160080 GSM160093 GSM160094 GSM160092 GSM160096 GSM160097 GSM160085 GSM160088 GSM160099 GSM160100 c ao 9 KMean clustering KMean biclusters are un supervized re
132. ression profiling by array count 12 samples ID 51748105 GEO DataSets Gene UniGene Homologene neighbors O Heart left ventricle and diaphragm comparison 3 Annotation Arfi ADP ribosylation factor 1 Organism Rattus norvegicus Reporter GPL341 1367459 at ID REF 5053224 64310 Gene ID DataSet type Expression profiling by array count 12 samples ID 51748108 GEO DataSets Gene UniGene 1 Gdi2 Heart left ventricle and diaphragm comparison 4 Annotation Gdi2 GDP dissociation inhibitor 2 Organism Rattus norvegicus Reporter GPL341 1367460 at ID REF GDS3224 29662 Gene ID BM387347 DataSet type Expression profiling by array count 12 samples ID 51748109 GEO DataSets Gene UniGene Chromosome neighbors Luckily this is not all of it and right menus allow doing more with the found items NCBI Resources v How Sequence neighbors Sequence neighbors Homologene neighbors il li splaisan My NCBI Sign Out GEOProfles corone J O Advanced Display Settings Summary 20 per page Sorted by Default order Results 1 to 20 of 4409 1 04221 Next Last gt gt 1 Sumo2 Heart left ventricle and diaphragm comparison 1 Annotation Sumo2 SMT3 suppressor of mif two 3 homolog 2 S cerevisiae Organism Rattus norvegicus Reporter GPL341 1367452 at ID REF GDS3224 690244 Gene ID NM 133594 DataSet type Expression profiling by arra
133. ry 1 Total number of genes 15866 2 2002 genes are differentially expressed 3 Heart vs Diaphragm 915 genes are up regulated b 1087 genes are down regulated Algorithm Options 1 One Way Between Subject ANOVA Unpaired Default Filter Criteria 1 Fold Change linear lt 2 Fold Change linear gt 2 2 ANOVA p value Condition pair 0 05 Conditions Heart 6 1 GSM160089 mas5 CHP 2 GSM160090 mas5 CHP 3 GSM160091 mas5 CHP 4 GSM160092 mas5 CHP 5 GSM160093 mas5 CHP 6 GSM160094 mas5 CHP Diaphragm 5 1 GSM160095 mas5 CHP 2 GSM160096 mas5 CHP 3 GSM160098 mas5 CHP 4 GSM160099 mas5 CHP 5 GSM160100 mas5 CHP Condition Heart File GSM160093 10 1388044 at Pfkfb2 Signalz 1 20 78 results The tabular results can finally be exported to local file s for further use IPA eoo Windows 8 1 gg Q E b En 4 affymetrix Analysis 1 tac RAE230A Analysis Result Summary Table Scatter Plot Volcano Plot Chromosome Summary Comparison Heart vs Diaphragm Search Prev Next Show Hide Columns Show Filtered Only Clear Current Filter s Reset to Default Customize Annotations View Interaction Network Bi D Transcript ID 2 21 peo Fold Change ANOVAp FDR p value weight Bi weight i ja linear Heart value Heart vs Heart vs Avg Avg Signal Design 5
134. s AND rat Organism Get information about GSE6943 used in this session We will stick to the simple NCBI GEO search here and look for one particular experiment GSE6943 used by CLC to build their tutorial This work was published by van Lunteren E Spiegler 5 Moyer 5 The sequencing was performed on tumor cultures from 4 patients at 2 time points over 3 conditions DPN OHT and control One control sample was omitted by the paper authors due to low qualityl l Full details about this dataset can be found at http www ncbi nlm nih gov geo query acc cgi acc GSE6943 and is reproduced in the next figure A lot of important information can be found on this page including the chip used for the experiment the number of replicates as well as metadata information about the experimental setup Series GSE6943 Status Title Organism Experiment type Summary Overall design Contributor s Citation s Submission date Last update date Contact name E mail Phone Organization name Department Street address City State province ZIP Postal code Country Platforms 1 Samples 12 More Relations BioProject Query DataSets for GSE6943 Public on Jan 24 2008 Normal Heart vs Normal Diaphragm Rattus norvegicus Expression profiling by array Comparison of gene expression of heart left vent and diaphragm of normal Sprague Dawley rats young adult Keywords Cell type comparison 6 diaphragm samples 6 hear
135. s and gene expression patterns stored in GEO For more information about various aspects of GEO please see our documentation http www ncbi nlIm nih gov geo info listings and publications http www ncbi nlm nih gov pmc 3013736 2686538 2270403 1669752 1619900 1619899 539976 99122 gt gt If you are new to GEO have a look to their handout http www ncbi nlm nih gov geo info GEOHandoutFinal pdf first Find GEO datasets relevant for your Biological question You may use the NCBI search page in a very basic way by entering your gene of interest to look for related knock out experiments or by searching for a compound or disease name that is relevant to your research Note that this will sometimes find too many datasets or miss the true one you dream of Other better ways of finding if GEO data does exist The current NCBI search page and its advanced counterpart also allows restricting your queries in smart ways and reach the goal of finding the best suited data in the repository A related How To page be found at Find GEO datasets Please read this page and discover the advantages of adding filters to your queries Fora good resource about how to build top notch queries in the GEO advanced search page http www ncbi nlm nih gov gds advanced look at the NCBI tutorial with examples of good syntax to recycle and copy 41 As example search experiments in rat with more than 100 samples with 100 500 Number of Sample
136. s here Expand References 1 f http cran r project org 2 http www rstudio com Main Page PubMA_Exercise 1 PubMA_Exercise 2 Exercise 3 Retrieved from http stelap local BioWareWIKI index php title PubMA_Exercise 2b amp oldid 1 1792 Category PUBMA2014 This page was last modified on 20 October 2014 at 09 00 This page has been accessed 54 times Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 23 24 PubMA Exercise 3 From BioWare WIKI Clustering using the GEO Dataset browser only for data with attached GDS ID DATASET BROWSER Main Page Hands on Analysis of public microarray datasets PUubMA_Exercise 2 PubMA_Exercise 3 PubMA_ Exercise 4 Contents Introduction 2 GEO Dataset browser 2 1 Start the tool 2 2 Select GEO Dataset Browser data 2 3 Download the full data table 3 Walking through the tools m 3 1 Find Genes 3 2 Compare two sets of samples 3 2 1 Two tails comparison 3 2 1 1 Profile data w 3 2 1 2 Profile pathways m 3 2 2 Rank mean difference 4 amp either 3 2 2 1 Search pathways enriched in the obtained subset 3 3 Cluster heatmaps m 3 3 1 Hierarchical clustering 3 3 2 KMean clustering 3 4 Experiment design and value distribution 4 download exercise files Introduction The GEO Dataset Browser takes care of providing users clustered data and he
137. s shows the 250 genes with the lowest p values regardless of the significance of their p values Sometimes 250 is not enough and you still miss DE genes as is the case in this example sometimes 250 is way too much and only a small fraction of these 250 genes is really DE So always check the adjusted p values to decide how many genes of these 250 you are going to use for further analysis P Value raw p value before multiple testing correction t t statistic of the shrunken t test B B statistic or log odds that the gene is differentially expressed logFC Log2 fold change between the two experimental conditions This table contains links through which detailed expression information can be retrieved for interesting genes not further detailed here 6 Clicking on Save all results will open a new window with the full table that can be saved to disk as a tab separated text file using the browser File Save option 15 www ncbi nlm nih gov geo geo2r backend geo2r cgi ctg time 14078 ID adj P Val P Value E B logFC Gene symbol Gene title 1388876 at 7 67e 12 7 81e 16 5 91e 01 25 28866 7 14 ais 1374248 at 7 67e 12 9 7 16 5 79e401 25 14116 1 15e 01 Mybpcl myosin binding protein C slow type 1374622 at 7 67 12 1 44 15 5 60e401 24 87971 4 69 1370033 at 3 47e 11 B 7le 15 4 80e 01 23 59223 7 79 Myli myosin light chain 1 1371339 at 4 62e 11 1 45
138. san Desktop Robi ults GSE6943 CEL GSM160092 CEL Scatter plot of file shome splaisan Desktop Robi ults GSE6943 CEL GSM160089 CEL vs home splaisan Desktop Robi ults GSE6943 CELYGSM160093 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM1600893 CEL vs fhome splaisan Desktop Robi ults GSE6943_CEL GSM160094 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM1600893 CEL vs home splaisan Desktop Robi ults GSE6943 CEL GSM160095 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160083 CEL vs home splaisan Desktop Robi ults GSE6943_CELYGSM160096 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160083 CEL vs Ihomejsplaisan Desktop Robi ults GSE6943 CEL GSM180098 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160083 CEL vs Quality check results Click in the list to open a fullsize view of the results Chips showing very poor PLM results may be excluded from further analyses by checking the Exclude box N RODINA e transcriptorm data preprocessor ver n of file homelsplaisan Desktop Robi ults GSEG943 CEL GSM160095 CEL vs thome splaisan Desktop Robi ults GSE6943 CEL GSM160100 CEL Scatter plot of file home splaisan Desktop Robi ults GSEB943 CEL GSM160096 CEL vs thome splaisan Desktop Robi
139. sease Drugs Cell Types Misc 65 6943 DE Robina 238 238 genes lt ChEA TRANSFAC and JASPAR PWMs Genome Browser PWMs Histone Modifications ChIP seq microRNA Table Grid Network 42 Click the bars to sort Now sorted by combined score An example of network view for the enriched Transfac amp Jaspar TF motifs TRANSFAC and PWMs GSEB943 DE Robina 238 human cBEPA human e m fh ZNF148 human human RELB human SRF human Webgestalt This second tool largely overlaps in data sources with Enrich although its tabular reporting format makes it a little less attractive to my eyes WebGestalt accepts many ID types and allows selecting the exact background based on the array which is a plus as compared to Enrich and puts it even with DAVID in that respect WebGestalt P is WEB based GEne SeT AnaLysis Toolkit It is designed for functional genomic proteomic and large scale genetic studies from which large number of gene lists e g differentially expressed gene sets co expressed gene sets etc are continuously generated WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of gene lists Please read the full manual http bioinfo vanderbilt edu webgestalt WebGestalt manual 2013 04 12 pdf for a good introduction to this tool The probe list obta
140. software products only jsp productIdz131414 amp categoryId235623 amp productName Affymetrix 2526 2523174 253B Expression Console926252696252315390253B Software 1 1 2 http www affymetrix com support learning training tutorials tac ec index affx l 2 Main Page Hands on Analysis of public microarray datasets Retrieved from http stelap local BioWareWIKT index php titlezAnalyze GEO data with the Affymetrix software amp oldid 11773 Category Howto This page was last modified on 19 October 2014 at 20 32 This page has been accessed 164 times Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 80 Normalize CEL files with RMAExpress From BioWare WIKI Convert CEL files to normalized text tables RMAExpress Main Page Hands on Analysis of public microarray datasets Contents Introduction 2 run with the GSE6943 data m 2 1 Convert CEL data 2 2 Plot normalized data m 2 Convert data with the Convertor tool m 2 4 Final result m 3 download exercise files Introduction Many programs like CLC require normalized microarray data as input and do not support the CEL format RMA_eXpress and its companion convertor tool rapidely produce normalized and log transformed data from a collection of CEL files and the corresponding CDF annotation database and without the need to install R and bioconductor packages RMA express was cited
141. ss RT 6 1 0E 3 4 1E 2 GOTERM BP FAT glucan metabolic process RT E 1 0E 3 4 1 2 m GOTERM_BP_FAT glycogen metabolic process RT 6 1 0E 3 4 1E 2 m GOTERM_BP_FAT energy reserve metabolic process RT 6 2 1 3 7 1E 2 GOTERM BP FAT cellular polysaccharide metabolic process RT 6 2 3E 3 7 9 2 GOTERM BP FAT polysaccharide metabolic process RT m 6 1 5E 2 2 9 1 A typical DAVID plot for the top cluster 54 myosin light polypeptide 1 myosin heavy polypeptide 2 skeletal muscle adult myosin heavy polypeptide 1 skeletal muscle adult leucine rich repeat containing 10 myosin light chain phosphoryatable fast skeletal muscle myosin heavy chain 4 skeletal muscle myosin heavy chain 6 cardiac muscle alpha synemin intermediate filament protein troponin type 1 skeletal slow troponin T type 1 skeletal slow troponin C type 2 fast troponin type 3 cardiac myosin binding protein C cardiac troponin T type 3 skeletal fast troponin 2 skeletal fast troponin T type 2 cardiac myosin light chain 7 regulatory ankyrin repeat domain 1 cardiac muscle calsequestrin 1 fast twitch skeletal muscle ATPase Ca transporting cardiac muscle fast twitch 1 ankyrin repeat domain 23 actin alpha cardiac muscle 1 smooth muscle alpha actin titin ryanodine receptor 2 cardiac nebulin junctophilin 1 keratin 19 PDZ and LIM domain 3 cysteine and glycine rich protein 3 four and a half LIM
142. st click Recalculate on the GEO2R tab to apply the edits When satisfied go to the GEO2R tab and click the Top250 button to run a limma analysis for identifying DE genes STU When more than two grous are defined GEO2R selects pairwise contrasts triangular circular way depending the number of groups These contrasts are labelled with arbitrary names GO G1 Gn and do not always reflect the user expectation but there is unfortunately little to be done in GEO2R to control this choice BUT more can be done when post processing the code in RStudio as will be shown in the dedicated tutorial 14 GEO2R Value distribution k Quick start Recalculate if you changed any options ID P Value t B logFC Gene symbol Gene title k 1388876 at 7 87 12 1 16 59 1 25 3 7 14 k 1374248 at T 678 12 9 782 16 57 9 25 1 11 48 Mybpc1 myosin binding p k 1374622 at 67 12 1 448 15 56 24 9 4 69 k 1370033 at 3 476 11 8 71 15 48 23 6 7 78 myosin light chai k 1371338 at 4 62e 11 1 45 14 46 23 2 8 25 Tnni1 troponin type 1 k 1373697 at 7 192 11 2 712 14 43 6 22 1 7 13 Mybpc2 myosin binding p k 1398306 at 9 892 11 4 35e 14 41 8 22 3 7 06 Ampd1 adenosine mono k 1374672 at 1 14e 10 5 70e 14 40 8 22 1 8 82 Tnni3k interactin k 1367962 at 1 44 10 B 13e 14 39 6 21 8 11 53 Actn3 actinin alpha 3 k 1367964 at 1 44 10 9 53 14 39 1 21 7 8 57 Tnni2 troponin type 2
143. st size 15923 Used universal gene list size requiring annotation and one feature per gene only 8603 Applied filter to reduce features to one per gene true Filtered features to one per gene by using feature with highest IQR Expression values used when filtering features Transformed expression values Minimum category size required 10 Number categories passing mimimum size requirement 1451 Humber permutations for p value calculation 10000 m settings for the GSEA test for GO MF Gene set enrichment analysis Wed Aug 13 14 51 03 CEST 2014 Version CLC Main Workbench 7 0 3 User splaisan Parameters Gene identifier column used in tests Gene symbol Annotation column used in tests GO molecular function The features were ranked on t statistic group comparison Heart Diaphragm Raw universal gene list size 15923 Used universal gene list size requiring annotation and one feature per gene only 8969 Applied filter to reduce features to one per gene true Filtered features to one per gene by using feature with highest IQR Expression values used when filtering features Transformed expression values Minimum category size required 10 Number categories passing mimimum size requirement 503 Humber permutations for p value calculation 10000 m settings for the GSEA test for Pathways 67 Gene set enrichment analysis Wed Aug 13 15 10 25 CEST 2014 Version CLC Main Workbench 7 0 3 User splaisan Pa
144. stalt User data and parameters User data textAreaUpload txt Organism rnorvegicus Id Type affy rae230a Ref Set affy rae230a Significance Level Top10 Statistics Test Hypergeometric MTC BH Minimum 2 This table lists the enriched KEGG pathways number of Entrez IDs in your user data set for the pathway the corresponding Entrez IDs and the statistics for the enriched pathway The statistic column lists C the number of reference genes in the category O the number of genes in the gene set and also in the category E the expected number in the category R ratio of enrichment rawP p value from hypergeometric test adjP p value adjusted by the multiple test adjustment Finally the pathway name is linked to where the user ids are highlighted the number of user gene ids is linked to a table with information about the user ids and the Entrez IDs are linked to Entrez Gene PathwayName Gene EntrezGene Statistics I 682930 24837 29658 29275 81636 29248 295929 29556 25399 117557 24239 C 70 0 15 E 1 54 R 9 72 rawP 2 19e Dosen aay 64532 689560 64672 116600 il adjP 1 58e 09 Arrhythmogenic right ventricular 682930 29658 171009 83501 306871 287925 25399 24239 689560 24392 307505 C 58 O 12 E 1 28 R 9 38 rawP 3 59e cardiomyopathy ARVC 116600 09 adjP 1 29e 07 f f 682930 24837 29658 29275 29248 295929 29556 25399 117557 24239 689560 C 66 O 12 E 1 46 R 8 25 rawP 1 70e Hypertrophic cardiomyopathy HCM 12 116600 08 adjP 4 08
145. statistics for the enriched pathway The statistic column lists the number of reference genes in the category the number of genes in the gene set and also in the category E the expected number in the category R ratio of enrichment rawP p value from hypergeometric test adjP p value adjusted by the multiple test adjustment Finally gene set is linked to the wikipathway graph which is generated dynamically from bioinfo vanderbilt edu wg gsat the number of user gene ids is linked to a table with information about the user ids and the Entrez IDs are linked to Entrez Gene PathwayName Gene EntrezGene Statistics 311029 171009 29388 24838 29275 295929 29556 296369 362867 117557 56781 24837 29389 29248 C 35 0 16 E 0 77 R 20 74 rawP 5 13e Striated Muscle Contraction 16 292879 171409 18 adjP 1 95e 16 132 0 12 2 91 4 12 rawP 3 34e Calcium Regulation in the Cardiac Cell 12 85420 60449 64532 24392 64672 686019 682930 81636 24239 689560 24245 24173 aa Relaxation and Contraction 10 29275 85420 60449 64532 24392 117558 81636 58965 689560 24245 C 134 0 10 E 2 95 R 3 38 rawP 0 0007 adjP 0 0089 Glycogen Metabolism 4 64561 29353 24645 25739 C 32 0 4 E 0 71 R 5 67 rawP 0 0051 adjP 0 0485 regulated genes with circadian 4 24253 305234 498642 25714 C 43 O 4 E 0 95 R 4 22 rawP 0 0145 adjP 0 1102 Glucuronidation 2 24645 25058 C 10 O 2 E 0 22 R 9 07 rawP 0 0194 adjP 0 1229 Integrin mediated cell adhesion 5 60352 3
146. sults showing groups of genes and samples that behave in a very similar way This method offers the advantage of not being biased by knowledge and only directed by the data itself ST 7 The trade off is that running Kmean over and over again will always return results and not necessarily the same results This seems counterintuitive for the average a Biologist who is trained towards reproducibility but is a reality for the analyst you will not get the same results as the ones below BUT you will find back mostly the same genes in similar clusters KMean clusters can identify core processes represented in sample groups by a small set of genes whose co regulation is very clear The user decides arbitrarily of the number of clusters to be found Too few will lead to mixed profiles while too many will lead to apparent redundancy Swan ete Set Chatter z O pub cations FAQ MIAME Email GEO NCBI gt gt gt gt GDS Browser gt GDS Analysis 12 K means K medians clustering divide genes into k partitions The best solution in 3 trials is reported Color Options Clustering Options Low expression level Bue _ 2 K method Mean Clusters k 2 15 4 Display s 60532204 Heart left ventricle and diaphragm comparison Rattus norvegicus NLM NIH GEO Help NCBI Help Disclaimer Accessibility From the obtained solution it is clear that the four clusters are grouping genes having clea
147. t grep GPL341 attr gset names else idx lt 1 gset lt gset idx make proper column names to match toptable fvarLabels gset lt make names fvarLabels gset group names for all samples aml lt ci en i nap ep g ap neo nal gj nl 21 131 21 log2 transform ex lt exprs gset qx lt as numeric quantile ex c 0 0 25 0 5 0 75 0 99 1 0 LogC lt qx 5 gt 100 qx 6 qx 1 gt 50 54 qx 2 gt 0 qx 2 gt 0 amp amp qx 2 lt 1 65 qx 4 gt 1 4 2 if Logt 1 ex which ex lt 0 lt NaN exprs gset lt log2 ex download exercise files Download exercise files here Expand References 1 f https www ncbi nlm nih gov geo info geo2r html 16 227 Erik van Lunteren Sarah Spiegler Michelle Moyer Contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism Respir Physiol Neurobiol 2008 161 1 41 53 PubMed 18207466 WORLDCAT DOI P p http www ncbi nlm nih gov geo query acc cgi acc GSE6943 Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 1 PubMA_Exercise 2 PubMA_Exercise 2b PubMA_Exercise 3 Retrieved from http stelap local BioWareWIKTU index php titleZPubMA Exercise 2 amp oldid 1 1740 Category PUBMA2014 This page was last modified on 15 Octob
148. t Filter s Reset to Default Customize Annotations View Interaction Network Diaphragm Bi weight Avg Signal log2 1367962_at Rn 17592 1 3 e actinin 374248_at Rn 91 2 3 76 13 41E myosin parvalbt Transcript ID Array Fold Change ANOVA p FDR p value linear Heart value Heart vs Heart vs vs Diaphragm Diaphragm Diaphragm Gene Symbol Transcript Cluster ID troponir carboni sarcolip myosin myosin myosin troponir gt This might take long time to perform hierarchical clustering Please Wait SERIE Click to Perform Hierarchical Clustering Analysis AES PoE See roponir aquapol ATPase myosin Rn 4012 Rn 38647 synapto BMS te troponir calpain calcium MP r myosin adenosi amylase myoger nebulin 374049_at Rn 24381 1 9 48 2 93 9 2 36E 07 0 0000 OC100 smooth Gene Rows 2002 Selected Rows 1 Selected 3 eoo Windows 8 1 9 5 L En 4 XE affymetrix Analysis_1 tac RAE230A Analysis Result Summary Table Scatter Plot Volcano Plot romosome Summary Heart vs Diaphragm Analysis Type Gene Level Differential Expression Analysis Array Type RAE230A Genome Version rn5 Annotation File RAE230A na34 annot csv Summa
149. t samples van Lunteren E Spiegler 5 Moyer M van Lunteren E Spiegler S Moyer M Contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism Respir Physiol Neurobiol 2008 Mar 20 161 1 41 53 PMID 18207466 Feb 02 2007 Jun 21 2012 Erik van Lunteren exv4 cwru edu 216 791 3800 Cleveland VA Medical Center Pulmonary 111J W 10701 East Boulevard Cleveland OH 44106 USA GPL341 RAE230A Affymetrix Rat Expression 230A Array GSM160089 Diaphragm 1 GSM160090 Diaphragm 2 GSM160091 Diaphragm 3 125 gt Clusters amp heatmaps Download family SOFT formatted family file s MINIML formatted family file s Series Matrix File s Format sort MINIML TXT l2 Supplementary file Size Download File type resource GSE6943 RAW tar Raw data provided as supplementary file 16 8 Mb http custom TAR of CEL Processed data included within Sample table STi i Note the two red boxes that highlight links to two additional resources at GEO lt The first link directs the user to the PRJNA98125 http www ncbi nlm nih gov bioproject PRJNA98125 BioProject page related to this submission and including extra links to the corresponding DataSet Note that only those submissions pre procesed by the GEO team are linked to a DataSet read http www ncbi nlm nih gov geo info datasets html DataSets form the basis of GEO s advanced da
150. t ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism Respir Physio Neurobiol 2008 Mar Contains DataSet information 2008 01 24 experiment variable subsets and expression value measurements plain text tab delimited format DataSet full SOFT file Cluster Analysis Download DataSet SOFT file aries family SOFT file aries family MINIML file notation SOFT file download the file from the DataSet SOFT file link in the main window wget ftp ftp ncbi nlm nih gov geo datasets GDS3nnn GDS3224 soft GDS3224 soft gz T decompress and replace line ends bby valid unix CR gunzip c GDS3224 soft gz tr Wr Wn gt GDS3224 soft txt vf find table 11 1 grep n dataset table begin GDS3224 soft txt 49 dataset_table begin we need only the number 49 header end grep n dataset table begin GDS3224 soft txt awk BEGIN FS print 1 v show the result echo header_end 49 7 split the file in two S header end GDS3224 soft txt gt GDS3224 data header txt cat GDS3224 soft txt sed e 1 header_end d gt GDS3224 data txt V inspect results igrep GSM GDS3224 data header txt V GSM160089 Value for GSM160089 Diaphragm 1 src Diaphragm VGSM160090 Value for GSM160090 Diaphragm 2 src Diaphragm VGSM160091 Value for GSM160091 Diaphragm 3 src Diaphragm VGSM160092 Value for GSM160092 Diaphragm 4 src Di
151. ta display and analysis tools including gene expression profile charts and clusters detailed in PubMA_Exercise 3 The second link Analyze with GEO2R http www ncbi nlm nih gov geo geo2r acc GSE6943 opens a new window with the GEO2R submission form further detailed in the next PubMA Exercise 2 download exercise files Download exercise files here Expand References http www ncbi nlm nih gov geo info faq html What http www ncbi nlm nih gov geo info GEOHandoutFinal pdf http www ncbi nlm nih gov geo info qqtutorial html http www ncbi nlm nih gov geo info qqtutorial html fields 1 amp O Erik van Lunteren Sarah Spiegler Michelle Moyer Contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism Respir Physiol Neurobiol 2008 161 1 41 53 PubMed 18207466 WORLDCAT DOI P p http www ncbi nlm nih gov geo query acc cgi acc24 GS E6943 6 http www bioconductor org packages release data experiment html parathyroidSE html Main Page Hands on Analysis of public microarray datasets PubMA_Exercise 2 Retrieved from http stelap local BioWareWIKI index php titleZPubMA Exercise 1 amp oldid 1 1745 Category PUBMA2014 This page was last modified on 16 October 2014 at 08 49 This page has been accessed 133 times Content is available under Creative Commons Attribution Non Commercial S
152. tijn Meganck Cosmin Lazar David Steenhoff Alain Coletta Colin Molter Robin Duque Virginie de Schaetzen David Y Weiss Sol s Hugues Bersini Ann Now Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R Bioconductor packages BMC Bioinformatics 2012 13 335 PubMed 23259851 4WORLDCAT DOI I e Alain Coletta Colin Molter Robin Duqu David Steenhoff Jonatan Taminau Virginie de Schaetzen Stijn Meganck Cosmin Lazar David Venet Vincent Detours Ann Now Hugues Bersini David Y Weiss Sol s InSilico DB genomic datasets hub an efficient starting point for analyzing genome wide studies in GenePattern Integrative Genomics Viewer and R Bioconductor Genome Biol 2012 13 11 R104 PubMed 23158523 WORLDCATZ DOI I e Cosmin Lazar Stijn Meganck Jonatan Taminau David Steenhoff Alain Coletta Colin Molter David Y Weiss Sol s Robin Duque Hugues Bersini Ann Now Batch effect removal methods for microarray gene expression data integration a survey Brief Bioinformatics 2013 14 4 469 90 PubMed 22851511 4WORLDCAT DOI I p Jonatan David Steenhoff Alain Coletta Stijn Meganck Cosmin Lazar Virginie de Schaetzen Robin Duque Colin Molter Hugues Bersini Ann Now David Y Weiss Sol s inSilicoDb an R Bioconductor package for accessing human Affymetrix expert curated datasets from GEO Bioinformatics 2011 27 22 3204 5 PubMed 21937664
153. troponin T type 3 skeletal fast chr1 222573326 2225784 _ 9 1367962_at Rn 17592 1 13 23 46 0 18 0 26 395 93 2 09E 13 2 28E 10 Actn3 actinin alpha 3 chri 227051529 2270674 10 1370033 at Rn 40120 1 13 61 5 01 0 11 0 33 387 67 4 58E 13 4 03E 10 Myl1 myosin light chain 1 chr9 73364156 73369886 11 1388139 at Rn 10092 1 12 6 4 03 0 19 0 2 379 82 9 19E 14 1 33E 10 Myh2 myosin heavy chain 2 skeletal muscle adult chr10 53487290 53491320 16 1387787_at Rn 6534 1 13 94 5 38 0 09 0 16 377 88 2 22E 15 1 17E 11 Mylpf myosin light chain phosphorylatable fast skeletal muscle chri 205649549 2056523 _17 1368108 at Rn 10833 1 13 69 5 32 0 14 0 35 330 79 1 20E 12 8 63E 10 2 1 ATPase Ca transporting cardiac muscle fast twitch 1 chri 204836392 204854 18 1367896 at Rn 1647 1 12 04 3 83 0 17 0 09 296 23 6 88E 15 1 86 11 Car3 carbonic anhydrase 3 chr2 SS 19 1370412_at Rn 13846 1 12 26 4 3 0 1 0 12 249 05 7 77E 16 6 17E 12 Tnnti troponin T type 1 skeletal slow chri 75652128 75660589 21 1367964_at Rn 9924 1 13 93 5 98 0 11 0 47 247 62 1 42 11 4 75 09 Tnni2 troponin type 2 skeletal fast chri 222505097 2225070 22 1374248 at Rn 9153 1 12 67 4 76 0 15 0 12 240 41 7 77E 15 1 86E 11 Mybpci myosin binding protein C slow type chr7 29192152 29199672 26 1370214 at Rn 2005 1 12 27 4 66 0 22 0 14 194 36 1 97E 13 2 28E 10 Pvalb parvalbumin chr7 119420291 1194352 30 1373697_at Rn 27586 1 12 41 4 99 0 15 0 38 170 81 8 29E 12 3 46E 09 Mybpc2 myosin binding pr
154. ude box BOXPLOT Plot of 11 Affymetrix data files MAPLOT Plot of file home splaisan Desktop RobiNA results GSE6943 CELIGSM160089 CEL MAPLOT Plot of file home splaisan Desktop RobiNA results GSE6943 CEL GSM1860090 CEL 0 Exclude MAPLOT Plot of file home splaisan Desktop RobiNA results GSE6943 CEL GSM1860091 CEL MAPLOT Plot of file home splaisan Desktop RobiNA results GSE6943 CEL GSM160092 CEL 0 Exclude MAPLOT Plot of file home splaisan Desktop RobiNA results GSE6943 CEL GSM180093 CEL MAPLOT Plot of file home splaisan Desktop RobiNA results GSE6943 CEL GSM180094 CEL C Exclude MAPLOT Plot of file home splaisan Desktop RobiMA results GSE6943 CEL GSM160095 CEL 0 Exclude MAPLOT Plot of file home splaisan Desktop RobiMA results GSE6943 CEL GSM160096 CEL C Exclude Previous gt Next Step 3 of 4 42 Manual ZA Quality check results Click in the list to open a fullsize view of the results Chips showing very poor PLM results may be excluded from further analyses by checking the Exclude box RNA Plot of 11 Affymetrix data files HIST Plot of 11 Affymetrix data files Scatter plot of file Jhome splaisan Desktop Robi ults GSE6943_CEL GSM160089 CEL vs home splaisan Desktop Robi ults GSE6943_CEL GSM160090 CEL Scatter plot of file home
155. ults GSE6943 CEL GSM160098 CEL Scatter plot of file home splaisaniDesktop Robi ults GSE6943 CEL GSM160096 CEL vs thome splaisan Desktop Robi ults GSE6943 CEL GSM160099 CEL Scatter plot of file home splaisan Desktop Robi ults GSE6943 CEL GSM160096 CEL vs thome splaisan Desktop Robi ults GSE6943 CEL GSM1860100 CEL Scatter plot of file Jhome splaisan Desktop Robi ults GSE6943 CEL GSM180098 CEL vs home splaisan Desktop Robi ults GSE6943_CEL GSM160099 CEL Scatter plot of file Jhome splaisan Desktop Robi ults GSE6943 CEL GSM160098 CEL vs home splaisan Desktop Robi ults GSE6943_CEL GSM160100 CEL NINININININ Scatter plot of file Jhome splaisan Desktop Robi ults GSE6943 CEL GSM180099 CEL vs thome splaisan Desktop Robi ults GSE6943_CELIGSM160100 CEL PCA Plot of 11 Affymetrix data files HCLUST Plot of 11 Affymetrix data files a Previous gt Next Previous gt Next Step 3 of 4 Step 3 of 4 Quality check results Click in the list to open a fullsize view of the results Chips showing very poor PLM results may be excluded from further analyses by checking the Exclude box N RobINA ranscripto 5 version 4 bu of file homelsplaisaniDesktop Robi ults GSEG943 CEL GSM160095 CEL vs thome splaisan Desktop Robi ults GSE6943 CEL GSM160100 CEL
156. umber of Links PUBLICATIONS PubMed 1 THER DATASETS GEO DataSets 2 GEO Data Details Parameter Value Data volume Spots 191076 Data volume Processed Mbytes 4 Data volume Supplementary Mbytes 17 Select GEO Dataset Browser data The above link http www ncbi nlm nih gov bioproject Db gds amp DbFrom bioproject amp Cmd Link amp LinkName bioproject_gds amp LinkReadableName GEO 20DataSets amp ordinalpos 1 amp IdsFromResult 98125 brings you to the GDS page shown next The first reference links to GEO2R while the second is annotated with a heatmap and links to GEO Dataset Browser SES Duuo Advanced Settings gt Summary Sorted by Default order Send to Results 2 Normal Heart vs Normal Diaphragm 1 Submitter supplied Comparison of gene expression of heart left vent and diaphragm of normal Sprague Dawley rats young adult Keywords Cell type comparison Organism Rattus norvegicus Type Expression profiling by array Dataset GDS3224 Platform GPL341 12 Samples Download data GEO CEL Series Accession GSE69 PubMed Similar studies Lis Le Analyze with 2 Analysis of normal heart left ventricle and diaphragm of young adult Sprague Dawley males Concurrent rhythmic contractions of the diaphragm and heart are needed to sustain life Results provide insight into transcriptional strategies for ensuring long term energy supplies in these two musc
157. urce http bioconductor org biocLite R iIbiocLite tbiocLite affy You also need Bioconductor packages for the annotation of the spots on the arrays In the example below I use the annotation packages for the Arabidopsis ATH 1 array Of course you need the annotation packages that correspond to the arrays that you used in your experiment There are two possibilities for obtaining these packages Bioconductor has a list of annotation packages http www bioconductor org packages release data annotation generated by Affymetrix In this list find the name of the cdf file that corresponds to the array that you have used we used ATHI arrays so the cdf file is called ath1121501cdf Package Maintainer Title Codelink ADME Rat 16 Assay Bioarray mel annotation data chip 16 ag dt E Affymetrix Arabidopsis Genome Array annotation data Bioconductor agcdf Package agcdf Maintainer roi Probe sequence data for microarrays of Maintainer YP ag iro Base Level Annotation databases for p Maintainer Anopheles bidopsis dbo a Base Level Annotation databases Maintainer Arabidopsis 3th1121501 db Affymetrix Arabidopsis ATH1 Genome Array annotation data ath1121501 Bioconductor ath1121501cdf Package ath1121501cdf Maintainer ath1121501prob os Probe sequence data for microarrays of ath it 21 50 i probe aie type ath1121501
158. utorial s Please refer to the Hands on Analysis of public microarray datasets page for more information about this dataset Load the GSE6943 dataset Change focus to search for Experiment and type GSE6943 in the text field followed by a click on SEARCH Search C Gene symbol Entrez ID Probe ID HomoloGene ID C Annotation Platform Experiment Signature Tables ALL Q value max 1E Four enriched TS Transcriptome Signatures are found by the program 88 Abort Delete Save Load Send to plugins Several options are available here please explore and we choose to show the results to get details about the four hits Results 4 signatures 1 platforms 1 experiments V ALL V GPL341 GsE6943 9482 923 94879128C 9487FE8A0 94BB8DCA2 Load Send plugins Create group Find Transcriptome Signatures Selecting each of the four hits and displaying the detailed statistical results on the top right window first TS 662 genes and Glycolysis as main aspect Results 4 signatures 1 platforms l experiments ALL 94879128C 94B7FE8A0 94BB8DCA2 Load data Send to plugins Create group Plugins Heatmap Settings m second TS 778 genes related to mitochondrial functions Info Platform Experiment Organism Rattus norve
159. ve splicing 1 Import CHP ARR 2 Create conditions 3 Run analysis Y Gene results Exon results Splicing results visualization visualization visualization Link out to public databases such as Ensemb and NCBI to determine transcript function A summarized in the above picture we will now perform the two steps required to perform a full analysis starting from a set of CEL files obtained from the GEO repository The method can be divided into two steps as detailed below the first step converts CEL data to a format better suited for differential expression analysis using the Expression console the second step computes differential expression base don user defined sample groups and using the Transcription Analysis Console Results presented here correspond to the blue highlights in the above workflow The Affymetrix Expression Console EC The EC software allows step by step processing of the data by sequentially clicking each tool on the right hand side of the window 69 Show startup Other Configuration tools are not detailed here Converting CEL data to CHP format required for TAC Using the Study tools the CEL files downloaded from GEO are loaded in the software then normalized using a chosen method out of RMA MASS and PLIER We use RMA as this is the standard method Expression Console Si Probe Cell Intensity Data GSM160083 CEL GSM160090 CEL GSM160091 CEL GSM160092 CEL GSM160093 CEL GSM160094 C
160. ww ncbi nIm nih gov gds 3 http tagc univ mrs fr tbrowser index2 php option com_content amp task view amp id 19 amp pop 1 amp page 0 amp Itemid 23 Main_Page Hands on Analysis of public microarray datasets Retrieved from http stelap local BioWareWIKI index php title Find_Transcriptome_Signatures_with_TranscriptomeBrowser amp oldid 1 1366 Category Howto This page was last modified on 8 September 2014 at 08 28 This page has been accessed 59 times m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted 92 PubMA Exercise 7 From BioWareWIKI IPA analysis of the GEO2R DE table INGENUITY PATHWAY ANALYSIS Main_Page Hands on Analysis of public microarray datasets PubMA_Exercise 6 sr 77 The following content was obtained with the DE table generates by GEO2R Very similar resuts are expected with the results from RobiNA or from the Affy console iua E Contents Introduction 2 IPA Tutorial material 3 IPA analysis 3 1 Upload data in IPA 3 2 Start core analysis and set filter 3 3 Review the obtained core results 4 download exercise files Introduction Ingenuity Pathway Analysis IPA is strongly advised for more advanced users usage You can use IPA on any Java installed computer after asking for a personal account to mailto bits vib be and login in here https apps ingenuity com ingsso login service https 3A
161. y count 12 samples ID 51748101 GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors Sequence neighbors 1 Ube2d3 Heart left ventricle and diaphragm comparison 2 Annotation Ube2d3 ubiquitin conjugating enzyme E2D 3 Organism Rattus norvegicus Reporter GPL341 1367456 at ID REF GDS3224 81920 Gene ID 031237 DataSet type Expression profiling by array count 12 samples ID 51748105 GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors Sequence neighbors J Heart left ventricle and diaphragm comparison 3 Annotation Arf1 ADP ribosylation factor 1 Organism Rattus norvegicus Reporter GPL341 1367459 at ID REF GDS3224 64310 Gene ID DataSet type Expression profiling by array count 12 samples ID 51748108 GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors Homologene neighbors 1 Gdi2 Heart left ventricle and diaphragm comparison 4 Annotation Gdi2 GDP dissociation inhibitor 2 Organism Rattus norvegicus Reporter GPL341 1367460 at ID REF GDS3224 29662 Gene ID BM387347 DataSet type Expression profiling by array count 12 samples ID 51748109 GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors Homologene neighbors Filters and related data are detailed and left to your curiosity Profile data This lets you download differentially expressed data selected by the tool Homoloqene neiqhbors Homoloq
162. y terms Functional enrichment Analysis of the RobiNA DE data In order to continue with the most complete and easy workflow we use here the RobiNA table obtained by de novo analysis of 11 of the 12 samples one CEL file being damaged on the GEO repository Preparing probe lists for enrichment testing Web tools will require probe or gene lists to compute enrichment they will not take into account the degree of DE or the confidence in that DE both of which are left to the user to filter We can produce these two lists using Excel better would be R in few easy steps import the table in excel taking care of protecting gene symbols against interpretation column with absolute value of logFC filter on the abs logFC with a minimal cutoff of 2 four fold DE E E abs logFC v AveExpr t m P Value 7 8 86904868 9 2066062 138 180385 7 73 23 1 23 18 4 8 0070084 8 66636177 112 671244 1 2421 9 90E 18 8 5138 3 E e abs logFC 8 88483556 8 8 17319442 8 173 4 Ascending 1 Descending 7 92028193 7 920 8 612651 8 60278367 8 602 6 81311994 6 813 M 4 8 63584881 8 63 color 7 55 8 56116535 Greater Than Equal 5 9654704 5 6550758 6 3184512 Choose One 6 87976276 3 84777011 8 32377965 8 Select All 5 9095495 6 5722595 0 300258662 7 0312331 0 300362846 6 6484909 6 648 0 308624963 4 31206383 6 27

VIB`Ies Analysis of public microarray datasets

Contents

Download Pdf Manuals

Related Search

Related Contents