Home
        VIB`Ies Analysis of public microarray datasets
         Contents
1.                                                         Sort   KEGG_REACTION  Probes      13 0 PANTHER_TERM_BP  Samples      15 0 PANTHER_TERM_MF         EL WIKIPATHWAY      525    Tomm5     Timm a1           DUM                                      Keword sr n w    mn mitochondrial pat           Nutt mitochondrion O AED        mitochondrial membrane 27369       1 mitochondrial envelope BERD  Tfpi mitochondrial inner membrane ERB          organelle inner membrane BEB         organelle envelope 8  NULL  Ywhah envelope       146 27    Pla2g5 mitochondrion 6 88   6           cytoplasmicpat              1 organelle membrane ES  Aqpi 12477932     C Ll                                         Trappc4 1 52   5 _  Atic  March2  Save                Save annotation  Export heatmap data               fourth heatmap                                                                  Annotation     KEGG_PATHWAY  Search    Sort   KEGG_REACTION    v   gt   2   j  L  m      2     w  79    Probes     13 0                                                                              Samples     15 0 PANTHER_TERM_MF  WIKIPATHWAY      525  Tomm5  Timm a1  Phyh  me Keyword Q value  mani mitochondrial part            mitochondrion               mitochondrilmembrne    223E9           mitochondrial envelope BERD    mitochondrial inner membrane 2        organelle inner membrane BE     organelle envelope 08           envelope EF  Ywhah  Pla2g5 mitochondrion                                                  
2.                          wo                          S   S X XX 3 X X X    w 09    go  02 wo  02 wo  02 92 go       O                                     This plot represents the distribution of      data in all samples     Since the data is supposed to be normalized you expect comparable boxes for all samples  When box plots show large divergence  it might point to the fact that the  data in the Series Matrix file was not yet normalized  Unfortunately you cannot perform normalization in GEO2R  If the boxes are very different  then it is not    possible to compare the samples     Search for the top 250 differentially expressed transcripts    Since the boxplots show that the data has been normalized  we can now proceed with finding DE genes  top 250 being a good proxy for downstream analysis   between the two groups       Options can be set in the Options tab to handle log transformation and multiple testing correction to be applied to the data     The default Options are shown below and are the best choice for most data sets    GEO2R Value distribution Options Profile graph R script    Apply adjustment to the P values  More    Apply log transformation to the data  More    Category of Platform annotation to display on     Benjamini  amp  Hochberg  False discovery rate      Auto detect  dn   _  Benjamini  amp  Yekutieli 7  Yes Submitter supplied    Bonferroni No    NCBI generated  _  Hochberg    Holm  _  Hommel    None    If you edit Options after performing an analysis  you mu
3.                      umm          40  20 0 20 40 c             PC1                          distribution of the                across principal components               plot    e      e    GSM160092 CEL  GSM160093 CEL  GSM160094 CEL   GSM160091 CEL  GSM160089 CEL  GSM160090 CEL  GSM160095 CEL  GSM160096 CEL   GSM160100 CEL  GSM160098 CEL  GSM160099 CEL    as dist 1   cor exprs eset   method    pearson     hclust      complete      Define design    In this part  we define contrasts and start the differential analysis  Please note that complex design are possible here by defining  metagroups  next to the sample groups  eg  mouse background  stimulant         This is not the aim of this very simplistic design comparing two tissues  Please refer to the software documentation for more detailed examples           rFiles  JUsers splaisan Projects BITS TUTORIALS BITS tutorials  work  Analysis of public microarray datasets C  JUsers splaisan Projects BITS TUTORIALS BITS tutorials owork Analysis of public microarray datasets C    In this step you can arrange your   39 Add selected JUsers splaisan Projects BITS TUTORIALS BITS tutorials  work Analysis of public microarray  datasets C  data files in groups  e g  representing    h li f 1    Users splaisan Projects BITS TUTORIALS BITS_tutorials work Analysis_of_public_microarray_datasets                    Users splaisan Projects BITS TUTORIALS BITS_tutorials work Analysis_of_public_microarray_datasets      choose which groups are to be com
4.                found that some computers        operating systems        refractory to RobiNA              even with 6GB RAM may have issues running RobiNA with as few as 10 CEL files  RobiNA requires a recent version of Java JDK  you can obtain JDK from   1    http   www oracle com technetwork javal javase downloads jdk8 downloads 2133151 html    The RobiNA developers do not actively support the software right now and you will need to try things by yourself  if you have issues    The CDF annotation file    RobiNA needs a CDF file to work  CDF files are complex text tables allowing the identification of the probes and genes associated with the chip spots  such files are sometimes difficult to find  The RobinA  software was developed by a plant groups and has plant as main focus and does not include non plant annotations  For those who want to analyze other living organisms  the microarray annotation file  corresponding to the used chip          should be obtained before starting RobiNA  A place where to find Affymetrix CDF files should be the company website but it is often difficult to locate the CDF among    the long lists of available Affymetrix resources  http   www affymetrix com estore    free registration required         The easiest way to obtain the correct Affymetrix CDF file is probably by using the Affymetrix Expression Console     STU   i  the Affymetrix Expression Console software is only available for Windows and downloading using it requires a free user regist
5.            6856     Alpk2 cytoplasmic part 709E 6      eat organelle membrane ERS   m 12477932 SERS  mapeia ES E    Atic  March2  Save                Save annotation  Export heatmap data                     Other buttons and tabs allow inspecting details of any particular view  Genes can be sorted alpha or by clustering order and a given gene can be searched using the search box    Conclusion    And there is even more for Geeks    The data mining package  RTools4TB  can perform calls to TranscriptomeBrowser web service and implements the DBF MCL algorithm  The R package RTools4TB   http   www bioconductor org packages 2 5 bioc html RTools4TB html  is now part of the Bioconductor project     References     1  1      http   tagc univ mrs fr tbrowser     Cyrille Lepoivre  Aur  lie Bergon  Fabrice Lopez  Narayanan B Perumal  Catherine Nguyen  Jean Imbert  Denis Puthier   TranscriptomeBrowser 3 0  introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks   BMC Bioinformatics  2012  13 19    PubMed 22292669    WORLDCAT    DOT   I e     Fabrice Lopez  Julien Textoris  Aur  lie Bergon  Gilles Didier  Elisabeth Remy  Samuel Granjeaud  Jean Imbert  Catherine Nguyen  Denis Puthier  TranscriptomeBrowser  a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database   PLoS ONE  2008  3 12  e4001    PubMed 19104654    WORLDCAT    DOT   I p        2    http   w
6.            8   178           386 07        synapto  Tasa              OUS          ST 175 10  18IEO7 Tomi     troponi  calpain   11398251          Rn97431 592 268 15632  674E 08  0000005 Camk2b      calcium  11374659       7511   911 19 14361   70809 0 000001 Arpp2l ___              11373697 at ____      275861   1413 70 13949  227E 12      720E 09Mybpc2  myosin   1398306               1308 67 0     281   11  _ 300E 08 Ampd    adenosi    1370359 at Rn 67070 2 8 72 1 97 107 49 0 000086  0 000973        1   amylase  1398655_at       94931   920 2 62  95 46 2 66E 07    0 000013           _  myoger  1381575 at  Rn 15517 1   1190 5 36  93 45 1 99E 07    0 000011          nebulin  1374049 at      Rn243811   948  2 93  93 39 2 36   07  0 000012 10  10035    smooth   TEREA see bu qkuna iu iu  om ON cu          Transcript Transcript Cluster T  nsctipt ID                                    01   eoueayiuBis                         Gene Rows  2002 Selected Rows  102 Selected           09 23         Q            ND ooa       The count of UR and DR genes is reported in the summary page    76              Windows 8 1   8 9E P  b Fl       3         af fymetr Ix GSE6943 CAT RMA tac   RAE230A Analysis Result    Summary N   Table Scatter Plot N   Volcano Plot   Chromosome Summary   Hierarchical Clustering       Heart vs  Diaphragm   Analysis Type  Gene Level Differential Expression Analysis  Array Type  RAE230A   Genome Version  rn5   Annotation File  RAE230A na34 annot csv    1   Total number 
7.       26 August 2014  at 14 09     This page has been accessed 150 times   m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     61    PubMA Exercise 6    From BioWareWIKI  Full analysis workflow using CLC main workbench      Main Page   Hands on Analysis of public microarray datasets             _                 5   PubMA Exercise 6      Analyze GEO data with the Affymetrix software                 build  experiment    Compute  groupwise DE                 limits  filtering                  functional  enrichment        AFFYMETRIA       ST uy   i  The following content was directly taken from the current CLC documentation  Feb17  2014  and applies only for people with access to the CLC Main workbench       Contents           CLC Tutorial material    1 1 Loading microarray data into the Workbench    1 2 Loading your own Affymetrix microarray data into the Workbench     1 3 Main results and specific settings      4 Enrichment analysis in CLC Main    1 4 1 hypergeometric tTest     14 2 GSEA    2 Published results    3 download exercise files    CLC Tutorial material    Users from VIB labs or people having a license for the CLC Main workbench can proceed with this exercise during the afternoon open session or later from home  The final CLC  project folder can be downloaded from the BITS server   Heart vs Diaphragm zip   and imported in the User CLC project manager  The workflow is not further discussed here and we  invit
8.       k 1376227 at 1 44   10 1 0     13  38 7 21 6 6 48 Myoz1 myozenin 1   k 1387065 at 1 448 10 1 09   13  38 6 21 6  5 85 Plcd4 phospholipase C      k 1384202 at 2 03   10 1 67   13 37 2 21 2 6 68 Tesc tescalcin   k 1386931 at 2 03   10 1 78e 13 37 21 2 6 71 Tnni3 troponin   type 3     k 1384178 at 2 562 10 2 422 13 36 1 20 9 4 38 Lrrc  b leucine rich repe     k 1371288 at 2 56e 10 2 57   13  35 8 20 8 6 34 H19 H19  imprinted m     k 1367604 at 3 12e 10 3 33e 13 35 1 20 6 3 56       2 cysteine rich prot     k 1367896      3 70e 10 4 18e 13  34 4 20 4  8 49          carbonic                    k 1375738 at 3 71e 10 4 43e 13 34 2 20 4 3 46 Ehd4 EH domain conta     k 1389532 at 4 67   10 5 98e 13 33 4 20 1 Nebl nebulette    Options Profile graph    R script    Log transformation has been applied to the data  You can change this in the Options tab     Save all results    Select columns    5 07    The limma analysis results in a list of the 250 transcripts with the lowest p values  ranked by increasing p value    The results table contains the following columns       adj P Val  p value after correction for multiple testing    This column is the statistic you should use for interpreting the results  Genes with the smallest adjusted p values will be the most reliable  Selecting all  transcripts with adjusted p values  lt  0 05 is equivalent to setting the False Discovery Rate  FDR  to 0 05 allowing that 5  of the selected DE genes are false  positives    As you can see GEO2R alway
9.     Analysis of public  microarray datasets    Learn to                 QUO SIU         su  ol         Jo               Jepun                        5     aul einquysip Aew             siy  UOdN        JO  ullojsu  1  eye        J      exi v eJeus          y  JO esn        JO          siopu   Adu  zey  sjseDDns yeu  Aem Aue ul jou 104    g A  1osSu  ol  pue Joujne                y                       YIOM OU                          noA     uonnglnv   suonipuoo                  OU  Jepur              y             0                         y                 pue                   doo          eJeug   0  99JJ Ie             su  or1                0 2 exivyeJeug uonnqumy  SUOLULUOY              5 e JOPUN                SI        514         q qlA www  dnu   GIA 9ui Jo          soneuuojuroiq              Aq peuwo SI        8141            ALITI2VJ 33IA3d3S                                    SNINIVHI I S  23ILWWSHO3NIOIH        Hands on Analysis of public microarray datasets    From BioWareWIKI     Date  October  17 2014  from 9h30 to 17100     Hands On Series    Analysis of public  microarray datasets          5 NCBI         Main_Page      Contents      1 Introduction     1 1 Summary   a 1 2 Required skills      1 3 Morning Session     1 4 Afternoon Session   a 1 5 More Info    2 Exercises  a 3 Additional resources     3 1 Additional tutorials   m 3 2 Web services and resources  3 3 Meta Analysis Resources  3 4 Commercial resources licensed by VIB  3 5 Do you still need MORE   
10.     Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160096 CEL vs     Ihomejsplaisan Desktop Robi   ults GSE6943 CEL GSM180098 CEL          Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160096 CEL vs     Ihomejsplaisan Desktop Robi   ults GSE6943 CEL GSM180099 CEL          Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160096 CEL vs     Ihomejsplaisan Desktop Robi   ults GSE6943 CEL GSM180100 CEL          Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160098 CEL vs     Ihomejsplaisan Desktop Robi   ults GSE6943 CEL GSM150099 CEL          Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160098 CEL vs     Ihomejsplaisan Desktop Robi   ults GSE6943 CEL GSM180100 CEL          NINININININ     Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160099 CEL vs     thome splaisan Desktop Robi   ults GSE6943_CEL GSM160100 CEL          PCA Plot  of 11 Affymetrix data files       HCLUST Plot  of 11 Affymetrix data files                         a Previous     gt  Next             Step 3 of 4             Evaluating QC results    One plot of each kind is reproduced below and shows that RobiNA generates classical QC plots showing how  good  the data is and how well it divides according to the defined groups     Details for each QC Plot type  Collapse       Boxplot of 11 input files    GSM160089 CEL                 gt LFC1   0 17    1
11.     third party  tools         PubMA Exercise 1 Search GEO to find public datasets related to one s project     PubMA Exercise 2 Compute differential analysis using GEO2R within the NCBI web portal  and follow up in RStudio     PubMA Exercise 2b Optional follow up analysis demo in RStudio     PubMA Exercise 3 Clustering using the GEO Dataset browser  only for data with attached GDS ID      PubMA Exercise 4 Full RobiNA analysis as a standalone desktop alternative to GEO2R and Bioconductor     PubMA Exercise 5 Web tools for functional enrichment of the obtained lists to identify key biological functions       Analyze_GEO_data_with_the_Affymetrix_software Optional exercise using the Affymetrix Transcription Analysis Console  Windows only   free program   Normalize CEL files with RMAExpress  Windows and MacOSX   free program    PubMA Exercise 6 Optional exercise using the fully integrated CLC Main workbench  for VIB users and CLC license owners    PubMA_Exercise 7 Optional IPA analysis of the GEO2R DE table  for VIB users and IPA license owners        Additional resources    Additional tutorials      Find Transcriptome Signatures with TranscriptomeBrowser Alt option to get enrichment from the GEO data  and much more     Analyze your own microarray data in R Bioconductor See how to analyze your own microarray data in R Bioconductor    Web services and resources    Only few of the following resources will be used during this training       GEO2R  www ncbi nlm nih gov geo geo2r     
12.    Seay     x xx x  x x      x     gt             x  x x  TC E  Ac x x               X    x                             Susp    5     x    x  XX x x  x                       x  lt  gt  yX  X x xX   gt  x  x x x  se x     x x s    x    X x X  x XX    Diaphragm _____      4 5 5 7 8 9 10 11 12 13       Volcano plots are very popular and show how confident the data is and how many genes show deviation from the steady state    200 1E 20 7  gt  lt        190 1   19    180 1E 18    170 1E 17 4  160 1E 16 4  150 1   15    140 1E 14  130 1   13    120 1   12      110 1   11         100 1E 10     90 1E 09     X    2                 0607 01   e  sueoyiubis    80 1E 08 4  TO 1E 07H   amp 0 1E 06  50 1E 05 4  40 0 0001 1    30 0 001    x SAN 25   2 E ds    x      20 0 01H    3      10 0 1H                             512  256  128  64  32  16  8  4  2 1 2 4 8 16 32 64 128 256 512                   The interactive nature of the plot allows identifying outliers or significantly DE genes using the mouse     75       eoo Windows 8 1 gs 9  9 R   E  db  l En    XE   m    a        metrix Analysis_1 tac   RAE230A Analysis Result      Scatter Plot   Volcano Plot N   Chromosome Summary   Hierarchical Clustering       Comparison    Heart          vs          Search  Prev  Next   Show Hide Columns      Export           Upin Heart vs Diaphragm    E     Down in Heart vs Diaphragm     V  Show Filtered Only  Clear Current Clear Current Filter s  Reset to De Reset to Default Reset to Default  Customize An
13.   1092  vs  Diaphragm       Diaphragm       Diaphragm   1367962_at Rn 17592 1   3 24  3059 71 6 76E 11      5 11E 08  Actn3 actinin    1374248 at Rn 9153 1 7 2 90 2986 48 3 76E 13 241E 09 Mybpcl   myosin  1370214_at Rn 2005 1   3 07 2166 95 8 24E 10 2 62   07  Pvalb  parvalbi  1372195 at Rn 43529 1   5 09 1052 32 1 91E 08 0 000002  Tnnc2 Itroponir  1386977 at Rn 1647 1    3 18 950 46 6 09E 13 241E 09  arboni  1374391 at Rn 16457 1   4 07 925 06 0 000011 0 000205  1387787_at Rn 6534 1   5 58 832 95 5 48E 10 1 98E 07  1370971 at Rn 40497 1   5 54 511 45 1 19E 09 3 34E 07  1370900_at Rn 1072 1   2 58 490 63 9 15E 07  0 000032  1371247_at Rn 22504 1   6 06 488 19 1 22E 08  0 000002  1388139_at Rn 10092 1   6 05 418 68 4 09E 10 1 69E 07  e  All informabon  1369502 a at      Rn 67070 1   1 27 391 82 7 61E 12      173E 08      1367964 at Rn 9924 1   6 95 348 32 2 27E 11 3 00E 08 Tnni    Gene Symbols to Export   1371000_at Rn 10738 1   3 34 327 12 1 40   09 3 53E 07  1376968 at Rn 26659 1 4 01 308 15 2 24E 08 0 000002       1st gene symbol only  1370412 at Rn 13846 1   6 04 291 17 2 16E 08  0 000002    1367896_at Rn 1647 1   5 88 270 73 1 56E 10   8 53E 08        gene symbols  1371339_at Rn 11675 2   6 31 258 31 1 20E 11 2 12E 08  1372190_at Rn 36859 1   1 70 250 30 0 000005 0 000110  1368108 at Rn 10833 1   7 35 230 38 1 30E 07      0 000008  cepe   1370033 at Rn 40120 1   7 15 222 39 7 62   12 1 73E 08      1390355 at Rn 38647 1   5 29 195 73 1 77E 08 0 000002  1374012 at Rn 6456 1
14.   14    4 560e401   23 20335    8 25   Tnnil   troponin I type 1  skeletal   slow      1373697 at   7 19   11   2 71   14      36   01   22 71378    7 13   Mybpc2     myosin binding  protein C  fast type     1398306 at   9 89e 11   4 35e 14    4 18e 01   22 33291    1 06   Ampdl   adenosine monophosphate  deaminase 1     1374672 at   1 14   10   5 70   14   4 0Be 01   22 11085   B 82   Tnni3k   TNNI3 interacting  kinase     1367962 at   1 44e 10   B 13e 14    3 96e401   21 81731    1 15e 01   Actn3   actinin alpha 3    1367964 at   1 44e 10   9 53   14    3 912 01   21 6842    8 57   Tnni2   troponin I type 2  skeletal   fast      1376227 at   1 44   10   1 0     13    3 87e401   21 59864    5 48   Myozl   myozenin 1     13B7065_at   1 44e 10   1 09   13    3 86e401   21 57382    5 85   Pled4   phospholipase C  delta 4    1384202 at   2 03e 10   1 67   13   3 72   01   21 20727   6 68              tescalcin     13B6931_at   2 03e 10   1 78   13   3 70e 01   21 15193   6 71   Tnni3   troponin I type 3  cardiac     1384178 at   2 56e 10   2 428 13   3 61   01   20 88785   4 38   Lrre4b   leucine rich repeat  containing 48     1371298 at   2 56e 10   2 57   13    3 592 01   20 83498    5 34   H19   H19  imprinted maternally  expressed transcript  non protein coding      1367604 at   3 12e 10   3  332 13   3 51   01   20 6094   3 56           2   cysteine rich protein 2    1367896 at   3 70e 10   4 18e 13    3 44e 01   20 40929    B 49   Car3   carbonic anhydrase 3    1375739 at 
15.   3 71e 10   4 43e 13   3 422 01   20 35831   3 46    Ehd4    EH domain containing 4    1389532 at   4 67e 10   5 98e 13   3 34e 01   20 09179   5 07   Nebl   nebulette     1370198        4 67e 10   6 44e 13    3 31e401   20 02504    5 15   Trdn   triadin     1370157 at   4 67e 10   6 45e 13   3 31   01   20 02293   6 31   Bln   phospholamban     13B6873_at   5 30e 10   7 66B 13    3 26e401   19 86935    7 37   Tnnil   troponin I type 1  skeletal     slow      e If you wish to upload this table to Ingenuity Pathway Analysis  IPA   you may consider opening it first in Microsoft Excel and save it back as a   xls  file     This will remove the double quotes around fields and allow better recognition of your data by IPA    Saving the Rscript for further use in RStudio    This is the last step of this tutorial and the first step of the follow up page PubMA  Exercise 2b where we will produce      R script to perform the GEO2R analysis on  our own computer and prepare for more advanced microarray analyses     GEO2R Value distribution Options Profile graph R script      Version info  R 2 14 1  Biobase 2 15 3  GEOquery 2 23 2  limma 3 10 1    R scripts generated Tue Aug 12 05 30 54 EDT 2014                                                                       Differential expression analysis with limma   library Biobase    library  GEOquery    library limma       load series and platform data from GEO    gset  lt   getGEO  GSE6943   GSEMatrix  TRUE   if  length gset   gt  1  idx     l
16.   7 79    6  1371339 at  4 62E 11   8 25    7  1373697 at  7 19E 11   7 13    8  1398306 at  9 89   11 22   7 06    9  1374672_at  1 14   10    1 22  8 82    10  1367962 at  1 44   10   11 5            Annotated Dataset  geo2r DE table  Preview Dataset geo2r_DE table       Mapped IDs  13169    Unmapped IDs  2754    All IDs  15923                                             gt     8 80   01   0 030 1398906_at 0610009B22Rik RIKEN cDNA 0610009B2  4 35E 01 1390239 at 1500009L16Rik  RIKEN cDNA 1500009L1  5 95   01 1378421 at 1500009L16Rik  RIKEN cDNA 1500009L1  4 85E 01 1377537_at 1700015G11Rik RIKEN cDNA 1700015G  2 26E 02 1371434_at 1810037117Rik Gm203  predicted gene 2036   3 S0E 03 1375706_at 2200002D01Rik RIKEN cDNA 220000200  6 93E 02 1388186_at 2210010  04       RIKEN cDNA 2210010  0  4 87E 01 1389196_at 2310039  08       RIKEN          2310039  0                              Other  Other  Other  Other  Nucleu  Extraci  Other                                                                                               Start core analysis and set filter          Set Cutoffs   Expression Value Type Cutoff Range Focus On  False Discovery Rate  q value   0 01 0 0 to 1 0 368 analysis ready molecules across observations  Log Ratio 2  11 5 1011 0  Both Up Downregulated M             When IDs map to the same gene  protein  or other molecule        Apply cutoffs before consolidating IDs  Yes  recommended            ADVANCED    Set Cutoffs        Resolve duplicates using Exp Value Lo
17.   721   1370157 at     Rn 9740 1 6 67 13 08 0 35 0 16  85 15 2 96E 11 7 60E 09 Pin phospholamban chr20 36390885 36399430  22   1389727 at Rn 18919 1 4 01 10 34 0 16 0 18  80 08 3 05E 13 3 02E 10 LOC100911101  Lrrc10 leucine rich repeat containing protein 10 like  leucine rich repeat co chr7 60099897 60100444  26  1374816 at      2307 1 541 11 61 0 18 0 25  73 37 3 43E 12 1 76   09         1 GRINL1A complex locus 1 chr8 75066585 75067085  30   1373987_at Rn 9940 3 4 49 10 55 0 12 0 18  66 56 2 16E 13 2 28E 10 Kcnip2  LOC100911951 Kv channel interacting protein 2  Kv channel interacting protein 2 lik chr1 270360899 2703615  31  1384202 at     Rn 14758 1 5 31 11 33 0 16 0 19  64 99 8 09E 13 6 11E 10 Tesc tescalcin chri2 45972889 45978551  33   1371951 at      3849 2 5 9 1171 0 44 0 11  56 08 4 33   10 6 42E 08 Fhi2 four and a half LIM domains 2 chr9 49591196 49591681  735   1374672 at X Rn 3434 1 3 98 9 6 0 08 0 19  49 27 1 98E 13 2 28E 10 Tnni3k TNNI3 interacting kinase chr2 279730852 2797313  38   1389411 at Rn 19666 1 429 9 78 0 18 0 2  45 17 4 10E 12 2 03E 09 chr3 120021190 1200220  _44  1371677          Rn 3817 1 5 22 10 46 0 3 0 16  37 91 8 22E 11 1 65E 08 Spink8 serine peptidase inhibitor  Kazal type 8 chr8 117258777 1172703  47 1367564 at      2004 1 5 94 11 16 0 18 0 4  37 24 3 15E 10 5 05E 08 Nppa natriuretic peptide    chr5 168466312 1684676  48 1388506 at     Rn 7293 1 7 34 12 44 0 26 0 13  34 43 1 60E 11 4 98E 09 Dsp desmoplakin chr17 29201855 29202516  50   1367949
18.   Hands on Analysis of public microarray datasets      Search public GEO datasets  identify specific transcriptome signatures  and perform functional enrichment on the found sets       Contents        Introduction    2    Walk through example    2 1 Load the GSE6943 dataset     2 2 Find Transcriptome Signatures    2 3 Show HeatMaps for each TS    3 Conclusion      3 1 And there is even more for Geeks    Introduction    TranscriptomeBrowser  TBrowser  l host a large database of transcriptional signatures  TS  extracted from GEO public microarray repository  2   TS have been produced using a new  algorithm called  Density Based Filtering And Markov Clustering   DBF MCL   The current database contains about 30 000 TS derived from   4 000 microarray datasets   222 millions  expression values   Each TS was tested for functional enrichment using annotation obtained from numerous ontologies or annotation databases  Gene Ontology  KEGG  BioCarta  Swiss   Prot  BBID  SMART  NIH Genetic Association DB  COG KOG  TargetScan  PicTar   TFBS  conserv ed   MSigDB  GeneSigDB         TBrowser comes with a sophisticated search engine  so that users can perform combined queries using boolean operators     VERSION 3 0  TranscriptomeBrowser host a large database of transcriptional signatures  TS  n 40 000  extracted from Gene Expression Omnibus   4 000 experiments  using the DBF   MCL algorithm  TBrowser comes with a sophisticated search engine so that users can search for the biological contexts 
19.   Introduction    Summary    This basic training will give you an overview of the what GEO    has to offer  Several experiments will be analyzed using simple tools to obtain differential gene lists  An introduction to downstream tools dedicated to  functional enrichment will close the session     Required skills    This training is meant for biologists with little or no data of their own that need to identify genes of interest associated to a given biological problem  The participants do not need any prior knowledge of programing     Morning Session    Find relevant data on GEO   Analyze using the NCBI GEO2R utility   Continue the GEO2R analysis in RStudio  intro    Find cluster using the NCBI GEO DataSets browser   Analyze the same data using RobiNA   Perform functional enrichment on the DE gene lists using public tools    Afternoon Session      Users search GEO datasets and analyze them with tools discovered during the morning session    Users with access to CLC Main can follow the CLC tutorial  VIB only     Users with access      IPA can follow the IPA tutorial  VIB only     More Info      More on the VIB website 12    Related VIB training sessions  3     Related BITS Website pages  4    Exercises    A   Gane                  Omnibus  ua Find datasets  atas  L  follow up    we  functional     enrichment de                 GEO2R  analysis       DATASFT  BROWSER                B  KO           You will find in this section exercises performed during the               session 
20.   Run analysis  is clicked to compute differential expression between the two groups        affymetrix  New Analysis   Open Existing Result   Preferences         Gene Level Differential Expression Analysis    Import Data Remove Selected     Parse File Names      Show Grouped Files    Name Array Type File Type File Path    Transcriptome Analysis Console 2 0                      Heart       GSM160089 rma  GSM160090 rma  GSM160091 rma  GSM160092 rma  GSM160093 rma  GSM160094 rma    Diaphragm    GSM160095 rma  GSM160096 rma  GSM160098 rma  GSM160099 rma  GSM160100 rma    Click to Create New    Condition                Analysis File    psf Home Documents TAC AnalysisResults Analysis_2 tac    Browse        Run Analysis       Other expression analyses can be performed when the probe type is compatible with transcript level analysis  discerning between alternative transcripts   However  this is not demonstrated here and  we only provide the example of gene level analysis     The summary of a standard DE analysis is shown with counts for UR and DR genes under standard filtering values  more than two fold difference between the groups and adjusted p value  lt  0 05     73       affymetrix GSE6943_CAT RMA tac   RAE230A Analysis Result    Summary big Scatter Plot    Volcano Plot   Chromosome Summary   Hierarchical Clustering    Bl   Upin Heart vs Diaphragm    0   Down in Heart vs Diaphragm          Heart vs  Diaphragm   Analysis Type  Gene Level Differential Expression Analysis  Array Ty
21.   SEnrichr           Transcription          Ontologies Disease Drugs Cell Types Misc              65  6943 DE Robina  238   238 genes   lt     KEGG Table Grid Network 42    Click the bars to sort  Now sorted by combined score                     ING PATHWAY  ERE MENT AND COAGULATION CASCADES                  ON RI SIGNALING PATHWAY         SIGNALING PATHWAY                     AND GLUCONEOGENESIS       FRUCTOSE AND MANNOSE METABOLISM         LEUKOCYTE TRANSENDOTHELIAL MIGRATION       SEnrichr        Transcription Pathways Disease Drugs Cell Types Misc       971144711411  GSE6943_DE Robina  238   238 genes   lt     GO Biological Process Table Grid Network 42    Click the bars to sort  Now sorted by combined score                           OR MUSEIEEORERAEHIG      0 0006937                                                    GO 0055008                               ment  GO 0007517   ERREUR contraction  GO 0008016                      GO 0006816     ted muscle contraction  GO 0006942     di   trivalent inorganic cation transport  GO 0015674          ent based movement  GO 0030048              n            r Login   Register    Transcription Pathways Ontologies Cell Types Misc       GSE6943_DE Robina  238   238 genes   lt     Up regulated CMAP  Down regulated CMAP  GeneSigDB    OMIM Disease      Table Grid Network      Click the bars to sort  Now sorted by combined score                  rophy    blood      alzheimer disease          Enrichr Losin   Regt    Pathways Ontologies Di
22.   To use this cdf file execute the following R commands        biocLite  ath1121501cd        uf load Bioconductor libraries       library   ath112150lcdf       library  affy      uf specify path on your computer where the directory that contains the CEL files is located    icelpath    C  Users Janick My Documents R win library 2 14 affydata celfiles        uf import CEL files containing raw probe level data             ReadAffy celfile path celpath     rl      BrainArray provides a list of custom annotation packages  http   brainarray mbni med umich edu bioc bin windows contrib 3 0    To use these cdf files  download  the zip file from the website  Install it from the local zip file   IR         64 bi  File Edit View Misc Windows Help Vignettes    Load package       RROmoe    Set CRAN mirror       Select repositories     Install package s          version 3 1 1 Update packages     Copyright  C  2  1    Platform  x86 6  Install package s  from local zip files             Then execute the following code        load Bioconductor libraries  dlibrary  affy        specify path on your computer where the directory that contains the CEL files is located  icelpath    D  R 2 15 2 library affydata celfiles Apum     of import CEL files containing raw probe level data    data   ReadAffy celfile path celpath    ff indicate you want to use the custom cdf      If you don t specify the cdfname  Bioconductor will use the default Affymetrix cdf    data cdfName  ATH1121501 At TAIRT     You can f
23.   f    Claudia Mimoso  Ding Dar Lee  Jiri Zavadil  Marjana Tomic Canic  Miroslav Blumenberg  Analysis and meta analysis of transcriptional profiling in human epidermis    Methods Mol  Biol   2014  1195 61 97    PubMed 24297317   4WORLDCAT    DOI   I p     Matgorzata Janas Kozik  Urszula Mazurek  Irena Krupka Matuszczyk  Matgorzata Stachowicz  Joanna Gtogowska Ligus  Tadeusz  Wilczok  The transcript expression profile of the leptin receptor coding gene assayed with the oligonucleotide microarray technique      could this be an anorexia nervosa marker   Cell  Mol  Biol  Lett   2006  11 1  62 9   PubMed  16847749    WORLDCAT    DOI   I p     Yoseph Barash  Elinor Dehan  Meir Krupsky  Wilbur Franklin  Marc Geraci  Nir Friedman  Naftali Kaminski  Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays   Bioinformatics  2004  20 6  839 46    PubMed 14751998    WORLDCAT    DOI   P p        2  T http   rmaexpress bmbolstad com          Main  Page   Hands on Analysis of public microarray datasets      Retrieved from  http   stelap local BioWareWIKT index php titlezNormalize CEL files with RMAExpress amp oldid 1 1803   Category  Howto      This page was last modified on 20 October 2014  at 10 08     This page has been accessed 20 times     85    m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     86    Find Transcriptome Signatures with TranscriptomeBrowser    From BioWareWIKI    Main  Page 
24.   las 2  col f1   legend  topleft   labels  fill palette    bty  n      Improved code    The next script was adapted to keep local files and to save results to file instead of sanding them to a new browser window  Edits are shown  where original lines are commented out by 7    m  destdir   base  saves the downloads to the local folder instead of to the temp directory  m number nrow fit2   generates the full table instead of the top250    We now show the edited script where lines starting with    are modified in the next line s        added configuration   ibase        PUBMA2014 ex2b files   sep      isetwd  base       Version info     2 14 1  Biobase 2 15 3  GEOquery 2 23 2  limma 3 10 1    R scripts generated Tue Aug 12 05 30 54 EDT 2014    AEE EEE AAA       Differential expression analysis with limma i  ilibrary  Biobase   ilibrary  GEOquery    ilibrary  1limma     load series and platform data from GEO     gset  lt   getGEO  GSE6943   GSEMatrix  TRUE     igset  lt   getGEO  GSE6943   GSEMatrix  TRUE  destdir   base   iif  length gset   gt  1  idx  lt   grep  GPL341   attr gset   names    else idx  lt   1  gset  lt   gset   idx         make proper column names to match toptable i  fvarLabels gset   lt   make names fvarLabels gset         group names for all samples i          lt   c  G0O   G0   G0   G0   G0   G0   G1   G1   G1   G1   G1   G1    i    log2 transform          lt   exprs gset  i        lt   as numeric quantile ex  c 0   0 25  0 5  0 75  0 99  1 0   na rm T     
25.  10930  lt    qx 5   gt  100      i       6       1   gt  50  amp  amp       2   gt  0                2   gt  0  amp  amp  qx 2   lt  1  amp  amp  qx 4   gt  1  amp  amp  qx 4   lt  2  i  if  LogC    ex which ex  lt   0    lt   NaN i     exprs gset   lt   log2 ex          set up the data and proceed with analysis   fl  lt   as factor  sml    igset description  lt   fl   design  lt   model matrix   description   0  gset   icolnames design   lt   levels fl    fit  lt   lmFit gset  design    icont matrix  lt   makeContrasts Gl G0  levels design   ifit2     contrasts fit fit  cont matrix    fit2     eBayes fit2  0 01             lt   topTable fit2  adjust  f  r   sort by  B   number 250   tT  lt   topTable fit2  adjust  fdr   sort by  B   number nrow fit2        load NCBI platform annotation  igpl  lt   annotation gset      platf     getGEO gpl  AnnotGPL TRUE     21    platf  lt   getGEO gpl  AnnotGPL TRUE  destdir   base    mcbifd  lt   data frame attr dataTable platf    table      uf replace original platform annotation   tT  lt   tT setdiff colnames tT   setdiff fvarLabels gset    ID       tT  lt   merge tT  ncbifd  by  ID     tT  lt   tT order tT P Value       restore correct order   d tT  lt   subset tT  select c  ID   adj P Val   P Value   t   B   logFC   Gene symbol   Gene title        tT final  lt   subset tT  select c  ID   adj P Val   P Value   t   B   logFC   Gene symbol   Gene title      7 write table tT final  file stdout    row names F  sep   t     write table tT fi
26.  2F   2Fanalysis ingenuity com   2Fpa4c2Fj spring cas security check amp originalUrl https 403A   2F   2Fanalysis ingenuity  com   2Fpa  3Futm_source  3DIngenuity   26utm medium   3     Please keep in mind that IPA is only meant for human mouse rat data     IPA Tutorial material    The starting material for the IPA upload        be downloaded from this link  http   data bits vib be pub trainingen gPUBMA2014 ex7 files geo2r DE table xls   bottom of this page     IPA analysis    Upload data in IPA    Dataset Upload         2   DE table xls                   1  Select File Format  Flexible Format v  More Info   2  Contains Column Header       Yes O       3  Select Identifier Type  Affymetrix    Specify the identifier type found in the dataset    4  Array platform used for experiments    Rat Expression Set 230 A Y   Select relevant array platform as a reference set for data analysis              5  Use the dropdown menus to specify the columns that contain identifiers and observations  For observations  select the appropriate expression value type     Raw Data  15924    Dataset Summary  13169                                                                       More Info  ID v   Observation 1  gt    Ignore        Ignore        Ignore v   Observation 1       Ignore        Ignore      False Discov    v Log Ratio X   1  D                   logFC    2  1388876 at  7 67E 12 25 288   7 14    3  1374248 at  7 67E 12 7   11 5    4  1374622_at  7 67E 12   4 69    5  1370033 at  3 47   11 
27.  4 65 189 31 1 26E 09 3 38E 07 synapto  1386873 at Rn 4035 1 6 69 176 57 3 75E 10 1 61E 07 Tnni troponir  1369375 a at     Rn 9726 6   3 27 176 35 3 49E 08 0 000003 calpain  1398251 a at      Rn 9743 1   2 63 156 32 6 74E 08 0 000005 calcium  1374659_at Rn 7511 1   1 95 143 61 7 08E 09 0 000001 cAMP r  1373697 at Rn 27586 1   7 00 139 99 2 27E 12 7 20E 09 myosin  1398306_at Rn 9794 1   6 07 129 20 2 81E 11 3 00E 08 adenosi  1370359_at Rn 67070 2   1 97 107 49 0 000086  0 000973 amylase  1398655_at Rn 9493 1   2 62 95 46 2 66E 07  0 000013 myoger  1381575_at Rn 15517 1   5 36 93 45 1 99E 07  0 000011 nebulin  1374049_at Rn 24381 1 i 2 93 93 39 2 36E 07  0 000012 10  10035     smooth                   Gene  Symbol    Transcript Cluster  ID                               Gene level Information only                                                                                                          ea ee 4   Condition Heart  File GSM160093  ID 1388044_at  Pfkfb2   Signal 1 20          Additional columns can be added to the table if the user needs them     Show Hide Columns    evo  ff     Transcript Cluster ID                   Transcript ID Array Design    Heart Bi weight Avg Signal  1092   Diaphragm Bi weight Avg Signal  lo     Heart Standard Deviation             3 B9 Ed  D  D  D             8 pa p    Diaphragm Standard Deviation   Fold Change  linear   Heart vs  Diap     ANOVA p value  Heart vs  Diaphrag     FDR p value  Heart vs  Diaphragm     Gene Symbol    Description   
28.  48475646698994 9 4716451160078 9 59635427766681 9 65872553515732 9 63503499785589 9   11367453 at 10 1553644909771 10 2499020063438 10 1861859768652 10 0965629262007 10 159434523851 10 2076070314483 10 2269807954293 10 195253118062 16  11367454 at 8 73905780465407 9 08252980064849 9 07161018095027 8 68762900735943 8 81499836330383 8 83806850836661 8 88654295440762 9 05310137910709 8   11367455 at 10  4763615418993 10 4471239467834 10 5710504733759 10 3264804870339 10 4745349426514 10 5552841089126 10 5959387062089 10 5139881779964 1    11367456 at 10 9984593902943 10 8441477983143 10 673570257498 10 6700249981427 10 6750331606405 10 7577588683534 10 1986166073672 10 3974448991471 10  1367457 at 8 78904682538038 8 72229122394877 8 67050724938203 8 9448960477874 8 72229122394877 8 66089835690261 8 76340465790531 8 76513510798621 8   11367458 at 7 53098410601309 7 35718220143744 7 71709223686958 7 65558449687549 7 45098048336888 7 52823368962808 7 7461437721208 7 40572408979464 74  11367459 at 11 1105760949228 11 2159762048115 10 9101244239086 11 149432063699 10 8953374468102 10 970737308597 11 2355512291063 11 2441035772632 11  11367460 at 9 96660388228079 9 84511446842093 9 94477628710098 9 91078389672863 9 84447947912025 9 94182586703467 9 87080073408934 9 70116063385167 9                Adding annotations to the RobiNA data  Because RobiNA does not support non plant organisms  it saves the data in a quite anonymous way with only probe IDs  The code below is borrowed from th
29.  Biological Process      Gene Ontology Cellular Component    1 Gene Ontology Molecular Function      Pathway      InterPro       Trans Membrane                     Annotation Description    C  Annotation Transcript Cluster    Trancerint Accianmente       Check Uncheck All          A plot of differential expression per chromosome may highlight local regulatory biases  hot spot loci     77    1 t i TUA n          I  F      i   i NI    UP omi      F             il 1  21 T T     1   L 1         I T 1    17                       F5 WaT  d      i     un      util         1     4   ah   aT                               m    FA Lh      NUI                  Ft a  LL   al  d                                  NAT         p       tt           gf      ran w wn ny                    mmy  pr             11        fn         OY   12 1 8         13                         14591                7  5            mom   16      a lan  mi                          ot Ut         aug   18      FS t gg   gg                      20  NT 9 W              du     x F   8       T  num   X    Heatmaps can be generated that show genes with similar pattern of variation across samples    eoo Windows 8 1      9  9 Ez  6  db  D En 4            affymetrix Analysis_1 tac   RAE230A Analysis Result    Summary   Scatter Plot   Volcano Plot   Chromosome Summary   Hierarchical Clustering N  Comparison  Heart vs  Diaphragm Search  Prev Next   Show Hide Columns                                Show Filtered Only   Clear Curren
30.  Chromosome    Genomic Position  Strand    Comment           After download to  txt  files  results can easily be converted and filtered in the Excel spreadsheet editor                1368093 at Rn 54399 1 4 31 12 98 0 23 0 12  407 47 6 94E 14 1 14E 10 Myh6 myosin  heavy chain 6  cardiac muscle  alpha chr15 37492581 37516282  5  1367665 at Rn 3789 1 4 9 13 14 0 29 0 28  301 97 5 01E 12 2 21E 09 Ankrdi ankyrin repeat domain 1 chri 262038143 2620467  78   1367664 at      3789 1 4 52 12 48 0 43 0 19  249 07 3 07E 11 7 60E 09 Ankrd1 ankyrin repeat domain 1 chr1 262038143 2620467  9  1367592 at      9965 1 5 03 12 41 0 65 0 24  166 63 1 89   09 2 07   07 Tnnt2 troponin T type 2  cardiac  chr13 57716336 57729185  10  1388876 at    Rn 1192 1 5 68 12 79 0 24 0 18  138 33 1 52   12 9 54   10 Pin phospholamban chr20 36399680 36400626   11 1387049 at      54399 1 4 63 11 5 0 17 0 35  116 45 1 10   11 4 03   09 Myh6 myosin  heavy chain 6  cardiac muscle  alpha chr15 37492581 37516282  16  1369313 at      3849 1 6 12 73 0 62 0 1  105 96 1 97E 09 2 09   07 Fhi2 four and a half LIM domains 2 chr9 49591626 49620648   17  1367616 at      3835 1 4 10 62 0 14 0 26  98 18 1 44E 12 9 53E 10 Nppb natriuretic peptide B chr5 168454272 1684556  18 1388597 at Rn 28286 1 6 01 12 63 0 21 0 19  98 15 1 26E 12 8 71E 10 Mybpc3 myosin binding protein C  cardiac chr3 86666789 86667480  19 13869314 Rn 64141 1 7 38 13 9 0 63 0 14  92 07 3 64E 09 3 38E 07 Tnni3 troponin   type 3  cardiac  chri 75665118 75668803
31.  Clear Selection  a               Sh Eee      Show Filtered        Clear Current Clear Current Filter s  Reset to Default Reset to Defauit Customize Annotations  View Interaction Network View Interaction Network    140 1E 14             Diaphragm Fold Change ANOVA      FOR p value      Bi  weight Gene   linear   Heart value  Heart v  Heart       ID            Avg Signal Symbol    Design  vg 519 vs  Diaphragm                    130 1E 13    Si os    log2       mmm actinin    7   3 76   13      2 41   09                       120 1   12                 14 15 2166 95 8 24   10 2 62   07             parvalbt   1386977_at M    241 E 09 Car3 carboni    1374391_at              0 000205  Sin sarcolip   1387787_at    6534       1 98   07 Mylpf myosin 100 1E 10  3 34E 07   Myh1  My      myosin    1370900 at   f   5    0 000032 Myh4 myosin    Ed at    1 troponit    409E 10      169E 07 Myh2     myosin                                  12       SA9      I6      Amyls A  amylase  2 27E 11      300E 08 Tnni2          troponir   371000 2t 1 107381  1169 334 32712 14009 3 53E 07 Cacnals calcium   1376068 at       127 401 30815  224E 08   0 000002 Mybpc2      myosin   370412 t      RnI3846 1   1423 604 2947 2416 08   0 00002 Tnnt                 0 nnd s E                     Ca carboni   1 20E 11      2 12E 08 Tnnil                     at    36859  1   0 000005 0 000110 Aqp4                Rn 10833 1     1 30E 07 0 000008   Atp221   ATPase        r 7 62   12    EM 08 Myl1 myosin     ToS ao imn
32.  Oryza sativa spp  japonica  1  gt     SolanumEsculentum  P value correction           gt        Solanum lycopersicum  Multiple testing z  gt     Solanum lycopersicum  tomato   nestedF        strategy   gt   21 Saccharum officinarum  sugarcane    gt   Solanum tuberosum  potato     gt   Triticum aestivum    gt     Vitis vinifera    gt     Zea mays    gt      Zea mays  maize        Write out normalized  raw data    Preview R script    V Log fold change min 1                  Download more mappings J  ES Import new     39 Skip        Annotate         p value cutoff    0 05 t          Create Metagroup Delete Metagroup         Previous wb Next             idle Finished analysis      Manual                We choose  Skip  as we do not have annotation files for rat  The differential expression gets computed  When done  we select  Exit  from the front window        RobiNA   The transcriptomics data preprocessor  Version 1 2 4_build656          Design your experiment    You can arrange the groups by  dragging them around   Define which groups shall be  compared by holding down the  CONTROL key and then  click dragging from the first group  to the second group    i i            dalasa en  Right click 2 eoo    one  metag  you want ti  and drawi   around the   Create Me     Finished successfully     Results were written to     Users splaisan Desktop test   Click  Modify  if you want to modify the design   and re run the analysis  Be sure to specify   a different name for the output folde
33.  PUBMA2014      This page was last modified on 16 October 2014  at 09 49     This page has been accessed 181 times     Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     51    PubMA Exercise 5    From BioWareWIKI  Functional enrichment of the obtained lists to identify key biological functions      Main Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 4   PubMA_Exercise 5      PubMA_Exercise 6               Contents      1 Why we are not yet done     2 Functional enrichment Analysis of the RobiNA DE data    2 1 Preparing probe lists for enrichment testing     2 2 DAVID  father of enrichment tools     2 3 Current web based Enrichment tools    2 3 1 Enrich    2 3 2 Webgestalt    3 Conclusion     4 download exercise files    Why we are not yet done     Once upon a time  scientists worked for a lifetime on one or few genes  they read all what came published about their favorite proteins and did not need any computer to help them follow up  and understand published data     This time now belongs to the past and modern biologists need to cope with publication frequency far higher than their reading speed  happy or not  they have to rely on computers for some of  the tasks they used to do manually     Analysis of MA data  similarly to any other high throughput technology  generates thousands of lines of results out of which hundreds are statistically significant  It is therefore very unlikely  that the
34.  The RMA conversion is operated with user selected options  leading to the activation of new menu items    Read Unprocessed files  Add new CEL files  Write RME format      Compute RMA measure      Write Results to file  log scale   Write Results to file  natural scale   Export expression values   Output Log file          82    Welcome to RMAExpress   Written by B  M  Bolstad  lt bmb bmbolstad com gt   Version  1 1 0   http   rmaexpress bmbolstad com    Select CDF file    CDF file   RAE230A CDF in  work TUTORIALS Analysis of public microarray datasets ref    Select CEL files    CEL files   GSM160089 CEL   CEL files   GSM160090 CEL   CEL files   GSM160091 CEL   CEL files   GSM160092 CEL    Mos   CEL files   GSM160093 CEL     Yes   CEL files   GSM160094 CEL         CEL files   GSM160095 CEL   CEL files   GSM160096 CEL   CEL files   GSM160098 CEL er eee   CEL files   GSM160099 CEL       CEL files   GSM160100 CEL    Quantile  C  None   Reading in data   Opening CDF and CEL files   Done Reading in datafiles                         Method               Median Polish  Done writing binary output   Choose Preprocessing Steps  PLM     _  Store Residuals          Camel                log tansformed results        saved to      for use with other programs    Read Unprocessed files  Add new CEL files  Write RME format    _ Compute RMA measure      Write Results to file  log scale     Write Results to file  natural scale   Export expression values  Output Log file          Plot normalized d
35.  at     Rn 10015 1 6 65 11 58 0 22 0 35  30 56 3 73E 10 5 79E 08 Penk proenkephalin       5 21834402 21839358  51 1370773         Rn 54469 1 4 33 9 12 0 14 0 24  27 53 1 44   11 4 75   09           2  10  100911951      channel interacting protein 2  Kv channel interacting protein 2 lik chr1 270338004 2703602  52  1372539 at      3291 1 4 81 9 55 0 12 0 29  26 69 3 45   11 8 16   09 chr14 17884583 17886375  53  1370229 at      81250 1 6 45 1111 0 27 0 04  25 23 3 04E 11 7 60E 09 Ndrg4 NDRG family member 4       19 9751516 9762475  56   1389532 at      7963 1 5 17 9 62 0 11 0 18  21 89 2 05   12 1 16   09 chri7 85869857 85871678  _57   1371566_at Rn 15764 1 4 04 8 44 0 09 0 23  21 02 1 12E 11 4 03E 09 Fbxl22 F box and leucine rich repeat protein 22 chr8 71869413 71872379  59  1398243      Rn 11345 1 7 91 12 29 1 24 0 1  20 1 0 000044 0 000457 Csrp3 cysteine and glycine rich protein 3  cardiac LIM protein  chri 105206458 1052171  60 1370061 at Rn 3788 1 5 64 9 92 0 13 0 34  19 45 4 51   10 6 50E 08 Rab3b RAB3B  member RAS oncogene family chr5 132356507 1324088       79          1  2  1372195 at      43529 1 13 38 3 97 0 14 0 19 680 92 8 22E 15 1 86E 11 Tnnc2 troponin C type 2  fast  chr3 167429205 1674302       Lio   1370971_at Rn 40497 1 12 48 3 55 0 17 0 13 490 32 7 11   15 1 86   11 Myhi        2  Myh8 myosin  heavy chain 1  skeletal muscle  adult  myosin  heavy chain 2  chr10 53514514 53517766  _ 8  1371247 at Rn 22504 1 13 27 4 42 0 08 0 1 461 19 0 00   00 0 00   00 Tnnt3 
36.  biologist evaluate each line and identify the proteins  genes products  that are significantly altered in expression and may be responsible for the biology under investigation     The approach consisting in recognizing genes in the list and selecting them for validation may seem appropriate but will unlikely lead to any discoveries  As the main need for publication is to  find novelty  this method is pretty much useless     A better way to analyze and prioritize targets from a screen is to consider the biological functions and pathways that include  are enriched in  differentially expressed genes  This can be done  after adding ontology annotations to the data and using these added column to identify  functions    diseases    pathways  or any ontologies terms that are enriched in the DE set et as compared  to the full set of genes measured by the platform  Again  this apparently straightforward statistical testing can be quite lengthy if you consider hundreds of available ontologies and hundreds to  thousand genes to annotate     Hypergeometric T test and Gene set enrichment analysis  GSEA  are the two mainly used statistical approaches to identify enrichment based on gene lists  A number of standalone and  Web tools implement these methods and is falls beyond the scope of this training to list them all or to argue for one or another  We instead present a few alternative tools that will accept the  data obtained in the former exercises and process it to find enriched ontolog
37.  bits vib be pub trainingen AffyECTAC2014 GSE6943 EC   mas5 qc PDF   and PLIER method  http   data bits vib be pub trainingen AffyECTAC2014 GSE6943 EC plier qc PDF  on our server  Users are welcome to evaluate each QC plot by  themselves using the data available on the server as input  see link at the bottom of this page     The Affymetrix Transcriptome Analysis Console  TAC     Importing EC data and defining Groups    71    affymetri x Transcriptome Analysis Console 2 0    New Analysis   Open Existing Result   Preferences  4m Gene Level Differential Expression Analysis     Open File       Current Directory hisan Documents  Affy_data GSE6943_Affy analysis  GSE6943_RM  Up One Level         s        analysis     Filename       Files of type   CHP Files      _             Z          Add File s  Here Add File s  Here    Click to Create New Condition             Each condition must have at least one file   Analysis File    psf Home Documents TAC AnalysisResults Analysis_2 tac Run Analysis       Each group is in turn defined by moving CHP files to the appropriate group window  This is done for  Heart  and for  Diaphragm  samples    affymetrix Transcriptome Analysis Console 2 0    New Analysis   Open Existing Result   Preferences    4m Gene Level Differential Expression Analysis    Remove Selected     Parse File Names       Show Grouped Files    Array Type File Type File Path  GSM160095 rma C  Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160095 rma chp  GS
38.  data files  you want to include in your analysis   by pressing the  Add  button     The  Info  button will provide some  details for each selected data file     If you are working with   a custom affy chip platform   that is not yet supported by   the bioconductor project you   need to supply the appropriate CDF  file upon data import        Import CDF file   roarray  datasets ref RAE230A CDF  2    Imported files     Volumes trainingen PUBMA2014  ex4 files GSE6943 CEL GSM160089 CEL   Volumes trainingen PUBMA2014  ex4 files GSE6943_CEL GSM160090 CEL   Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160091 CEL   Volumes trainingen PUBMA2014  ex4 files GSE6943_CEL GSM160092 CEL   Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160093 CEL   Volumes trainingen PUBMA2014  ex4 files GSE6943_CEL GSM160094 CEL   Volumes trainingen PUBMA20 14  ex4 files GSE6943_CEL GSM160095 CEL   Volumes trainingen PUBMA2014  ex4 files GSE6943_CEL GSM160096 CEL   Volumes trainingen PUBMA2014 ex4 files GSE6943_CEL GSM160098 CEL   Volumes trainingen PUBMA2014  ex4 files GSE6943_CEL GSM160099 CEL   Volumes trainingen PUBMA2014  ex4 files GSE6943_CEL GSM160100 CEL                Remove                      Performing QC on each CEL file    When QC finishes  a number of single plots can be reviewed by clicking on each section  These figures report issues or demonstrate good quality of the data and are also saved to disk for later inclusion in your    reports     Overview of all QC Plots  Colla
39.  domains 3   calcium channel  voltage dependent  L type  alpha 1C subunit  four and a half      domains 2       Current web based Enrichment tools  We chose to show you only two recent tools in this section  Many others are available and you are welcome to try them and share your experience back with us     Enrich    Enrichr 31 and its associated tool Lists2Networks    use    respectable number of sources to compute enrichment  source list  http   amp pharm mssm edu Enrichr index html stats   To  learn more about Enrich  please refer to their FAQ page  http   amp pharm mssm edu Enrichr  help      Enrichr uses a list of Entrez gene symbols as input  Each symbol in the input must be on its own line  You can upload the list by either selecting the text file that contains the list or just simply  pasting in the list into the text box  We will use the gene symbols extracted from the RobiNA file  cleaned to remove duplicates  and where double ID lines  a probeset mapping to two distinct  genes  were expanded  This input file is available on the BITS server  link  http   data bits vib be pub trainingen PUBMA2014 ex5 files RobiNA DE genes_LFC2 FDR0 001 txt   and  its content can be used on the Enrich submission page  http   amp pharm mssm edu Enrichr index html       We present below the top part of 5 randomly selected output bar charts  please explore Enrich to get much more than only these  The online version charts are interactive and provide much  more info than these pictures   
40.  each TS  we can visualize the actual expression data as a heatmap by using the corresponding plugin    90    Keyword      contractile fiber   myofibril    contractile fiber part    muscle protein      MF00261  ACTIN BINDI      MF00071  TRANSLATIO        protein binding          00250  MUSCLE DEV               173  MUSCLE CON      BP00286  CELL STRUC      skeletal muscle develo        striated muscle develo       Nb  probes    Nb  samples      Nb  genes    684                       Pratt       Q value  BH    3 71998E 5  1 49866E 4  5 58187E 4  0 00129896  0 00407486  0 00607071  6 04472E 4  0 00104774  0 00238776  0 00377257  2 588565E 4  0 0019021    4 signatures 1 platforms 1 experiments        94879128C        94  7            94BB8DCA2       Plugins            Heatmap     Settings    Ww         first heatmap  cut at  20 genes           Signature                                                                                Annotation    Search   Sort 4    a 1 PANTHER_TERM_BP    Probes        13 0  Samples     15 0               RGD1311260        3  1  Seppi  Keyword   BP00005  GLYCOLYSIS                            Save heatmap Save annotation Export heatmap data                                                         Annotation                                                                   Signature   94                   Search     Sort               PANTHER_TERM_BP        Probes      13 0  Samples      15 0               GSM160089  Title   Diaphragm 1  Description  D
41.  in several publications  eg 1  and is available for Windows AND for mac OSX from its developer site  2     manual is  also available here  http   rmaexpress bmbolstad com RMAExpress_UsersGuide pdf           The Affymetrix tools produce      same kind      data with more QC plots and        preferred if you want to have    close look to your data    RMAeXpress run with the GSE6943 data    In order to perform this exercise  please first install the software from the following link  http   rmaexpress bmbolstad com      2  You will need the rat RAE230A CDF file accessible from the link at the bottom of this page    Convert CEL data    The user locates the CEL folder and the CDF file     SF  he 73   obviously all CEL files should come from the same CHIP       Read Unprocessed files    Output Log file    81    Welcome to RMAExpress   Written by B  M  Bolstad  lt bmb bmbolstad com gt   Version  1 1 0   http   rmaexpress bmbolstad com    Select CDF file  CDF file   RAE230A CDF       work TUTORIALS Analysis of public microarray datasets ref    Select CEL files       Please Select your CEL files    CDE   w s cca C         FAVORITES     1 Dropbox    GSM160089 CEL  E All My Files   GSM160090 CEL   GSM160091 CEL     GSM160092 CEL      work   GSM160093 CEL    7 biotools   GSM160094 CEL    SJ biod GSM160095 CEL  iodata    GSM160096 CEL    121 splaisan   GSM160098 CEL     usr i GSM160099 CEL  GSM160100 CEL       N open_terminal_here         1 git  repos  N Applications   2  Desktop         
42.  process  Raw universal gene list size   15923  Used universal gene list size  requiring annotation and one feature per gene only    8603  Raw subset gene list size   142  Used subset gene list size  requiring annotation    73  Expression values used when filtering to one feature per gene   Transformed expression values  Applied filter to reduce features to one per gene   true  Filter applied to reduce features to one per gene   used feature with highest IQR    The next figure details the Top 30 hypergeometric results for the contrast Heart vs diaphragm             glycogen metabolic process 27    7 1      7 1 84   09  51  5975  carbohydrate metabolic process          1128  711  6 0000104    m           cellular calcium ion homeostasis   64   5   1 d 4 10000199   Eos    cardiac musele contraction   MM NEUE             ME               7519  iskeletal muscle tissue development  ae 000042       positive regulation of sequestering of triglyceride   5 d MS   MM E      2m i 0  0007    32780 i negative regulation of ATPase activity        iin ibe i ks RCS A On  gt       CUM DOG UM      NET 0 001454    35814     iskeletalmusclecelldifferentiation      D Se l   JEROME    ta 4002443       cardiac myofibril assembly 10 OMEN 3 0 003064    55008     icardiac muscle tissue morphogenesis 2 _ 0  2 10003064   43268     ipositive regulation of potassium ion transport 5 1    12121  0  2 0004445     heart contraction TA DE TM 0 1  2 10005224   10880 regulation of release of sequestered calcium 
43.  splaisan Desktop Robi   ults GSE6943_CEL GSM160089 CEL vs    home splaisan Desktop Robi   ults GSE6943_CEL GSM160091 CEL          Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM180089 CEL vs    home splaisan Desktop Robi   ults GSE6943_CEL GSM160092 CEL          Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CELIGSM160089 CEL vs   thome splaisan Desktop Robi   ults GSE6943 CEL GSM160093 CEL          Scatter plot  of file  Jhome splaisan Desktop Robi   ults GSE6943 CEL GSM1800689 CEL vs   thome splaisan Desktop Robi   ults GSE6943 CEL GSM1600984 CEL          Scatter plot  of file  Jhome splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs    home splaisan Desktop Robi   ults GSE6943 CEL GSM160095 CEL          KIKIN ISA INI SIN    Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs                    Quality check results    Click in the list to open  a fullsize view of the results     Chips showing very poor PLM  results may be excluded from  further analyses by checking   the  Exclude  box     RNA Plot  of 11 Affymetrix data files       HIST Plot  of 11 Affymetrix data files       Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs     Ihomejsplaisan Desktop Robi   ults GSE6943 CEL GSM150090 CEL    Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs     Ihome splaisan Desktop Robi   ults GSE6943 CEL GSM180091 CEL    Sc
44.  stephane plaisance    Documents    Affy data    AffyEC lib   0                         p      bed Name   Date modified Type Size           3prime IVT AFFX_README NetAffx CSV Files 2014 09 02 08 45    Text Document 5KB  iads    ATH1 121501 cdf 2014 10 10 03 35          File 39 496         places    ATH1 121501 mas5_configuration 2006 08 11 10 56        MAS5_CONFIGUR    6 KB  ta  1 ATH1 121501 psi 2014 10 10 03 35    PSI File 440 KB    o Date    ATH1 121501 splaisan_default report_controls 2014 10 10 03 35    REPORT  CONTRO    8KB    wnloads   LI ATH1 121501 splaisen_default report thresholds 2014 10 10 03 35    REPORT  THRESH    7KB    You can change this folder via Edit in the top menu  Click Set library path        xpression Conso  File   Edit  Report Graph Analysis Tools Export Window     Change User Profile      Set library path      Set Internet Settings      3  Expression Report Controls      Report Thresholds      Probe Level Summarizations Report Options      Create Annotation Merge File             Transform Signals  log v linear           Step by Step RobiNA analysis workflow    You can use RobiNA for         quality assessment of your data     normalization of your microarray data    40      detection of differentially expressed genes    preparation of the data for an import into MapMan and or excel  m generation of informative plots on your experiment    Pre requisites are the CEL files containing each hybridization data  obtained from the GEO page as a   tar  arc
45. 00E 12 3 43E 09 chri 89135621 89137093  37   1369706 at Rn 24079 1 9 83 4 27 0 16 0 29 47 28 1 69E 11 5 15E 09 Cacngi calcium channel  voltage dependent  gamma subunit 1 chr10 95655312 95668315       Conclusion    The combination of the Affymetrix Expression and Transcription Analysis Consoles allows Windows PC users without any knowledge of  R  to perform standard analysis of Affymetrix microarray  data and obtain differential expression tables that can be used for downstream biological interpretation  Note that other more specific options and alternative analysis workflows are available with  the same tools and that this tutorial is only an introduction with a selection of basic methods     The main added value of these tools over  R  are the full range of QC plots generated and classically produced by bioinformatician experts as well as the very rapid processing of public Affymetrix  CEL data  within minutes   We therefore recommend exploring the EC and TAC tools and associate them to IPA and other downstream tools allowing biological evaluation of public microarray  data     Youtube videos from the Affymetrix training team    Please follow the video webcasts below to get familiar with the Affymetrix Expression Console and Transcription Analysis Console    A series of YouTube videos can be found on the Affymetrix web site  Expand     download exercise files    Download exercise files here   Expand     References     1  1 http   www affymetrix com estore browse level seven 
46. 01509 29155 114495 117558 C 80 0 5 E 1 76 R 2 83  rawP 0 0315 adjP 0 1496  G13 Signaling Pathway 3 366624 56781 83708 C 30 0 3 E 0 66 R 4 54 rawP 0 0277 adjP 0 1496  Glycolysis and Gluconeogenesis 3 25438 114508 25058 C 38 0 3 E 0 84 R 3 58  rawP 0 0507 adjP 0 1991  SIDS Susceptibility Pathways 4 29253 25665 60449 689560 C 64 0 4 E 1 41 R 2 83 rawP 0 0524 adjP 0 1991    Again  many more tables can be generated in WebGestalt and you should choose the type of enrichment that fits your experimental needs  Data can be saved back to disk for further use     Conclusion    More complete analysis can be performed by those few who can program in  R  Bioconductor  As this session is aimed at Biologists  this option is not further discussed  Users can also  consider using commercial packages like Ingenuity pathway Analysis that provide much more detailed and rich information than what free tools can offer     download exercise files    Download exercise files here   Expand     References     1  1 http   david abcc ncifcrf gov  2    http   www nature com nprot journal v4 n1 abs nprot 2008 211 html    Da Wei Huang  Brad T Sherman  Richard A Lempicki  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources     Nat Protoc  2009  4 1  44 57   PubMed 19131956    WORLDCAT    DOI   I p        3    http   amp pharm mssm edu Enrichr     Edward Y Chen  Christopher M Tan  Yan Kou  Qiaonan Duan  Zichen Wang  Gabriela Vaz Meirelles  Neil R Clark  Avi Ma ayan  Enri
47. 0ma chp Outside Bounds log  e      QC  Array Metrics     T ae          QC Signal Distribution     Add Intensity Files   Run Analysis     Add Summarization Fies   Remove   Refresh Attributes      QC Array Comparisons     Check All   Uncheck         Check Group  gt    Export Results   Utilities  Graphs tables are sorted by   v   and labeled by           02 09 2014 09 33 37   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160095 ma chp A    02 09 2014 09 33 37   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis  GSE6943_CEL GSM160096 ma chp  02 09 2014 09 33 37   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160098 ma chp  02 09 2014 09 33 37   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160099 ma chp  02 09 2014 09 33 37   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160100 ma chp  02 09 2014 09 33 37   Done opening files v    Library path  C  Users splaisan Documents Affy_data AffyEC_lib User Profile  splaisan_default    Z Enr D        5 09 33           3  x B     ND                  As seen above  several samples are reported  outside bounds  by the RMA workflow  It means that some control probe sets did not meet the quality requirements  We looked it up and saw that the  sample prep control probe sets  targeting B  subtilis genes  dap  thr  phe and lys  were not behaving as expected  Dap RNA is added in h
48. 13521   i 4 96324665 Clear Filter     7 QQ QQ74    s   TET      filter on the  adj P Val  column with a maximum of 0 001    52    1 21E 17  2 01E 17 4 Ascending 4 Descending  2 01E 17  2 27E 17     2 52E 17     187E 16  2 32E 16 By color   3 39   16  5 27   16  6 26E 16 And Or  1 23E 15  1 34E 15  1 38E 15  1 38E 15  1 41   15      Select All         1 23   18  3 11E 15 i  4 20E 15 9g  90E 18  4 99   15   1 21   17  5 10   15  5 20   15         color     Filter    Less Than or Equal         Choose One    Clear Filter       ee                                  60 fx  1371247 at  B   K   D        2 1538135 2 15381345 4 77775773  11 983872  2 11497824 2 11497824 6 4997848 11 6641876  2 50771386 2 50771386 7 19402634 11 5893498   2 0120609 2 01206089 7 44612905  11 468981  2 07024751 2 07024751 8 34658697 11 0926596  2 17655819 2 17655819 8 31144022 11 0138358  2 40106034 2 40106034 5 55231151 10 9837808   2 6722469 2 67224686 6 99242881  10 920723  3 25780076 3 25780076  6 2035536 10 6956697  2 39907696 2 39907696 6 05944553 10 4667296  2 53756687 2 53756687 7 51689923  9 8247405  2 34032412 2 34032412 5 38095574 9 70894747  _  2 55866279 2 55866279 6 1254086 9 68207457  2 63789108 2 63789108 5 61884298 9 33990911   2 3523361 2 35233615 4 86835957  9 092688   4 1096341 4 10963407 10 0033087  8 9794419   2 0626017 2 06260169 8 83099319  8 906937  2 62751169 2 62751169 5 63399946 8 62643898  2 02380876 2 02380876 6 26391359 8 30930684  2 73855152 2 73855152 7 22961663 7 93015146   
49. 2    10       14    12    10          one for each sample           00 L09 NSS    13966009 WSD    739 86009 L NSS    739 960091 NS5    739 S6009 L ASS    780 P6009 NSS               6009 ASS    7190 26009  NSS     H9 F6009LINS    739 06009L INS5    7139 680091 NS5    RNA degradation plot         GSM160089 CEL       GSM160090 CEL       GSM160091 CEL       GSM160092 CEL       GSM160093 CEL       GSM160094 CEL       QGSM160095 CEL       GSM160096 CEL       GSM160098 CEL       GSM160099 CEL         GSM160100 CEL          90           GSM160089 CEL    GSM160090 CEL    GSM160091 CEL    GSM160092 CEL       GSM160093 CEL    QSM160094 CEL    GSM160095 CEL       GSM160096 CEL         GSM160098 CEL    GSM160099 CEL    GSM160100 CEL    v0    60    Ayisuap          aN    co    LO       00    0          GE    06    Ge    01    6              5 pue pejus   Ajisuaju               0    16    14    12    10    10          5   lt       gt  3   Probe Number    log intensity    44       5  160090           14    12    10    8 10       5  160089           12       14     lt   one for each pairwise sample comparison  172     45    Screeplot Principal Components Plot    Cluster Dendrogram          S  z  8    v         T    GSM160089 CEL        GSM160090 CEL    A GSM160091 CEL    GSM160092 CEL  k    GSM160093 CEL co       GSM160094 CEL e     o e v GSM160095 CEL    S o 9   GSM160096 CEL        2       GSM160098 CEL   gt   GSM160099 CEL      GSM160100 CEL  N               e e  o     T e           5    
50. 2 1144882 2 1144882 6 85111303  7 4441184  2 39924695 2 39924695 5 87296475 7 13857143  2 66484111 2 66484111 8 64149113 7 09413281  2 13639985 2 13639985 10 2130565 6 85059104   3 4311758 3 43117575 7 16625921  6 0431623   3 4799808 3 47998084 5 75248374  5 2935557  2 07285108 2 07285108 5 76790947 5 91969991  2 00792833 2 00792833 6 8603389 5 33770041                          44   GSE6943 DE Robina        Normal View 301 of 15923 records found    export probes and gene lists to text files    301 DE probes  http   data bits vib be pub trainingen PPUBMA2014 ex5 files RobiNA DE probes LFC2 FDRO 001  txt        238 DE genes  unique   http   data bits vib be pub trainingen PPUBMA2014 ex5 files RobiNA DE genes LFC2 FDRO 001  txt     all probes in the rat array  http   data bits vib be pub trainingen PUBMA2014 ex5 files RobiN A all probes txt     DAVID  father of enrichment tools    The canonical web tool is DAVID     DAVID has been around since 1997 and is stil very popular although its interface is quite outdated  A recent nature Protocol paper  will help you start  using DAVID     In order to use DAVID  we need to spit our data in two groups        genes  probe IDs  considered differentially expressed  TEST     the remaining of the genes  probes  present on the platform  BACKGROUND    note that the DAVID  Background  tab allows selecting the actual MA chip  Rat Genome RAE230A  Array     57      This division is very important to obtain good results and ensures that only functio
51. 3    Although former HowTo page Analyze_GEO_data_with_GEO2R is already present on the BITS Wiki  we repeat the analysis here with the same dataset used in the    CLC main workbench to be able to compare results of free and commercial solutions  This work was published by  van Lunteren E  Spiegler 5  Moyer M   2  Full  details about this dataset can be found on the http   www ncbi nlm nih gov geo query acc cgi acc  GSE6943 GEO page     The GEO2R interface    The initial window shows several TABs that will be reviewed in the remaining of this tutorial     11       Gene Expression Omnibus  GEO Publications FAQ MIAME Email GEO  NCBI    GEO    GEO2R    GSE6943 Login       Use GEO2R to compare two or more groups of Samples in order to identify genes that are differentially expressed across experimental conditions   Results are presented as a table of genes ordered by significance  Full instructions             GEO accession  556943 Set   Normal Heart vs Normal Diaphragm      Samples Selected 0 out of 12 samples  Columns     Set  Group Accession   Title    source name      Young adult SD rat      GSM160089 Diaphragm 1 Diaphragm    GSM160080 Diaphragm 2 Diaphragm    GSM160081 Diaphragm 3 Diaphragm    GSM160092 Diaphragm 4 Diaphragm    GSM160083 Diaphragm 5 Diaphragm    GSM160084 Diaphragm 6 Diaphragm    GSM160085 Heart 1 Heart  left vent     GSM160096 Heart 2 Heart  left vent     GSM160087 Heart 3 Heart  left vent     GSM160098 Heart 4 Heart  left vent     GSM160099 Heart 5 Heart  le
52. 3 52 4 56 i 0 029922 0 088971  4 87 5 90   0 023069 0 073187  5 68 6 72 i 0 034185 0 098780   Nup37 nucleoporin 37  6 51 7 55  2 06 0 002813 0 014364 Rgs12 regulator of G protein sig     7 72 8 76  2 06 0 000015 0 000266   Tex264 testis expressed 264  6 89 7 93  2 06 0 000589 0 004211   Lpcat1 lysophosphatidylcholine     9 65 10 69  2 06 0 000001 0 000042  7 51 8 55  2 06 0 000588 0 004211 Jmjd8 jumonji domain containi     6 94 7 98  2 06 0 000008 0 000171 Pced1b PC esterase domain cont     6 32 7 37  2 06 0 000566 0 004095  6 97 8 01  2 06 9 07   08 0 000006   Timm22 translocase of inner mito     10 43 11 47  2 06 0 000267 0 002293 Lama2 laminin  alpha 2    7 31 8 35  2 06 0 002312 0 012309 RGD1563888 similar to DNA segment      621 725  2 06 0 048207 0 127986 Epha4 Eph receptor A4     20481024 512  256 128  64  32  16  8 32 64 128 256 512 10242048                      110 1E 11                                                             9 607 01   esueoyiuDis                                                                               Gene Rows  2002 Selected Rows  81 Selected          09 22         6    x Fe ND opa          eoo Windows 8 1 gs 9  9 Ez  E      L  E2 E       a   fymetr IX Analysis 1 tac   RAE230A Analysis Result      Summary      Scatter Plot   Volcano Plot     Chromosome Summary   Hierarchical Clustering  Comparison    Heart       vs  NAME    Search  Prev Next   Show Hide Columns M Export M Bl    Upin Heart vs Diaphragm    0   Down in Heart vs Diaphragm 171
53. 42897453 100 528072562373 5 87151716284703e 21 2 01180346646277e 17 37 3935367235781    11370971 at 8 88483556476391 8 44782914943697 99 9889507276205 6 31728778013807e 21 2 01180346646277e 17 37 3386444895643    11367896 at 8 17319442370653 8 32878048386055 97 7761545656647 8 56617301567765   21 2 27331954881059   17 37 1083132290366 i  11374248 at 7 92028193041167 9 06490481778715 95 9358621573261 1 10936083087999e 20 2 52347893001459e 17 36 9103955684498 i  1368093 at  8 61265104186292 8 23846334693981  81 9875192287634 9 40341381711377e 20 1 87163197762378e 16 35 195893580546  11388139 at 8 60278366808529 8 70065810953839 80 0047250421511 1 31181945592154e 19 2 32090013295985   16 34 9170369337463    Head of  mean_rma_normalized_expression_values  txt   Collapse   Identifier Heart Diaphragm    1367452 at 9 5293566110383 9 5889443412077    11367453 at 10 1758428259477 10 1726921462891    11367454 at 8 87231561088045 8 86593395245015 i  11367455 at 10 4751392501094 10 3325900097593  11367456 at 10  7698324122072 10 286304116399 i  11367457 at 8 75165515455833 8 73159768841635  11367458 at 7 54000953569876  7 71781966096578  11367459 at 11 0420305904582 11 1519497142209  11367460      9 90893064678104 9 5235636384544    Head       raw_rma_normalized_expression_values txt   Collapse      Identifier GSM160089 CEL GSM160090 CEL GSM160091 CEL GSM160092 CEL GSM160093 CEL GSM160094 CEL GSM160095 CEL GSM160096 CEL Gs  11367452 at 9 64295883656058 9 65178912787566 9 32863584112902 9
54. 60095   GSM160096 GSM160097 GSM160098 GSM160099 GSM160100 Gene title Gene symbol Gene ID   UniGene title UniGene symbol UniGene ID Nucleotide Title GI GenBank Accession Platform CLONEID  Platform_ORF Platform_SPOTID Chromosome location Chromosome annotation G0 Function GO Process   G0  Component GO Function ID GO Process ID   G0 Component ID   tissue diaphragm diaphragm diaphragm diaphragm diaphragm diaphragm heart heart   heart heart heart heart   1367452 at 2532 9 2518 6 2384 6 2304 2360 2482 8 3166 2938 9 2953 3 2558 8 3043 3 2711 5 SMT3   suppressor of mif two 3 homolog 2  S  cerevisiae  Sumo2 690244 Rattus norvegicus SMT3   suppressor of mif two 3 homolog 2  S  cerevisiae   Sumo2   mRNA 210147495 NM 133594   10432 3 Chromosome 10       005109 3  104184388  104195497  SUMO ligase activity   protein binding   ubiquitin protein   ligase binding   ubiquitin protein ligase binding cellular protein localization   negative regulation of   transcription  DNA dependent   positive regulation of proteasomal ubiquitin dependent protein catabolic process   positive   regulation of transcription from RNA polymerase II promoter   protein sumoylation   protein sunoylatien   protein   sumoy lation PML body   PML body   nucleus G0 0019789   G0 0005515   G0 0031625   G0  003162 GO  0034613   G0    0045892   60 0032436     60  0045944     60 0016925      60  0016925     60  0016925 GO 0016605   G0  0016605   G0  0005634   1367456 at 6090 8 5352 2 5614 9 5249 6 5834 6 5915 9 3995 3 4356 9 46
55. 71517e 21  317288e 21  566173e 21    adj    1 2059  2 0118  2 0118    1 230329e 18  9 901121e 18    2 273320e 17     detailed results   sep           P  Val B  40 30280  38 51476  38 08932  37 39354  37 33864  37 10831    68e 17  03e 17  03e 17    troponin T type 3  skeletal     Gene title Gene symbol    fast     troponin T type 1  skeletal  slow   myosin light chain  phosphorylatable  fast skeletal muscle  troponin C type 2  fast     carbonic anhydrase 3    50    Tnnt3          1  Mylpf  Tnnc2    2  skeletal muscle  adult   myosin  heavy chain 1  skeletal muscle  adult Myh2   Myhl    Car3    12961 171409    111906 24584  4744 296369  13520 691644   287408  445 54232    v save to new file  outfile  lt   paste robina folder   GSE6943 DE Robina txt   sep       write table data  file outfile  row names F  sep   t   quote FALSE      colnames  data      1   ID   logFC   AveExpr    EU     5   P Value   adj P Val   B   Gene title  i     9   Gene symbol   Gene ID   UniGene title   UniGene symbol  i   13   UniGene  ID   Nucleotide  Title   GI   GenBank Accession    17   Platform CLONEID   Platform ORF   Platform SPOTID   Chromosome location    21   Chromosome annotation   GO Function   GO Process   GO Component      25   GO Function  ID   GO  Process  ID   GO Component  ID     Conclusion    RobiNA is a wrapper for R code  developed and standardized by the authors to run reproducibly and do as they are expected  Although simple in layout  RobiNA appears as a quite performant alternativ
56. 75 1 4570 4 4994 8 4231 6 ubiquitin    conjugating enzyme E2D 3 Ube2d3 81920 Rattus oe ubiquitin conjugating enzyme E2D   3  Ube2d3   mRNA 13676842 NM 031237 Chromosome 2  NC 005101 3    259146589  259174396  ATP binding   acid amino acid ligase activity   ubiquitin protein ligase activity   ubiquitin    protein ligase activity   ubiquitin protein ligase activity   dependent protein catabolic process   proteasomal ubiquitin dependent protein catabolic process   protein K11 linked   ubiguitination   protein K11 linked ubiguitination   protein K48 linked ubiguitination   protein K48 linked    ubiguitination   protein monoubiguitination   protein polyubiguitination   protein polyubiguitination   protein  ubiguitinatien   ubiquitin dependent protein catabolic process endosome membrane   plasma membrane GO  0005524   G0     881   60 0004842   60 0004842   60 0004842 G0 0006281   G0 0006915   G0  0043161   G0 0043161   G0 0070979   G0   0070979   G0  0070936   G0 0070936   G0 0006513   G0 0000209   G0 0000209   G0 0016567   G0 0006511 G0 0010008   G0   0005886  1367459 at 7665 8 7415 9 7075 9 7349 4 6406 7 6664 6 10400 5 9729 2 9679 2 9996 8 9783 7 8333 ADP    fibosylation factor 1 Arfi 64310 10922  Chromosome 10  NC 005109 3  45319018  45334501  complement  GTP binding protein transport   small GTPase mediated    ciansal trancductinn     fwuvacinclaocmadiatad tranenart alni annaratiuc     finarinucleasr ranian      cutanisem                  Profile pathways    Search pathways enr
57. 9   GSM160090   GSM160091   GSM160092   GSM160093        160094 GSM160094  GSM160095              GSM160096           160096   5  160097        60097  5  160098               GSM160099               GSM160100       The comparison is based on the selected test method  The choice of the tail s  allow finding genes less  more or simply differentially expressed     Step 1  Select test and significance level      Two tailed t test  Avs B      Significance level    0 010        Step 2  Select which Samples to put in Group A and Group B       Step 3  Query Group A vs  B    28    The result is a very long list of DE genes with barplot view on the right confirming the expression difference but not really useful as is    Display Settings     Summary  20 per page  Sorted by Default order    Send to     Results  1 to 20 of 4409           1  04221 Next     Last gt  gt      1 Sumo2   Heart left ventricle and diaphragm comparison   1  Annotation              SMT3 suppressor of mif two 3 homolog 2  S  cerevisiae   Organism  Rattus norvegicus  Reporter  GPL341  1367452 at  ID REF   5053224  690244  Gene ID   NM 133594  DataSet type  Expression profiling by array  count  12 samples  ID  51748101  GEO DataSets Gene UniGene          Homologene neighbors       O Ube2d3   Heart left ventricle and diaphragm comparison   2  Annotation  Ube2d3  ubiquitin conjugating enzyme E2D 3  Organism  Rattus norvegicus  Reporter  GPL341  1367456 at  ID REF   5053224  81920  Gene ID   NM 031237  DataSet type  Exp
58. 9 2 9996 8 9783     1367460_at Gdi2 3155 7 2946 9 3589 7 3487 4 3131 5 3338 3198 2991 3 2781 2754 1 2406     pm    Walking through the tools    Several built in tools are ready to serve you on demand  We provide below a rapid overviews of the results obtained by each tool     Find Genes    Allows user interested by a particular gene to get its expression values across samples  This simple feature is of limited interest for most applications    Data Analysis Tools    Find genes           Find gene name or symbol             Compare 2 sets of samples       Find genes that are up down    Cluster heatmaps for this condition s    amp  tissue                Experiment design and          distribution    Compare two sets of samples    This allows fine tuning the system and select two groups of samples that are compared using T test statistics     Data Analysis Tools       Find genes  gt   Step 1  Select test and significance level  Compare 2 sets of samples  2   Significance level    0 100      Cluster heatmaps    One tailed t test  A  gt  B       One tailed t test  A    B   Value means difference  Rank means difference    Step 3  Query Group A vs  8       les to put in Group A and Group B  Experiment design and value distribution    Two tails comparison    The groups are defined using the mouse        Ok       Click on accessions to select samples individually   click on colored blocks and then on blinking arrows to  select groups of samples  Reset             Cancel       GSM16008
59. CAT      I p        5 gt       http   www vib be en training research training courses Pages Analysis of public microarray data sets  aspx     http   www vib be en training research training courses Pages Introduction to A ffymetrix microarray analysis aspx http   www vib be en training research training courses Pages Analysis of public microarray   data using Genevestigator aspx     https   www  bits vib be index php training 177 microarray bioconductor https   www bits  vib be index  php training 125 genevestigator     http   genepattern org      http   www broadinstitute org cancer software GENE E      http   www broadinstitute org gsea      http   tagc univ mrs fr tbrowser              99 CON en de    Cyrille Lepoivre  Aur  lie Bergon  Fabrice Lopez  Narayanan B Perumal  Catherine Nguyen  Jean Imbert  Denis Puthier   TranscriptomeBrowser 3 0  introducing a new compendium of molecular interactions and a new visualization tool for the study of gene regulatory networks   BMC Bioinformatics  2012  13 19    PubMed 22292669    WORLDCAT    DOI   I e     Fabrice Lopez  Julien Textoris  Aur  lie Bergon  Gilles Didier  Elisabeth Remy  Samuel Granjeaud  Jean Imbert  Catherine Nguyen  Denis Puthier  TranscriptomeBrowser  a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database   PLoS ONE  2008  3 12  e4001    PubMed 19104654    WORLDCAT    DOI   I p        9    https   insilicodb com InsilicoDB    Jonatan Taminau  S
60. EL  GSM160095 CEL  GSM160096 CEL  GSM160098 CEL  GSM160099 CEL  GSM160100 CEL        46  06  S   9   S    lt   I lt    S    lt     lt     lt          wasaypas                        Somat ias    fene                 inawa     reck Gew                                      02 09 2014 09 32 29   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160095 CEL  02 09 2014 09 32 29   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160096 CEL  02 09 2014 09 32 29   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160098 CEL  02 09 2014 09 32 29   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160099 CEL  02 09 2014 09 32 29   Opening C  Users splaisan Documents  Affy_data GSE6943_Affy analysis GSE6943_CEL GSM160100 CEL  02 09 2014 09 32 29   Done opening files    v  Library path  C  Users splaisan Documents Affy_data AffyEC_lib User Profile  splaisan_default        09 32     2      x    ND            Toolbox         Report Controls    Configuration      1  Specify user profile    2  Select library path    3  Download library files    4  Download annotation files      5  Specify report controls             The choice of the right method to apply for normalization is not detailed here  please refer to the BITS microarray training session and material for more information about this topic  Introduction  to Affymetrix Microarray Anaysis  https   www 
61. M160096 rma C  Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160096 rma chp  GSM160098 rma C  Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160098 rma chp  GSM160099 rma rma C  Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160099 rma chp  GSM160100 rma C  Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSM160100 rma chp          Heart    GSM160089 rma  GSM160090 rma  GSM160091 rma  GSM160092 rma    Add File s  Here    GSM160093 rma Click to Create New Condition  GSM160094 rma             Each condition must have at least one file   Analysis File  C  Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSE6943_CAT RMA tac Run Analysis       72    Transcriptome Analysis Console 2 0       affymetrix  New Analysis   Open Existing Result   Preferences    4m Gene Level Differential Expression Analysis       Import              Remove Selected     Parse File Names    _  Show Grouped Files                     I Array Type File Type File Path          Heart    GSM160089 rma  GSM160090 rma  GSM160091 rma  GSM160092 rma    Diaphragm    GSM160095 rma  GSM160096 rma  GSM160098 rma  GSM160099 rma    GSM160093 rma GSM160100 rma  GSM160094 rma    Click to Create New Condition                Analysis File   C  Users splaisan Documents Affy_data GSE6943_Affy analysis GSE6943_RMA GSE6943_CAT RMA tac    Browse        Run Analysis       Computing  gene level  Differential expression 
62. Sequence neighbors Homologene neighbors a  The pathways are now very specific to heart biology as expected  FLink   Frequency weighted Links ABOUT HOWTO HELP FAQ NEWS PUBLICATIONS DISCOVER     Links from geoprofiles records to biosystems records weighted by frequency  click to details   v      _  Clear Selections     Show      Download CSV        Summary    BSID   Type Organism  1010675 Metabolisrn organism specific biosystem Rattus norvegicus  1010015 Signal Transduction organism specific biosystem Rattus norvegicus    909656 Adrenergic signaling in conserved biosystem  cardiomyocytes    908278 Adrenergic signaling      organism specific biosystem Rattus norvegicus  cardiomyocytes    198436 WikiPathways Striated Muscle Contraction organism specific biosystem Rattus norvegicus    117251 KEGG Arrhythmogenic right ventricular   organism specific biosystem Rattus norvegicus  cardiomyopathy               115128 KEGG Arrhythmogenic right ventricular conserved biosystem  cardiomyopathy                83442 Calcium signaling pathway organism specific biosystem Rattus norvegicus  458 Calcium signaling pathway conserved biosystem    1009963 Transmembrane transport of organism specific biosystem Rattus norvegicus  small molecules       r r Pi     3      3                  mi       4 Page 1 of 56 b b                    10 Y Displaying BioSystems Records 1   10 af 551    Cluster heatmaps  Two options are presented in this part  the hierarchical clustering and the KMean approaches   Hiera
63. The Broad Institute  http   www broadinstitute org   has developped a number of tools including GenePattern  Gene E  6  without forgetting the famous GSEA platform Ul  DAVID  http   david abcc ncifcrf gov  BITS WIKI  http   wiki bits vib be index php Exercises on Gene Ontology    Enrichr  http   amp pharm mssm edu Enrichr    webgestalt  http   bioinfo vanderbilt edu webgestalt       TranscriptomeBrowser  8  works as standalone on your computer and look very impressive when in good hands  Please follow the webcast  http   tagc univ mrs fr tbrowser index php   option com_content amp task view amp id 35  amp Itemid 28      Please feel free to discover these other ones with more Plant dedicated resoures than above    PlantGSEA http   structuralbiology cau edu cn PlantGSEA analysis php  MapMan  http   mapman gabipd org web guest mapman download  ToppGene  http   toppgene cchmc org prioritization  jsp  gProfiler  http   biit cs ut ee gprofiler   PLAZA  http   bioinformatics psb ugent be plaza   developed at VIB                   http   www biomart org biomart martview  QuickGO  http   www ebi ac uk ego             http   www  arabidopsis  org tools bulk index  jsp  BAR  http   bar utoronto ca welcome htm  BioCyc  http   biocyc org gene search shtml    BioCyc Ath   gt  pathways  http   biocyc org ARA class tree object Pathways    Meta Analysis Resources      InsilicoDB  9  offers similar services by linking the data to the Broad data mining tools GenePattern  amp  Gene E  Please ref
64. WORLDCAT    DOI   I p        10  1 http   omictools com    Vincent J Henry  Anita E Bandrowski  Anne Sophie Pepin  Bruno J Gonzalez  Arnaud Desfeux  OMICtools  an informative directory for multi omic data analysis     Database  Oxford   2014  2014    PubMed 25024350    4WORLDCAT    DOI   I e          Main Page      Retrieved from  http   stelap local BioWareWIKT index php title Hands on Analysis of public microarray  datasets amp oldid 11837   Categories  Training   HandsOn   PUBMA2014       a This page was last modified on 21 October 2014  at 10 39       This page has been accessed 289 times   a Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     PubMA Exercise 1    From BioWareWIKI    Search GEO to find public datasets related to one s project       Gene Expression Omnibus      Main  Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 2      Contents      Introduction   2 Find GEO datasets relevant for your Biological question  3 Get information about GSE6943 used in this session   4 download exercise files    Introduction        The Gene Expression Omnibus  GEO  is a public repository that archives and freely distributes microarray  next generation sequencing   and other forms of high throughput functional genomic data submitted by the scientific community  In addition to data storage  a collection  of web based interfaces and applications are available to help users query and download the studie
65. a preprocessor  Version 1 2 4_build656       Reading mapping files     68      LN   c RobiNA was developed for plants and can use annotation files from the website to add gene descriptions to the differential expression table  For non plant data  users will need to add these annotations    using external tools    47          RobiNA   The transcriptomics data preprocessor  Version 1 2 4_build656                    Design your experiment    You can arrange the groups by  dragging them around    Define which groups shall be  compared by holding down the  CONTROL key and then  click dragging from the first group  to the second group    Right click and choose delete to  delete connections    To combine several groups into   one  metagroup  select all groups  you want to combine  by left clicking  and drawing a selection rectangle  around them  and click    Create Metagroup      Annotate the results    Using the functional annotation data provided by the MapMen  project you can annotate your raw result files  Please double click the  species you are working with and select a matching mapping file       Installed Mappings    gt     Arabidopsis thaliana  thale cress    gt     Arabidopsis thaliana    gt     Glycine max  soybean      gt      unknown species    gt     Medicago truncatula    gt     Medicago truncatula  barrel medic    gt      Oryza sativa  rice     gt     Populus trichocarpa                   Show expert settings       r Expert settings    Normalisation  rma  il  gt   21
66. agm  5    1  GSM160095 rma chp  2   GSM160096 rma chp  3   GSM160098 rma chp  4   GSM160099 rma chp  5   GSM160100 rma chp                   Adjusting Differential expression limits    The filtering values can be adapted by the user to restrain or increase the DE gene list and new plots generated     Adjusting the differential expression limit    Fold Change ANOVA p  FDR p value   linear   Heart value  Heart vs   Heart vs   vs  Diaphragm       Diaphragm      Diaphragm     1 86E 11 V  0 00E00  0 00E00 Ti          Fold Change  linear   Heart vs  Diaphragm  X      Or O And                     P          a nar antes    Adjusting the limit for the adjusted p value    74                      FDR p value  value  Heart vs   Heart vs   Diaphragm   gt  Diaphragm     Gene  Symbol    1 86   11         2  7 11   15  1 86E 11 Myh1                         0 00E00  0 00E00  Tnnt3 Itropon  2 09E 13  2 28E 10  Actn3 lactinin                     QLA    ANOVA p value  Heart vs  Diaphragm          Or O And    05 1             Cancel          Plots based on the filtered differential expression table    Additional graphs can be obtained to view the data from different angles  The scatter plot highlights potential differences between UR and DR genes between the groups  The graphs are interactive  and the user can query the full data to find which probesets or genes are UR or DR using the mouse and selecting area around points           x x  x  x     M  ax 7    x                x xX            X x   
67. aling    1 84E 12   2 37E 03  3 55E 11   3 96E 03  1 42E 10   4 62E 03  1 42E 10   4 62E 03  6 36E 09   2 91E 03    108  135  149  172  61    Physiological System Development and Function                  Much more information and tools are available in IPA to continue the analysis and identify markers and targets for validation     If you are a VIB scientist with 2 5 interested VIB colleagues  you may ask for a free custom 1        training provided by BITS inside your lab  Please email us to discuss the possibilities    download exercise files   Download exercise files here   Expand   References      Main_Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 6      Retrieved from  http   stelap local BioWareWIKI index php title PubMA_Exercise 7 amp oldid 11836   Category  PUBMA2014      This page was last modified on 21 October 2014  at 10 37     This page has been accessed 2 times     Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     94    
68. aphragm  1   5  160093   Value for GSM160093  Diaphragm 5  src  Diaphragm   V GSM160094   Value for GSM160094  Diaphragm 6  src  Diaphragm  VGSM160095   Value for GSM160095  Heart 1  src  Heart  left vent   VGSM160096   Value for GSM160096  Heart 2  src  Heart  left vent   VGSM160097   Value for GSM160097  Heart 3  src  Heart  left vent   VGSM160098   Value for GSM160098  Heart 4  src  Heart  left vent   VGSM160099   Value for GSM160099  Heart 5  src  Heart  left vent    65  160100   Value for GSM160100  Heart 6  src  Heart  left vent   head GDS3224 data txt   column  t        REF IDENTIFIER GSM160089 GSM160090 GSM160091 GSM160092  11367452 at Sumo2 2532 9 2518 6 2384 6 2304  11367453 at Cdc37 3464 2 3197 4 3487 1 3133 2  11367454 at Copb2 1620 8 1870 5 1538 6 1334  11367455 at Vcp 5512 5 4103 9 5746 5 4393 6  11367456 at Ube2d3 6090 8 5352 2 5614 9 5249 6    GSM160093 GSM160094 GSM160095  2360 2482 8 3166  3432 5 3486 9 3860 2  1502 9 1520 3 1849 8  5870 7 5851 2 5408 8  5834 6 5915 9 3995 3    27    GSM160096  2938   3429     1852    4682   4356     9  4    6    GSM160097  2953   3381   1858   4734   4675     3        1 QI N     Collapse   GSM160098   5  16  2558 8 3043   4131 3 4364   1483 3 1766   4403 7 3940    4570 4 4994     11367457 at Becnl 1093 9 1134 3 736 4 1219 774 9 712 2 892 1 998 9 782 3 710 8 923 2    11367458 at Lypla2 347 8 223 9 261 4 338 8 249 6 363 7 422 2 409 9 273 3 492 2 458  1367459_at Arfl 7665 8 7415 9 7075 9 7349 4 6406 7 6664 6 10400 5 9729 2 967
69. ata    A number of QC plots are produced to Inspect the normalized data          83    log2 PM by array for raw data    716008450  001090955    6200911459                26009 WSS                 5     60091959          04590  6000 1459   9600911499   6600911959          Density plots of 1092 PM by array       Convert data with the Convertor tool  84    The companion tool RMA convertor allows direct batch conversion without QC plots                 RMAExpress Data Convertor  CEL CDF conversion PGF CLF conversion    mm mid                  CEL File Directory  work TUTORIALS Analysis_of_puk   Browse   eee oe                     5 57 2 s    I7 ry  CDF file  work TUTORIALS Analysis of put   Browse   Arrays in Buffer     J  30             Restrict File     Browse   Probes in Buffer     ITT 25000    Force                 Temporary File Location     tmp       Choose Dir                 Output directory  work TUTORIALS Analysis_of_puk   Browse          About     Preferences     Convert     Final result  The resulting data for the GSE experiment is shown below  top 5 lines by first 5 columns for readability     Probesets GSM160089 CEL  GSM160090 CEL  GSM160091 CEL GSM160092 CEL    11367453 at 10 155364 10 249902 10 186186 10 096563 i    11367452 at 9 642959 9 651789 9 328636 9 484756  11367454 at 8 739058 9 082530 9 071610 8 687629  11367455 at 10 476362 10 447124 10 571050 10 326480    download exercise files    Download exercise files here   Expand                     References   1
70. atmaps derived from manual curation of some of the GEO datasets selected by the NCBI team  This  processing is otherwise laborious and requires  R  skills that not every scientist can build  More info about this toolset can be found on the NCBI help page     http   www ncbi nlm nih gov geo info datasets html  7     The web interface allows finding co expressed genes in bi clusters that have been shown to often belong to common signaling pathways or result from transcriptional co regulation by  common key regulators  TFs or pathways      The GEO Dataset browser    The remaining of this page shows key view obtained by navigating in the GEO Dataset browser Web interface    Start the tool    Start by locating the GDB link in the dataset information page found on the http   www ncbi nlm nih gov bioproject PRJNA98125 GEO BioProject page or search for the GDSIDF if  you know it     25    Display Settings  Send to   Normal Heart vs Normal Diaphragm  Norway rat  Accession  PRJNA98125 10  98125  Comparison of gene expression of heart  left vent  and diaphragm of normal Sprague Dawley    rats  young adult Keywords  Cell type comparison Overall design  6 diaphragm samples 6 heart  samples       Project Data Type  Transcriptome or Gene expression  Attributes  Scope  Multiisolate  Material  Transcriptome  Capture  Whole  Method type  Array  1650 additional    projects are related  Other Accessions  GEO GSE6943 by organism        Relevance  Model Organism    Project Data     Resource Name N
71. atter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM180089 CEL vs      home splaisan Desktop Robi   ults GSE6943_CEL GSM160092 CEL    Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs     Ihome splaisan Desktop Robi   ults GSEB943 CEL GSM180093 CEL       Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943_CEL GSM160089 CEL vs     Ihome splaisan Desktop Robi   ults GSE6943 CEL G5M180094 CEL       X  OX  UNS             Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs      home splaisan Desktop Robi   ults GSE6943 CEL GSM160095 CEL          Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM180089 CEL vs                                a Previous     Next       a Previous     Next                     Step 3 of 4       Step 3 of 4             Quality check results    Click in the list to open  a fullsize view of the results     Chips showing very poor PLM  results may be excluded from  further analyses by checking   the  Exclude  box     Scatter plot  of file  shome splaisan Desktop Robi   ults GSE6943_CEL GSM160089 CEL vs   Ihomej splaisaniDesktop Robi   ults GSE6943 CEL GSM160090 CEL       Scatter plot  of file  shome splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs   Ihome splaisan Desktop Robi   ults  GSE6943 CEL G5M1860091 CEL    Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs   thome splai
72. bits vib be index php bits search results searchword Intro   20A ffymetrix  amp searchphrase all    The normalization method is selected  from a pop down menu     Available Analyses       The process takes some time and leads to a summary page and saves new files to the disk with extension   chp  containing the normalized data  one for each imported CEL file  The   chp  files are  ready for import in the TAC tool    70                                                                         eoo Windows 8 1        E P b             E Expression Console    Affymetrix Study    B x        File Edit Report Graph Analysis Tools Export Window Help              35                         B  X      Updates Available  File Threshold Test S      Probe Cell Intensity Data   L GSM160089CEL      GSM160090 CEL ae   GSM160091 CEL  2  Open Existing Study  GSM160092 CEL  3  Add Intensity Files  GSM160093 CEL  4  Add Summarization Files  GSM160094 CEL  5  Save Study   GSM160095 CEL   Close Study   GSM160096 CEL       GSM160098 CEL   GSM160099 CEL   GSM160100 CEL   RMA   Group 1   GSM160089 ma chp Within Bounds 1092   GSM160090 ma chp Within Bounds      092   GSM160091 ma chp Within Bounds      092   GSM160092 ma chp Within Bounds 1092   GSM160093 ma chp Within Bounds 1092   GSM160094 ma chp Within Bounds 1092    65  160095             Outside Bounds  092    65  160096             Outside Bounds  092   GSM160098 ma chp Within Bounds 1092    65  160099             Outside Bounds log2      GSM16010
73. chr  interactive and collaborative HTMLS gene list enrichment analysis tool    BMC Blioinformatics  2013  14 128    PubMed 23586463    4WORLDCAT    DOI   I e     Alexander Lachmann  Avi Ma ayan   Lists2Networks  integrated analysis of gene protein lists   BMC Bioinformatics  2010  11 87    PubMed 20152038    WORLDCAT    DOT   I e        5    http   bioinfo vanderbilt edu webgestalt     Stefan Kirov  Ruiru Ji  Jing Wang  Bing Zhang   Functional annotation of differentially regulated gene set using WebGestalt  a gene set predictive of response to ipilimumab in tumor biopsies   Methods Mol  Biol   2014  1101 31 42    PubMed 24233776    WORLDCAT    DOT   I p     Jing Wang  Dexter Duncan  Zhiao Shi  Bing Zhang  WEB based GEne SeT AnaLysis Toolkit  WebGestalt   update 2013   Nucleic Acids Res   2013  41 Web Server issue   W77 83     PubMed 23703215    WORLDCAT    DOI   I p     Bing Zhang  Stefan Kirov  Jay Snoddy   WebGestalt  an integrated system for exploring gene sets in various biological contexts   Nucleic Acids Res   2005  33 Web Server issue   W741 8    PubMed 15980575    4WORLDCAT    DOT   I p        http   bioinfo vanderbilt edu webgestalt WebGestalt manual 2013 04 12 pdf    porn       Main  Page   Hands on Analysis of public microarray datasets             _          1      4   PubMA_Exercise 5      PuUDMA Exercise 6      Retrieved from  http   stelap local BioWareWIKTU index php titleZPubMA Exercise 5 amp oldid 10960   Category  PUBMA2014      This page was last modified
74. d genes   Upregulated genes      Heart   Diaphragm   Heart   Diaphragm  14876 15480  Downregulated genes n       Heart   Diaphragm  15319     In addition to the plots  several text tables        saved in  detailed_results  and sampled below   Analysis summary  Collapse                                                                                   Robin affymetrix data analysis summary    RobiNA results   V 8 18 2014  16 35 28                                         f Input files     49    if Normalization settings for quality control  normalization method  rma     P value correction method  BH       analysis strategy  Limma         Normalization settings for main analysis  1  mormalization method               P value correction method  BH       Multiple testing strategy  nestedF  P value cut off value for significant differential expression  0 05  iGenes that showed a log2 fold change smaller than two ignored  yes    uff The analysis produced the following warnings                Head       full_table_Heart   Diaphragm txt   Collapse   ID logFC AveExpr t P Value adj P Val B  11371247 at 8 8690486752098 9 2066061992338 138 180384786338 7 72673860255015e 23 1 23032858768406   18 40 3028038935592  1370412 at 8 00700839973324 8 66636177255196 112 671243745562 1 24362501567467e 21 9 90112056229389e 18 38 5147635203019  11387787 at 8 51384348690296 10 0136513602682 107 791218026326 2 27212516655303e 21 1 2059683009008e 17 38 0893208171574    11372195 at 9 44110458586408 9 10616
75. e   used feature with highest              results of the hypergeometric test for Pathways    Hyper geometric test for association of annotation categories to a sublist of a larger gene list         Aug 13 14 49 36 CEST 2014   Version  CLC Main Workbench 7 0 3  User  splaisan    Parameters     Gene identifier column used in tests   Gene symbol   Annotation column used in tests   GO molecular function   Raw universal gene list size   15923   Used universal gene list size  requiring annotation and one feature per gene only    8969  Raw subset qene list size   142   Used subset gene list size  requiring annotation    74   Expression values used when filtering to one feature per gene   Transformed expression values  Applied filter to reduce features to one per gene   true   Filter applied to reduce features to one per gene   used feature with highest IQR    GSEA    The GSEA method does not require partitioning the data as for the hypergeometric test  it takes the full table and considers the relative ranking of gene list members in relation to the    individual gene expression levels in the data     m Settings for the GSEA test for GO BP    Gene set enrichment analysis  Wed Aug 13 12 42 16 CEST 2014   Version  CLC Main Workbench 7 0 3  User  splaisan    Parameters     Gene identifier column used in tests   Gene symbol   Annotation column used in tests   GO biological process   The features were ranked on   t statistic   group comparison   Heart   Diaphragm   Raw universal gene li
76. e 07    I 682930 24837 29658 29275 29248 84396 29556 25399 117557 24239 689560 C 70 O 12 E 1 54 R 7 78 rawP 3 39e   Cardiac muscle contraction 12 116600 08 adjP 6 10e 07  Tight junction 11 83807 171009 24584 289759 691644 85420 29556 360543 81755 287408 307505 C 101 O 11 E 2 23 R 4 94 rawP 1 30e   05 adjP 0 0002  Vascular smooth muscle contraction 10 682930 81636 85420 362039 58965 24239 29354 64532 24173 117558 Vobis      a aa qi              682930 81636 116601 296369 64561 24239 689560 29353 24245 24173 117558 C 162 0 13 E 3 57 R 3 64 rawP 5 87e              signaling pathway 5 64672 114207 05 adjP 0 0006  GnRH signaling pathway 9 682930 81636 60352 362039 24239 29354 64532 114495 24245 C 62 0 9 E 1 81 R 4 98 rawP 7 60e   05 adjP 0 0007  Pancreatic secretion 9 54242 81779 81636 29354 689560 64532 116601 84396 362039 C 85 0 9 E 1 87 R 4 80 rawP 0 0001 adjP 0 0008  Insulin signaling pathway 8 25058 64561 50671 114508 689995 114203 29353 25739 C 113 0 8 E 2 49 R 3 21 rawP 0 0035 adjP 0 0252     Wiki pathways     59            WEB based GEne SeT AnaLysis Toolkit    Translating gene lists into biological insights       WebGestalt    User data and parameters  User data  textAreaUpload txt  Organism  rnorvegicus  Id Type  affy_rae230a  Ref Set  affy_rae230a  Significance Level  Top10  Statistics Test  Hypergeometric  MTC  BH  Minimum  2    This table lists the enriched Wikipathways  number of Entrez IDs in your user data set for the pathway  the corresponding Entrez IDs  and the 
77. e GEO accession number  GSExxx      Loading your own Affymetrix microarray data into the Workbench    The Workbench assumes that expression values are given at the gene  probe set  level  thus probe level analysis of Affymetrix arrays and import of Affymetrix CEL  and CDF files is  not supported  However  you can import your own Affymetrix data via two ways     m as  CHP files generated by Affymetrix Expression Console containing normalized Affymetrix data  See the section on how to convert  CEL files to  CHP files using the Expression Console   http   wiki bits vib be index php Analyze GEO data with the Affymetrix softwareZConverting         data to        format required for         for a detailed discussion  on how to do this  Use RMA for the normalisation     m as  txt files exported from R containing normalized Affymetrix data    62    Expand this section to see how you can      the normalization in R   hide        To create these txt files  open R  http   www r project org       RStudio  http   www rstudio com   as administrator and install the following packages      Matrix    lattice    f  drtool    rpart    File Edit View Misc Windows Help Vignettes    Load package     E2 R Console Set CRAN mirror     Select repositories       Install package s        R version 3 1 1    Update packages     Copyright  C  2     P 9      Platform  x86 6  Install package s  from local zip files              Then run the following code       Install all required Bioconductor packages  iso
78. e GEO2R script and adds gene symbols and  additional annotations to the RobINA table   R code to annotate the RobiNA data  Collapse       Add annotations to the RobiNA full table  d the code below is borrowed to the GEO2R code and adapted  library  GEOquery       make sure that the surrent directory is set to folder enclosing the  RobiNA results  folder  base     getwd        load RobiNA full table in a data frame  robina folder  lt   paste base   robina data  lt   paste robina folder   full table Heart   Diaphragm txt                wobina  full  lt   read delim robina data  as is       RobiNA results        load NCBI platform annotation    gpl  lt       GPL341   platf  lt   getGEO gpl  AnnotGPL TRUE  destdir      TRUE     base     mcbifd  lt   data frame attr dataTable platf    table       v replace original platform annotation  data  lt   merge robina full  ncbifd  by  ID    data  lt   data order dataSP Value       restore correct order      preview first 10 columns  head data   c 1 10       gt  head data   c 1 10        ID logFC AveExpr t  13796 1371247_at 8 869049 9 206606 138 18038  12961 1370412 at 8 007008 8 666362 112 67124  111906 1387787 at 8 513843 10 013651 107 79122  i4 744 1372195 at 9 441105 9 106164 100 52807  13520 1370971 at 8 884836 8 447829 99 98895  445 1367896 at 8 173194 8 328780 97 77615 8   13796   12961   111906   14744   3520 myosin  heavy chain   445   i Gene  ID   13796 24838    7   1   2   5   6     P Value  726739e 23  243625e 21  272125e 21  8
79. e and diaphragm    muscle in expression of genes involved in carbohydrate and lipid metabolism  Respir Physiol    DataSet Record GDS3224   Expression Profiles    Data Analysis Tools    Sample Subsets       Go      Gene Expression Omnibus    Cluster Analysis       Download    DataSet full SOFT file  DataSet SOFT file  Series family SOFT file  Series family MINIML file  Annotation SOFT file    Getting the full table of expression values can be important to extract multiple gene values or any other aim you could have in mind  Tou can get the full dataset from the Download item    on the right of the window    DataSet Record GDS3224   Expression Profiles    Data Analysis Tools    Sample Subsets     Title  Heart left ventricle and diaphragm comparison  Summary   transcriptional strategies for ensuring long term energy supplies in these two muscles   Organism  Rattus norvegicus  Platform  GPL341   RAE230A  Affymetrix Rat Expression 230A Array  Citation     20 161 1  41 53  PMID  18207466  GSE6943  count    Reference Series   Value type     Sample count     Series published     The resulting text file contains a 49 lines header that may pose problems in Excel    split the file in two using some CLI magic       Analysis of normal heart left ventricle and diaphragm of young adult Sprague Dawley males  Concurrent  rhythmic contractions of the diaphragm and heart are needed to sustain life  Results provide insight into    van Lunteren E  Spiegler S  Moyer M  Contrast between cardiac lef
80. e file   outfile     sessionInfo      R version 3 1 1  2014 07 10   Platform  x86 64 apple darwinl10 8 0  64 bit     locale     1  en US UTF 8 en US UTF 8 en US UTF 8 C en US UTF 8 en US UTF 8    attached base packages     1  parallel stats graphics  grDevices utils datasets methods base  other attached packages     1  GEOquery 2 31 1 Biobase 2 25 0 BiocGenerics 0 11 4 limma 3 21 12  loaded via a namespace  and not attached      1  RCurl 1 95 4 3 tools 3 1 1 XML 3 98 1 1    Version info  R 2 14 1  Biobase 2 15 3  GEOquery 2 23 2  limma 3 10 1  R scripts generated Mon Oct 13 08 43 55 EDT 2014    ug    VETE TETITIETETHRHEIHIEETIHTIHIHIHIHPHEIHIEETHIHIHIHIPIPHIHIHIEHTHIHIHHIPIPHEIHEHIHIHIHHIHHIEHIHIHHIHIHIHIHIHIHIHIHIHIHIEIE    ur Differential expression analysis with limma    1                             i  library GEOquery       library affy     22     load CEL files from GEO  igetGEOSuppFiles  GSE6943      untar  GSE6943 GSE6943 RAW tar   exdir  data      cels  lt   list files  data    pattern     gz      sapply paste  data   cels  sep       gunzip   icels    V path to the folder in which R saved the CEL files             of the files  GSM160097  has been corrupted  you have to remove it from the folder    icelpath  lt    C  Users Janick Documents data      fns  lt   list celfiles path celpath full names TRUE      15    icat   Reading files  n  paste fns collapse   n     n    celfiles  lt   ReadAffy celfile path celpath       download exercise files    Download exercise file
81. e in expression of genes involved in carbohydrate and lipid metabolism     Respir Physiol Neurobiol  2008  161 1  41 53   PubMed  18207466    WORLDCAT    DOI   P p        http   www ncbi nlm nih gov geo query acc cgi acc  GSE6943      Main Page   Hands on Analysis of public microarray datasets   PUbMA_Exercise 5   PubMA_Exercise 6      Analyze GEO data with the Affymetrix software      Retrieved from  http   stelap local BioWareWIKT index php titleZPubMA  Exercise 6 amp oldid 1 1808   Category  PUBMA2014      This page was last modified on 20 October 2014  at 16 21     This page has been accessed 88 times   m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     68    Analyze GEO data with the Affymetrix software    From BioWareWIKI    Analyzing a selected GEO dataset using the Affymetrix Expression Console  EC  and Transcriptome Analysis Console  TAC          affymetrix    Biology for a better world      Main_Page   Hands on Analysis of public microarray datasets      Contents       Introduction    2 The Affymetrix Expression Console  EC      2 1 Converting CEL data to CHP format required for           2 2 Performing QC on the data and generating summarizing plots    3 The Affymetrix Transcriptome Analysis Console  TAC     3 1 Importing EC data and defining Groups    32 Computing  gene level  Differential expression     3 3 Adjusting Differential expression limits    34 Plots based on the filtered differential expressio
82. e interested people to follow the four PDF files also present on the server        Expression_analysis_part_I pdf  http   data bits vib be pub trainingen PPUBMA2014 ex6 files Expression analysis  part I pdf    m Expression analysis part II pdf  http   data bits vib be pub trainingen lPPUBMA2014 ex6 files Expression analysis           II pdf      Expression analysis part  III pdf  http   data bits vib be pub trainingen PPUBMA2014 ex6 files Expression analysis          III pdf   m Expression analysis          IV  pdf  http   data bits vib be pub trainingen PPUBMA2014 ex6 files Expression  analysis part IV  pdf     A recompiled version of this tutorial is part of a former BITS CLC training accessible Here  http   data bits vib be pub trainingen CLCMain TutorialMicroarrays pdf      Loading microarray data into the Workbench    The Workbench supports analysis of one color expression arrays  These may be imported from GEO  http   www  ncbi nIm nih gov    The Workbench supports the following formats     s  GEO SOFT sample files  http   www  ncbi nlm nih gov geo info soft2 html   simple line based  plain text files that contain all the data and the descriptive information of a  microarray experiment  example SOFT file  http   www  ncbi nlm nih gov geo info soft ex platform txt        GEO series file  txt files containing the definitions of a group of related samples  They contain tables describing extracted data  summary conclusions and analyses  Each  Series file is assigned a uniqu
83. e top  more top notch analysis methods reported in the literature  A computer with sufficient resources is wished to perform QC steps in a reasonable time but current strong laptops and desktops should do the job     For those who cannot afford the more expensive CLC Main workbench and do not wish to learn  R  and Bioconductor  RobiNA seems a good choice when in need of MA analysis  and it can also do standard  RNASeq analysis   as will be described in a separate hands on     The complete file is accessible here  http   data bits vib be pub trainingen PUBMA2014 ex4 files RobiNA results detailed_results GSE6943_DE Robina txt   use right click download linked file as or navigate  from the page bottom link     download exercise files    Download exercise files here   Expand     References   LT    Marc Lohse  Adriano Nunes Nesi  Peter Kr  ger  Axel Nagel  Jan Hannemann  Federico M Giorgi  Liam Childs  Sonia Osorio  Dirk Walther  Joachim Selbig           Sreenivasulu  Mark Stitt   Alisdair R Fernie  Bj  rn Usadel   Robin  an intuitive wizard application for R based expression microarray quality assessment and analysis    Plant Physiol   2010  153 2  642 51     PubMed 20388663    WORLDCAT    DOI   I p        M http   www  affymetrix  com estore       Main Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 3   PubMA_Exercise 4      PubMA_ Exercise 5      Retrieved from  http   stelap local BioWareWIKI index php title PubMA_ Exercise 4 amp oldid 11748   Category 
84. e treated group  it will be colored       pink   The order is important for calculating log fold changes later in the analysis  If you reverse the order  genes that are upregulated according to the publication  that supports the data will be downregulated in your results and vice versa    The list of samples in each group can be reviewed by clicking on List in the group definition popup window         Define groups       5 LS E   rE    L      Sample Groups x       Diaphragm  GSM160089  GSM160090  GSM160081  GSM160082  GSM160083  GSM160094 P    Heart   55  160095  GSM160096  GSM160097  GSM160098  GSM160099  GSM160100       Visualize the distribution of log transformed expression values    Before proceeding with DE analysis  it is very important to first control for sample value distribution homogeneity in the  Value distribution  TAB     GEO2R Value distribution Options Profile graph R script       Calculate the distribution of value data for the Samples you have selected  Distributions may be viewed graphically as a  box plot or exported as a number summary table  The plot is useful for determining if value data are median centered  across Samples  and thus suitable for cross comparison  More       View Export       13    GSE6943 GPL341  selected samples    L3 Diaphragm          DrHeat T 7 7 T  1200      1 d      1000    800      600      400  200  0        w    e                   Oo                   o o                    e e e e e e e e e e e       eo e                   
85. eased      Diaphragm       A   B        D E   F   1                    Description  22222  TiSie     Test            Lower taisi  Upper tai v      2  55114  ioxationreduction process    essct  473  1 285285603  101000     3  B6005    regulation of ventricular cardiac muscle cell action potential 113     28 4480485   1 d           B6    reguiation of heart contraction    25 1214227626   09999 1 00004       _5  55010  Ventricular cardiac muscle tissue morphogenesis ER 771189372308   0 599   00004      6  6099  triarboxylicaeidcycle     U U U III  a 124 _ 17 8151236  0 9994   0 0006     7  86004           7a LISSE P 88988 09   8 86091 regulation of heart rate by cardiac conduction         su  15   18 0252284   0 3993   00007    9  51291      protein heterooligomerization        16 8587409 i 09992   00008    10  2026 __ regulation ofthe force of heart contraction        21 _   17 9066909   0 9992   0 0008          Published results    The original publication ends with functional enrichment results identifying key differences between  heart  and  diaphragm  tissues in rats  We link here to the results published by     van Lunteren E  Spiegler 5  Moyer M       Full details about this dataset can be found on the http   www ncbi nlm nih gov geo query acc cgi acc  GSE6943 GEO page     download exercise files    Download exercise files here   Expand     References     1       Erik van Lunteren  Sarah Spiegler  Michelle Moyer  Contrast between cardiac left ventricle and diaphragm muscl
86. either only a command line interface or solely very basic user interaction     Finally  there are tools such as RMA Express which offer a rich user interface but only a very limited set of options     RobiNA tries to bridge this gap by providing a flexible user friendly graphical interface to unleash the power of R BioConductor for the individual biologist  RobiNA comes as a convenient all in one    installation package  that automatically installs the application itself plus all required external tools  i e  the R and BioConductor frameworks and bowtie            STU     7 77  Although RobiNA can handle several data type including two color microarray and even RNASeq data  we here provide a simple tutorial to perform QC and differential analysis of Affymetrix microarray     data comparable to what can be achieved using the CLC        workbench    Please see the RobiNA quick guide  http   mapman gabipd org c document library get file uuid 2a09272e9 e474 402e a554 b03d6ec9efd6 amp groupId 10207  and Robin and RobiNA user s manual   http   mapman gabipd org c document_library get_file  uuid 60912d03 660e 4281  9834 22f2789424d2 amp  groupId 10207  which contain step by step walk through and detailed information  RobinA         be downloaded from http   mapman gabipd org web guest robin    Required before starting RobiNA  RobiNA working great     or NOT    ST uy       RobiNA should work under Windows                 Unix as it uses Java which is    universal language  However   
87. ene neiqhbors    29       Send to     Help    Filters               Filters    Profile data            Download profile data      1    Profile pathways           7     Find pathways    gt  2    Find related data  Database    Select                Recent activity    Turn Off Clear     Heart left ventricle and diaphragm  comparison GDSBrowser  Q   GDS3224 ACCN   AND GDSffilter   1   GDSBrowser       GEO DataSets for BioProject  Select  98125   2  GEO DataSets     Normal Heart vs Normal Diaphragm  BioProject    Boletus calopus  taxonomy  See more          Profile data      e    Page of 221 Next   Last gt  gt            peewee      mon    Download profile data      Download the value data  red bars  for each profile on this page    Download files are tab delimited and suitable for opening in a spreadsheet  application such as Excel     Retrievals that incorporate multiple DataSets are organized by DataSet blocks     Experimental factor and          annotation information is included     A download file includes profiles shown on the current           under  Display Settings      set    Items per page    to 500 to get the maximum number of profiles     To download values for a complete DataSet  please use the    DataSet full SOFT file     link available on the DataSet record    Note  Cross DataSet normalizations are not performed  direct comparisons of values  between different DataSets are not appropriate     ID REF GSM160089 GSM160090 GSM160091 GSM160092 GSM160093 GSM160094 GSM1
88. er 2014  at 13 58     This page has been accessed 94 times     m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     17    18    PubMA Exercise 2b    From BioWareWIKI    Follow up analysis demo in RStudio          Studio      Main Page  Hands on Analysis of public microarray datasets   PubMA_Exercise 1   PubMA_Exercise 2      PubMA_Exercise 3         Contents        Introduction  2 installing on    non BITS laptop      3 Extend the GEO2R analysis in R with RStudio    3 1 adapt the GEO2R script in RStudio     3 1 1 Original GEO2R code     3 1 2 Improved code  4 download exercise files    Introduction    The former exercise produced a full table with differential expression between Heart and Diaphragm samples that can be used directly with  Functional enrichment tools or IPA  The R script used by GEO2R is quite standard and basic and only provides BoxPlots for signal    distribution  This is a minimum when doing MA data analysis and users often appreciate to have some more QC done on the data to control    for biases or inconsistencies associated with artifacts or with variability in the experiment     Expert analyst makes use of the  R  Bioconductor toolbox to further analyze their data  Some examples of standard R functions are shown  below to show you the power of this programing language  The following exercise is provided as an appetizer and is far not exhaustive  you    will find many more methods and tools by expl
89. er to the InsilicoDB tutorial pages  https   insilicodb com category tutorials   for more  info     Commercial resources licensed by VIB      Genevestigator not covered during this training but warmly recommended for all users who do not have their own MA data but need to find biomarkers      CLC Main workbench  http   data bits vib be pub trainingen CLCMain TutorialMicroarrays pdf  used in the optional PubMA_Exercise 6     Ingenuity Pathway Analysis  IPA  is strongly advised for more advanced users usage  You can use        on any Java installed computer after asking for a personal account to mailto bits vib be and login in here   https   apps ingenuity com ingsso login   service https  3A   2F   2Fanalysis ingenuity com   2F pa   2Fj spring cas security check amp originalUrl https          2F  2Fanalysis ingenuity com   2F pa   3Futm_source   3D Ingenuity   26utm_medium  3D Website   26utm_campaign   3DIPA LoginPage     Please keep in mind that IPA is only meant for human mouse rat data        Do you still need MORE     Find more tools with OMICtools  http   omictools com          References     1  1    Tanya Barrett  Ron Edgar   Mining microarray data at NCBI s Gene Expression Omnibus  GEO     Methods Mol  Biol   2006  338 175 90    PubMed  16888359    WORLDCAT    DOI   P p     Ron Edgar  Michael Domrachev  Alex E Lash   Gene Expression Omnibus  NCBI gene expression and hybridization array data repository   Nucleic Acids Res   2002  30 1  207 10    PubMed 11752295    WORLD
90. es allows plotting their profiles and exporting the results to file    GDS3224  Heart left ventricle and diaphragm comparison  Rattus norvegicus      D53224 lage caunt  vs  samples  15        eu  e                            GSM160089  GSM160090 E  GSHM160091  GSM160092 bo  GSM160093 a  GSM160095  GSM160096  GSM160097  GSHM160098  GSM160099  GSM160100     gt    1298121       295824 A    1178713   Ctnnal   Pudk    Fads3    Tnsi    Adami3             Camked    Download displayed data Show heat map region View profiles in Entrez       Experiment design and value distribution    This QC tool plots box plot for each sample and allows evaluating the global quality of the dataset  A good dataset has a constant median line and similar distributions     36    Profile GD83224  Title Heart left ventricle and diaphragm comparison  Organism Rattus norvegicus      053224    95   90   75   median  25   10   5     GSM160089   Diaphragm 1  GS5M160000   Diaphragm 2    GSM160091   Diaphragm 3    rw 07 Heart3    CSM L60009    download exercise files    Download exercise files here     References     1  1 http   www ncbi nlm nih gov sites GDSbrowser   2    http   www ncbi nlm nih gov geo info datasets html      Main Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 2   PubMA_Exercise 3                    _ Exercise 4      Retrieved from  http   stelap  local BioWareWIKI index php title PubMA_Exercise 3 amp oldid 10946     Category  PUBMA2014      This page was last modified 
91. f  lt   getGEO gpl  AnnotGPL TRUE    iplatf  lt   getGEO gpl  AnnotGPL TRUE    ncbifd  lt   data frame attr dataTable platf    table         replace original platform annotation   tT  lt   tT setdiff colnames tT   setdiff fvarLabels gset    ID      tT  lt   merge tT  ncbifd  by  ID     tT  lt   tT order tT P Value       restore correct order         lt   subset tT  select c  ID   adj P Val   P Value   t   B   logFC   Gene symbol   Gene title     write table tT  file stdout    row names F  sep   t                                            Boxplot for selected GEO samples   library  Biobase     Library  GEOquery        load series and platform data from GEO      gset  lt   getGEO  GSE6943   GSEMatrix TRUE    igset     getGEO  GSE6943   GSEMatrix TRUE    if  length gset   gt  1  idx  lt   grep  GPL341   attr gset   names    else idx  lt   1  igset  lt   gset   idx       20      group names for all samples in a series   sml     e  G0   G0    G0   G0   GO   GO      Gl   Gl     Gl   Cl     61       order samples by group   lex     exprs gset     order sml     isml     sml order sml     fl  lt   as factor sml    labels  lt   c  Diaphragm   Heart       set parameters and draw the plot   palette c   dfeaf4    f4dfdf     AABBCC      idev new width 4 dim gset   2   5  height 6            mar c 2 round max nchar sampleNames gset    2  4 2 1      title  lt   paste   GSE6943        annotation gset     selected samples   sep        boxplot ex  boxwex 0 6  notch T  main title  outline FALSE
92. ft vent     GSM160100 Heart 6 Heart  left vent     GEO2R sample definition    The first step in the GEO2R analysis is performed by cliscking on Define groups to setup sample groups based on available samples and label them  These groups  will be used to define contrasts and compute pairwise differential expression analyses  Two groups are created with names  diaphragm  and  heart  then samples         labeled using the mouse            Samples   Define groups  Enter a group name  List  Group Accession  x Cancel selection    GSM160089     Diaphragm    GSM160090  Heart    GSM160081       12        Samples      Define groups    Enter a group name  List                                        Selected 12 out of 12 samples    Columns     Set  Group Accession       Source name   Young adult SD rat    x Cancel selection   Diaphragm GSM160089   Diaphragm   Diaphragm  6 samples        Diaphragm GSM160090 Diaphragm   Heart  6 samples  m  Diaphragm GSM1860081 Diaphragm  Diaphragm GSM160082 Diaphragm 4 Diaphragm  Diaphragm GSM160083 Diaphragm 5 Diaphragm  Diaphragm GSM160084 Diaphragm 6 Diaphragm  Heart GSM160095 Heart 1 Heart  left vent   Heart GSM160096 Heart 2 Heart  left vent   Heart GSM160087 Heart 3 Heart  left vent   Heart 5  160098          4 Heart  left vent   Heart GSM160098 Heart 5 Heart  left vent   Heart GSM160100 Heart 6 Heart  left vent          The order in which you assign the groups is important  First define the control group  it will be colored in blue   then define th
93. g Ratio       Consolidate IDs using the expression value  median                              Review the obtained          results    93       full summary report can      downloaded from the server  see link at bottom of this page           geo2r DE table   LR2FDR0 0 m  Summary   Canonical Pathways   Upstream Analysis   Diseases  amp  Functions         Regulator Effects   Networks   Lists   Molecules       35  Download Summary  PDF     p value    Ratio       Calcium Signaling  Protein Kinase A Signaling    Hepatic Fibrosis   Hepatic Stellate Cell Activation    Thrombin Signaling  Role of NFAT in Cardiac Hypertrophy    Upstream Regulator    3 66E 08  2 95E 05  1 04E 04  1 35E 04  1 88E 04    p value of overlap    21 130  0 162   26 272  0 096   16 137  0 117   16 140  0 114   16 144  0 111            Predicted Activ          DMD  MEF2C  GATA4  LIPE  MYOD1    E Top Diseases and      Functions    Diseases and Disorders  Name    2 38E 19  2 37E 12  9 17   11  7 17E 10  2 87E 09    p value    Inhibited    B      Molecules       Skeletal and Muscular Disorders  Cardiovascular Disease   Organismal Injury and Abnormalities  Developmental Disorder  Neurological Disease    Molecular and Cellular Functions  Name    9 62E 13   4 62E 03  7 30E 11   4 62E 03  7 30E 11   4 62E 03  7 10E 09   4 27E 03  1 59E 07   4 62E 03    p value    131  122  247  94   141    B      Molecules       Cell Morphology   Molecular Transport   Cellular Development   Cellular Growth and Proliferation  Cell Sign
94. generated Tue Aug 12 05 30 54 EDT 2014                   AA    Differential expression analysis with limma   library Biobase    ilibrary GEOquery    library limma      load series and platform data from GEO      gset  lt   getGEO  GSE6943   GSEMatrix  TRUE    igset  lt   getGEO  GSE6943   GSEMatrix  TRUE    if  length gset   gt  1  idx  lt   grep  GPL341   attr gset   names    else idx  lt   1  igset  lt   gset  idx        make proper column names to match toptable   ifvarLabels gset   lt   make names fvarLabels gset         group names for all samples   sml  lt   c  G0   GO   GO   GO   GO   GO    G1    G1    G1    G1    G1    G1        log2 transform            exprs gset          lt   as numeric quantile ex  c 0   0 25  0 5  0 75  0 99  1 0   na rm T    ibogC  lt         5   gt  100            ax 6  qx 1   gt  50 55 qx 2   gt  0           qx 2   gt  0  amp  amp  qx 2   lt  1 58      4   gt  1 58      4   lt  2    if  LogC    ex which ex  lt   0       NaN     exprs gset   lt   log2 ex          set up the data and proceed with analysis   fl  lt   as factor sml    igset description  lt   fl   design     model matrix   description   0  gset        1            design   lt   levels f1    fit  lt   lmFit gset  design    icont matrix     makeContrasts Gl G0  levels design     12  lt   contrasts fit fit  cont matrix     12  lt   eBayes fit2  0 01          lt   topTable fit2  adjust  fdr   sort by  B   number 250     load NCBI platform annotation   gpl     annotation gset       plat
95. gicus               Nb  genes    662  Experiment     GSE6943 Nb  probes    885  Platform    CPL341 Nb samples  12    Annotation       Table Keyword Q value  BH   GOTERM_MF_ALL  translation factor activity    0 00559898  PANTHER TERM BP  BPOOOOS  GLYCOLYSIS  0 00424492    89    Info       4 signatures 1 platforms 1 experiments Platform   Experiment     V   9482    923 Organism   Nb  genes    Experiment     GSE6943 Nb  probes   1060      9                 Platform     GPL 341 Nb  samples          94BB8DCA2      Load data    Send to plugins   Create group     Plugins          Annotation  Table  PANTHER_FAMILY  PANTHER_TERM_BP  PANTHER TERM MF  PRODOM           PUBMED ID  PFAM NAME  GOTERM CC ALL  GOTERM CC ALL  GOTERM CC ALL  PROSITE MAME  SP        KEYWORDS  SP PIR KEYWORDS    Keyword  PTHR10574 LAMININ              120  CELL ADHESI      MF00261  ACTIN BINDI       PD0000D686 5H3  12477932  PFOOO  amp  5H3 1  mitochondrion   mitochondrial part    cytoplasmic part   P550002 P550002  mitochondrion    sh3 domain        Q value  BH    0 00382715  5 7 242E 4   0 00147612  0 00483506  5 73342   4  0 00554995  8 01806E 6  8 96956E 5  1 59748E 4  0 00249775  6 05554E 4  0 00479139                     Heatmap                     third TS  622  clearly associated with heart and mitochondrial functions    Results Info       IO 1 experiments      Load data   Send to plugins   Create group    Back    Plugins          Heatmap   x Settings      last TS  684  apparently specific to the cardiac 
96. hare Alike unless otherwise noted     10    PubMA Exercise 2    From BioWareWIKI    Compute differential analysis using GEO2R within the NCBI web portal                 Gene Expression Omnibus      Main Page Hands on Analysis of public microarray datasets   PubMA_Exercise 1   PubMA_Exercise 2      PubMA_Exercise 2b   PubMA_Exercise 3      Contents         Analyze public GEO data on the NCBI portal    1 1 GEO2R step by step walk through for GSE6943     1 1 1 The GEO2R interface    1 1 2 GEO2R sample definition    1 1 3 Visualize the distribution of log transformed expression values    1 1 4 Search for the top 250 differentially expressed transcripts    1 1 5 Saving the Rscript for further use in RStudio      2 download exercise files    Analyze public GEO data on the NCBI portal    The GEO portal links to several web tools allowing data analysis without the need to install anything on your computer  Although these tools will not compete with  sophisticated R Bioconductor methods  they remain very attractive as they do not require prior knowledge in MA data analysis and are very fast  leading the users to  tabular results and pictures that can be fed to other tools or used as is in scientific reports  We proceed here with GEO2R which allows finding differentially    expressed genes by comparing sample groups within one GEO submission  Full instructions  https   www ncbi nlm nih gov geo info  geo2r html      Tutorial video         2     GEO2R step by step walk through for GSE694
97. hive and decompressed to individual CEL files  and a CDF file listing all probes present on the chip    used in the experiment  Today s CDF file can be obtained from the Affymetrix site 2  after registering  free  and searching for the library file corresponding to the platform reported in the GEO pages  in our  case  Rat Expression Set 230  aka         2304      wm http   www affymetrix com Auth support downloads library files rae230 libraryfile zip     The decompressed archive contains the required CDF file under CD RAE230 rev04 Full RAE230A LibFiles RAE230A  CDF  that can be copied in the RobiNA project area     start RobiNA and create a new project for results    RobiNA   The transcriptomics data preprocessor  Version 1 2 4 build656    Release Notes    1 2 4 build656       yze     Please    Welcome to RobiNA and thank you for using it to evaluate and analyse    your microarray and RNA Seq data  Before taking off please take the time to read the      release notes carefully     e New workflow for RNA Seq based transcript profiling  Check out the new workflow le chan nel  for RNA Seq based analysis of differential gene expression  To date it supports    import oflllumina Solexa type raw sequence data in FASTQ format  SAM BAM prealigne kpe riment    reads and precomputed counts tables   ays etc        e Treatment of replicates  The current Version of RobiNA assumes that  all replicates that you enter are true biological replicates  Technical  replication is not yet taken int
98. iaphragm    Source            Diaphragm                        RGD1311260        3  1                            Dolk   NULL   Ralbp1          Als2cr2  NULL        1 3  Pfkfbi  Tst  Dtdi  Khi13                                                The full heatmap picture can be saved to file using buttons present at the bottom of the window  full heatmap  1   Expand     Similarly  text files can be saved with the data used to plot the heatmaps  http   data bits vib be hidden jhslbjcgnchjdgksqngcvgqdlsjcnv TBrowser2014 09d_TS1 heatmap   data txt  and a text table reporting enriched terms  http   data bits vib be hidden jhslbjcgnchjdgksqngcvgqdlsjcnv TBrowser2014 09e TS1 terms txt       second heatmap          Signature   948791256  ct   c c cc      c      c      c c      Annotation     PANTHER_FAMILY  Sach    Sort   PANTHER TERM  BP  Probes      13 0 PANTHER_TERM_MF    Samples      15 0    Tm  NULL    Pidi   Rfc3   Aldh2   LOC690871   Hk1   Arpcia                      m 1306353 BP00120  CELL ADHESION MEDIATED SIGNALING  NULL 12477932   Snn       Lcat  Gprii       Lamb2  Cidea  NULL  Mapia  Carhsp1                Q value           E  a    MF00261  ACTIN BINDING CYTOSKELETAL PROTEIN             Ili       Prkaria        1  19511                                  Save heatmap Save annotation Export heatmap datal                  third heatmap    91                Siqnature   9487 FERQBI ect     ccc                          c            Annotation  m           PATHWAY                     
99. ic Tests on Annotations                 Select two nested experiments  experiments Navigation Area  a Y    Heart vs Diaphragm  E Heart vs  Diaphragm                   lt enter search term gt     2                    Previous       iw         a s     Selected elements  2           Heart vs  Diaphragm           Heart vs  Diaphragm subexperiment  n 142              Finist   Cancel      A window allows selecting the annotation type to be used in the test and the action to take with duplicate probes  here   merged by gene symbol to the highest IQR         65    eoo Hypergeometric Tests on Annotations      Set parameters  1  Select two nested p    experiments    2  Set parameters for  hyper geometric tests on    annotations Annotations    Annotation to test   Pathway       Annatated features  1299                           Reduce feature set        Remove duplicates  Using gene identifier   Gene symbol    Annotated features  12853          Keep feature with      Highest IQR      Highest value    r Values to analyze   1 Original expression values      Transformed expression values     1 Normalized expression values                                    Previous     Next   Finish   Cancel         Hyper geometric test for association of annotation categories to a sublist of a larger gene list  Wed Aug 13 12 38 46 CEST 2014   Version  CLC Main Workbench 7 0 3  User  splaisan    Parameters   Gene identifier column used in tests   Gene symbol  Annotation column used in tests   GO biological
100. iched in the obtained subset    IN       Profile pathways 5                                See frequency weighted list of pathways        these profiles  This button links to the NCBI BioSystems database     Use it to display the list of pathways in which these gene expression profiles participate   The pathways are ranked by the number of profiles to which they are linked     Amaximum of 100 000 profiles are considered     This tool can be particularly useful for helping to characterize lists of profiles  that have been determined to be differentially expressed across experimental variables     30    FLink   Frequency weighted Links ABOUT HOW      HELP FAQ NEWS PUBLICATIONS DISCOVER       Links from geoprofiles records to biosystems records weighted by frequency  click to see details  eal        Clear Selections    Show     Download CSV        Summary    Frequency BSID 2n                   Organism   an 381 1010675 Metabolism organism spacific biosystem Rattus norvegicus  am 306 1010015 Signal Transduction onganism specific biosystem Rattus norvegicus  an 226 1010824 Immune System onganism specific biosystem Rattus norvegicus  Gn 218 1010484 Disease organism specific biosystem Rattus norvegicus  an 175 1010388 Gene Expression onganism specific biosystem Rattus norvegicus  am 144 1009706 Metabolism of proteins onganism specific biosystem Rattus norvegicus           s   mp ral  sp r       Gm 130 1010682 Metabolism of lipids and onganism spacific biosystem Rattus norvegicus  li
101. igher concentrations than thr RNA so the signal of dap  should be higher than that of thr and this was not the case for the samples that were flagged  outside bounds   The other control probes behaved as they should  So it might be that in some samples  the reverse transcription of the high abundance transcripts was not completely efficient  because of saturation              STU   fe As part of the standard Affymetrix microarray processing  control molecules are added to the mRNA at different concentrations prior to producing the cDNA  Other molecules  cDNA  are  a gt     added later in the sample preparation to control for hybridization on the         The      of bound errors reported above result from the discrepancy between      known spiked in quantities and the  readout after scanning the chip  The highest concentration of control does not produce a final value higher than a lower concentration of control which results in raising an alarm and showing the  4 samples with colored background  Full details about the identity of the faulty probes and the obtained values can be found at the bottom table part of the full report linked in the next paragraph   PDF     Performing QC on the data and generating summarizing plots    A number of QC plots can be generated using the right tools  The full QC report can then be saved as PDF file and is available both for the RMA method   http   data bits vib be pub trainingen AffyECTAC2014 GSE6943_EC rma qc PDF   MASS method  http   data
102. in which several genes were concomitantly regulated  Several    examples are provided below and in the article published in PLoSONE 1    video tutorial for TranscriptomeBrowser is available here  https   www youtube com watch       bJMEPeSgHI  and a second for the InteractomeBrowser plugin here  https   www youtube com watch  v SxOBmCP1G1A   The full manual can be read here  http   tagc univ     mrs fr tbrowser index2 php option com  content amp task view amp id 19 amp pop 1 amp page 0 amp Itemid 23      After installation  see online documentation  and startup  the main TBrowser interface awaits user input    87          Search                Signature   Platform Experiment         Gene symbol     Organism            Nbges             Entrez ID Experiment                           5   _____      Probe ID   Plaform     Nb samples      C  HomoloGene ID         Annotation Q value             Platform       _  Experiment      Signature             Tables    ALL    Q value max   1E  0   BH              Plugins        SEARCH   Heatmap 2 Settings                               Select          Annotation         Sort            130        15 0          Q value                                                          5 save heatmap Save annotation Export heatmap data             A Walk through example    As shown above  different search angles can be used to populate the interface  Instead of searching for genes  we choose to use here the GSE6943 dataset used in other BITS training  t
103. ind the name of the custom cdf by going to the package folder        DATA D    gt  R 3 1 1    library    ath1121501attairtcdf  gt                         sua  a            w  New folder  Name i Date modified Type Size     data 10 10 2014 13 00 File folder  k help 10 10 2014 13 00 File folder     html 10 10 2014 13 00 File folder              10 10 2014 13 00 File folder       10 10 2014 13 00 File folder  CITATION 10 10 2014 13 00 File 2 KB    DESCRIPTION 10 10 2014 13 00 File 1 KB  INDEX 10 10 2014 13 00 File 1 KB  NAMESPACE 10 10 2014 13 00 File 1 KB                   Open the DESCRIPTION file and look in the Description line     Package  ath112150lattairtcdf   Title  ath112150lattairtcdf   Version  18 0 0   Created  Wed Jan 29 12 33 39 2014   Author  Manhong Dai   Description  A package containing an environment representing the   customcdf                                      VEU RE        file    Maintainer  Manhong         lt daimh umich edu gt    License  LGPL   biocViews  MBNICustomCDF  AnnotationData  AffymetrixChip  ath1121501   ath112150lattairt  Arabidopsis thaliana    Then proceed by executing the following R code     uf normalisation using  rma  algorithm    data rma rma data   L   The output is an exprSet object with a data matrix containing normalized log intensities  on probe set level  in the exprs slot       writing probe set level data to a file called  data txt   write exprs data rma file  data txt      The resulting text file can be imported into the Wor
104. ined from the RobiNA results is available on the BITS server  link  http   data bits vib be pub trainingen PPUBMA2014 ex5 files RobiNA DE probes LFC2   FDR0 001 txt   and its content can be used on the WebGestalt submission page  http   bioinfo vanderbilt edu webgestalt         The ID type of the enriched list can be identified by selecting it in the list    57    WEB based GEne 5      AnaLysis Toolkit       WebGestalt    Translating gene lists into biological insights       Select the organism of interest         rnorvegicus          Select gene ID type         rnorvegicus affy rae230a         Upload gene list         Choose File   no file selected    OR    1367707_at  1370355_at  1376371_at  1368000_at  1373410_at    1388116_at  1375230_at  1371315_at  1368966_at  1371293              Clear        ENTER        In the next window  the  Reference Set for Enrichment Analysis  should be selected from the list    Select Reference Set for Enrichment Analysis      X Select Id Type from Drop Down Menu  rnorvegicus genome  rnorvegicus entrezgene protein coding       rnorvegicus affy rae230a 1388502 at D Type fi   rnorvegicus__affy_rae230b ul m     rnorvegicus        raex 10561 l prepa   rnorvegicus affy ragene 1 OQ st vl 1 376309 at    rnorvegicus        rat230 2   rnorvegicus        rg u34a   rnorvegicus        rg u34b   rnorvegicus  affy rg u34c   rnorvegicus        rn u34   rnorvegicus  affy rt u34   rnorvegicus agilent G4131A   rnorvegicus agilent G4131F   rnorvegicus agilent wh
105. ion into cytosol by sarcoplasmic reticulum    LEM MM NE              d  005224  88070    nitrogen compound metabolic process      000000000000 MEC NU CNN Q0   12 0005224    48738    cardiac musele tissue development         fo i   S es MM ce    005224  5978     igiycogen biosynthetic process      DM eee ee 0 1  gt    00606        regulation of synaptic activity 1 1        1 0008433  5420 iglutamine biosynthetic      NALE        MID      a D T          i 008493    7  1901020      negative regulation of calcium ion transmembrane transporter activity    1 i     1     a T0008493     adrenergic receptor signaling pathway involved in heart process   1          JB UM      um      0 008493              ie DE 1   0    d 1000843  MM iphosphocreatine metabolic process                       es          1 70008493   ipyridoxal phosphate biosynthetic process 1 1   9 d 1   0 008493  1634   iregulation of germinal centerformation         eR    ee D T i 008493    FE jmocdnesignaling i aaa L l l    1  0  1 T0008493  regulation of lateral pseudopodium assembly   1 1 Xue  MB ME b ae i     10  008493    iis                        ie                    j MEC i meng 0  008453  46439   iLeysteine metabolic process                  CEN          1   0008493    2424   icatecholamine catabolic process L l l A l O L 1 _ 100083     Another page presents details about the parameters used in the different tests    66    m results of the hypergeometric test        GO BP    Hyper geometric test for associatio
106. kbench     Main results and specific settings    Only key steps are reproduced here to provide information to interpret the figures  All other steps and parameters will be explained in the tutorial PDF files linked above     After computing group wise differential expression  a filtering step is applied to the full table to retain only DE genes with at least 2 fold change in expression  with an adjusted p   value of at most 5x10 3 and with expression data  present calls  in at least 4 of the 6 replicates        Rows  142   15 923      Match            Match all      t test  Heart v          abs        2             t test  Heart             lt     0 0005         Heart   Presen          gt   gt    4            4 EE                       Diaphragm   P          gt     Filter      The classical volcano plot is produced with in red the 142 DE genes selected during filtering  This subset will be used as  test  set against all other genes in the data table in the  hypergeometric enrichment analyses detailed below     64    Volcano Plot  t test     15    14    13    12    11    10                 p values         10  5 10    0  Difference of group means    STU   i  Due to the logarithmic nature of the data  transformed   the  Difference  column should be used instead of  Fold Change  to represent the differential expression    94  Enrichment analysis      CLC           hypergeometric tTest    The following figure shows data samples used in the hypergeometric tTest    eoo Hypergeometr
107. les   Organism  Rattus norvegicus  Type  Expression profiling by array  count  2 tissue sets  Platform  GPL341 Series  GSE6943 12 Samples  Download data  GEO  CEL    DataSet Accession  GDS3224       3224  PubMed Similar studies        Profiles  Analyze DataSet           The highlighted link  http   www ncbi nlm nih gov sites GDSbrowser acc GDS3224  opens in a new tab    26             DATASET  BROWSER    T          Search for GDS3224 ACCN  __________    Search     Clear     Show All    Advanced Search        Neurobiol 2008        20 161 1  41 53  PMID  18207466    GSE6943  count    Reference Series  Sample count  12    Value type  Series published     Find genes P     Compare 2 sets of samples    Find genes that are  Cluster heatmaps       Experiment design and value distribution       Download the full data table    2008 01 24    Data Analysis Tools    Find gene name or symbol       up down    for this condition s       tissue    _Go_    Title  Heart left ventricle and diaphragm comparison   Summary  Analysis of normal heart left ventricle and diaphragm of young adult Sprague Dawley males   Concurrent rhythmic contractions of the diaphragm and heart are needed to sustain life  Results  provide insight into transcriptional strategies for ensuring long term energy supplies in these  two muscles    Organism  Rattus norvegicus   Platform  GPL341   RAE230A  Affymetrix Rat Expression 230A Array   Citation  van Lunteren E  Spiegler S  Moyer M  Contrast between cardiac left ventricl
108. muscle    4 signatures    ALL        94B2EC923        94B7912BC          94BB8DCA2    Experiment      Platform    Organism    Experiment     GSE6943  Platform      Annotation  Table Keyword Q value  BH   KEGG PATHWAY  RNOOO0190 OXIDATIVE     3 73533E 4                 TERM BP      00019  LIPID  FATTY    0 0014345 2  GOTERM_BP_ALL  heart process  0 00279106  GOTERM_BP_ALL    heart contraction    0 00280895  PANTHER TERM MF  MF00123  OXIDOREDU    0 00739837  WIKIPATHWAY Rn Electron Transport     3 18418E 4  PUBMED ID 12477932 1 52352E 5  KEGG REACTION  RO2164 UBIQUINONE     0 00884734  GOTERM CC ALL  mitochondrial part  5 00014E 11  GOTERM CC ALL mitochondrion 8 71289E 10  GOTERM CC ALL  mitochondrial membrane  2 72681E 9  SP        KEYWORDS mitochondrion 6 8  26E 6    Nb  genes    622  Nb  probes    772  Nb  samples    12                      Results Info       4 signatures        ALL      9482    923      948  7912            94  7    8    0     Load data    Send to plugins   Create group            Plugins  x   Heatmap 7j   Settings      Show HeatMaps for each TS    1 platforms    Back      P E                Platform    Experiment            Organism    Rattus norvegicus    Experiment     GSE6943    Platform      Annotation     Table  GOTERM CC ALL   IGOTERM CC ALL   IGOTERM CC ALL   ISP        KEYWORDS  PANTHER TERM MF   IPANTHER TERM MF                  MF ALL                  TERM BP    PANTHER TERM BP                  TERM BP   GOTERM BP ALL                BP ALL    For
109. n of annotation categories to a sublist of a larger gene list  Wed Aug 13 14 49 36 CEST 2014   Version  CLC Main Workbench 7 0 3  User  splaisan    Parameters   Gene identifier column used in tests   Gene symbol  Annotation column used in tests   GO molecular function  Raw universal gene list size   15923  Used universal gene list size  requiring annotation and one feature per gene only    8969  Raw subset gene list size   142  Used subset gene list size  requiring annotation    74  Expression values used when filtering to one feature per gene   Transformed expression values  Applied filter to reduce features to one per gene   true  Filter applied to reduce features to one per gene   used feature with highest IQR    m results of the hypergeometric test for GO MF    Hyper geometric test for association of annotation categories to a sublist of a larger gene list  Wed Aug 13 14 49 36 CEST 2014   Version  CLC Main Workbench 7 0 3  User  splaisan    Parameters   Gene identifier column used in tests   Gene symbol  Annotation column used in tests   GO molecular function  Raw universal gene list size   15923  Used universal gene list size  requiring annotation and one feature per gene only    8969  Raw subset gene list size   142  Used subset gene list size  requiring annotation    74  Expression values used when filtering to one feature per gene   Transformed expression values  Applied filter to reduce features to one per gene   true  Filter applied to reduce features to one per gen
110. n table  m 3 5 Exporting results    4 Conclusion      5 Youtube videos from the Affymetrix training team       download exercise files    Introduction    6 The data used in this how to tutorial is the same as that used for the BITS hands on training Hands on_Analysis_of_public_microarray_datasets    The Affymetrix online training page dedicated to MA and transcriptome analysis can be browsed here  http   www affymetrix com estore browse level_seven_software_products_only jsp     productld 131414 amp categoryld 35623 amp productName A ffymetrix   2526   2523174   253B Expression Console   2526   2523153   253B Software 1_1      This main pages contains links  to download the necessary software as well as links to other Affymetricx resources necessary to perform a full expression analysis  Also refer to the Affymetrix Transcriptome Analysis Console     TAC  Software and Expression Console Software tutorial pages  http   www affymetrix com support learning training_tutorials tac_ec index affx 1_2        Ge You will need to set a free NetAffx  http   www affymetrix com analysis index affx  account to download software and access data pages    Data workflow    Affymetrix     Expression  Console   EC   Software          Perform exon evel  normalization and  signal summarization    Perform gene level  normalization and  signal summarization       Affymetrix   Transcriptome  Analysis Console     TAC  Software       Select analysis          1  Gene level   2  Exon level   3  Alternati
111. nal  file  GSE6943 DE txt   row names F  sep   t   quote FALSE    VETE TIETIHTIETETHIHEIHIEETIHIHTIEIHIPPHEIHIEEHIHIHIHIPIPHEIHIEHIHIHIHIHIPIPHEIHEHIHIHIHIHIHIHIHIHIHHIHIHIHIHIHIHIHIHIHIHIEE   ur Boxplot for selected GEO samples      library  Biobase    library  GEOquery        load series and platform data from              gset  lt   getGEO  GSE6943   GSEMatrix TRUE    igset  lt   getGEO  GSE6943   GSEMatrix TRUE  destdir   base    if  length gset   gt  1  idx  lt   grep  GPL341   attr gset   names    else idx  lt   1  igset  lt   gset   idx      uf group names for all samples in a series    sml  lt   c  GO   GO   GO   GO   GO    GO    G1   G1   G1   G1     G1   G1      uf order samples by group         lt   exprs gset     order sml      isml  lt   sml order sm1l      fl  lt   as factor sml    labels  lt   c  Diaphragm   Heart     uf set parameters and draw the plot    77 save to file   filename  lt   paste base   GSE6943 boxplot pdf   sep        pdf file   filename  bg    white     ipalette c   dfeaf4     f4dfdf    ZAABBCC         dev new width 4 dim gset   2   5  height 6            mar c 2 round max nchar sampleNames gset    2  4 2 1       title  lt   paste   GSE6943        annotation gset     sample signal distribution   sep       iboxplot ex  boxwex 0 6  notch T  main title  outline FALSE  las 2  col f1   legend  topleft   labels  fill palette    bty  n     idev  off         save R workspace for reuse   ioutfile  lt   paste base   Workspace RData   sep       save  imag
112. nk mean difference  4x  amp  either     The former test found too many hits  n 4409  to be specific for given pathways  if we try instead the rank mean difference test with a 4 fold difference cutoff  we get only 152 genes that  are probably more specific for the biology behind the sample groups     Search pathways enriched in the obtained subset    Data Analysis Tools          Find genes  Step 1  Select test and significance level    Compare 2 sets of samples  2        Rank means difference    AvsB    4 fold      either           Cluster heatmaps        Step 2  Select which Samples to put in Group A and Group B          Experiment design and value distribution       Step 3  Query Group Avs  B    31    Display Settings   v  Summary  20 per page  Sorted by Default order Send to  Filters  Manage Filters                Results  1 to 20 of 152 Page  1 1048 Next gt  Last gt  gt  Profile data  gt         J            Heart left ventricle and diaphragm comparison          Download profile data    Annotation  Nppa  natriuretic peptide A  Organism  Rattus norvegicus    Reporter  GPL341  1367564 at  ID REF    5053224  24602  Gene ID        012612   u       Profile pathways    DataSet type  Expression profiling by array  count  12 samples ey             ID  51748213                Find pathways           GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors   Sequence neighbors Homologene neighbors       Nppb   Heart left ventricle and diaphragm comparison    2  Annotation  N
113. notations  View Interaction Network View Interaction Network    oe 140 1E 14  bed              d Fold Change C ANOVAp p  FDR p value  veight   Bi weight Gene     linear   Heart value  Heart vs   Heart vs  Description Comment    Avg   Avg Signal               Symbol 130 1E 13  TN C vs  Diaphragm       Diaphragm  s  Diaphragm  30     10 71 Mida   d 6 47E 07   Cox17 COX17 cytochrome c oxi       9 18 10 21      0 000093   Tsc22d4 TSC22 domain family  me    120 1E 12  5 70 6 73     0 051391 Plekhg2 pleckstrin homology do     8 76 9 79 i    0 000179  775 8 78      0 001014  9 23 10 26     0 000014 Ap3b1 adaptor related protein c     2 70 3 73     0 046805         12 PR domain containing 12 100 1E 10  9 39 10 42     0 000700         1 cysteine tyrosine rich 1  5 81 6 84   0 017993 0 060586 10  680770  similar to dachshund b  8 80 9 83    0 000003 0 000068 Nde1 nudE nuclear distribution     4 91 5 95 i 0 034755 0 100022   Serpinal serpin peptidase inhibitor     7 70 8 73 1 0 000563 0 004081 Apbb1 amyloid beta  A4  precurs     8 88 9 91   0 000001 0 000042 Lipa lipase A  lysosomal acid      6 41 7 44   0 000250 0 002172         2 potassium voltage gated     10 88 11 92   6 00E 08 0 000005  Eci2 enoyl CoA delta isomeras     5 52 6 55   0 048 794 0 129042                protein tyrosine phosphat     9 63 10 66   0 002494 0 013072 Clic4  chloride intracellular cha     5 69 6 73 A 0 041055 0 113600  Alkbh2 alkB  alkylation repair ho     8 85 9 88   0 000085 0 000963   Parvb parvin  beta  
114. ns represented by genes assessed in the experiment will be considered  although this is normally true          ith modern platform where almost all known genes are present      When the full Rat genome is taken as background  we obtain relatively high confidence predictions    53         DAVID Bioinformatics Resources 6 7  Ak On National Institute of Allergy and Infectious Diseases  NIAID   NIH    Functional Annotation Clustering       Help and Manual  Current Gene List  List_1  Current Background  Rattus norvegicus  246 DAVID IDs          E  Options Classification Stringency   Highest      Rerun using options     Create Sublist    58 Cluster s  E Download File    Annotation Cluster 1 Enrichment Score  7 06   Count   P_Value   Benjamini           GOTERM_BP_FAT myofibril assembly RT   8 1 1E 8 1 3   6     GOTERM_BP_FAT actomyosin structure organization RT   8 1 2bE 7 1 0E 5      GOTERM_BP_FAT cellular component assembly involved in RT   8 5 2E 7 4 4E 5  morphoqenesis    Annotation Cluster 2 Enrichment Score  3 56   i Benjamini         INTERPRO Zinc finqer  LIM type RT    7 1 9E 4 2 1E 2    SP_PIR_KEYWORDS pee RT   7 2 6   4 8 1E 3  m SMART LIM RT   7 4 1E 4 3 6   2    Annotation Cluster 3 Enrichment Score  3 24 X   Benjamini                        _    _       cellular qlucan metabolic process RT   6 2 0E 4 7 1E 3  m GOTERM_BP_FAT qlucan metabolic process RT   6 2 0E 4 7 1E 3  g GOTERM_BP_FAT glycogen metabolic process RT   6 2 0E 4 7 1E 3  m GOTERM_BP_FAT eneray reserve metab
115. o account  When entering technical replicates     the significance of differential expression will be overestimated leading Ing  to an artificially increased number of genes that are significantly called  differentially expressed    e Unconnected designs in two color experiments  Comparing results from two  sets of two color microarrays that are not connected requires analysing the  color channels separately  This is not yet supported in RobiNA              quencing             show      startup OK       Welcome to RobiNA idle     Manual    Welcome to RobiNA    d    Start new project Open existing project    The first step of the workflow will beto choose a project directory in  which all files related       analysis will be stored  Please make sure that the chosen  will be on a volume  hard drive  USB stick etc      tough free space to possibl                  your       Dp   P data in case    Project folder    Free space on target volume       Cancel     Continue       Importing CDF  amp  CEL files    STU     7 7    While preparing this training  we discovered that GEO had a damaged file  GSM160097  for one of the sample  we will therefore do this training with only 5 replicate in the Diaphragm group while the     CLC analysis was done with the full data    You are now ready to import the CEL files and the matching CDF annotation database     41    Welcome to Robin     The first step of the data analysis   is the import of microarray data   into Robin  Please choose the raw
116. of genes  15866    2   1043 genes are differentially expressed    Algorithm Options   1   One Way Between Subject ANOVA  Unpaired     Default Filter Criteria   1   Fold Change  linear   lt   2      Fold Change  linear   gt  2  2   ANOVA p value  Condition pair   lt  0 05    Conditions   Heart  6   1   GSM160089 rma chp  2   GSM160090 rma chp  3   GSM160091 rma chp  4   GSM160092 rma chp  5   GSM160093 rma chp  6   GSM160094 rma chp    Diaphragm  5    1   GSM160095 rma chp  2   GSM160096 rma chp  3   GSM160098 rma chp  4   GSM160099 rma chp  5   GSM160100 rma chp                09 52           MD voy       Additional annotations can be added using the dedicated menu    Customize Annotations    Annotation File     RAE230A na34 annot csv xl      Annotations Assignment     O Top Assignment  e  All Assignments      Select Annotation Column s  to Add         GeneChip Array     1 Species Scientific Name      Annotation Date       Sequence Type       Sequence Source   C  Transcript ID Array Design   Already Added       Target Description    _  Representative Public ID      Archival UniGene Cluster      UniGene ID       Genome Version       Alignments       Gene Title       Gene Symbol  Already Added       Chromosomal Location      Unigene Cluster Type       Ensembl       Entrez Gene       SwissProt       EC     1 OMIM       RefSeq Protein ID        RefSeq Transcript ID        FlyBase    C  AGI       WormBase    C  MGI Name       RGD Name       SGD accession number       Gene Ontology
117. olegenome 4x44k vl  rnorvegicus agilent wholegenome 4x44k v3  rnorvegicus codelink                               Level                    Two sets of enrichments are available    User Data  textAreaUpload txt  Total number of User IDs  301  255 user IDs can unambiguously map to 226 unique Entrez Gene IDs   46 user IDs were mapped to multiple Entrez Gene IDs or could not be mapped to any Entrez Gene ID  The Enrichment Analysis and  GO Slim Classification will be based upon the 226 unique Entrez Gene IDs     Click here for new analysis    Enrichment Analysis    GO Slim Classification       Enrichment Analysis GO Slim Classification    Select Reference Set for Enrichment Analysis       rnorvegicus        rae230a 2    58    Enrichment Analysis       GO Analysis   KEGG Analysis or  Wikipathways Analysis      Pathway Commons Analysis     Transcription Factor Target  Analysis x      MicroRNA Target Analysis   Protein Interaction Network  Module Analysis m           Cytogenetic Band Analysis  Disease Association Analysis  Drug Association Analysis  Phenotype Analysis   PheWAS Analysis    ru                                      GO Slim Classification      GO Slim Classification    Biological Process  Moleclular Function    Cellular Component      The results are shown in tables with annotations and scores as well as the list of genes responsible for the enrichment  KEGG pathways          WEB based GEne SeT AnaLysis Toolkit    Translating gene lists into biological insights       WebGe
118. olic process RT   6 3 7E 4  1 3E 2  g GOTERM_BP_FAT cellular polysaccharide metabolic process RT m  6 7 1E 4 2 0E 2     GOTERM BP        polysaccharide metabolic process RT   6 1 7E 2 2 3E 1      By contrast  when the true background is set to what the RAE230A really covers  a lower confidence is obtained  This is not a major issue here but when reporting p values  you  should always be careful to use the correct background in order not to overestimate your findings             DAVID Bioinformatics Resources 6 7  Ab On National Institute of Allergy and Infectious Diseases  NIAID   NIH    Functional Annotation Clustering       Help and Manual  Current Gene List  List_1  Current Background  Rat Genome RAE230A Array  246 DAVID IDs          E Options Classification Stringency   Highest        Rerun using options     Create Sublist    58 Cluster s     Download File    Annotation Cluster 1 Enrichment Score  5 91   Count   P_Value   Benjamini            GOTERM_BP_FAT myofibril assembly RT   8 1 6E 7 1 9   5    GOTERM BP FAT actomyosin structure organization RT   8 1 5E 6 1 6   4    GOTERM_BP_FAT cellular component assembly involved in RT   8 7 9E 6 6 4E 4  morphogenesis  Annotation Cluster 2 Enrichment Score  3 65   i Benjamini     INTERPRO Zinc finger  LIM type RT   7 1 6E 4 2 4E 2    SP PIR KEYWORDS LIM domain RT   7 2 1E 4  8 8E 3    SMART LIM RT   7 3 3E 4 2 9E 2    Annotation Cluster 3 Enrichment Score  2 69   i Benjamini            GOTERM_BP_FAT cellular glucan metabolic proce
119. on 26 August 2014  at 13 39       This page has been accessed 160 times       Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     m         ce  ce  mo                   GSM160096  GSM160099  GSM160100       37     Expand     38    PubMA Exercise 4    From BioWareWIKI    RobiNA analysis         Main Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 3   PubMA_Exercise 4      PubMA_Exercise 5      Contents       Introduction     2 Required before starting RobiNA     2 1 RobiNA working great          NOT    2 2 The CDF annotation file    3 Step by Step RobiNA analysis workflow  m 3 1 start RobiNA and create    new project for results    3 2 Importing CDF  amp  CEL files     3 3 Performing QC on each CEL file     3 3 1 Evaluating QC results     3 4 Define design    3 5 Computing differential expression     3 5 1 Reviewing DE results  m 4 Adding annotations to the RobiNA data    5 Conclusion       download exercise files    Introduction      Several commercial programs such as GeneSpring featuring a rich and simple user interface to statistically analyze high throughput  omics  data are available  However  these are usually very expensive and  might even require an annual subscription  On the other hand  there are free  open source tools such as BioConductor which offer great statistical options and support for free  but this power often comes at the  price of usability  since these tools often feature 
120. oring the CRAN documentation  http   cran r project org       installing on a non BITS laptop    required for non BITS laptops       If running a unix OS computer please use yum to install   ilibxm12 devel libcurl devel  required to build dependencies                  required for markdown   itexlive collection latexrecommended noarch texlive latex noarch   also required for markdown     install minimal bioconductor   isource  http   bioconductor org biocLite R     ibiocLite       Add required packages for today s session  will add RCurl  XML      dependencies       biocLite c  Biobase    GEOquery    limma     affy         some other packages are not required here but are required for RobiNA in other exercises      these required packages were identified by parsing all R scripts present in the windows build of Robin     biocLite  c  affy    affyPLM     RankProd      limma      germa    statmod    marray    plier       edgeR    DESeq    EDASeq         Extend the GEO2R analysis in R with RStudio    19     Collapse          makecdfen    RStudio is the current best graphical environment to program 1 the      language and offers many facilities to learn and develop your       skills  The program is available free of charge from http   www rstudio coml21     adapt the GEO2R script in RStudio    Original GEO2R code    The starting script 18 first reproduced here    initial GEO2R code  Collapse        Version info  R 2 14 1  Biobase 2 15 3  GEOquery 2 23 2  limma 3 10 1  v R scripts 
121. otein C  fast type chr1 101572108 1015759  31   1374391 at Rn 16457 1 12 42 5 09 0 24 1 49 160 99 0 000002 0 00004 Sin sarcolipin chr8 57016720 57017242  33 1386873_at Rn 4035 1 12 5 5 48 0 15 0 32 129 87 4 26E 12 2 04E 09 Tnni1 troponin   type 1  skeletal  slow  chr13 57676633 57684216  35  1371339_at Rn 11675 2 12 59 5 64 0 24 0 09 123 72 4 13E 13 3 86E 10 Tnni1 troponin   type 1  skeletal  slow  chr13 57683074 57685257  738   1398306_at Rn 9794 1 11 19 4 46 0 19 0 07 106 44 7 16E 14 1 14E 10 Ampd1 adenosine monophosphate deaminase 1 chr2 224999610 2250204  744   1376227_at Rn 41395 1 12 91 6 25 0 13 0 58 101 15 6 68E 10 8 82E 08 Myozi myozenin 1 chris 8185059 8185524  747   1376968 at Rn 26659 1 10 54 4 26 0 48 0 17 77 47 5 60E 10 7 80E 08 Mybpc2 myosin binding protein C  fast type chri 101591233 1015937  48  1370900 at Rn 1072 1 9 47 3 2 0 94 0 05 76 98 1 46E 07   0 000006 Myh4 myosin  heavy chain 4  skeletal muscle chr10 53552135 53553903  50   1386977_at Rn 1647 1 11 08 4 96 0 2 0 27 69 53 9 73E 12 3 77E 09 Car3 carbonic anhydrase 3 chr2 107900426 1079092   51 1381575 at Rn 15517 1 9 51 3 48 0 32 0 23 65 69 5 90E 11 1 28E 08 Neb nebulin chr3 42756010 42756391  52  1371298 at Rn 3968 1 12 6 6 61 0 23 0 3 63 38 2 54E 11 6 95E 09 H19 H19  imprinted maternally expressed transcript  non protein coding  chr1 222639223 2226401  _53   1373873 at Rn 14050 1 11 44 5 64 0 13 0 26 55 6 3 17E 12 1 68E 09 chr17 34950334 34951353    56  1390355 at Rn 38647 1 11 42 5 73 01 0 3 51 53 8 
122. pared   delete group   Users splaisan Projects BITS TUTORIALS BITS_tutorials work Analysis_of_public_microarray_datasets      Configure groups             r Input files        Files  JUsers splaisan Projects BITS TUTORIALS B Name    Users splaisan Projects BITS TUTORIALS B   JUsers splaisan Projects BITS TUTORIALS B   Add selected  JUsers splaisan Projects BITS TUTORIALS B  JUsers splaisan Projects BITS TUTORIALS B     _  delete group                               Addgroup     Delete group               P Previous zb Next                46    Design your experiment    You can arrange the groups by   dragging them around    Define which groups shall be   compared by holding down the     CONTROL key        then Diaphragm  click dragging from the first group   to the second group    Right click and choose delete to   delete connections    To combine several groups into   one  metagroup  select all groups   you want to combine  by left clicking   and drawing a selection rectangle   around them  and click    Create Metagroup         Show expert settings       r Expert settings  Normalisation          P value correction  BH    Multiple testing    strategy  nestedF       Write out normalized  raw data    Preview R script       Log fold change min 1    p value cutoff           Reset design       Create Metagroup Delete Metagroup         Previous wb Next                Step 4 orf 4     running                        Computing differential expression    RobiNA   The transcriptomics dat
123. pe  RAE230A   Genome version  rn5   Annotation File  RAE230A na34 annot csv    Summary   1   Total number of genes  15866  2   1043 genes are differentially expressed  3   Heart vs  Diaphragm   a  605 genes are up regulated  b  438 genes are down regulated    Algorithm Options   1   One Way Between Subject ANOVA  Unpaired     1  Fold Change  linear      2 or Fold Change  linear     2  2   ANOVA p value  Condition pair     0 05    Conditions    Heart  6   1   GSM160089 rma chp  2   GSM160090 rma chp  3   GSM160091 rma chp  4   GSM160092 rma chp  5   GSM160093 rma chp  6   GSM160094 rma chp    Diaphragm  5    1   GSM160095 rma chp  2   GSM160096 rma chp  3   GSM160098 rma chp  4   GSM160099 rma chp  5   GSM160100 rma chp                Summary   GSE6943 CAT RMA       Heart vs  Diaphragm   Analysis Type  Gene Level Differential Expression Analysis  Array Type  RAE230A   Genome Version  rn5   Annotation File  RAE230A na34 annot csv    Summary   1   Total number of genes  15866  2   1043 genes are differentially expressed  3   Heart vs  Diaphragm   a  605 genes are up regulated  b  438 genes are down regulated    Algorithm Options   1   One Way Between Subject ANOVA  Unpaired     Default Filter Criteria   1   Fold Change  linear   lt   2 or Fold Change  linear   gt  2  2   ANOVA p value  Condition pair     0 05    Conditions   Heart  6   1  GSM160089 rma chp  2   GSM160090 rma chp  3   GSM160091 rma chp  4   GSM160092 rma chp  5   GSM160093 rma chp  6   G5M160094 rma chp    Diaphr
124. poproteins    Gn 121 1010825 Adaptive Immune System onganism specific biosystem Rattus norvegicus    0953 nnate Immune System organism sp iosys norvegicus     118 1010853     System i ecific bi tem Rattus      r  s  mi    aa 111 1010881 Hemostasis onganism specific biosystem Rattus norvegicus     4 4            1222  P   qe   Per Page  10        Displaying BioSystems Records 1   10 of 2215       Description    This table shows links from 3450 gene records      to 2215 biosystems records      The link used was gene_biosystems  which is described as  BioSystems that contain the  specified gene s   The association between the biosystems and genes was made using the method described in BioSystems data processing    A one to one mapping            this link  is available  more about one to one mappings       Additional details about the FLink output display are provided in the help document     Column legends    Selected table column descriptions     Frequency   The number of gene records from your input list that are linked to the BioSystems record    Max Frequency   The total number of gene records that are linked to the BioSystem record  This represents the maximum value that can appear in the frequency column and is  used to calculate the score  percent coverage     BSID   BioSystem record identifier    Source   Depositor of the BioSystem    Mame            of the BioSystem    Type   The taxonomic span of the BioSystem    Organism   The organism containing the BioSystem     Ra
125. ppb  natriuretic peptide B Find related data         DataSet type  Expression profiling by array  count  12 samples o      ID  51748265  GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors   Sequence neighbors Homologene neighbors    Organism  Rattus norvegicus Database    Select n  Reporter  GPL341  1367616 at  ID REF    5053224  25105  Gene ID        031545 4 4        Ptgds   Heart left ventricle        diaphragm comparison Recent activity      3  Annotation  Ptgds  prostaglandin D2 synthase  brain  Turn Off Clear    Organism  Rattus norvegicus PJ Heart left ventricle and diaphragm  Reporter  GPL341  1367851 at  ID REF   GDS3224  25526  Gene ID     04488     comparison GDSBrowser      anim            DataSet type  Expression profiling by array  count  12 samples  ID  51748500 77777 Q   GDS3224 ACCNJ  AND GDSffilter   1     GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors   Sequence neighbors Homologene neighbors GDSBrowser       GEO DataSets for BioProject  Select  Car3   Heart left ventricle and diaphragm comparison 98125   2  GEO DataSets    4  Annotation  Car3  carbonic anhydrase 3       FJ Normal Heart vs Normal Diaphragm    Organism  Rattus norvegicus          BioProject  Reporter  GPL341  1367896_at  ID_REF   GDS3224  54232  Gene ID   NM_019292 2     gt   DataSet type  Expression profiling by array  count  12 samples E511            Boletus calopus  ID  51748545 EX taxonomy  GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors 
126. pse              Run quality check tools Son Box plot and MA plots    Choose the quality checks you  want to include in your analysis and  click Next to continue     If you don t want any quality checking       rInfo    a       Edit expert options    by an expert                     4   PLM   Fitting probe level models to the data    to detect possible RNA degradation    you can skip this step  Z        digestion     Uses ordered probes in probeset                Histogram   Shows density plots of the signal intensity        each chip    Scatterplot   The data will be normalised before plotting the log2 fold  expression values of all possible comninations of two chips against each other M Include     more    PCA and hierarchical clustering    NUSE and RLE   Normalized unscaled standard errors and  relative logarithmic expression    The chosen default values will be suitable    in most cases  They should only be changed Normalisation rma 2 o p value correction BH         M Include     more    Include                    Include    more    M Include    more    Plot components    PC1 amp 2 2      Include    more       include    more    Select all  V         Expert settings       Analysis strategy   Linear models  package limma  A                         Previous   9 nex            Step 2 of 4    Quality check results    Click in the list to open  a fullsize view of the results     Chips showing very poor PLM  results may be excluded from  further analyses by checking   the  Excl
127. r trends in both sample groups  Groups  1 and  3 being highly contrasted between Diaphragm  and Heart while  2 and  4 are more subtle DE groups    34    GDS3224 Heart left ventricle and diaphragm comparison  Rattus norvegicus   Clustering  Euclidean K means Colors         SW Full image  10425 x 12 spots              features 4158 features z279 features    5761 features          Clicking on the first group to get more details for genes higher in the Diaphragm    Gps3224 Heart left ventricle and diaphragm comparison  Rattus norvegicus   Clustering  Euclidean K means Colors            Full image  727 x 12 spots  Reset     Expression level   ms         y           High Low Absent    tissue     E   2 8 5 B 8 8    Gene list  searchable     RGD1309821  Smarcd3                         17406350  45012  nfi  31276990   L0C294154  oli4al      11  Fbln2           Myh10  Mnat1  Kpna3  BF388415    GSM160083  GSM160090  GSM160091  GSM160092  GSM160093  GSM160094  GSM16009   GSM16010    GSM1600  GSM1600  GSM1600  GSM1600                      Dcun1d2  Eif4b  14799471  21442  Hspala   dadc1  Dnajc2  Mknk2   pt  Hmgn3  BE113371  sp15          Similarly  for genes higher in the Heart    35          Gps3224 Heart left ventricle and diaphragm comparison  Rattus norvegicus   Clustering  Euclidean K means Colors  iii      Full image  278 x 12 spots  Reset     Expression level   O        a tar  High Low Absent    tissue          e          e        9   7    Finally  zooming      a small number of gen
128. r unless   you want to overwrite the results of the previous  analysis run  Clicking  Exit  will close Robin   If you want to view the results in MapMan please start  MapMan now  Robin will try to automatically transfer the  analysed data set to MapMan     View Data in MapMan Exit  Modify        Restart          Reset design Create Metagroup         Previous wb Next                Reviewing DE results    RobiNA saves a number of files to the disk in its folder structure  Text tables can be found in the  detailed results  folder and are partly reproduced below  A short summary lists the parameters applied to    perform the analysis while two tables report the full results and the top 100 results after sorting lines in decreasing adj P Val order  Three PNG pictures  below  report the Up regulated gene count  DE gene  count and Down regulated gene count respectively     Final plots summarize key facets of the analysis as done in classical MA analysis  The left plot shows in red probesets that are differentially expressed with the selected statistics  the right plot confirms that the  samples segregate as defined in the grouping  The bottom simple venn diagrams report counts of DE probesets  Total  Up  Down      Pats 7  Collapse    48    Principal component analysis    MA Plot of contrast Heart   Diaphragm                                                 Diaphragn    Heart  u    5           o  5                     u     40  20 0 20 40  PC1 81 44        A    Significantly regulate
129. rameters   Gene identifier column used in tests   Gene symbol  Annotation column used in tests   Pathway  The features were ranked on   t statistic  group comparison   Heart   Diaphragm  Raw universal gene list size   15923  Used universal gene list size  requiring annotation and one feature per gene only    994  Applied filter to reduce features to one per gene   true  Filtered features to one per gene by using   feature with highest IQR  Expression values used when filtering features   Transformed expression values  Minimum category size required   10  Number categories passing mimimum size requirement   45  Humber permutations for p value calculation    10000    Top 10 GSEA results for BP increased in Heart            1                       ee      imi oT  gt    Lower taisti Upper tai      _2  7519 _ iskeletal muscle tissue development  18 42     1281338605   00      1    3  309     skeletal muscle contraction sa 143862  NM 0 ilis  4  6936    musdecontrartion        3    43126281  0000100   5  5977 tglycagen metabolic process aa as 00062   69988     6  6094   glucomengenesis        23 _ 18815161  0 0002   0 9998    7  6937 regulation of musclecontraction 000000000009   21287718   00002   0 8998      5  5978  iglycogen biosynthetic process     44      21 712337  0 0003   0 9997    9  s412 _            aaa aaa aaa aaa    L 205 i 714 686305   0 0004   i  0 9996        Eug O15 translational       222    2    46 2 18199774  0 0004   09996 1          10 GSEA results for BP incr
130. ration       After installing and starting      Affymetrix Expression                program         the File menu to access the  library  download manager    39       Expression Console          Save Study   ZIP Study   UnZIP Study      Open Sample Array Attribute File    Ctrl Shift O  Properties      Download Annotation Files      Utiliti   Page Setup      Exit                      Log into the Affy support area using your credentials    NetAffx Account Information  Enter your Affymetrix com email address and password           NetAffx Library Files  Select the library files to download         Mul            C OviGene 1_O st v1     OviGene 1 11  C         Gla        Piasmodum Anopheles      Poplar        Porcine   CI PorGene 1                          1  tv       RabGene 1_O st 1       RabGene 1 1atae1  C  RAE230A        RAE2308       RaEx 1 0 41        RaGene 1 0 4 1     RaGene 1  1 4   1       RaGene 2             RaGene 2  1 41       In the example above we highlight the rat array that was already downloaded and was used to generate the demo data      the CLC Bio        Workbench  measurements of gene expression in tissues from cardiac  left ventricle and diaphragm muscle of rats  Lunteren et al   2008      The Affymetrix Expression Console will download the CDF file to the folder that you have specified in the library path  typically  the folder where your CEL files are stored   The file shown here as example is  for Arabidopsis    Se        P         was        T 1   
131. rchical clustering    Hierarchical clustering and heatmaps are classical representations of differential expression that nicely provide high level view on the data  Both are done for you by the browser and  return images as well as the corresponding tables for further exploration and without one line of code     32    Data Analysis Tools       Find genes       Hierarchical Distance    Pearson Correlation    Compare 2 sets of samples    M   7 Linkage    Complete     Cluster heatmaps Display    x  Experiment design and value distribution   Partitional  K means K medians     By location on chromosome    A number of interactive links allow changing the correlation method  changing colors  and or selecting heatmap regions to plot other graphs or export data to file    NCBI  gt  GEO  gt  GDS Browser  gt  GDS Analysis  Selected profiles    Plot values    GD53224 Heart left ventricle and diaphragm comparison  Rattus norvegicus   Clustering  Correlation Complete Linkage    Full image  10425 x 12 spots       33    GDS3224 Heart left ventricle and diaphragm comparison  Rattus norvegicus   Clustering  Correlation Complete Linkage Colors       ii Full image  10425 x 12 spots  Reset   Expression level              Si  High Low Absent    tissue    Correlation  Gene list    GSM160089  GSM160080  GSM160093  GSM160094  GSM160092  GSM160096  GSM160097  GSM160085  GSM160088  GSM160099  GSM160100       c                  ao    9          KMean clustering    KMean  biclusters  are un supervized re
132. ression profiling by array  count  12 samples  ID  51748105  GEO DataSets Gene UniGene  Homologene neighbors          O        Heart left ventricle and diaphragm comparison  3  Annotation  Arfi  ADP ribosylation factor 1  Organism  Rattus norvegicus  Reporter  GPL341  1367459 at  ID REF   5053224  64310  Gene ID   DataSet type  Expression profiling by array  count  12 samples  ID  51748108  GEO DataSets Gene UniGene        1 Gdi2   Heart left ventricle and diaphragm comparison  4  Annotation  Gdi2  GDP dissociation inhibitor 2  Organism  Rattus norvegicus  Reporter  GPL341  1367460 at  ID REF   GDS3224  29662  Gene ID   BM387347  DataSet type  Expression profiling by array  count  12 samples  ID  51748109    GEO DataSets Gene UniGene Chromosome neighbors       Luckily  this is not all of it and right menus allow doing more with the found items      NCBI Resources  v  How              Sequence neighbors    Sequence neighbors    Homologene neighbors       il    li                splaisan My NCBI Sign Out    GEOProfles  corone J O                       Advanced    Display Settings       Summary  20 per page  Sorted by Default order    Results  1 to 20 of 4409           1 04221 Next   Last gt  gt      1 Sumo2   Heart left ventricle and diaphragm comparison  1  Annotation  Sumo2  SMT3 suppressor of mif two 3 homolog 2  S  cerevisiae   Organism  Rattus norvegicus  Reporter  GPL341  1367452 at  ID REF   GDS3224  690244  Gene ID   NM 133594  DataSet type  Expression profiling by arra
133. ry   1   Total number of genes  15866    2   2002 genes are differentially expressed  3   Heart vs  Diaphragm        915 genes are up regulated   b  1087 genes are down regulated    Algorithm Options   1   One Way Between Subject ANOVA  Unpaired     Default Filter Criteria   1   Fold Change  linear   lt   2      Fold Change  linear   gt  2  2   ANOVA p value  Condition pair     0 05    Conditions   Heart  6   1   GSM160089 mas5 CHP  2   GSM160090 mas5 CHP  3   GSM160091 mas5 CHP  4   GSM160092 mas5 CHP  5   GSM160093 mas5 CHP  6   GSM160094 mas5 CHP    Diaphragm  5   1   GSM160095 mas5 CHP    2   GSM160096 mas5 CHP  3   GSM160098 mas5 CHP  4   GSM160099 mas5 CHP  5   GSM160100 mas5 CHP          Condition Heart  File GSM160093  10 1388044 at  Pfkfb2   Signalz 1 20             78                       results    The tabular results can finally be exported to local file s  for further use  IPA          eoo Windows 8 1 gg Q          E   b    En 4                   affymetrix Analysis 1 tac   RAE230A Analysis Result    Summary   Table   Scatter Plot   Volcano Plot   Chromosome Summary    Comparison  Heart   vs  Diaphragm    Search  Prev   Next   Show Hide Columns        Show Filtered Only  Clear Current Filter s   Reset to Default   Customize Annotations   View Interaction Network    Bi  D   Transcript ID   2 21   peo   Fold Change ANOVAp      FDR p value  weight   Bi weight      i     ja      linear   Heart value  Heart vs   Heart vs          Avg   Avg Signal            Design  5
134. s   AND rat Organism        Get information about GSE6943 used in this session    We will stick to the simple NCBI GEO search here and look for one particular experiment  GSE6943  used by CLC to build their tutorial   This work was published by  van Lunteren E  Spiegler 5  Moyer    5  The sequencing was performed on tumor cultures from 4 patients at    2 time points over 3 conditions  DPN  OHT and control   One control sample was omitted by the paper authors due to low qualityl l  Full  details about this dataset can be found at http   www ncbi nlm nih gov geo query acc cgi acc  GSE6943 and is reproduced in the next  figure  A lot of important information can be found on this page including the chip used for the experiment  the number of replicates  as  well as metadata information about the experimental setup     Series GSE6943    Status   Title   Organism  Experiment type  Summary    Overall design    Contributor s   Citation s     Submission date  Last update date  Contact name  E mail   Phone    Organization name    Department  Street address  City  State province  ZIP Postal code  Country    Platforms  1     Samples  12        More       Relations    BioProject       Query DataSets for GSE6943  Public on Jan 24  2008  Normal Heart vs Normal Diaphragm  Rattus norvegicus  Expression profiling by array    Comparison of gene expression of heart  left vent  and diaphragm of normal Sprague Dawley rats   young adult  Keywords  Cell type comparison    6 diaphragm samples  6 hear
135. s and gene expression patterns stored in  GEO  For more information about various aspects of GEO  please see our documentation  http   www  ncbi nlIm nih gov geo info   listings    and publications  http   www ncbi nlm nih gov pmc 3013736 2686538  2270403  1669752 1619900 1619899 539976 99122  gt  gt        If you are new to GEO have a look to their handout  http   www ncbi nlm nih gov geo info GEOHandoutFinal pdf  first           Find GEO datasets relevant for your Biological question    You may use the NCBI search page in a very basic way by entering your gene of interest to look for related knock out experiments or by  searching for a compound or disease name that is relevant to your research  Note that this will sometimes find too many datasets or miss the  true one you dream of     Other  better ways  of finding if GEO data does exist        The current NCBI search page  and its advanced counterpart  also allows restricting your queries in smart ways and reach the goal  of finding the best suited data in the repository  A related How To page be found at Find GEO datasets  Please read this page  and discover the advantages of adding filters to your queries      Fora good resource about how to build top notch queries in the GEO advanced search page     http   www  ncbi nlm nih gov gds advanced  look at the NCBI tutorial      with examples of good syntax to recycle and copy  41  As example  search experiments in rat with more than 100 samples with   100 500 Number of Sample
136. s here   Expand     References     1  f http   cran r project org  2   http   www rstudio com      Main  Page   PubMA_Exercise 1   PubMA_Exercise 2                 Exercise 3      Retrieved from  http   stelap local BioWareWIKI index php title PubMA_Exercise 2b amp oldid 1 1792   Category  PUBMA2014       This page was last modified on 20 October 2014  at 09 00     This page has been accessed 54 times       Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     23    24    PubMA Exercise 3    From BioWare WIKI    Clustering using the GEO Dataset browser  only for data with attached GDS ID     DATASET  BROWSER      Main Page   Hands on Analysis of public microarray datasets   PUubMA_Exercise 2   PubMA_Exercise 3      PubMA_ Exercise 4      Contents          Introduction    2        GEO Dataset browser     2 1 Start the tool    2 2 Select GEO Dataset Browser data    2 3 Download the full data table    3 Walking through the tools  m 3 1 Find Genes    3 2 Compare two sets of samples    3 2 1 Two tails comparison    3 2 1 1 Profile data  w 3 2 1 2 Profile pathways  m 3 2 2 Rank mean difference  4    amp  either      3 2 2 1 Search pathways enriched in the obtained subset     3 3 Cluster heatmaps  m 3 3 1 Hierarchical clustering     3 3 2 KMean clustering    3 4 Experiment design and value distribution    4 download exercise files    Introduction    The GEO Dataset Browser     takes care of providing users clustered data and he
137. s shows the 250 genes with the lowest p values  regardless of the significance of their p values  Sometimes 250 is not enough  and you still miss DE genes  as is the case in this example   sometimes 250 is way too much and only a small fraction of these 250 genes is really DE  So  always check the adjusted p values to decide how many genes of these 250 you are going to use for further analysis    P Value  raw p value before multiple testing correction   t  t statistic of the shrunken t test   B  B statistic or log odds that the gene is differentially expressed   logFC  Log2 fold change between the two experimental conditions    This table contains links through which detailed expression information can be retrieved for interesting genes  not further detailed here      6 Clicking on  Save all results  will open a new window with the full table that can be saved to disk as a tab separated text file using the browser File Save    option    15    www ncbi nlm nih gov geo geo2r backend geo2r cgi ctg time 14078           ID   adj P Val   P Value   E   B   logFC   Gene symbol   Gene title     1388876 at   7 67e 12   7 81e 16   5 91e 01   25 28866   7 14     ais    1374248 at   7 67e 12   9 7     16    5 79e401   25 14116    1 15e 01   Mybpcl   myosin  binding protein C  slow type     1374622 at   7 67   12   1 44   15    5 60e401   24 87971   4 69           1370033 at   3 47e 11   B 7le 15    4 80e 01   23 59223    7 79   Myli   myosin  light chain 1    1371339 at   4 62e 11   1 45 
138. san Desktop Robi   ults GSE6943 CEL GSM160092 CEL       Scatter plot  of file  shome splaisan Desktop Robi   ults GSE6943 CEL GSM160089 CEL vs    home splaisan Desktop Robi   ults GSE6943 CELYGSM160093 CEL       Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM1600893 CEL vs   fhome splaisan Desktop Robi   ults GSE6943_CEL GSM160094 CEL       Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM1600893 CEL vs    home splaisan Desktop Robi   ults GSE6943 CEL GSM160095 CEL       Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160083 CEL vs    home splaisan Desktop Robi   ults GSE6943_CELYGSM160096 CEL                                   Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160083 CEL vs   Ihomejsplaisan Desktop Robi   ults GSE6943 CEL GSM180098 CEL          Scatter plot  of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160083 CEL vs                    Quality check results    Click in the list to open  a fullsize view of the results     Chips showing very poor PLM  results may be excluded from  further analyses by checking   the  Exclude  box     N RODINA   e transcriptorm data preprocessor  ver  n    of file   homelsplaisan Desktop Robi   ults GSEG943 CEL GSM160095 CEL vs     thome splaisan Desktop Robi   ults GSE6943 CEL GSM160100 CEL          Scatter plot    of file   home splaisan Desktop Robi   ults GSEB943 CEL GSM160096 CEL vs     thome splaisan Desktop Robi   
139. sease Drugs Cell Types Misc          65  6943 DE Robina  238   238 genes   lt     ChEA  TRANSFAC and JASPAR PWMs  Genome Browser PWMs    Histone Modifications ChIP seq    microRNA           Table Grid Network 42    Click the bars to sort  Now sorted by combined score        An example of network view for the enriched Transfac  amp  Jaspar TF motifs    TRANSFAC and              PWMs  GSEB943 DE Robina  238                human        cBEPA  human     e                  m fh ZNF148  human                                   human        RELB  human        SRF  human     Webgestalt    This second tool largely overlaps in data sources with Enrich although its tabular reporting format makes it a little less attractive to my eyes  WebGestalt accepts many ID types and allows  selecting the exact background based on the array which is a plus as compared to Enrich and puts it even with DAVID in that respect     WebGestalt P  is     WEB based GEne SeT AnaLysis Toolkit   It is designed for functional genomic  proteomic and large scale genetic studies from which large number of gene lists  e g   differentially expressed gene sets  co expressed gene sets etc  are continuously generated  WebGestalt incorporates information from different public resources and provides an easy way for  biologists to make sense out of gene lists     Please read the full manual  http   bioinfo vanderbilt edu webgestalt WebGestalt manual 2013 04 12 pdf  for a good introduction to this tool     The probe list obta
140. software products only jsp productIdz131414 amp categoryId235623 amp productName  Affymetrix  2526 2523174 253B   Expression Console926252696252315390253B Software 1 1  2    http   www  affymetrix com support learning training tutorials tac ec index affx  l 2      Main  Page   Hands on Analysis of public microarray datasets      Retrieved from  http   stelap local BioWareWIKT index php titlezAnalyze GEO  data with the Affymetrix software amp oldid 11773   Category  Howto      This page was last modified on 19 October 2014  at 20 32     This page has been accessed 164 times     Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     80    Normalize CEL files with RMAExpress    From BioWare WIKI    Convert CEL files to normalized text tables    RMAExpress         Main  Page   Hands on Analysis of public microarray datasets      Contents       Introduction    2                      run with the GSE6943 data  m 2 1 Convert CEL data    2 2 Plot normalized data  m 2  Convert data with the Convertor tool  m 2 4 Final result  m 3 download exercise files    Introduction    Many programs  like CLC require normalized microarray data as input and do not support the CEL format  RMA_eXpress and its companion  convertor tool rapidely produce normalized and log transformed data from a collection of CEL files and the corresponding CDF annotation  database and without the need to install R and bioconductor packages     RMA express was cited
141. ss RT   6 1 0E 3 4 1E 2    GOTERM BP FAT glucan metabolic process RT   E 1 0E 3 4 1   2  m GOTERM_BP_FAT glycogen metabolic process RT   6 1 0E 3 4 1E 2  m GOTERM_BP_FAT energy reserve metabolic process RT   6 2 1   3 7 1E 2     GOTERM BP FAT cellular polysaccharide metabolic process RT   6 2 3E 3 7 9   2    GOTERM BP FAT polysaccharide metabolic process RT m  6 1 5E 2 2 9   1    A typical DAVID plot for the top cluster    54    myosin  light polypeptide 1   myosin  heavy polypeptide 2  skeletal muscle  adult  myosin  heavy polypeptide 1  skeletal muscle  adult  leucine rich repeat containing 10   myosin light chain  phosphoryatable  fast skeletal muscle  myosin  heavy chain 4  skeletal muscle   myosin  heavy chain 6  cardiac muscle  alpha   synemin  intermediate filament protein   troponin   type 1  skeletal  slow    troponin T type 1  skeletal  slow    troponin C type 2  fast    troponin   type 3  cardiac    myosin binding protein C  cardiac   troponin T type 3  skeletal  fast    troponin            2  skeletal  fast    troponin T type 2  cardiac    myosin  light chain 7  regulatory   ankyrin repeat domain 1  cardiac muscle    calsequestrin 1  fast twitch  skeletal muscle    ATPase  Ca   transporting  cardiac muscle  fast twitch 1  ankyrin repeat domain 23   actin  alpha  cardiac muscle 1  smooth muscle alpha actin  titin   ryanodine receptor 2  cardiac   nebulin   junctophilin 1   keratin 19   PDZ and LIM domain 3   cysteine and glycine rich protein 3   four and a half LIM
142. st click Recalculate on the GEO2R tab to apply the edits     When satisfied  go to the GEO2R tab and click the Top250 button to run a limma analysis for identifying DE genes     STU       When more than two grous are defined  GEO2R selects pairwise contrasts         triangular circular way  depending      the number of groups   These contrasts         are labelled with arbitrary names  GO  G1      Gn  and do not always reflect the user expectation but there is unfortunately little to be done in GEO2R to control this  choice  BUT more can be done when post processing the code in RStudio as will be shown in the dedicated tutorial    14    GEO2R Value distribution    k Quick start    Recalculate if you changed any options     ID                P Value t B logFC Gene symbol Gene title   k 1388876 at 7 87   12      1   16 59 1 25 3 7 14   k 1374248 at T 678 12 9 782 16  57 9 25 1  11 48 Mybpc1 myosin binding p     k 1374622 at    67   12 1 448 15  56 24 9  4 69   k 1370033 at 3 476 11 8 71   15  48 23 6  7 78        myosin  light chai     k 1371338 at 4 62e 11 1 45   14  46 23 2  8 25 Tnni1 troponin   type 1     k 1373697 at 7 192 11 2 712 14  43 6 22 1  7 13 Mybpc2 myosin binding p     k 1398306 at 9 892 11 4 35e 14 41 8 22 3  7 06 Ampd1 adenosine mono     k 1374672 at 1 14e 10 5 70e 14 40 8 22 1 8 82 Tnni3k            interactin      k 1367962 at 1 44   10 B 13e 14  39 6 21 8  11 53 Actn3 actinin alpha 3   k 1367964 at 1 44   10 9 53   14  39 1 21 7  8 57 Tnni2 troponin   type 2
143. st size   15923   Used universal gene list size  requiring annotation and one feature per gene only    8603  Applied filter to reduce features to one per gene   true   Filtered features to one per gene by using   feature with highest IQR  Expression values used when filtering features   Transformed expression values  Minimum category size required   10   Number categories passing mimimum size requirement   1451   Humber permutations for p value calculation    10000    m settings for the GSEA test for GO MF    Gene set enrichment analysis  Wed Aug 13 14 51 03 CEST 2014   Version  CLC Main Workbench 7 0 3  User  splaisan    Parameters     Gene identifier column used in tests   Gene symbol   Annotation column used in tests   GO molecular function   The features were ranked on   t statistic   group comparison   Heart   Diaphragm   Raw universal gene list size   15923   Used universal gene list size  requiring annotation and one feature per gene only    8969  Applied filter to reduce features to one per gene   true   Filtered features to one per gene by using   feature with highest IQR  Expression values used when filtering features   Transformed expression values  Minimum category size required   10   Number categories passing mimimum size requirement   503   Humber permutations for p value calculation    10000    m settings for the GSEA test for Pathways    67    Gene set enrichment analysis  Wed Aug 13 15 10 25 CEST 2014   Version  CLC Main Workbench 7 0 3  User  splaisan    Pa
144. stalt    User data and parameters  User data  textAreaUpload txt  Organism  rnorvegicus  Id Type  affy rae230a  Ref Set  affy rae230a  Significance Level  Top10  Statistics Test  Hypergeometric  MTC  BH  Minimum  2    This table lists the enriched KEGG pathways  number of Entrez IDs in your user data set for the pathway  the corresponding Entrez IDs  and the statistics for the enriched pathway   The statistic column lists     C  the number of reference genes in the category   O  the number of genes in the gene set and also in the category  E  the expected number in the category   R  ratio of enrichment   rawP  p value from hypergeometric test   adjP  p value adjusted by the multiple test adjustment    Finally  the pathway name is linked to          where the user ids are highlighted  the number of user gene ids is linked to a table with information about the user ids  and the Entrez IDs are linked to Entrez Gene     PathwayName  Gene EntrezGene Statistics  I   682930 24837 29658 29275 81636 29248 295929 29556 25399 117557 24239 C 70 0 15 E 1 54 R 9 72 rawP 2 19e   Dosen aay   64532 689560 64672 116600 il adjP 1 58e 09      Arrhythmogenic right ventricular    682930 29658 171009 83501 306871 287925 25399 24239 689560 24392 307505 C 58 O 12 E 1 28 R 9 38 rawP 3 59e   cardiomyopathy  ARVC  116600 09 adjP 1 29e 07  f f 682930 24837 29658 29275 29248 295929 29556 25399 117557 24239 689560 C 66 O 12 E 1 46 R 8 25 rawP 1 70e   Hypertrophic cardiomyopathy  HCM  12 116600 08 adjP 4 08
145. statistics for the enriched pathway   The statistic column lists            the number of reference genes in the category          the number of genes in the gene set and also in the category    E the expected number in the category     R ratio of enrichment     rawP  p value from hypergeometric test     adjP  p value adjusted by the multiple test adjustment    Finally  gene set is linked to the wikipathway graph  which is generated dynamically from bioinfo vanderbilt edu wg gsat  the number of user gene ids is linked to a table with information about the user ids  and the Entrez IDs are linked to Entrez Gene   PathwayName  Gene EntrezGene Statistics    311029 171009 29388 24838 29275 295929 29556 296369 362867 117557 56781 24837 29389 29248 C 35 0 16 E 0 77 R 20 74 rawP 5 13e     Striated Muscle Contraction 16 292879 171409 18 adjP 1 95e 16       132 0 12    2 91     4 12 rawP 3 34e     Calcium Regulation in the Cardiac Cell 12 85420 60449 64532 24392 64672 686019 682930 81636 24239 689560 24245 24173     aa Relaxation and Contraction 10 29275 85420 60449 64532 24392 117558 81636 58965 689560 24245 C 134 0 10 E 2 95 R 3 38 rawP 0 0007 adjP 0 0089  Glycogen Metabolism 4 64561 29353 24645 25739 C 32 0 4 E 0 71 R 5 67  rawP 0 0051 adjP 0 0485              regulated genes with circadian 4 24253 305234 498642 25714 C 43 O 4 E 0 95 R 4 22 rawP 0 0145 adjP 0 1102  Glucuronidation 2 24645 25058 C 10 O 2 E 0 22 R 9 07 rawP 0 0194 adjP 0 1229  Integrin mediated cell adhesion 5 60352 3
146. sults showing groups of genes and samples that behave in a very similar way  This method offers the advantage of not being biased by knowledge  and only directed by the data itself     ST    7   The trade off is that running Kmean over and over again will always return results and not necessarily the same results  This seems counterintuitive for the average a Biologist       who is trained towards reproducibility but is a reality for the analyst you will not get the same results as the ones below    BUT you will find back mostly the same genes in similar  clusters    KMean clusters can identify core processes represented in sample groups by a small set of genes whose co regulation is very clear  The user decides arbitrarily of the number of clusters to  be found  Too few will lead to mixed profiles while too many will lead to apparent redundancy     Swan ete Set Chatter z      O                        pub cations  FAQ  MIAME  Email GEO    NCBI  gt   gt          gt   gt  GDS Browser  gt  GDS Analysis 12     K means K medians clustering divide genes into k partitions  The best solution in 3 trials is reported     Color Options Clustering Options    Low expression level   Bue _ 2  K method  Mean       Clusters  k  2 15   4     Display     s    60532204 Heart left ventricle and diaphragm comparison  Rattus norvegicus        NLM   NIH   GEO Help   NCBI Help   Disclaimer   Accessibility    From the obtained solution  it is clear that the four clusters are grouping genes having clea
147. t   grep  GPL341   attr gset   names    else idx  lt   1  gset  lt   gset  idx        make proper column names to match toptable  fvarLabels gset   lt   make names fvarLabels gset        group names for all samples  aml  lt   ci  en  i nap     ep      g        ap    neo    nal     gj      nl      21     131     21             log2 transform  ex  lt   exprs gset   qx  lt   as numeric quantile ex  c 0   0 25  0 5  0 75  0 99  1 0                    LogC  lt    qx 5   gt  100        qx 6  qx 1   gt  50 54 qx 2   gt  0       qx 2   gt  0  amp  amp  qx 2   lt  1 65 qx 4   gt  1           4    2   if  Logt  1 ex which ex  lt   0    lt   NaN  exprs gset   lt   log2 ex          download exercise files    Download exercise files here   Expand              References     1  f https   www ncbi nlm nih gov geo info geo2r html    16    227    Erik van Lunteren  Sarah Spiegler  Michelle Moyer  Contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism     Respir Physiol Neurobiol  2008  161 1  41 53   PubMed  18207466    WORLDCAT    DOI   P p        http   www ncbi nlm nih gov geo query acc cgi acc  GSE6943      Main  Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 1   PubMA_Exercise 2      PubMA_Exercise 2b   PubMA_Exercise 3      Retrieved from  http   stelap local BioWareWIKTU index php titleZPubMA  Exercise 2 amp oldid 1 1740   Category  PUBMA2014       This page was last modified on 15 Octob
148. t Filter s   Reset to Default  Customize Annotations  View Interaction Network       Diaphragm  Bi  weight  Avg Signal   log2   1367962_at Rn 17592 1   3 e actinin     374248_at Rn 91    2 3 76   13 41E       myosin    parvalbt    Transcript ID   Array    Fold Change ANOVA p  FDR p value   linear   Heart value  Heart vs   Heart vs   vs  Diaphragm     Diaphragm  Diaphragm     Gene    Symbol    Transcript Cluster  ID             troponir       carboni        sarcolip       myosin          myosin       myosin              troponir                 gt  This might take    long time to perform hierarchical clustering   Please Wait   SERIE      Click to Perform Hierarchical Clustering Analysis          AES PoE See             roponir       aquapol  ATPase     myosin                    Rn 4012  Rn 38647                            synapto       BMS       te    troponir       calpain       calcium  MP r    myosin             adenosi       amylase       myoger                nebulin  374049_at Rn 24381 1 9 48 2 93 9 2 36E 07 0 0000 OC100 smooth                                     Gene Rows  2002 Selected Rows  1 Selected  3                   eoo Windows 8 1      9        5 L  En 4 XE         affymetrix Analysis_1 tac   RAE230A Analysis Result    Summary    Table Scatter Plot   Volcano Plot romosome Summary      Heart vs  Diaphragm     Analysis Type  Gene Level Differential Expression Analysis  Array Type  RAE230A   Genome Version  rn5   Annotation File  RAE230A na34 annot csv       Summa
149. t samples    van Lunteren E  Spiegler 5  Moyer M   van Lunteren E  Spiegler S  Moyer M  Contrast between cardiac left ventricle and diaphragm  muscle in expression of genes involved in carbohydrate and lipid metabolism  Respir Physiol  Neurobiol 2008 Mar 20 161 1  41 53  PMID  18207466   Feb 02  2007   Jun 21  2012   Erik    van Lunteren   exv4 cwru edu   216 791 3800   Cleveland VA Medical Center   Pulmonary 111J W    10701 East Boulevard   Cleveland   OH   44106   USA    GPL341  RAE230A  Affymetrix Rat Expression 230A Array    GSM160089 Diaphragm 1  GSM160090 Diaphragm 2  GSM160091 Diaphragm 3                     125     gt  Clusters  amp  heatmaps    Download family  SOFT formatted family file s   MINIML formatted family file s     Series Matrix File s     Format  sort     MINIML  TXT l2     Supplementary file Size Download File type resource    GSE6943 RAW tar    Raw data provided as supplementary file    16 8 Mb   http  custom  TAR  of CEL     Processed data included within Sample table    STi  i Note the two red boxes that highlight links to two additional resources at GEO   lt       The first link directs the user to the PRJNA98125  http   www ncbi nlm nih gov bioproject PRJNA98125  BioProject page  related to this submission and including extra links to the corresponding DataSet  Note that only those submissions pre procesed by  the GEO team are linked to a DataSet  read http   www ncbi nlm nih gov geo info datasets html   DataSets form the basis of  GEO s advanced da
150. t ventricle and diaphragm muscle in  expression of genes involved in carbohydrate and lipid metabolism  Respir Physio  Neurobiol 2008 Mar       Contains DataSet information   2008 01 24 experiment variable subsets and  expression value measurements     plain text  tab delimited format      DataSet full SOFT file    Cluster Analysis       Download       DataSet SOFT file  aries family SOFT file   aries family MINIML file  notation SOFT file                 download the file from the  DataSet SOFT file  link in the main window    wget ftp   ftp ncbi nlm nih gov geo datasets GDS3nnn GDS3224 soft GDS3224 soft gz    T decompress and replace line ends bby valid unix CR    gunzip  c GDS3224 soft gz   tr  Wr   Wn   gt  GDS3224 soft txt    vf find table 11      1  grep  n  dataset table begin  GDS3224 soft txt  49   dataset_table begin       we need only the number 49     header end    grep  n  dataset table begin  GDS3224 soft txt   awk  BEGIN FS      print  1       v show the result     echo   header_end   49    7 split the file in two              S header end  GDS3224 soft txt  gt  GDS3224 data header txt    cat GDS3224 soft txt   sed  e  1   header_end d   gt  GDS3224 data txt    V inspect results    igrep     GSM GDS3224 data header txt   V GSM160089   Value for GSM160089  Diaphragm 1  src  Diaphragm  VGSM160090   Value for GSM160090  Diaphragm 2  src  Diaphragm  VGSM160091   Value for GSM160091  Diaphragm 3  src  Diaphragm  VGSM160092   Value for GSM160092  Diaphragm 4  src  Di
151. ta display and analysis tools  including gene expression profile charts and clusters detailed in  PubMA_Exercise 3      The second link Analyze with GEO2R  http   www ncbi nlm nih gov geo geo2r  acc GSE6943  opens a new window with the  GEO2R submission form further detailed in the next PubMA Exercise 2     download exercise files    Download exercise files here   Expand     References         http   www ncbi nlm nih gov geo info faq html What       http   www ncbi nlm nih gov geo info GEOHandoutFinal pdf      http   www ncbi nlm nih gov geo info qqtutorial  html       http   www ncbi nlm nih gov geo info qqtutorial  html fields    1        amp  O            Erik van Lunteren  Sarah Spiegler  Michelle Moyer  Contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and    lipid metabolism   Respir Physiol Neurobiol  2008  161 1  41 53   PubMed  18207466    WORLDCAT    DOI   P p        http   www  ncbi nlm nih gov geo query acc cgi acc24 GS E6943    6    http   www  bioconductor org packages release data experiment html parathyroidSE html      Main  Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 2      Retrieved from  http   stelap local BioWareWIKI index  php titleZPubMA  Exercise 1  amp oldid 1 1745   Category  PUBMA2014      This page was last modified on 16 October 2014  at 08 49     This page has been accessed 133 times       Content is available under Creative Commons Attribution Non Commercial S
152. tijn Meganck  Cosmin Lazar  David Steenhoff  Alain Coletta  Colin Molter  Robin Duque  Virginie de Schaetzen  David Y Weiss Sol  s  Hugues Bersini  Ann Now    Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R Bioconductor packages    BMC Bioinformatics  2012  13 335    PubMed 23259851    4WORLDCAT    DOI   I e     Alain Coletta  Colin Molter  Robin Duqu    David Steenhoff  Jonatan Taminau  Virginie de Schaetzen  Stijn Meganck  Cosmin Lazar  David Venet  Vincent Detours  Ann Now    Hugues Bersini  David Y Weiss  Sol  s   InSilico DB genomic datasets hub  an efficient starting point for analyzing genome wide studies in GenePattern  Integrative Genomics Viewer  and R Bioconductor    Genome Biol   2012  13 11  R104    PubMed 23158523    WORLDCATZ   DOI   I e     Cosmin Lazar  Stijn Meganck  Jonatan Taminau  David Steenhoff  Alain Coletta  Colin Molter  David Y Weiss Sol  s  Robin Duque  Hugues Bersini  Ann Now    Batch effect removal methods for microarray gene expression data integration  a survey    Brief  Bioinformatics  2013  14 4  469 90    PubMed 22851511    4WORLDCAT    DOI   I p     Jonatan               David Steenhoff  Alain Coletta  Stijn Meganck  Cosmin Lazar  Virginie de Schaetzen  Robin Duque  Colin Molter  Hugues Bersini  Ann Now    David Y Weiss Sol  s  inSilicoDb  an R Bioconductor package for accessing human Affymetrix expert curated datasets from GEO    Bioinformatics  2011  27 22  3204 5    PubMed 21937664    
153. troponin T type 3  skeletal  fast  chr1 222573326 2225784  _ 9  1367962_at Rn 17592 1 13 23 46 0 18 0 26 395 93 2 09E 13 2 28E 10 Actn3 actinin alpha 3 chri 227051529 2270674  10  1370033 at Rn 40120 1 13 61 5 01 0 11 0 33 387 67 4 58E 13 4 03E 10 Myl1 myosin  light chain 1 chr9 73364156 73369886  11  1388139 at Rn 10092 1 12 6 4 03 0 19 0 2 379 82 9 19E 14 1 33E 10 Myh2 myosin  heavy chain 2  skeletal muscle  adult chr10 53487290 53491320  16   1387787_at Rn 6534 1 13 94 5 38 0 09 0 16 377 88 2 22E 15 1 17E 11 Mylpf myosin light chain  phosphorylatable  fast skeletal muscle chri 205649549 2056523  _17  1368108 at Rn 10833 1 13 69 5 32 0 14 0 35 330 79 1 20E 12 8 63E 10       2  1 ATPase  Ca   transporting  cardiac muscle  fast twitch 1 chri 204836392 204854  18  1367896 at Rn 1647 1 12 04 3 83 0 17 0 09 296 23 6 88E 15 1 86   11 Car3 carbonic anhydrase 3 chr2 SS  19  1370412_at Rn 13846 1 12 26 4 3 0 1 0 12 249 05 7 77E 16 6 17E 12 Tnnti troponin T type 1  skeletal  slow  chri 75652128 75660589  21   1367964_at Rn 9924 1 13 93 5 98 0 11 0 47 247 62 1 42   11 4 75   09 Tnni2 troponin   type 2  skeletal  fast  chri 222505097 2225070  22   1374248 at Rn 9153 1 12 67 4 76 0 15 0 12 240 41 7 77E 15 1 86E 11 Mybpci myosin binding protein C  slow type chr7 29192152 29199672  26  1370214 at Rn 2005 1 12 27 4 66 0 22 0 14 194 36 1 97E 13 2 28E 10 Pvalb parvalbumin chr7 119420291 1194352  30   1373697_at Rn 27586 1 12 41 4 99 0 15 0 38 170 81 8 29E 12 3 46E 09 Mybpc2 myosin binding pr
154. ude  box     BOXPLOT Plot  of 11 Affymetrix data files          MAPLOT Plot  of file   home splaisan Desktop RobiNA results GSE6943 CELIGSM160089 CEL          MAPLOT Plot  of file   home splaisan Desktop RobiNA results GSE6943 CEL GSM1860090 CEL     0 Exclude          MAPLOT Plot  of file   home splaisan Desktop RobiNA results GSE6943 CEL GSM1860091 CEL             MAPLOT Plot  of file   home splaisan Desktop RobiNA results GSE6943 CEL GSM160092 CEL     0 Exclude          MAPLOT Plot  of file   home splaisan Desktop RobiNA results GSE6943 CEL GSM180093 CEL          MAPLOT Plot  of file   home splaisan Desktop RobiNA results GSE6943 CEL GSM180094 CEL    C Exclude          MAPLOT Plot  of file   home splaisan Desktop RobiMA results GSE6943 CEL GSM160095 CEL     0 Exclude          MAPLOT Plot  of file   home splaisan Desktop RobiMA results GSE6943 CEL GSM160096 CEL          C  Exclude                              Previous     gt  Next               Step 3 of 4    42        Manual       ZA             Quality check results    Click in the list to open  a fullsize view of the results     Chips showing very poor PLM  results may be excluded from  further analyses by checking   the  Exclude  box           RNA Plot  of 11 Affymetrix data files          HIST Plot  of 11 Affymetrix data files          Scatter plot  of file  Jhome splaisan Desktop Robi   ults GSE6943_CEL GSM160089 CEL vs    home splaisan Desktop Robi   ults GSE6943_CEL GSM160090 CEL          Scatter plot  of file   home
155. ults GSE6943 CEL GSM160098 CEL          Scatter plot    of file   home splaisaniDesktop Robi   ults GSE6943 CEL GSM160096 CEL vs     thome splaisan Desktop Robi   ults GSE6943 CEL GSM160099 CEL          Scatter plot    of file   home splaisan Desktop Robi   ults GSE6943 CEL GSM160096 CEL vs     thome splaisan Desktop Robi   ults GSE6943 CEL GSM1860100 CEL          Scatter plot    of file  Jhome splaisan Desktop Robi   ults GSE6943 CEL GSM180098 CEL vs      home splaisan Desktop Robi   ults GSE6943_CEL GSM160099 CEL       Scatter plot    of file  Jhome splaisan Desktop Robi   ults GSE6943 CEL GSM160098 CEL vs      home splaisan Desktop Robi   ults GSE6943_CEL GSM160100 CEL       NINININININ    Scatter plot    of file  Jhome splaisan Desktop Robi   ults GSE6943 CEL GSM180099 CEL vs     thome splaisan Desktop Robi   ults GSE6943_CELIGSM160100 CEL          PCA Plot  of 11 Affymetrix data files       HCLUST Plot  of 11 Affymetrix data files                         a Previous     gt  Next          Previous     gt  Next                   Step 3 of 4       Step 3 of 4                Quality check results    Click in the list to open  a fullsize view of the results     Chips showing very poor PLM  results may be excluded from  further analyses by checking   the  Exclude  box     N RobINA    ranscripto 5                            version 4 bu    of file   homelsplaisaniDesktop Robi   ults GSEG943 CEL GSM160095 CEL vs   thome splaisan Desktop Robi   ults GSE6943 CEL GSM160100 CEL      
156. umber    of Links          PUBLICATIONS  PubMed 1      THER DATASETS     GEO DataSets 2    GEO Data Details             Parameter Value  Data volume  Spots 191076  Data volume  Processed Mbytes 4   Data volume  Supplementary Mbytes 17          Select GEO Dataset Browser data    The above link  http   www ncbi nlm nih gov bioproject   Db gds amp DbFrom bioproject amp  Cmd Link amp  LinkName bioproject_gds amp  LinkReadableName GEO   20DataSets amp ordinalpos 1  amp IdsFromResult 98125  brings you to the GDS  page shown next  The first reference links to GEO2R while the second is annotated with a heatmap and links to GEO Dataset Browser        SES Duuo                  Advanced    Settings    gt   Summary  Sorted by Default order Send to        Results  2       Normal Heart vs Normal Diaphragm  1   Submitter supplied  Comparison of gene expression of heart  left vent  and diaphragm of normal Sprague Dawley rats   young adult Keywords  Cell type comparison  Organism  Rattus norvegicus  Type  Expression profiling by array  Dataset  GDS3224 Platform  GPL341 12 Samples  Download data  GEO  CEL   Series Accession  GSE69  PubMed Similar studies       Lis Le    Analyze with                     2  Analysis of normal heart left ventricle and diaphragm of young adult Sprague Dawley males  Concurrent  rhythmic contractions of the diaphragm and heart are needed to sustain life  Results provide insight into  transcriptional strategies for ensuring long term energy supplies in these two musc
157. urce  http   bioconductor org biocLite R    iIbiocLite     tbiocLite  affy      You also need Bioconductor packages for the annotation of the spots on the arrays  In the example below I use the annotation packages for the Arabidopsis ATH 1 array  Of  course  you need the annotation packages that correspond to the arrays that you used in your experiment   There are two possibilities for obtaining these packages        Bioconductor has a list of annotation packages  http   www bioconductor org packages release data annotation    generated by Affymetrix   In this list find the name    of the cdf file that corresponds to the array that you have used  we used ATHI arrays so the cdf file is called ath1121501cdf    Package Maintainer Title    Codelink ADME Rat 16 Assay Bioarray       mel          annotation data  chip         16         ag dt E         Affymetrix Arabidopsis Genome Array        annotation data                Bioconductor  agcdf Package agcdf  Maintainer  roi                 Probe sequence data for microarrays of  Maintainer   YP    ag             iro Base Level Annotation databases for  p    Maintainer Anopheles  bidopsis dbo a        Base Level Annotation databases         Maintainer Arabidopsis  3th1121501 db     Affymetrix Arabidopsis ATH1 Genome              Array annotation data         ath1121501   Bioconductor  ath1121501cdf Package ath1121501cdf  Maintainer  ath1121501prob os Probe sequence data for microarrays of  ath it 21 50 i probe  aie type ath1121501  
158. utorial s   Please refer to the Hands on Analysis of public microarray  datasets page for more information about this dataset     Load the GSE6943 dataset    Change focus to search for  Experiment  and type GSE6943 in the text field followed by a click on SEARCH     Search        C  Gene symbol       Entrez ID       Probe ID       HomoloGene ID  C Annotation       Platform        Experiment        Signature    Tables   ALL    Q value max   1E     Four enriched TS  Transcriptome Signatures  are found by the program    88    Abort    Delete                                  Save    Load         Send to      plugins         Several options are available here  please explore  and we choose to show the results to get details about the four hits    Results       4 signatures 1 platforms 1 experiments   V   ALL  V   GPL341     GsE6943     9482    923      94879128C      9487FE8A0     94BB8DCA2      Load            Send      plugins   Create group    Find Transcriptome Signatures    Selecting each of the four hits and displaying the detailed statistical results on the top right window      first TS  662 genes  and Glycolysis as main aspect    Results       4 signatures 1 platforms l experiments     ALL        94879128C        94B7FE8A0      94BB8DCA2      Load data    Send to plugins   Create group     Plugins                              Heatmap   Settings    m second TS  778 genes  related to mitochondrial functions    Info             Platform Experiment    Organism    Rattus norve
159. ve splicing    1  Import CHP  ARR   2  Create conditions  3  Run analysis    Y    Gene results Exon results Splicing results  visualization visualization visualization    Link out to public databases  such as Ensemb  and    NCBI  to determine transcript function       A summarized in the above picture  we will now perform the two steps required to perform a full analysis starting from a set of CEL files obtained from the GEO repository  The method can be  divided into two steps as detailed below  the first step converts CEL data to a format better suited for differential expression analysis using the Expression console  the second step computes  differential expression base don user defined sample groups and using the Transcription Analysis Console  Results presented here correspond to the blue highlights in the above workflow     The Affymetrix Expression Console  EC     The EC software allows step by step processing of the data by sequentially clicking each tool on the right hand side of the window    69       Show     startup  Other  Configuration  tools are not detailed here     Converting CEL data to CHP format required for TAC    Using the  Study  tools  the CEL files downloaded from GEO are loaded in the software  then normalized using a chosen method  out of RMA  MASS and PLIER   We use RMA as this is the  standard method     Expression Console    Si       Probe Cell Intensity Data  GSM160083 CEL  GSM160090 CEL  GSM160091 CEL  GSM160092 CEL  GSM160093 CEL  GSM160094 C
160. ww  ncbi nIm nih gov gds   3    http   tagc univ mrs fr tbrowser index2  php option com_content amp task view amp id 19 amp pop 1  amp page 0 amp Itemid 23      Main_Page   Hands on Analysis of public microarray datasets      Retrieved from  http   stelap local BioWareWIKI index php title Find_Transcriptome_Signatures_with_TranscriptomeBrowser amp oldid 1 1366   Category  Howto          This page was last modified on 8 September 2014  at 08 28     This page has been accessed 59 times   m Content is available under Creative Commons Attribution Non Commercial Share Alike unless otherwise noted     92    PubMA Exercise 7    From BioWareWIKI    IPA analysis of the GEO2R DE table    INGENUITY    PATHWAY ANALYSIS      Main_Page   Hands on Analysis of public microarray datasets   PubMA_Exercise 6      sr  77 The following content was obtained with the DE table generates by GEO2R  Very similar resuts are expected with the results from RobiNA or from the Affy console        iua    E          Contents      Introduction  2 IPA Tutorial material  3 IPA analysis    3 1 Upload data in IPA     3 2 Start core analysis and set filter      3 3 Review the obtained core results    4 download exercise files       Introduction    Ingenuity Pathway Analysis  IPA  is strongly advised for more advanced users usage  You can use IPA on any Java installed computer after asking for a personal account to mailto bits vib be and login in here   https   apps ingenuity com ingsso login    service https   3A  
161. y  count  12 samples  ID  51748101  GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors Sequence neighbors     1 Ube2d3   Heart left ventricle and diaphragm comparison   2  Annotation  Ube2d3  ubiquitin conjugating enzyme E2D 3   Organism  Rattus norvegicus   Reporter  GPL341  1367456 at  ID REF   GDS3224  81920  Gene ID        031237   DataSet type  Expression profiling by array  count  12 samples   ID  51748105   GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors   Sequence neighbors           J        Heart left ventricle and diaphragm comparison  3  Annotation  Arf1  ADP ribosylation factor 1  Organism  Rattus norvegicus  Reporter  GPL341  1367459 at  ID REF   GDS3224  64310  Gene ID   DataSet type  Expression profiling by array  count  12 samples  ID  51748108  GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors  Homologene neighbors           1 Gdi2   Heart left ventricle and diaphragm comparison  4  Annotation  Gdi2  GDP dissociation inhibitor 2  Organism  Rattus norvegicus  Reporter  GPL341  1367460 at  ID REF   GDS3224  29662  Gene ID   BM387347  DataSet type  Expression profiling by array  count  12 samples  ID  51748109  GEO DataSets Gene UniGene Profile neighbors Chromosome neighbors  Homologene neighbors          Filters and        related data are       detailed          and left to your curiosity    Profile data    This lets you download differentially expressed data selected by the tool    Homoloqene neiqhbors    Homoloq
162. y terms     Functional enrichment Analysis of the RobiNA DE data    In order to continue with the most complete and easy workflow  we use here the RobiNA table obtained by de novo analysis of 11 of the 12 samples  one CEL file being damaged on the  GEO repository     Preparing probe lists for enrichment testing    Web tools will require probe or gene lists to compute enrichment  they will not take into account the degree of DE or the confidence in that DE both of which are left to the user to filter   We can produce these two lists using Excel  better would be R  in few easy steps      import the table in excel  taking care of protecting gene symbols against interpretation             column with absolute value of logFC     filter on the abs logFC  with a minimal cutoff of 2  four fold DE                E      E      abs logFC v AveExpr      t  m  P Value                 7    8 86904868  9 2066062 138 180385 7 73   23 1 23   18 4  8 0070084   8 66636177 112 671244 1 2421 9 90E 18     8 5138 3  E e abs logFC  8 88483556   8        8 17319442   8 173 4 Ascending 1 Descending  7 92028193   7 920    8 612651          8 60278367   8 602   6 81311994   6 813 M 4  8 63584881   8 63      color   7 55         8 56116535 Greater Than      Equal          5 9654704              5 6550758   6 3184512   Choose One  6 87976276  3 84777011     8 32377965   8       Select All   5 9095495      6 5722595 0 300258662   7 0312331    0 300362846   6 6484909   6 648   0 308624963  4 31206383   6 27
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
NAVIGON 92 Plus    Spektrum SPMAR12020  Medidor de aislamiento 1550    Manual de instalación del VLC  User Manual - FiberPlex Technologies  Samsung MM-ZL7 User Manual  ADC-70HD-DC ANALOG to DIGITAL CONVERTER 取扱説明書  CC1111/CC2511 USB HW User`s Guide (Rev. B    Copyright © All rights reserved. 
   Failed to retrieve file