Home

Analysis of RNA-Seq Data with Partek ® Genomics Suite™ 6.6

1. Nochangebetween Muscle andNOT 5062 78 13 2353 0 00522849 10 11657 oGDHL OGDHL 0 000212638 0 000212638 0 0143701 69 5889 Muscle down vs NOT muscle 4701 33 234 535 0 099774 Figure 28 The ANOVA results sheet showing p value Mean Ratio and Fold Change for each gene The indicates that values could not be calculated one group has no reads The format of the ANOVA spreadsheet is similar for all workflows The description of each column may be found in Table 2 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 19 Table 2 Interpretation of ANOVA results The column number in the gene_rpkm spreadsheet Column ID amp Gene Symbol Gene Symbol based on the annotation file used p value Tissue The overall p value for the Tissue factor included in the ANOVA model p value Muscle vs NOT The specific p value for the linear contrast of candidate levels within the factor muscle in this example there are only two levels The ratio of the mean values of RPKMs of the two groups selected in the Mean Ratio contrast where Group_2 is the denominator reference numbers less than one imply down regulation Linear fold change with O indicating NO change negative numbers indicating down regulation and positive numbers indicating up regulation for each contrast defined and a text description The F statistic essentially the ratio of signal to noise high F value low p F Tissue value for each factor SS Tiss
2. 36 69 co ee oas orana feo beani 238 eizan noe foe oo l 11 co f 0 6 0 16 g 84 40 16 63 r p i 98632 Nore one o m 8 4137 19 mo museoa mo fiora None oe o sso on ta eeo fiamra Nore nore 6508 i mt os w 9 9 86 9 65 5 S Pa 108 108 a03 13 108 1d r99 41 540 es 1 1 11 21 _ i genomic features The description of each column in the unexplained peaks spreadsheet is shown in Table 3 Table 3 Description of annotated columns in unexplained _regions Section of the nearest gene that overlaps with the region intron starts before or after a gene region contained in a gene gene contained within the region region overlaps with a gene Nearest Feature Name of the nearest feature and strand or Distance of the detected region to the closest gene If the detected region is mapped to the intron of a gene the distance is shown as 0 Overlapping Features Distance to Nearest Feature bps Analysis of RNA Seq Data with Partek Genomics Suite 6 6 29 Right clicking on a row header and selecting Browse to Location will show the reads mapped to the chromosome For this tutorial a couple of genes are selected to show regions that are located after a known gene or in the intron of a gene e With the unexplained_regions spreadsheet open right click on Average Coverage column 6 and select Sort Descending Select row 45 and Brow
3. 33 rhythmic process 0 44 establishment of localization 0 8 localization 0 94 growth 1 1 death 1 4 biological regulation 1 7 cellular component biogenesis 2 1 cellular component organization 2 7 cellular process 6 4 metabolic process 7 F igure 36 Viewing the Gene Ontology Browser Step 7 Detect Single Nucleotide Variations To learn how to detect nucleotide differences and annotate SNP calls with overlapping genes refer to the Analysis of Nucleotide Variations in NGS Data with Partek GS 6 6 tutorial using Help gt On line Tutorials gt Next Gen Sequencing Step 8 Detecting Unexplained Regions During Step 3 a spreadsheet named unexplained_regions was generated Figure 37 This spreadsheet contains locations where reads map to the genome but are not annotated by the transcript database RefSeqGene in this case The spreadsheet can be sorted by descending Average Coverage column 6 This sheet is potentially very interesting as it may contain novel findings Analysis of RNA Seg Data with Partek Genomics Suite 6 6 26 a Current Selection 19 ee ae cee ID 7 Coverage i5 M 16 M 17 M EE Es M i 2 4 M iq heart_fa 50019 8 as p peru ee aa in a asaza 2a8a165 eat o1 lasz7 m a ema moo p na a aar aa be pee e me i1s695008 118605172 fheart fa 4 zaas preo e ferta fao psr meos pas na OO a qr laza m fa las hami q Rows 51234
4. aida muscle_fa 28g i i hg19 Reference chr8 3 3KBps 128749 7KBps 128751 0KBps 128752 3KBps 128753 Figure 16 Zooming to look at a gene of interest MYC Step 3 Analyze Known Genes The next step in the workflow is to detect differentially expressed genes by performing mRNA quantification This function creates data tables at the transcript and gene levels and identifies those transcripts that are differentially expressed or spliced across all samples The raw and normalized reads are also reported for each sample The normalization method used by PGS is Reads Per Kilobase of exon model per Million mapped reads RPKM Mortazavi et al e Select mRNA quantification from the Analyze Known Genes section of workflow The dialog box shown in Figure 17 will appear Your choices for these options depend on the aims of your experiment In the Configure the test section Figure 17 you are asked if the assay can discriminate between the sense and antisense transcripts Your answer depends on the method used for sample preparation as some methods preserve the strand information of the original transcript and some do not A directional mRNA Seg sample prep protocol only synthesizes the first strand of cDNA whereas other methods reverse transcribe the mRNA into double stranded cDNA In the latter case the sequencer reads sequences from both the sense and antisense strands but does not discriminate between them e
5. analysis Folstad Analyze a Fartek Flow project Figure 1 The RNA Seq workflow The RNA Seq workflow will be used throughout this tutorial to analyze RNA Seq data These and other commands for analyzing RNA Seq data are also available from the command toolbar Analysis of RNA Seq Data with Partek Genomics Suite 6 6 2 Step 1 Importing the aligned reads PGS can import large files several gigabytes of reads that are already aligned to a reference genome The data used for this tutorial was already aligned by Partek Flow The sequence importer can handle the two standard alignment formats BAM and SAM Conversion from ELAND txt files to BAM files is available via the Tools menu Also note that if a quantification project has been created in Partek Flow this project can also be imported and analyzed in this scenario invoke the workflow from the very bottom of the standard RNA Seg workflow Figure 1 Related Analyze a Partek Flow project e Under mport from the RNA Seg workflow select Import and manage samples The Sequence Import window will open Figure 2 ASO Sequence Import Please navigate to the directory where your samples are located and select samples using the check box Current Directory Tutorial Data 4TissueFastaFiles ANASed 4 Tissue Data Back Browse Files of type BAM Files bam a ts Downloads mi File Name Type Dat
6. coded tracks help to visualize differential expression and alternative splicing The reads that are mapped to a certain tissue and the proportion of the transcript expressed in this tissue are colored the same In this screenshot brain is colored green heart is colored yellow liver is colored orange and muscle 1s colored red Analysis of RNA Seq Data with Partek Genomics Suite 6 6 21 ADA Partek Genome Viewer kG lzor dQ sca RefSeq Transcripts RefSeq Transcripts hg19 RefSeq Transcripts hg19 sLc2s3 leI a aa a ESE Isoform proportion 1 transcripts Legend Base Colors RefSeq Transcripts Bam Profile brain_fa Bam Profile heart_fa Bam Profile liver_fa Isoform Proportion brain fa heart_fa liver_fa m muscle fa Bam Profile muscle_fa k a o 213811 Arte i o D a SON a Cytoband hg19 Genomic Label Fal _ NM_005888 NewTrack Remove Track LJ LI Track Labels Strand Base Colors BASBCHCBTa amp N Track height Q brain_fa 9 Color 137 69 0 heart_fa liver_fa 35 muscle_fa 446 297 149 0 98987 4KBps 98989 5KBps 98991 6KBps 98993 7KBps 98995 8l Apply _ Reset S T Aam a Figure 30 Viewing one gene SLC25A3 in the Genome Viewer SLC25A3 was reported by Wang et al Nature 2008 to have mutually exclusive exons MXEs The reads mapped to the transcripts of this gene
7. in each of the tissue samples are shown in the browser as different tracks The relative abundances of the individual transcripts of this gene are shown by the height of the color coded tracks Note the transcript NM_213611 has low expression while transcripts NM_005888 and NM_002635 have higher expression The alternative splicing pattern is shown as the paper explained indicating that different forms of the third exon are used in a tissue specific manner Using the Track panel At this point you may find it useful to start altering track properties Each track can be individually configured to alter the visualization of the reads For example e Select a track and drag it to change the position of the track in the viewer e Select the brain track and a configuration panel will appear at the bottom of the track panel enabling the configuration of the sequence reads display the Y axis the color and the labels of the track e Select a track and increase its height Step 5 Create a Gene List To create a list of transcripts that are both significantly differentially expressed AND alternatively spliced among the four tissue samples use the Create gene list function from the workflow to invoke the List Manager Figure 31 Each of the tabs Venn Diagram ANOVA Streamlined and Advanced can be used to generate combinations of lists Analysis of RNA Seq Data with Partek Genomics Suite 6 6 22 Spreadsheet Name 1 MyRNASegProject 1 Alignment_
8. 041791910 0606085 0 0646193 Rows 31945 Cols 27 TIT Figure 24 Viewing the second part of the RNA Seg_result transcripts spreadsheet Soalalal lalea o alalala ealealal lalea ealealal lalea ealealal lalea oalalal lsala It is possible to derive basic information about differential and alternative splicing between your samples if you don t have replicates from the RNA Seg_result transcripts spreadsheet using a simple chi squared or log likelihood tests since each sample is represented only once and the null hypothesis is that the transcripts are evenly distributed across all samples However the power of PGS resides in the implementation of a mixed model ANOVA that can handle unbalanced and incomplete datasets nested designs numerical and categorical variables any number of factors and flexible linear contrasts when you do have biological replicates Detecting differential expression using the Partek ANOVA During import you created a categorical attribute called Tissue and assigned the 4 samples to either the Muscle or NOT muscle groups This step was to create replicates within a group albeit this grouping is somewhat artificial and is used in this dataset simply to illustrate the ANOVA Replicates are a prerequisite for analysis of differential expression using the ANOVA test e Select Differential expression analysis from the Analyze Known Genes section of the workflow You have the choice of analyzing at either th
9. Analysis of RNA Seq Data with Partek Genomics Suite 6 6 Overview RNA Seq is an emerging technology that uses high throughput sequencing to generate information about a sample s RNA content Partek Genomics Suite PGS offers convenient visualization and analysis of the high volumes of data generated by RNA Seq experiments This tutorial will illustrate how to e Import large next gen data sets Add attribute data to your files Visualize large next gen data sets Get read counts for each of the transcripts in a database Find transcripts that are differentially expressed among phenotypes Find genes that are alternatively spliced among phenotypes Set up a basic analysis of variance ANOVA model Detect nucleotide variations across samples or comparing to reference genome Find nonannotated regions and map it to the genome Note the workflow described below is enabled in PGS version 6 6 Please contact the Partek Licensing Team at licensing partek com to request this version or use the Help gt Check for Updates command to check whether you have the latest released version The screenshots shown within this tutorial may vary across platforms and across different versions of PGS Description of the Data Set In this tutorial you will analyze an RNA Seg experiment using PGS s RNA Seq workflow The data used in this tutorial was generated from mRNA extracted from four diverse human tissues skeletal muscle brain heart and live
10. Cols 6 Figure 37 The unexplained_regions spreadsheet PGS can look for genes that overlap the regions PGS locates reads not only outside of a known gene it also takes exon exon information from the database and locate reads to the intron of a gene This is helpful for finding potential novel transcripts exons sequencing biases etc Go to Tools gt Find Overlapping Genes option in the command toolbar In the resulting dialog box Figure 38 select Add a new column with the gene nearest to the region and then OK l lt Find Overlapping Genes Find Overlapping Genes Select a method for annotating regions with genomic features Add a new column with the gene nearest to the region C Create a new spreadsheet with genes that overlap with the regions C Coca Figure 38 Find Overlapping Genes Analysis of RNA Seg Data with Partek Genomics Suite 6 6 27 e Specify the database you wish to use In this example select RefSeq Transcripts Figure 39 e Select OK AOO Output Overlapping Features Report regions from the specified database a RefSeq Transcripts The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non red undant well annotated set of sequences including genomic DNA transcripts and proteins AceView Transcripts AceView provides a curated comprehensive and non redundant sequence representation of all p
11. Counts MyRN L gene_reads RNA Seq_re if gene_rpkm RNA Seq_resi 1 gene_rpkm ANOVA lway 1 mapping_summary RNA S iftranscript_reads RNA Sex 1 transcript_rpkm RNA Seq i transcripts RNA Seq_rest 1funexplained_regions RNA ANOVA Steamined Advanced Specify criteria using the buttons in the left panel Use shift or control to select multiple criteria Click Save to generate a spreadsheet from the selected criteria Specify criteria Criteria Specify New Criteria 2 Combine criteria aD Union Or A Intersection And Manage criteria Edit Remove Save List e Select the Advanced tab e Select the Specify New Criteria button Diff Exp Select OK In the Configure Criteria dialog box Figure 32 provide a name for the list e g Select the transcripts spreadsheet and the p value DiffExp column Set Include p values significant with FDR of 0 05 A list of 25 632 transcripts that pass this criteria will be generated Analysis of RNA Seq Data with Partek Genomics Suite 6 6 23 Ang Configure Criteria Data source Name Spreadsheet 1Aranscripts RNA Seq_results transcripts Column 8 p value DiffExpr Configure criteria Include p values significant with FDR of pass 25632 aD C cu fb Figure 32 Configure Criteria dialog box e These same steps can be used to create a list of transcripts likely alternatively spliced using the same FDR cutoff and the AltSplice p value column e Select both lists in t
12. In this example change the first sample to Brain circled area Samples Drag and drop files to group them into sarpplee 0 Unassigned files Sample ID liver_fa Users Scott Partek Tutorial Data 4TissueFastaFiles RNAseq 4 Tissue Data liver_fa bam Sample ID muscle fa files OtUWwUHUHUTTCH iy CBC oa A Figure 5 Using the Assign files to samples dialog to give informative sample names Analysis of RNA Seq Data with Partek Genomics Suite 6 6 5 e For this tutorial the default names are suitable so select Cancel to proceed However if you have data from one sample which is split into two or more BAM files you can also use Manage samples to merge these reads into one sample Additionally you may use Add samples or Remove selected samples if needed e Click Close to proceed If the files are being imported for the first time they will be sorted in order to quickly visualize and analyze the data The imported data will appear in a spreadsheet Figure 6 Each sample is listed in a row with the number of aligned reads displayed for each sample ANO Co fo eal lt tliat em ame Alle 1 RNAseg 4 Tissue Data seq Current Selection brain_fa i Sample ID Number of 3 brain fa 10984232 3 heart fa 11199653 liver fa 11583902 muscle fa 14486541 Figure 6 Viewing the imported data in a spreadsheet Oo ho Notice that the Sample ID names in column 1 are gray denoting a text fi
13. NA Seq Data with Partek Genomics Suite 6 6 8 In column 3 one can see that all imported aligned reads from Figure 6 align exactly once to the human genome This is dependent on the options used during the alignment process For other data and alignment options one might observe more than one alignment per read In column 2 there are single end reads with zero alignments per read reported because BAM files also contain all the reads that were not aligned during the alignment process Step 2 Visualization Once imported it is possible to visualize the mapped reads along with gene annotation information and cytobands e Select the parent spreadsheet RNA Seg 4 Tissue Data in this example e Select Plot Chromosome View in the Visualization section of the workflow Unless you have previously downloaded an annotation file during another experiment you will be prompted to select an annotation source Figure 13 ANOO No default annotation You don t appear to have a recognized default annotation database for this genome build hgi9 Select an annotation source Ensembl Transcripts release 62 Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Based on Release 62 GO Ensembl Transcripts release 63 Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on t
14. Select No for Can assay discriminate between sense and antisense transcripts because the library preparation method did not preserve the strand information The dialog also asks if you would like the intronic reads to be compatible with the gene in the gene level result spreadsheets By selecting Yes reads that might correspond to new or Analysis of RNA Seq Data with Partek Genomics Suite 6 6 12 extended exons will be considered part of the gene in the gene level spreadsheets and the RPKM calculation will include the intron length in the transcript length e For In the gene level result report intronic reads as compatible with the gene select No The next option Report results with no reads from any sample determines whether the results spreadsheets include transcripts or genes that have no reads from any sample Selecting Yes will include all the genes transcripts in the transcriptome even if there are no reads for that transcript gene from any sample e As this tutorial is not concerned with genes that are not expressed in all samples select No for Report results with no reads from any sample Lastly Report unexplained regions with more than reads determines if reads that are mapped to the genome but not to any transcript are reported on a separate spreadsheet unexplained_regions If checked you must also specify the minimum number of reads that must be present before the region is reported e Make sure Report unexplained regions w
15. antly expressed genes transcripts and find GO terms that are significantly enriched within the list For a detailed introduction of GO Enrichment refer to the GO Enrichment User Guide Help gt On line Tutorials gt User Guides e Select Gene set analysis in the Biological Interpretation section of the workflow e Select GO Enrichment and select Next Choose the spreadsheet that was just created Figure 34 and select Next a00 Gene Set Analysis Select the spreadsheet that contains the genes you want to test 1 Diff_Exp_and_Alt_Spl Diff Exp and Alt Spl txt Figure 34 Selecting the spreadsheet for GO Enrichment e The Gene Set Analysis dialog shown in Figure 35 allows you to select either Fisher s Exact test or the Chi Square test The Chi Square test is faster than the Fisher s Exact test but is less accurate for sparse data The defaults for the rest of the options are acceptable Be sure to check Invoke gene ontology browser on the result Select Next cone Set Ano OOOO OOOO O OOOO O l Configure the parameters of the test Use Fisher s Exact test O Use Chi Square test Invoke gene ontology browser on the result Restrict analysis to functional groups with more than 2 genes Restrict analysis to functional groups with fewer than genes Result File GO Enrichment txt e Select Default mapping file and then select Next e A GO Enrichment spreadsheet as well as a browser Figure 36 will be generated with the enri
16. band and Genomic Label The Genome Sequence Cytoband and Genomic Label tracks are shown at the bottom of the panel Genome Sequence will display the bases different colors and labels of the reference genome specified when zoomed in sufficiently The other two labels are helpful for navigating about the chromosome Using the Genome Viewer In the Genome Viewer select _ for selection mode and IB for navigation mode In navigation mode to zoom in on a region of interest left click and draw a box to zoom in on any region for a detailed view of the reads mapped to this region Alternatively in the bottom left hand corner of the browser drag the slider left to right or click the plus button to zoom in Drag right to left or click the minus button to zoom back out alo ja It is also possible to zoom directly to a specific gene of interest using Genome View For example type MYC into the search box at the top of the window and the viewer will show just the MYC transcript in the RefSeq track and the aligned reads Figure 16 The individual base colors are now visible To return to the whole chromosome view simply select l Analysis of RNA Seq Data with Partek Genomics Suite 6 6 11 k Q chrb 128748315 128753681 e RefSeq Transcripts eee E MYC RefSeq Transcripts Base Colors BABCHGCBHTEN brain_fa 19 a i a oT E T heart_fa I z ie aa Lai a liver_fa 23 u te sp ia
17. chment score shown for each GO term e Browse through the results to find a functional group of interest by examining the enrichment scores The higher the enrichment score the more overrepresented this functional group is in the input gene list Alternatively you may use the Analysis of RNA Seq Data with Partek Genomics Suite 6 6 29 Interactive filter on the GO Enrichment spreadsheet to identify functional groups that have low p values and perhaps a higher percentage of genes in the group that are present Gene Ontology Browser Spreadsheet 2 Pie Chart f Bar Chart Gene List Forest Plot ee e eee Functional Group gt molecular_function Enrichment Score gt metabolic process gt cellular process gt cellular component organization 2 gt cellular component biogenesis gt biological regulation f developmental process 0 gt death gt growth multicellular organismal process 0 gt localization gt establishment of localization biological adhesion 0 05 gt rhythmic process gt immune system process reproductive process 0 06 gt reproduction pigmentation multi organism process 0 08 gt locomotion gt multi organism process gt reproductive process response to stimulus 0 Locomotion 0 12 gt biological adhesion gt multicellular organismal process 0 gt developmental process gt response to stimulus j reproduction 0 28 pigmentation 0 17 immune system process 0
18. cript rpkm transcripts RNA Seq_results transcripts unexplained_regions RNA Seq_results unexplained regions Figure 19 Viewing the list of results spreadsheets after detecting differential expression and alternative splicing The same dialog box that appears if you select Describe results will pop up if you have not previously turned this off This dialog box explains the spreadsheets generated by mRNA quantification The mapping_summary spreadsheet The mapping_summary spreadsheet Figure 20 is a summary of reads that have been mapped to the intronic xxonic or intergenic regions This is a QA QC step to give you a brief idea of how the reads are distributed across the transcriptome You should expect the results of replicates to resemble each other with respect to the distribution of reads Analysis of RNA Seq Data with Partek Genomics Suite 6 6 14 Current Selection brain fa 7 8 are ID Number of Percentage of Alignments reads which fully reads within an reads between overlap exon intron 1 brain_fa 10984232 NOT muscle 10984232 61 9032 1 59361 5 62868 30 8745 2 heart_fa 11199653 Muscle 11199653 48 8748 1 77955 6 79877 42 5468 al lver_ta 115863902 NOT muscle 11583902 76 8934 2 026 4 27921 16 8073 4 musecle_fa 14466541 Muscle 14466541 68 3527 ec4444 5 464089 23 839 Figure 20 Viewing the RNA Seg_results read summary spreadsheet The gene_reads and gene_rpkm spreadsheets PGS presents the gene level da
19. cripts spreadsheet details the analysis results of RNA Seq if there no replicates Figure 23 and Figure 24 Each row lists a separate transcript A description of each column can be found in Table 1 Table 1 Column Descriptions in the transcripts spreadsheet Column Label Chromosome Start Stop Genomic location of the transcript Information about the transcript and gene symbol and indicate the transcript is transcribed from the forward or Strand Transcript Gene reverse strand respectively transcript name is determined by the transcriptome database selected NCBI mRNA ID in this case In the absence of replicates differences in expression between samples are detected using a log likelihood test at the transcript level Sorting by ascending p value shows the most significant differentially expressed transcript at the top of the spreadsheet In the absence of replicates splice variants are detected using a Pearson chi square test at the gene level Sorted by ascending chisq and p value AltSplice p value shows genes with the greatest probability of expressing different RNA isoforms between samples at the top of the spreadsheet Read counts raw of each Number of reads directly from the aligned data assigned to the sample transcript for each sample Read counts RPKM of each Raw read counts normalized by RPKM Mortazavi et al sample assigned to the transcript for each sample If the aligner that was used to align this data supported
20. d on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Ensembl Transcripts release 62 Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Based on Release 62 Y Manage available annotations Figure 18 Specifying a transcriptome database Reads will now be assigned to individual transcripts of a gene based on the Expectation Maximization E M algorithm Xing et al In PGS the E M algorithm is modified to accept paired end reads junction aligned reads and multiple aligned reads 1f these are present in your data For a detailed description of the E M algorithm refer to the RNA Seq white paper Help gt On line Tutorials gt White Papers e Select OK to perform the RNA Seq analysis which will generate a mapping summary spreadsheet and up to six other spreadsheets containing the analyzed results Figure 19 eoo Partek Genomics Suite Version HACHA 4 RNAseg 4 Tissue Data 10 25 2011 1 Alignment_Counts RNAseq 4 Tissue Data 10 25 2011_1_alignments_per_read txt gene_reads RNA Seq_results gene reads gene_rpkm RNA Seg_results gene rpkm mapping_summary RNA Seg_results read summary transcript_reads RNA Seq_results transcript reads transcript_rpkm RNA Seq_results trans
21. e Gene or Transcript level Select Gene level analysis Figure 25 e Make sure the gene_rpkm spreadsheet is selected as the Spreadsheet e Select OK to open the ANOVA dialog Analysis of RNA Seg Data with Partek Genomics Suite 6 6 17 800 Differential expression analysis First specify the type of result that you would like to analyze then specify the spreadsheet Specify type of result e Gene level g Transcript level g Specify RPKM Spreadsheet Spreadsheet 1 gene_rpkm ANA Seq_results gene rpkm ki ee ca Figure 25 Differential expression analysis e Categorical experimental factors are listed in the Experimental Factor s panel on the left Move Tissue to the ANOVA Factor s pane on the right as shown in Figure 26 eoo ANOVA of Spreadsheet 1 gengerikm sy Experimental Factor s ANOVA Factor s 1 Sample ID SE g 3 Tissue 2 Number of Alignments Add Interaction gt 2 lt Remove Factor 2 mere ee RI Response Variable s All Response Variables ly Specify Output File Users Scott Partek Tutorial Data RNAseq ANOVAResults Browse ee coca E O ay om Figure 26 The ANOVA dialog If the ANOVA were now performed without contrasts a p value for differential expression would be calculated but would only indicate if there are differences within the factor Tissue it will not inform you which groups are different or give any information on t
22. e modified a M brain_fa bam 05 19 2011 09 19 36 ta Movies Zj heart_fa bam 05 19 2011 09 20 10 Music mi liver_fa bam 05 19 2011 09 20 46 srl ig M muscle_fa bam 05 19 2011 09 21 57 Partek Tutorial Da ta 4TissueFastaFi ta Flow using ta RNAseg 4 Copy Number g Exon Flow e_coli Gene Expressic taIntegrated Gen farmiRNA Next Gen Seq 4 files selected Figure 2 Viewing the Sequence Importer window Choose the files you want to import e Files of type Select BAM Files bam from the drop down list Browse to the folder where you have stored the BAM files Select the files to import by checking the box to the left of the data files For this tutorial select brain_fa heart_fa liver_fa and muscle_fa e Select OK and the next Seguence Import window will open as shown in Figure 3 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 3 a Gsen E M s Output file MyRNASegProject Species Homo sapiens z Genome Transcriptome reference hgi9 5 E used to align the reads Figure 3 Viewing the Sequence Import wizard specify Output file and directory using Browse Species and Genome Configure the dialog as follows e Output file provide a name for the top level spreadsheet Use the Browse button to change the output directory e Species Select Homo sapiens from the drop down menu since this data is from human subjects e Genome Transcriptome reference used to align the read
23. eld but we want to change these such that the sample names are a categorical factor that can be used in the downstream analysis To change the properties of column 1 right click on the column header to invoke the contextual menu and select Properties In the resulting dialog Figure 7 set Type to categorical Attribute to factor and then select OK A amp A OGO Properties of Column 1 in Spreadsheet 4 Column Label Sample ID B coa O y Figure 7 Setting column properties The sample names in column 1 will now be black to denote a categorical variable Next we will add attributes for grouping the data 1 e into replicates or sample groups From the workflow select Add sample attribute choose the Add a categorical attribute option and select OK Figure 8 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 6 AOO Add Sample Attributes Specify type Select a type of sample attribute to add to the spreadsheet Add attributes from an existing column Add a categorical attribute i Add a numeric attribute HO caress Figure 8 Adding sample attributes to describe or group samples by categories In this tutorial we have four samples from different tissues but to illustrate the statistical analysis options later in the workflow we will group the tissues into two groups muscle muscle and heart samples and NOT muscle liver and brain samples These two groups will then be compared at a later step e In
24. he mRNAs and protein sequences deposited into public databases from the scientific community Based on Release 63 Ensembl Transcripts release 64 Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Based on Release 64 RefSeq Transcripts The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non red undant well annotated set of seanuences including cenoamic DNA transerints and oroteins _ Don t show this dialog again Figure 13 Selecting an annotation source e For this tutorial select RefSeq Transcripts PGS will download the relevant file subject to any firewall proxy settings and save it to your default library location You may also notice a dialog flash as the cytoband file is automatically downloaded The Genome Viewer window will open with chromosome 1 displayed Figure 14 You may choose other chromosomes from the drop down menu circled to change which chromosome is displayed Analysis of RNA Seq Data with Partek Genomics Suite 6 6 9 L U U U U U U Tracks RefSeq Transcripts hg19 RefSeq Transcripts hg19 Legend Base Colors ATP1A1 CD101 CD2 FAM46C MAB21 L3 MAN1A2 Bam Profile brain_fa RefSeq Transcri pts MIR320B1 MIR942 PTGFRN SLC22A15 SYCP1 TSHB TTF2 VANGL1 Bam Profile hear
25. he magnitude of the change between groups fold change or ratio To get this more specific information you need to define linear contrasts e Select the Contrast button circled in Figure 26 e For Select Factor Interaction Tissue will be the only factor available as it was the only factor included in the ANOVA model in the previous step if multiple factors were included they could be selected here top red circle e To define a contrast between the two candidate levels Muscle and NOT muscle select and move them as shown in Figure 27 The bottom group is the reference or control group By defining this contrast you will produce a specific p value and a Analysis of RNA Seq Data with Partek Genomics Suite 6 6 18 fold change with the reference group as the denominator in the linear fold change calculation e Select Add Contrast circled e Select OK to return to the ANOVA dialog and select OK again to perform the ANOVA Configure of Spreadsheet 1 gene_rpkm Data is already log transformed 7 O we Base 20 No 2 Add Combinations 2 LL Candidate Level s Label Mu ale Muscle NOT muscle Add Contrast Level gt lt Remove Contrast Level Label NOT muscle Add Contrast Level gt lt Remove Contrast Level m Other Statistics Estimate Fratio Tstatistic Contrast Name Factor Interaction Status Delete OK Cancel 5 Figure 27 Define
26. he right panel under Criteria and then select Intersection from the left pane of the List Manager Select OK A list of 12 369 genes will be generated that includes all the genes that are both significantly differentially expressed and alternatively spliced among the four tissue samples e Under Manage Criteria select Save List e Check the box for the intersection sheet of Diff exp and Alt splice and select OK This list will now be available when you Close the List Manager Figure 33 1 MyRNASeoProject Alignment_Counts MyRNASegProject_alignments_per_read txt Diff_exp_and_Alt Splice Diff exp and Alt Splice tet gene_reads RNA Seq_results gene reads gene_rpkm RNA Seg_results gene rpkm ANOVA lway ANOVAResults mapping summary RNA Seq_results read summary transcript_reads RNA Seq_results tanscript reads tTranscript_rpkm RNA Seq_results tanscript rpkm transcripts RNA Seq_ results transcripts unexplained_regions RNA Seq_results unexplained regions m Figure 33 A list of the differentially expressed and alternatively spliced genes is now available for downstream analysis The list of differentially expressed and alternatively spliced transcripts will be used in the next step Biological Interpretation for GO Enrichment Analysis Analysis of RNA Seq Data with Partek Genomics Suite 6 6 24 Step 6 Biological Interpretation GO Enrichment With the GO Enrichment feature in PGS you can take a list of signific
27. ith more than reads is checked and specify 5 as the number of reads AOG RNA Seq Quantification Specify a database of genomic features to quantify b mRNA b Genomic Variants b miRNA Manage available annotations Configure the test Can assay discriminate between sense and antisense transcripts Yes e No g In the gene evel result report intronic reads as compatible with the gene Yes No g Repor results with no reads from any sample Yes No g wi Report unexplained regions with more than 5 reads Result file ANA Seq_results Browse Describe results ok Cancel Figure 17 RNA Seg Quantification options e In this tutorial you will use an mRNA database RefSeq Select the black arrowhead circled in Figure 18 select RefSeq Transcripts and OK to continue Analysis of RNA Seg Data with Partek Genomics Suite 6 6 13 Specify a database of genomic features to quantify RefSeq Transcripts The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non red undant well annotated set of sequences including genomic DNA transcripts and proteins AceView Transcripts AceView provides a curated comprehensive and non redundant sequence representation of all public mRNA sequences mRNAs from GenBank or RefSeq and single pass cDNA sequences from dbEST and Trace Ensembl Transcripts Ensembl transcripts are base
28. junction l reads these columns one per sample will show the aume ton eases normalized RPKM reads assigned to junction reads for this transcript Paired reads only normalized unique paired reads RPKM that intersect with the transcript but are considered incompatible because the mate is not found in the same transcript chisq and p value DiffExpr Incompatible RPKM Analysis of RNA Seq Data with Partek Genomics Suite 6 6 16 Current Selection 1769 T B zil chisg DIFfExpr p value DIffE chisg AltSplice E anaes ase L 29371 NR _024540 WASH7F 2 40653 0 49242 0 o 0 z 1 323892 328582 MR _028325 LOC100132062 0 497153 0 919516 0 599538 0 740989 4370 0 o 0 199846 3 1 323892 328582 MR _028322 LOC100132287 0 497153 0 919516 0 599538 0 740989 4370 0 0 0 199846 4 1 323892 328582 NR_O283527 LOC100133331 L49337 0 6838 2 37299 0 882403 4273 0 o 0 600308 5 1 661139 665732 NR_O28327 1 ILOC10013333110 726237 0 867016 207299 0 882403 4273 4 2 3 Rows 31943 Cols 27a nT p Figure 23 Viewing the first part of the RNA TET transcripts spreadsheet _Eurrent Selection 1 17 B x 20 25 2T heart_fa liver_fa sde_f ain_fi heat i fa liver _fa i art_f liver_fa muscle fa RPKM RPEM IPR cti Junction Junction unction i Incompatible Incompatible RPKM RPKM RPKM 0 0 0390218 0 00394784 0 00315682 co olca 0 00394784 0 00315682 0 0 0 0121279 0 00969787 4 0 085223 0
29. linear contrasts Once the ANOVA has been performed on each gene in the dataset an ANOVA child spreadsheet ANOVA Iway ANOVAResults will appear under the gene_rpkm sheet Figure 28 Current Selection 8876 LF Column 2 d 3 Column ID 6 Gene Symbol T 8 MeanRatio Mu FoldChange FoldChange Muscle vs NOT muscle Description 4 5 p value Tissue p value Muscle vs 4 6773 GPR98 GPR98 7 3945 06 No change between Muscle and NOT 4 45814e 06 2 izme vwar vwar 8 25054e 06 8 25054e 06 0 136816 7 30908 Muscle down vs NOT muscle 121203 362 263 0 0059778 1 3 14378 ISCARAS SCARAa5 7 10032e 05 7 10032e 05 26 873 26 873 Muscle up vs NOT muscle 14082 4 73 2804 0 0104074 4 16178 SYTLS SYTLS 0 000145978 _0 000145978 No change between Muscle and NOT 6848 84 o 418199 0 000122122 5 16823 TMEM33 TMEM33 0 00014654 0 00014654 0 243243 4 11112 Muscle down vs NOT muscle 6822 59 11 9366 0 00349913 6 9181 LOC339751 LOC339751 0 000174524 0 000174524 No change between Muscle and NOT 5728 38 0 326635 0 000114041 7 17804 VASN VASN 0 000196221 0 000196221 0 368362 2 71472 Muscle down vs NOT muscle 5094 81 19 7077 0 00773639 8 679 ANKRDS6 ANKRD56 0 000197364 0 000197364 0 00389606 256 669 Muscle down vs NOT muscle 5065 28 9 12651 0 00360356 9 8485 kRT74 kRT74 0 000197462 0 000197462
30. llowing is a detailed description of each track in the default view RefSeq Transcripts The RefSeg Transcripts track shows all genes encoded on the forward strand of chromosome 1 This experiment uses RefSeqGene which defines genomic sequences of well characterized genes to be used as the reference annotation track Mouse over a Analysis of RNA Seq Data with Partek Genomics Suite 6 6 10 particular region in this track and all genes within this region are shown in the information bar visible in the top right of Figure 14 Zoom into this track to see individual genes including alterative isoforms Zooming in on one track automatically zooms all other visible tracks thus you can now see the reads that mapped to this particular gene across all samples RefSeq Transcripts The RefSeq Transcripts track shows all the genes encoded on the reverse strand of chromosome 1 Legend Base Colors This track indicates the color coding for individual bases Although included in the default view the individual bases are only visible once zoomed into a region of interest Bam Profile muscle_fa brain_fa heart_fa and liver_fa These tracks show all the reads that mapped to chromosome one from the four tissue samples The y axis numbers on the left side of the tracks indicate the raw read counts The aligned reads are shown in the Genome Viewer in each track with a different color for each Bam Profile track Genome Sequence Cyto
31. on of the workflow select Choose sample ID column e Use the drop down menu to select the Sample ID column the samples names will be shown Figure 11 e Select OK ANE Choose Sample ID Column The sample ID column is required for integrated analysis using the filename is not recommended The specified sample IDs must match the sample IDs from the spreadsheet with which you want to integrate Sample IDs are case sensitive If you don t have a column that you want to use as Sample ID please use Add sample attribute from the workflow to add a Sample ID column Sample ID Column 1 Sample ID Sample ID Preview brain_fa heart_fa liver_fa muscle_fa Figure ll Specifying the correct Sample ID column e For quality assessment select Alignments per read from the QA QC portion of the workflow After analyzing the four samples a new child spreadsheet named Alignment_Counts is created Figure 12 ANDO Partek Genomics Suite Version 6 11 0518 1 Alignment_C Cao a ee ah bea Pim Eme Se 1 RNAseq 4 Tissue Data i Current Selection brain_fa Alignment_Counts RNAseq 4 Tissue Data_alignments_per_read txt DE are 0 Single End 1 Single End Alignments Alignments Per Read Per Read brain_fa 298450 10984232 2 hean fa 628704 11199653 liverfa 248819 11583902 muscle_fa 527781 14486541 a E Figure 12 Alignment_Counts spreadsheet Analysis of R
32. onal assistance with this data set contact the Partek Technical Support staff at 1 314 878 2329 or email us at support partek com Date last updated Feb 2012 Copyright 2012 by Partek Incorporated All Rights Reserved Reproduction of this material without express written consent from Partek Incorporated is strictly prohibited Analysis of RNA Seq Data with Partek Genomics Suite 6 6 30 References Mortazavi A Williams B A McCue K Schaeffer L amp Wold B Mapping and quantifying mammalian transcriptomes by RNA Seq Nature 2008 5 621 8 Wang E T Sandberg R Luo S Khrebtukova I Zhang L Mayr C Kingsmore S F Schroth G P amp Burge C B Alternative isoform regulation in human tissue transcriptomes Nature 2008 456 470 6 Xing Y Yu T Wu YN Roy M Kim J Lee C An expectation maximization algorithm for probalisitic reconstructions of full length isoforms from splice graphs Nucleic Acids Res 2006 34 3150 3160 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 31
33. per row with transcript IDs in columns As shown in Figure 22 you can see that there are 31 946 columns representing 31 943 annotated transcripts The normalization is by RPKM Mortazavi et al If you have biological replicates in your sample groups and want to do differential expression on the transcript level this is the spreadsheet to you would use Similarly if you have biological replicates in your sample groups and want to perform Alternative splicing analysis this is the spreadsheet to use as input The same analysis PCA and ANOVA can Analysis of RNA Seq Data with Partek Genomics Suite 6 6 15 be done on this spreadsheet as described above in the RNA Seg_result gene rpkm spreadsheet _ Current Selection brain_fa L 2 i 4 Du 6 D B i 10 Sample ID Number of Tissue NR_O24540 gt W NR_O26525 gt LO NR_O26322 gt LO WR_O28327 gt LO NR_O28327 1 gt NR_033908 gt LO NR_O243521 gt NC Alignments ASHP 100132062 C1001322857 C100133331 LOC100133331 C100288069 RNA00115 L 10934232 Notmuscde O 0 0 0 0 0852231 0 0 138253 2 heart fa 11199653 Musde 0 0 0 0 0 0417919 lo 0 271187 3 liver_fa 11583902 Notmuscde o 0 00394784 0 00394784 0 0121279 o o6oso85 0 125932 0 131096 4 muscle_fa 14986541 Musde 0 0390218 o 00315682 o 00315682 o 00969787 0 0646193 0 0503498 _ i o 104829 Rows 4 Cols 31946 4 iy Figure 22 Viewing the RNA Seg_result transcript rpkm spreadsheet The transcripts spreadsheet The trans
34. r from different donors and sequenced on the Illumina Genome Analyzer The single end mRNA Seq reads were mapped to the human genome hg19 allowing up to two mismatches using Partek Flow alignment and the default alignment options The output files of Flow are BAM files which can be imported directly into Genomics Suite 6 6 BAM or SAM files from other alignment programs like ELAND CASAVA Bowtie BWA or TopHat are also supported This same workflow will also work for aligned reads from any sequencing platform in the aligned BAM or SAM file formats Data and associated files for this tutorial can be downloaded by going to Help gt On line Tutorials from the PGS main menu Analysis of RNA Seq Data with Partek Genomics Suite 6 6 l Workflow Open the RNA Seg workflow within PGS by selecting it from the Workflow drop down list in the upper right corner of the software The entire workflow is shown in Figure 1 ows anason 0 RNA Seq vi tmpon Import and manage samples Add sample attributes Choose sample ID column m Alignments per read v Analyze Known Genes mRNA quantification E Differential expression analysis Alternative splicing analysis OOCL Create gene list v Allele Specific Analysis Detect Single Nucleotide Variations v i Visualization Plot chromosome view Cluster based on significant genes VIC Brose iteretaton J Gene set analysis a Pathway
35. s Select the genome build against which your data was aligned For this tutorial data choose hg 19 since the data was aligned to the reference genome hg19 e Select OK This will open the BAM Sample Manager window Figure 4 The Bam Sample Manager window shows the files to be imported The Manage sequence names option allows you to check or modify the chromosome name mapping not an issue for human samples but may cause problems with esoteric organisms if the chromosome names used by the aligner do not match the chromosome names in the genome annotations It is also possible to add the samples to a separate experiment rather than adding them to the existing experiment by using the Add new experiment button Samples may be removed from the experiment with Remove selected samples samples must first be selected by clicking on the row in the list of samples Analysis of RNA Seq Data with Partek Genomics Suite 6 6 4 Samples C mams JA brain taam a hean fa bam a liver_fa bam Genomic muscie_fa bam a Experiment hh Figure 4 Bam Sample Manager window e In this tutorial the individual file names are short but in some cases the names may be much longer Assigning shorter or more informative names will lead to clearer labels legends later in the workflow To change the names select Manage samples to invoke the Assign files to samples dialog Figure 5 e The path to the file is shown and the Sample ID is the filename by default
36. s RefSeq Transcripts RefSeq Transcripts hg19 RefSeq Transcripts hg19 Isoform proportion 1 transcripts Legend Base Colors RefSeq Transcripts Bam Profile brain_fa Bam Profile heart_fa Bam Profile liver_fa Isoform Proportion Bam Profile muscle_fa m brain_fa heart_fa liver_fa muscle _fa Cytoband hg19 Genomic Label A ne SA 7 NewTrack Remove Track E E E Gea E T Spee f Labels EEE Base Colors BEAMCHCHT EN Strand Track height brain_fa Color 31226 20817 10409 0 heart_fa liver_fa 223286 148857 74429 0 muscle_fa 0 0MBps 62 3MBps 124 6MBps 186 9MBps 249 3M Apply Reset Figure 29 Viewing the transcript results in the Genome Viewer The viewer is almost identical to that seen after importing the aligned reads the difference is the inclusion of the Isoform Proportions track highlighted in the Tracks list in Figure 29 Next you are going to view a single gene SLC25A3 to understand how the PGS Genome Browser can help you visualize differential expression and alternative splicing results e Type SLC25A3 in the Search bar at the top of the window The browser will browse to the gene Figure 30 e The muscle brain heart liver and genomic label tracks were described earlier in the tutorial Here the focus is on the Isoform Proportion track to explain how those color
37. se to location to show a region within an intron of UNC45B Figure 41 left panel This may be a novel exon Select row 10744 and Browse to location to show a region that starts bp after CD82 Figure 41 right panel This peak may represent an extended exon IE uncase unc RefSeq Transcripts E i E a brain_fa heart_fa liver_fa muscle_fa UNC45B RefSeq Transcripts 1 unexplained_regions untitled Base Colors BABCHGCOTa N brain_fa heart_fa liver_fa I muscle_fa 488 4 n 33477 OKBps 33477 8KBps 33478 5KBps 33479 3KBps 33480 1 NA 08 05 02 ean Hwe 502 001 500 ean Rw Won 5 3 1 ONY 1 OUNO Figure 41 A region view of reads mapped to non overlapping genes o m ala moa K FQ fcnr11 44639703 44642151 RefSeq Transcripts a cuinne BE 2m E CD82 RefSeq Transcripts 1 unexplained_regions brain_fa heart_fa liver_fa muscle_fa Base Colors BAgsBCHCOTamN untitled 44639 7KBps 44640 3KBps 44640 9KBps 44641 5KBps 44642 2 1 Aam a While RefSeq was used to identify overlapping features the choice of which database to use will depend on the biological context of your experiment For example you may wish to utilize promoter or microRNA databases if you are interested in regulation of expression End of Tutorial This is the end of RNA Seq tutorial If you need additi
38. t_fa Tia ES Bam Profile liver_fa Bam Profile muscle_fa Cytoband hg19 Base Colors BABCHECBTa amp N Genomic Label 2 ATP1A1 CD101 CD2 FAM46C MAB21L3 MAN1A2 MIR320B1 MIR94 RefSeq Transcripts brain_fa 34008 22672 11336 0 ere heart_fa E E E 57696 Strand 38464 19232 Track height 0 Color liver_fa 224233 149489 74744 0 muscle_fa 90498 60332 30166 0 coN LETTE Gda lll Md al e a TT AA du hd 1 ela 0 0MBps 62 3MBps 124 6MBps 186 9MBps 249 3M a2e a igure 14 Visualizing reads on a chromosome level in the Genome View The panel on the left shows the ten tracks in the viewer The New Track button adds a new track into the viewer Figure 15 and the Remove Track button removes the selected track from the viewer Select an option Add an annotation track with genomic features from a selected annotation source Add a track from spreadsheet Select a spreadsheet then a track type The list of types depends on the content of the spreadsheet Add tracks from a list of samples Add profiles grouped by samples from every spreadsheet Add tracks from a list of spreadsheets Add one track of the default type from selected spreadsheets Add a track with the sequence of the reference genome Add a track with cytobands Other Advanced Figure 15 Adding a new track to the Genome Viewer The fo
39. ta both before reads and after RPKM normalization Samples are listed one per row with the normalized counts of the reads mapped to the genes in columns As shown in Figure 21 you can see that there are 18 971 columns representing 18 968 genes The normalization is by RPKM Mortazavi et al The gene_rpkm spreadsheet is particularly useful when you have biological replicates in your sample groups You may go to View gt Scatter Plot from the PGS toolbar to create a PCA plot and examine how your samples group together For a detailed introduction of PCA refer to Chapter 7 of the Partek User s Manual With replicates you would also be able to perform Differential expression analysis using ANOVA with the gene_rpkm spreadsheet a Selection 0 10 a ID n of aa wen T one ee A4GALT Alignments 1 brain_fa 10984232 Not musde 0 849542 0 922038 0 764269 24 8571 0 508895 1 39258 2 heart_fa 11199653 Musde 0 606717 0 249575 0 0 22 7591 106 411 0 20 1881 3 liver_fa 11583902 Not musde 433 096 14 4031 10 9251 2 340 16 591 183 0 0 825303 4 muscde_fa 14466541 Musde 0 273617 0 0 014783 0 297864 24 3486 0 15 5086 Rows 4 Cols 18971 py Figure 21 Viewing the RNA Seg_result gene rpkm spreadsheet The transcript_reads and transcript_rpkm spreadsheets As above PGS presents the transcript level data both before and after normalization The normalized count of sequencing reads are mapped to each transcript listed as one sample
40. the dialog in Figure 9 change Attribute name to Tissue e Rename Group_I to Muscle and Group_2 to NOT muscle e Drag the samples to the correct windows The setup is shown in Figure 9 Additional groups can be added as required using the New Group button e Select OK to proceed eoo Create categorical attribute pecifyjhaname gi the new attribute to be created Attribute name Tissue m Attribute groups Select and drag the samples from the Unassigned list on the left to the appropriate group on the right Use the shift and control keys to select multiple samples Unassigned Group Name Muscle 2 samples Remove Group heart_fa muscle_fa Group Name NOT muscle si 2 samples Remove Group brain_fa liver_fa New Group o canei Figure 9 Create categorical attribute e When asked if you wish to add another categorical attribute select No and if you wish to save the spreadsheet select Yes The attribute will now appear as a new column with the heading Tissue and the groups Muscle and NOT muscle Figure 10 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 7 urrent Selection Muscle Figure 10 Tissue a new categorical attribute has been added It is also important to ensure that the correct column is defined as the Sample ID This is particularly important when integration of data from different experiments is desired e From the mport secti
41. ublic mRNA sequences mRNAs from GenBank or RefSeq and single pass cDNA sequences from dbEST and Trace Ensembl Transcripts Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Ensembl Transcripts release 62 Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Based on Release 62 Download required Click OK to download the file a Manage available annotations GD Cam 9 lis Figure 39 Select the database to search for overlapping features The closest overlapping feature and the distance to it is now included Figure 40 in the unexplained_regions spreadsheet Analysis of RNA Seq Data with Partek Genomics Suite 6 6 28 Current Selection M i 2 3 4 5 f B 9 Chromosome Start stop sample ID Length Overlapping Nearest Distance to Features Feature Nearest Feature bps m O o foe penra e os noe ome C E C a 63 aBWH None Cine M606 een frenta fres aooe None fioe e e pes amp oos muse ta i 9 9 8 e_fa 24184077 24184166 28765 1 region starts ZNF726 77559 250151 in_fa ole o c amp po to k gt 9 9 k a fm a0 a5 ers ihe e o feo fheara 198 soo 80
42. ue The sum of squares for each factor rough estimate of variability within the groups Error is the variability in the data not explained by the factors included in the a eee ANOVA noise The F value is always set to ratio of noise to noise Note that in this tutorial the overall p value for the factor column 4 is the same as the p value for the linear contrast column 5 as there are only two levels within factor If we had more than two groups the overall p value and the linear contrast p values would most likely differ You can also see the symbol in the ratio fold change columns 6 and 7 for several genes that also have a low p value resulting from zero reads in one of the groups thus ratios and fold changes cannot be calculated FoldChange and FoldChange Description For more detailed examples of setting up the ANOVA including multiple factors and linear contrasts please refer to the gene expression tutorials Down s Syndrome and Breast Cancer available from Help gt On line Tutorials The unexplained_regions spreadsheet The contents of this sheet will be explained in more detail in Step 8 Step 4 Use the Genome Browser e Select the transcripts spreadsheet and then select Plot Chromosome View under the Visualization tab to view the analyzed results in Partek Figure 29 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 20 ADAP Partek Genome Viewer Tracks h Q chri 0 249250621 Ag Track

Analysis of RNA-Seq Data with Partek ® Genomics Suite™ 6.6

Contents

Download Pdf Manuals

Related Search

Related Contents

Analysis of RNA-Seq Data with Partek &reg; Genomics Suite&trade; 6.6

Contents

Download Pdf Manuals

Related Search

Related Contents

Analysis of RNA-Seq Data with Partek ® Genomics Suite™ 6.6