Home

Analysis of RNA-Seq Data with Partek® Genomics Suite® 6.6

1. Species Homo sapiens Genome Transcriptome reference Serpa the reads hg19 Figure 3 Viewing the Sequence Import wizard specify Output file and directory using Browse Species and Genome Configure the dialog as follows e Output file provide a name for the top level spreadsheet Use the Browse button to change the output directory e Species Select Homo sapiens from the drop down menu since this data is from human subjects e Genome Transcriptome reference used to align the reads Select the genome build against which your data was aligned to For this tutorial data please select hg 19 since the data was aligned to the reference genome hg19 e Select OK This will open the BAM Sample Manager dialog box Figure 4 The Bam Sample Manager dialog box shows the files to be imported The Manage sequence names option allows you to check or modify the chromosome name mapping not an issue for human samples but may cause problems with esoteric organisms if the chromosome names used by the aligner do not match the chromosome names in the genome annotations Samples may be removed from the experiment with Remove selected samples samples must first be selected by clicking on the row in the list of samples Analysis of RNA Seq Data with Partek Genomics Suite 6 6 4 Otnsmpetinge omen ne igure 4 BAM Sample Manager window In this tutorial the individual file names are short but in some cases the names may
2. W Import Import and manage samples Add sample attributes Choose sample ID column w Qac Alignments per read W Analyze Known Genes mRNA quantification Differential expression analysis Alternative splicing analysis Create gene list Order TagqMan Assays W Allele Specific Analysis Detect Single Nucleotide Variations wW Visualization Plot chromosome view Cluster based on significant genes wW Biological Interpretation Gene set analysis Pathway analysis Related Analyze a Partek Flow project Figure 1 The RNA Seqg workflow Analysis of RNA Seq Data with Partek Genomics Suite 6 6 2 The RNA Seq workflow will be used throughout this tutorial to analyze RNA Seq data These and other commands for analyzing RNA Seq data are also available from the command toolbar Step 1 Importing the aligned reads Partek Genomics Suite software can import next generation sequencing data that 1s already aligned to a reference genome The data used for this tutorial was already aligned using the Partek Flow software The sequence importer can handle the two standard alignment formats BAM and SAM Conversion from ELAND txt files to BAM files is available via the Tools menu Also note that if a quantification project has been created in Partek Flow this project can also be imported and analyzed in this scenario invoke the workflow from the very bottom of the standard RNA Seqg workflow Figure 1 Related Analyze a Partek Fl
3. 135234 14082 4 6848 84 5728 38 10 SS Tissue 0 301446 362 263 73 2804 0 418199 11 9366 0 326635 19 7077 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 11 12 SS Error F Error 4 458 14e 1 0 0059778 1 0 0104074 1 0 0001221 1 0 003491 1 0 0001140 1 0 0077363 1 T he ANO VA results sheet showing p value Mean Ratio and Fold Change for The indicates that values could not be calculated one group has no reads 24 The format of the ANOVA spreadsheet is similar for all workflows The description of each column may be found in Table 2 Table 2 Interpretation of ANOVA results The specific p value for the linear contrast of candidate levels within the factor muscle in this example there are only two levels The ratio of the mean values of RPKMs of the two groups selected in the Mean Ratio contrast where Group 2 is the denominator reference numbers less than one imply down regulation FoldChange and Linear fold change with 0 indicating NO change negative numbers indicating l down regulation and positive numbers indicating up regulation for each FoldChange Description ee contrast defined and a text description l The F statistic essentially the ratio of signal to noise high F value low p F Tissue value for each factor SS Tissue The sum of squares for each factor rough estimate of variability within the groups Error is the variability in the data not explained by the facto
4. It is also possible to zoom directly to a specific gene of interest using Partek Genome Viewer For example type MYC into the search box at the top of the window and the viewer will show just the MYC gene in the RefSeq track and the aligned reads Figure 16 A To return to the whole chromosome view simply select The following is a detailed description of each track in the default view RefSeq Transcripts The RefSeq Transcripts track shows all genes encoded on the forward strand of chromosome 1 This experiment uses RefSeqGene which defines genomic sequences of well characterized genes to be used as the reference annotation track Mouse over a particular region in this track and all genes within this region are shown in the information bar visible in the top right of Figure 14 Zoom into this track to see individual genes including alternative isoforms Zooming in on one track automatically zooms all other visible tracks thus you can now see the reads that mapped to this particular gene across all samples RefSeq Transcripts The RefSeq Transcripts track shows all the genes encoded on the reverse strand of chromosome 1 Legend Base Colors This track indicates the color coding for individual bases Although included in the default view the individual bases are only visible once zoomed into a region of interest Bam Profile muscle_fa brain_fa heart_fa and liver_fa These tracks show all the reads that ma
5. 11583902 77 2811 2 00499 4 31659 16 3974 4 musde_fa 14486541 Musde 14486541 69 0141 2 31254 5 51856 23 1548 Figure 20 Viewing the RNA Seq_results read summary spreadsheet The gene_reads and gene_rpkm spreadsheets Partek Genomics Suite software presents the gene level data both before reads and after RPKM normalization Samples are listed one per row with the normalized counts of the reads mapped to the genes in columns As shown in Figure 21 you can see that there are 19 969 columns representing 19 966 genes If you see a different number of columns genes this is because a different annotation database was selected in the RNA Seg Analysis of RNA Seq Data with Partek Genomics Suite 6 6 18 Quantification dialog box Figure 18 The normalization is by RPKM Mortazavi et al 2008 Using different mRNA database will result in different number of genes The gene_rpkm spreadsheet is particularly useful when you have biological replicates in your sample groups You may go to View gt Scatter Plot from the toolbar to create a PCA plot and examine how your samples group together For a detailed introduction of PCA please refer to Chapter 7 of the Partek User s Manual With replicates you would also be able to perform Differential expression analysis using ANOVA with the gene _rpkm spreadsheet Current Selection brain_fa 8 oe ID kes of i ae ae 1 A2M AS1 Alignments 1 10984232 NOTmusde 0
6. Genomics Suite 6 6 28 f Configure Criteria em Data source Name Diff Exp Spreadsheet lftranscripts RNA Seq_results transcripts Column 8 p value DiffExpr Configure criteria Indude p values significant with FDR of Figure 33 Configure Criteria dialog box e Repeat the same steps to create a list of transcripts likely alternatively spliced using the same FDR cutoff and the AltSplice p value column Please name it as Alt Splice e Select both lists in the right panel under Criteria while holding the Ctrl button on your keyboard and then select Intersection from the left pane of the List Manager Select OK A list of 17 279 genes will be generated that includes all the genes that are both significantly differentially expressed and alternatively spliced among the four tissue samples e Under Manage Criteria select Save List e Please check the box for the intersection of spreadsheet Diff exp and Alt splice and select OK This list will now be available when you Close the List Manager Figure 34 ty Partek Genomics Suite 1 Diff_Exp_and_Alt_Splice Diff Exp and Alt File Edit Transform View Stat Filter Tools Window Cust Dae eX Ethie AW Bie HC 1 MyRNASeqProject Alignment_Counts MyRNASegProject_alignments_per_read txt Diff_Exp_and_Alt_Splice Diff Exp and Alt Splice txt exon_reads RNA Seq_results exon reads exon_rpkm RNA Seq_results exon rpkm gene_reads RNA Seq_results gene reads gene_rpkm RNA Seq_res
7. Partek Genomics Suite 6 6 2I List Manager Lo j List Spreadsheet Name A List Info Venn Diagram ANOVA Streamlined Advanced J A 1 MyRNASegProject Specify criteria using the buttons in the left panel Use shift or control to select multiple criteria Click Save to B 1 Alignment_Counts MyRN generate a spreadsheet from the selected criteria C 1fexon_reads RNA Seq_res Specify criteria Criteria D 1 exon_rpkm RNA Seq_resi Specify New Criteria 2 E i gene_reads RNA Seq_res Combine criteria F i gene_rpkm RNA Seq_resi G ifgene_rpkm ANOVA 1way amadan p H 1 mapping_summary RNA 0 Intersection And 2 I i transcript_reads RNA Seq 1 transcript_rpkm RNA Seq J K 1 transcripts RNA Seq_re 2 L 1 funexplained_regions RNA 2 2 Manage criteria S e Select the Advanced tab e Select the Specify New Criteria button e Inthe Configure Criteria dialog box Figure 33 provide a name for the list e g Diff Exp e Select the transcripts spreadsheet and the p value DiffExp column e Set Include p values significant with FDR of 0 05 e A list of 30 305 transcripts that pass this criteria will be generated If the settings are changed this list will automatically update Try changing the FDR threshold to 0 01 and observe the number of transcripts change Change it back 0 05 again e Select OK Analysis of RNA Seq Data with Partek
8. Partek Genomics Suite is Reads Per Kilobase of exon model per Million mapped reads RPKM Mortazavi et al 2008 e Select mRNA quantification from the Analyze Known Genes section of workflow The RNA Seg Quantification dialog box shown in Figure 17 will appear Your choices for these options depend on the aims of your experiment In the Configure the test section Figure 17 you are asked about Strand specificity Your answer depends on the method used for sample preparation as some methods preserve the strand information of the original transcript and some do not A directional mRNA Seq sample prep protocol only synthesizes the first strand of cDNA whereas other methods reverse transcribe the mRNA into double stranded cDNA In the latter case the sequencer reads sequences from both the forward and reverse strands but does not discriminate between them When strand information is preserved it is possible for paired end sequences to come from a combination of the forward and reverse strands If in doubt select Auto detect from the drop down list e Select No from the Strand specificity drop down list because the library preparation method did not preserve the strand information Analysis of RNA Seq Data with Partek Genomics Suite 6 6 14 The dialog also asks if you would like the intronic reads to be compatible with the gene in the gene level result spreadsheets By selecting Yes reads that might correspond to new or extended exons will
9. Seq Data with Partek Genomics Suite 6 6 34 e Select row 45 and Browse to location to show a region within an intron of UNC45B Figure 42 left panel This may be a novel exon e Select row 10482 and Browse to location to show a region that starts 1 bp after CD82 Figure 42 right panel This peak may represent an extended exon Vy chr17 33476009 33480479 7Q he Q dhr11 44636577 44646913 0 RefSeq Transcripts 2014 01 03 RefSeq Transcripts 2014 01 03 Mi R eo 0 40 E SS a EE UNC45B cD82 RefSeq Transcripts 2014 01 03 RefSeq Transcripts 2014 01 03 1 unexplained_regions RNA Seq_results unexplained regions 1 unexplained_regions RNA Seq_results unexplained regions p a heart_fa n heart_fa e en n T muscle_fa muscle_fa Base Colors BACH GCHTMN Base Colors BABCHGHTMN brain_fa brain_fa E ee EE eee heart_fa heart_fa S EE r OO SS liver_fa liver_fa 0 0 i a muscle_fa muscle_fa 0 E A E I chr17 chr11 44639 2KBps 44641 7KBps 44644 3KBps 44646 9KBps 33476 0KBps 33477 1KBps 33478 2KBps 33479 4KBps 33480 5KBps 44636 6KBps 4 J AAA gal 4 J A Ja Figure 42 A region view of reads mapped to non overlapping genes While RefSeq was used to identify overlapping features the choice of which database to use will depend on the biological context of your experiment For example yo
10. To change the properties of column 1 please right click on the column header to invoke the contextual menu and then select Properties In the resulting dialog Figure 7 please set Type to categorical Attribute to factor and then select OK Properties of Column 1 in Spreadsheet 1 ea String Size 15 Random Effect Figure 7 Setting column properties The sample names in column 1 will now be black to denote a categorical variable Next we will add attributes for grouping the data 1 e into replicates or sample groups From the workflow select Add sample attribute then select the Add a categorical attribute option and then select OK Figure 8 f Add Sample Attributes Specify type Select a type of sample attribute to add to the spreadsheet Add attributes from an existing column Add a categorical attribute Add a numeric attribute Figure 8 Adding sample attributes to describe or group samples by categories Analysis of RNA Seq Data with Partek Genomics Suite 6 6 7 In this tutorial we have four samples from different tissues but to illustrate the statistical analysis options later in the workflow we will group the tissues into two groups muscle muscle and heart samples and NOT muscle liver and brain samples These two groups will then be compared at a later step e In the Create categorical attribute dialog box Figure 9 change Attribute name to Tissue e Rename Group 1 to Muscle a
11. dialog shown in Figure 36 allows you to select either Fisher s Exact test or the Chi Square test The Chi Square test is faster than the Fisher s Exact test but is less accurate for sparse data The defaults for the rest of the options are acceptable Be sure to check Invoke gene ontology browser on the result Select Next r 4 Gene Set Analysis X E y Configure the parameters of the test Use Fisher s Exact test Use Chi Square test v Invoke gene ontology browser on the result Restrict analysis to functional groups with more than 2 genes Restrict analysis to functional groups with fewer than genes 2 Result File GO Enrichment txt Browse s Gena Figure 36 GO Enrichment options e Select Default mapping file and then select Next e A GO Enrichment spreadsheet as well as a browser Figure 37 will be generated with the enrichment score shown for each GO term Analysis of RNA Seq Data with Partek Genomics Suite 6 6 30 e Browse through the results to find a functional group of interest by examining the enrichment scores The higher the enrichment score the more overrepresented this functional group 1s in the input gene list Alternatively you may use the Interactive filter on the GO Enrichment spreadsheet to identify functional groups that have low p values and perhaps a higher percentage of genes in the group that are present File Window Search Functional Group molecular_function
12. needed e Select Close to proceed If the files are being imported for the first time they will be sorted in order to quickly visualize and analyze the data The imported data will appear in a spreadsheet Figure 6 Each sample is listed in a row with the number of alignments displayed for each sample You may wish to add the samples to a separate experiment To do so select Import and manage samples from the workflow again There you will see the BAM Sample Manager dialog box Figure 4 but this time there will be an additional Add new experiment button If you press this button you will see the Sequence Import dialogue box Figure 2 where you can add samples to a new spreadsheet For this tutorial there is no need to create a new experiment so press Cancel on in the Sequence Import dialogue box and return to the spreadsheet with the imported BAM files Figure 6 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 6 Partek Genomics Suite 1 MyRNASeqProject File Edit Transform View Stat Filter Tools Window Custom Help Dae eX Kkihkt S SLD Mie AQ 1 MyRNASeqProject a Current Selection brain_fa vo Number of 10984232 11199653 11583902 14486541 Figure 6 Viewing the imported data in a spreadsheet Notice that the Sample ID names in column are gray denoting a text field but we want to change these such that the sample names are a categorical factor that can be used in the downstream analysis
13. no replicates Figure 24 and Figure 25 Each row lists a separate transcript A description of each column can be found in Table 1 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 20 Table 1 Column Descriptions in the transcripts spreadsheet Column Label Chromosome Start Stop Genomic location of the transcript Information about the transcript and gene symbol and indicate the transcript is transcribed from the forward or Strand Transcript Gene reverse strand respectively transcript name is determined by the transcriptome database selected NCBI mRNA ID in this case In the absence of replicates differences in expression between samples are detected using a log likelihood test at the transcript level To sort by ascending p value right click the column header and select Sort Ascending This will show the most significant differentially expressed transcript at the top of the spreadsheet In the absence of replicates splice variants are detected using a Pearson chi square test at the gene level Sorted by ascending chisq and p value AltSplice p value shows genes with the greatest probability of expressing different RNA isoforms between samples at the top of the spreadsheet Transcript Length Read counts raw of each Number of reads directly from the aligned data assigned to the sample transcript for each sample Read counts RPKM of each Raw read counts normalized by RPKM Mortazavi et al 2008 sample assigned to
14. specify 5 as the number of reads Lastly the option Report exon level results determines whether you would like to get the exon level expression If checked you will get the exon reads and exon rpkm spreadsheet Analysis of RNA Seq Data with Partek Genomics Suite 6 6 15 Specify a database of genomic features to quantify Configure the test Strand specificity No y In the gene4evel result report intronic reads as compatible with the gene C Yes No Require strict paired end compatibility Yes No Report results with no reads from any sample C Yes No Report unexplained regions with more than C reads Report exonevel results Result file RNA Seq_results Describe results E3 Figure 17 RNA Seq Quantification options e In this tutorial you will use an mRNA database RefSeq Expand the mRNA section and then select RefSeq Transcripts 2014 01 03 If you would like to get the exon level results please ensure that you check the Report exon level results In this tutorial we select to report exon level results Figure 18 OK to continue Analysis of RNA Seq Data with Partek Genomics Suite 6 6 16 Specify a database of genomic features to quantify well annotated set of sequences induding genomic DNA transcripts and proteins Download required Click OK to download the file RefSeq Transcripts 2014 01 03 The Reference Sequence RefSeq collection aims to provide a comprehensive integrat
15. the right as shown in Figure 27 f ANOVA of Spreadsheet 1 gene_rpkm St Experimental Factor s 1 Sample ID 2 Number of Alignments Add Interaction gt lt Remove Factor UZ Seeds Loses Response Variable s All Response Variables V Specify Output File C Partek Training Data 4TissueBamFiles ANOVAResults ox cancel Apply Q o Figure 27 The ANOVA dialog If the ANOVA were now performed without contrasts a p value for differential expression would be calculated but would only indicate 1f there are differences within the factor Tissue it will not inform you which groups are different or give any information on the magnitude of the change between groups fold change or ratio To get this more specific information you need to define linear contrasts e Select the Contrast button circled in Figure 27 For Select Factor Interaction Tissue will be the only factor available as it was the only factor included in the ANOVA model in the previous step if multiple factors were included they could be selected here top red circle Analysis of RNA Seq Data with Partek Genomics Suite 6 6 23 e To define a contrast between the two candidate levels Muscle and NOT muscle select and move them as shown in Figure 28 The bottom group 1s the reference or control group By defining this contrast you will produce a specific p value and a fold change with the reference group as the denominator in the lin
16. the transcript for each sample If the aligner that was used to align this data supported junction reads these columns one per sample will show the normalized RPKM reads assigned to junction reads for this transcript Paired reads only normalized unique paired reads RPKM that intersect with the transcript but are considered chisq and p value DiffExpr Junction RPKM Incompatible RPKM l l incompatible because the mate is not found in the same 1 2 3 4 5 s transcr ipt 8 9 10 11 12 13 14 15 Chroma Start Stop Strand Transcript Sene chisq DiffEx p value Dif chisq AltSpli p value Alt Transcript brain_fa heart_fa liver_fa musde_fa some pr fExpr ce Splice Length Reads Reads Reads Reads evs 14362 29371 l NR_024540 WASH7P 2 40653 0 49242 1769 l0 0 0 1 a1 134773 140567 l NR_039983 LOc729737 4 8096 0 186282 5474 lo 1 2 0 3 p 323892 328582 NR_028327 1 LOC100133331 1 49337 0 6838 2 37299 0 882403 4273 lo 0 0 600308 0 600308 4 1 323892 328582 NR_028322 1 RP4 669L17 10 0 497153 0 919516 1 19908 0 97693 4370 lo 0 0 199846 0 199846 p 323892 328582 NR_028325 RP4 669L17 10 0 497153 0 919516 1 19908 0 97693 4370 0 0 0 199846 0 199846 6 1 567705 567794 NR_106781 MIR6723 22 0112 6 48835e C 89 5 25 7 8 rome pl 661139 665732 NR_028327 2 LOC100133331 0 726237 0 867016 2 37299 0 882403 4273 4 2 3 4 8 1 700245 714069 l NR _0339 9 4 29992 0 2
17. to download the file RefSeq Transcripts 2014 01 03 The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non redundant well annotated set of sequences including genomic DNA transcripts and proteins P Genomic Variants gt miRNA Figure 40 Select the database to search for overlapping features The closest overlapping feature and the distance to it is now included Figure 41 in the unexplained_regions spreadsheet Analysis of RNA Seq Data with Partek Genomics Suite 6 6 33 Current Selection 17 L 2 x 4 5 6 7 9 Chromosome Start Stop Sample ID Length Average Overlapping Nearest Distance to Coverage Features Feature Nearest Feature bps 1 M 9030 9108 heart_fa 78 50019 8 None 0 eA M 9138 9201 heart_fa 63 45599 8 None None 0 3 M 6336 6531 heart_fa 195 40096 None None 0 4 17 42075077 42075149 musde_fa 72 38899 3 ntron of PYY P 0 5 19 24184074 24184165 heart_fa 91 38736 7 region is 32042 ZNF254 32042 6 M 9138 9201 musde_fa 63 34835 1 None lone 0 L 9030 9108 musde_fa 78 34278 None None 0 8 M 8625 8751 heart_fa 126 31446 lone lone 0 9 19 24184077 24184166 musde_fa 89 28765 1 region is 32041 ZNF254 32041 10 M 9030 9108 brain _fa 78 25015 7 None lone 0 11 M 9138 9201 liver_fa 63 23882 None None 0 12 M 1680 1803 heart_fa 123 22909 7 None None 0 13 M 9138 9224 brain_fa 86 221115 None None 0 14 M 8375 8613 heart_fa 238 212
18. 00969787 0 00315682 0 00315682 i Rows Cols 39542 4 ru Figure 22 Viewing the RNA Seq_result transcript rpkm spreadsheet Analysis of RNA Seq Data with Partek Genomics Suite 6 6 19 The exon_reads and exon_rpkm spreadsheets Further level of detail is to take a look at the exon level data presented as raw and normalized using RPKM just like the gene level and the transcript level data The normalized count of sequencing reads are mapped to each exon listed as one sample per row with transcript IDs in columns As shown in Figure 23 you can see that there are 244 532 columns representing 244 529 annotated exons If you have biological replicates in your sample groups and want to do differential expression on the exon level this is the spreadsheet to you would use The same analysis PCA and ANOVA can be done on this spreadsheet as described above in the RNA Seq_result gene rpkm spreadsheet Current Selection brain_fa iw E ewe fe age eee ae ee eal ee ee Alignments 28 gt DDX11L1 22 gt DDX11L1 10 gt DDx11L1 30 gt WASH7P 39 gt WASH7P 48 gt WASH7P 1 brainfd 10984232 NOTmusde 0 0 0 0 0 0 2 heart_fa 11199653 Musde 0 0 0 0 0 0 3 liver_fa 11583902 NOTmusde o 0 0 0 0 0 4 musde_fa 14486541 Musde 0 0 0 0 0 0 454142 Rows 4 Cols 244532 lt m Figure 23 Viewing the RNA Seq_result exon rpkm spreadsheet The transcripts spreadsheet The transcripts spreadsheet details the analysis results of RNA Seq if there
19. 3 hg19 RefSeq Transcripts 2014 01 03 hg 19 Isoform proportion 1 transcripts New Track Remove Track Profile Color Labels Gene aan n K Q chr1 0 249250621 0 RefSeq Transcripts 2014 01 03 RefSeq Transcripts 2014 01 03 Isoform Proportion W brain_fa heart_fa liver_fa muscle_fa E s L Base Colors E Am CHGMEMTHN brain_fa 04 i EERE SS heart_fa Ot pame I SS N liver_fa 0 a muscle_fa E QUI Dt I DUDE C al i al Ld a al o ATT AT Ld Gal al l al Ll al gt 0 0MBps 62 3MBps 124 6MBps 186 9MBps 249 3MBps bi aal Figure 30 Viewing the transcript results in the Genome Viewer The viewer is almost identical to that seen after importing the aligned reads the difference is the inclusion of the soform Proportions track highlighted in the Tracks list in Figure 30 Next you are going to view a single gene SLC25A3 to understand how the Partek Genomics Viewer can help you visualize differential expression and alternative splicing results e Type SLC25A3 in the Search bar at the top of the window and hit Enter The browser will browse to the gene Figure 31 e The muscle brain heart liver and genomic label tracks were described earlier in the tutorial Here the focus is on the soform Proportion track to explain how those color coded tracks help to visualize differential expressio
20. 31329 1371 0 0 2 1 Figure 24 Viewing the first part of the RNA Seq_result transcripts spreadsheet Analysis of RNA Seq Data with Partek Genomics Suite 6 6 21 16 17 18 19 20 21 22 23 24 rn 26 27 brain_fa heart_fa liver_fa musde_fa brain_fa heart_fa liver_fa muscle_fa brain_fa heart_fa liver_fa musde_fa RPKM RPKM RPKM RPKM unction Qunction Gunction Junction Incompat Incompati Incompati Incompatibl RPKM RPKM RPKM RPKM ble RPKM ble RPKM ble RPKM e RPKM L 0 0 0 0 0390218 0 0 2 0 0 0163114 0 031540 0 0 0 0 0 3 0 0 0 012127 0 00969787 0 0 0 0 4 0 0 0 003947 0 00315682 0 0 0 0 5 0 0 0 003947 0 00315682 0 0 0 0 6 5 11458 25 081 6 78974 6 20491 0 0 q lo 0 0 0 0 7 0 0852231 0 0417919 0 060608 0 0646193 0 0 0 0 0 0 0 0 8 0 0 0 125932 0 0503498 0 0 0 0 Figure 25 Viewing the second part of the RNA Seq_result transcripts spreadsheet It is possible to derive basic information about differential and alternative splicing between your samples if you don t have replicates from the RNA Seq_result transcripts spreadsheet using a simple chi squared or log likelihood tests since each sample is represented only once and the null hypothesis is that the transcripts are evenly distributed across all samples However the power of Partek Genomics Suite software resides in the implementation of a mixed model ANOVA that can ha
21. 32 1 None None 0 Figure 41 The unexplained_regions spreadsheet showing regions m to the closest genomic features The description of each column in the unexplained peaks spreadsheet is shown in Table 3 Table 3 Description of annotated columns in unexplained regions Column Label Label Deseription h start Genomic location of the region containing the reads SampleID SampleID Sample that contains the reads mapped to this region Sample that contains the reads mapped to this region contains the reads mapped to this region Length of the region in base pairs Average Coverage Average read coverage in the region Section of the nearest gene that overlaps with the region intron starts before or after a gene region contained in a gene gene contained within the region region overlaps with a gene Nearest Feature Name ofthe nearest feature and strand or Distance of the detected region to the closest gene If the detected region is mapped to the intron of a gene the distance is shown as 0 Overlapping Features Distance to Nearest Feature bps Right clicking on a row header and selecting Browse to Location will show the reads mapped to the chromosome For this tutorial a couple of genes are selected to show regions that are located after a known gene or in the intron of a gene e With the unexplained regions spreadsheet open right click on Average Coverage column 6 and select Sort Descending Analysis of RNA
22. 849454 0 928112 23 8571 1 19355 2 Ra 11199653 Musde 0 606717 0 250041 o 106 411 1 41757 3 liver_fa 11583902 NOTmusde 438 696 14 4031 10 9251 551 183 27 7707 4 musde_fa 14486541 Musde 0 273617 o 0 014783 24 3486 0 244825 Rows Cols 19969 4 ri Figure 21 Viewing the RNA Seq_result gene rpkm spreadsheet The transcript_reads and transcript_rpkm spreadsheets As above Partek Genomics Suite software presents the transcript level data both before and after normalization The normalized count of sequencing reads are mapped to each transcript listed as one sample per row with transcript IDs in columns As shown in Figure 22 you can see that there are 39 542 columns representing 39 539 annotated transcripts The normalization is by RPKM Mortazavi et al 2008 If you have biological replicates in your sample groups and want to do differential expression on the transcript level this is the spreadsheet to you would use Similarly if you have biological replicates in your sample groups and want to perform Alternative splicing analysis this 1s the spreadsheet to use as input The same analysis PCA and ANOVA can be done on this spreadsheet as described above in the RNA Seg_result gene rpkm spreadsheet Current Selection brain_fa 1 brain brain fa 10984232 NOT muscle 0 2 m 11199653 Musde 0 0 0163114 0 0 0 3 liver_fa 11583902 NOT musde 0 0 0315406 0 0121279 0 00394784 0 00394784 4 musde_fa 14486541 Musde 0 0390218 0 0
23. Analysis of RNA Seq Data with Partek Genomics Suite 6 6 Software Overview RNA Seq is a high throughput sequencing technology used to generate information about a sample s RNA content Partek Genomics Suite offers convenient visualization and analysis of the high volumes of data generated by RNA Seq experiments This tutorial will illustrate how to Import large next gen data sets e Add attribute data to your files e Visualize large next gen data sets e Obtain read counts for each of the transcripts in a database e Find transcripts that are differentially expressed among phenotypes e Find genes that are alternatively spliced among phenotypes e Set up a basic analysis of variance ANOVA model e Detect nucleotide variations across samples or comparing to reference genome Find nonannotated regions and map it to the genome Note the workflow described below is enabled in Partek Genomics Suite version 6 6 software Please fill out the form at www partek com PartekSupport to request this version or use the Help gt Check for Updates command to check whether you have the latest released version The screenshots shown within this tutorial may vary across platforms and across different versions of Partek Genomics Suite Description of the Data Set In this tutorial you will analyze an RNA Seq experiment using the Partek Genomics Suite software RNA Seq workflow The data used in this tutorial was generated from mRNA extracted fro
24. amp biological_process 61 60 7 cellular process metabolic process 7 cellular component organization o single organism process 7 localization F biological regulation developmental process multi organism process immune system process growth 7 locomotion rhythmic process 7 biological adhesion response to stimulus 7 behavior hormone secretion 7 cell aggregation biological phase 4 cell killing F signaling 7 reproductive process v Pie Chart Bar Chart Gene List Forest Plot multicellular organismal process 0 01 reproduction 0 01 reproductive process 0 12 signaling 0 16 cell killing 0 20 biological phase 0 28 cell aggregation 0 62 hormone secretion 0 70 behavior 1 99 f response to stimulus 3 75 biological adhesion 3 91 rhythmic process 5 33 locomotion 5 44 growth 6 17 immune system process 7 60 Enrichment Score reproduction 0 01 multi organism process 8 29 m as oa process 02 developmental process 12 88 os aa biological regulation 29 35 localization 35 00 single organism process 61 91 cellular component organization or biogenesis 77 37 metabolic process 98 09 cellular process 150 47 ef Figure 37 Viewing the Gene Ontology Browser Step 7 Detect Single Nucleotide Variations To learn how to detect nucleotide differences and annotate SNP calls with overlapping genes refer to the Analysis of Nucleot
25. any firewall proxy settings and save it to your default library location You may also notice a dialog flash as the cytoband file is automatically downloaded The Partek Genome Viewer window will open with chromosome 1 displayed Figure 14 You may choose other chromosomes from the drop down menu circled to change which chromosome is displayed You may also type a search term directly into the circled box e g gene symbol or transcript ID Analysis of RNA Seq Data with Partek Genomics Suite 6 6 11 f Partek Genome Viewer File View Window Tracks A Q chriz0 249250621 Me Tracks E RefSeq Transcripts 2014 01 03 hg19 RefSeq Transcripts 2014 01 03 RefSeq Transcripts 2014 01 03 hg19 Legend Base Cols RefSeq Transcripts 2014 01 03 Bam Profile brain_fa Bam Profile heart_fa Bam Profile liver_fa Base Colors BABCOGCBTAN Bam Profile muscle_fa brain f rain_fa Genome Sequence hg19 Cytoband hg19 17097 Genomic Label 3 o gt heart_fa m 0 Track Labels Strand v liver_fa iha g 140736 o muscle_fa 60207 l o hg19 Reference e CUNI Wl f DUDI f Dt Dof Kl al Cl Ld il gt lt ad ACT AD D Ld all a l O 0 0MBps 62 3MBps 124 6MBps 186 9MBps 249 3MBps Appin Reset bi aa a Figure 14 Visualizing reads on a chromosome level in the Genome View The panel on the left shows th
26. be considered part of the gene in the gene level spreadsheets and the RPKM calculation will include the intron length in the transcript length For Jn the gene level result report intronic reads as compatible with the gene select No You might want to Require strict paired end compatibility meaning that two alignments from the same read must map to the same transcript to be considered compatible If you select No then a paired end read will be compatible with any transcript it overlaps e As the data set used for this tutorial consists of single end reads select No for Require strict paired end compatibility The next option Report results with no reads from any sample determines whether the results spreadsheets include transcripts or genes that have no reads from any sample Selecting Yes will include all the genes transcripts in the transcriptome even if there are no reads for that transcript gene from any sample e As this tutorial is not concerned with genes that are not expressed in all samples select No for Report results with no reads from any sample The option Report unexplained regions with more than ___ reads determines 1f reads that are mapped to the genome but not to any transcript are reported on a separate spreadsheet unexplained_regions If checked you must also specify the minimum number of reads that must be present before the region 1s reported e Make sure Report unexplained regions with more than reads is checked and
27. be much longer Assigning shorter or more informative names will lead to clearer labels legends later in the workflow To change the names select Manage samples to invoke the Assign files to samples dialog box Figure 5 The path to the file is shown and the Sample ID is the filename by default In this example change the first sample to Brain circled area Analysis of RNA Seq Data with Partek Genomics Suite 6 6 5 r i3 Assign files to samples Samples Unassigned files Drag and drop files to group them into samples Use the shift and control keys t Sample ID Brain 1 files Remove Sample C Partek Training Data 4TissueBamFiles brajpn_ferBa a Sample ID heart_fa 1 files Remove Sample C Partek Training Data 4TissueBamFiles heart_fa bam a Sample ID liver_fa 1 files Remove Sample C Partek Training Data 4TissueBamFiles iver_fa bam Sample ID musde_fa 1 files Remove Sample C Partek Training Data 4TissueBamFiles muscle_fa bam lal Figure 5 Using the Assign files to samples dialog to give informative sample names e For this tutorial the default names are suitable so select Cancel to proceed However if you have data from one sample which is split into two or more BAM files you can also use Manage samples to merge these reads into one sample Additionally you may use Add samples or Remove selected samples if
28. e ten tracks in the viewer The New Track button allows addition of a new track into the viewer Figure 15 and the Remove Track button removes the selected track from the viewer P TrackWizard Select an option Add an annotation track with ETTET SEES EEE E EH ETEEEEHEEEEEESEEEEEEEEEEEEEEEEEREEEEEEEEEEEEEEEEE EEE SHEET EE EEEEEEE SEER SESE EEEEEEEEEEEEEE SEES EE EE EEE SEER SEES ERSTE ERE R REED C Add a track from spreadsheet 1 MyRNASegProject Select a spreadsheet then a track type The list of types depends on the content of the spreadsheet Add tracks from a list of samples Add profiles grouped by samples from every spreadsheet Add tracks from a list of spreadsheets Add one track of the default type from selected spreadsheets Add a track with the sequence of the reference genome C Add a track with cytobands Other Advanced Figure 15 Adding a new track to the Genome Viewer Using the Genome Viewer In the Genome Viewer select _ for selection mode and Es for navigation mode In navigation mode to zoom in on a region of interest left click and draw a box to zoom in on any region for a detailed view of the reads mapped to this region Alternatively in the bottom right hand corner of the browser drag the slider left to right or click the plus button to zoom in Drag right to left or click the minus button to zoom back out aje q Analysis of RNA Seq Data with Partek Genomics Suite 6 6 12
29. ear fold change calculation e Select Add Contrast circled e Select OK to return to the ANOVA dialog and select OK again to perform the ANOVA CP Configure of Spreadsheet 1 gene_rpkm Select Factor Interaction Co Candidate Level s Muscle NOT muscle Other Statistics Label Muscle Add Contrast Level gt lt Remove Contrast Level lt Remove Contrast Level Estimate Frato Tstatistic 2 Figure 28 Define linear contrasts v m Data is already log transformed Yes Base 2 0 No Once the ANOVA has been performed on each gene in the dataset an ANOVA child spreadsheet ANOVA Iway ANOVAResults will appear under the gene rpkm spreadsheet Figure 29 _Current Selection 6736 1 2 Column Colu 18854 16963 10745 Pa dd el od dl Oo Figure 29 each gene 18734 p JASI LAJA 4 a 15011 SCARAS 17680 TMEM mn ID GENE gt ymbol 7 3945e 006 8 25054e 006 7 10032e 005 0 000145978 0 000 14654 0 000 174524 0 000196221 7 3945e 006 8 25054e 006 7 10032e 005 0 000145978 0 000 14654 0 000174524 0 000196221 0 136816 26 873 0 243243 0 368362 sde vs NOT musde 7 30908 26 873 4 11112 2 71472 No change Musde down vs 121203 s Musde down vs 6822 59 No change Musde down vs 5094 81 et issue aoe ee uate FoldChange u vs NOT muscle de vs NOT muscle meie Description
30. ed non redundant well annotated set of sequences including genomic DNA transcripts and proteins C RefSeq Transcripts 2014 04 29 The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non redundant Y lt gt Manage available annotations Configure the test Strand specificity No Y In the gene4evel result report intronic reads as compatible with the gene C Yes No Require strict paired end compatibility Yes No Report results with no reads from any sample C Yes No Report unexplained regions with more than 5 Result fle RNA Seq_results Describe results Figure 18 Specifying a transcriptome database Reads will now be assigned to individual transcripts of a gene based on the Expectation Maximization E M algorithm Xing et al 2006 In Partek Genomics Suite software the E M algorithm is modified to accept paired end reads junction aligned reads and multiple aligned reads if these are present in your data For a detailed description of the E M algorithm refer to the RNA Seq white paper Help gt On line Tutorials gt White Papers e Select OK to perform the RNA Seq analysis which will generate a mapping summary spreadsheet and several other spreadsheets containing the analyzed results Figure 19 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 17 amp 1 MyRNASeoProject Alignment_Counts MyRNASegProject_alignments_per_read txt
31. enome This is dependent on the options used during the alignment process For other data and alignment options one might observe more than one alignment per read In column 2 there are single end reads with zero alignments per read reported because BAM files also contain all the reads that were not aligned during the alignment process Step 2 Visualization Once imported it is possible to visualize the mapped reads along with gene annotation information and cytobands e Select the parent spreadsheet MyRNASeqgProject in this example e Select Plot Chromosome View in the Visualization section of the workflow Unless you have previously downloaded an annotation file during another experiment you will be prompted to select an annotation source Figure 13 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 10 No default annotation You don t appear to have a recognized default annotation database for this genome build hg 19 Select an annotation source The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non redundant well annotated set of sequences induding genomic DNA transcripts and proteins C Ensembl Transcripts release 72 Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Built from ftp ftp ensembl org pub release 72 gtffhomo_sapiens Ho
32. exon_reads RNA Seq_results exon reads exon_rpkm RNA Seq_results exon rpkm gene_reads RNA Seq_results gene reads gene_rpkm RNA Seq_results gene rpkm mapping_summary RNA Seq_results read summary transcript_reads RNA Seq_results transcript reads transcript_rpkm RNA Seq_results transcript rpkm transcripts RNA Seq_results transcripts unexplained_regions RNA Seq_results unexplained regions Figure 19 Viewing the list of results spreadsheets after detecting differential expression and alternative splicing The same dialog box that appears if you select Describe results will pop up if you have not previously turned this off This dialog box explains the spreadsheets generated by mRNA quantification The mapping _ summary spreadsheet The mapping summary spreadsheet Figure 20 is a summary of reads that have been mapped to the intronic exonic or intergenic regions This is a QA QC step to give you a brief idea of how the reads are distributed across the transcriptome You should expect the results of replicates to resemble each other with respect to the distribution of reads D 6 f 8 Percentage of Percentage of Percentage of Percentage of reads which reads which reads within an reads between fully overlap partially exon overlap exon 1 brain fl 10984232 NOTmusde 10984232 62 7474 1 56686 5 67301 30 0127 2 heart_fa 11199653 Musde 11199653 49 2889 1 75096 6 87757 42 0826 3 liver _fa 11583902 NOTmusde
33. ide Variations in NGS Data with Partek Genomics Suite 6 6 tutorial using Help gt On line Tutorials gt Next Generation Sequencing Step 8 Detecting Unexplained Regions During Step 3 a spreadsheet named unexplained_regions was generated Figure 38 This spreadsheet contains locations where reads map to the genome but are not annotated by the transcript database RefSeqGene in this case The spreadsheet can be sorted by descending Average Coverage column 6 This spreadsheet is potentially very interesting as it may contain novel findings Analysis of RNA Seq Data with Partek Genomics Suite 6 6 31 plained regions 1 Help a ent Selection M ed ee ee Chromosome Start Sample ID Length Average Coverage 1 Mo 9030 9108 heart_fa 78 50019 8 d M 9138 9201 heart_fa 63 145599 8 3 M 6336 6531 Jheartfa 195 40096 4 17 42075077 42075149 musde fa 72 138899 3 5 19 24184074 24184165 heart_fa 91 38736 7 6 M 9138 9201s fuse fa 63 4835 1 7 M 9030 9108 musdefa 3 3427 8 M 8625 8751 heartfa 126 31446 9 19 24184077 24184166 musde_fa 39 238765 1 10 M 9030 s108 branfa 2505 7 r Figure 38 The unexplained regions spreadsheet Partek Genomics Suite software can look for genes that overlap the regions It locates reads not only outside of a known gene but also takes exon exon information from the database and locate reads to the intron of a ge
34. lumn e Use the drop down menu to select the Sample ID column the samples names will be shown Figure 11 e Select OK f Choose Sample ID Column The sample ID column is required for integrated analysis using the filename is not recommended The specified sample IDs must match the sample IDs from the spreadsheet with which you want to integrate Sample IDs are case sensitive If you don t have a column that you want to use as Sample ID please use Add sample attributes from the workflow to add a Sample ID column Sample ID Column 1 Sample ID Sample ID Preview brain_fa heart_fa liver_fa muscle_fa Figure 11 Specifying the correct Sample ID column e For quality assessment select Alignments per read from the QA QC portion of the workflow After analyzing the four samples a new child spreadsheet named Alignment_Counts 1s created Figure 12 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 9 LP Partek Genomics Suite 1 Alignment_Counts MyRNASegProject_alignments_per_read txt File Edit Transform View Stat Filter Tools Window Custom Help Dae eX kh BtS LH Bie AQ 1 MyRNASeqProject a Current Selection brain_fa Alignment_Counts MyRNASeqgProject_alignments_per_read txt 10984232 11199655 11583902 14486541 Figure 12 Alignment Counts spreadsheet In column 3 one can see that all imported aligned reads from Figure 6 align exactly once to the human g
35. lusive exons MXEs The reads mapped to the transcripts of this gene in each of the tissue samples are shown in the browser as different tracks The relative abundances of the individual transcripts of this gene are shown by the height of the color coded tracks Note the transcript NM 213611 has low expression while transcripts NM 005888 and NM 002635 have higher expression The alternative splicing pattern 1s shown as the paper explained indicating that different forms of the third exon are used in a tissue specific manner Using the Track panel At this point you may find it useful to start altering track properties Each track can be individually configured to alter the visualization of the reads For example e Select a track and drag it to change the position of the track in the viewer e Select the brain track and a configuration panel will appear at the bottom of the track panel enabling the configuration of the sequence reads display the Y axis the color and the labels of the track e Select a track and increase its height Step 5 Create a Gene List To create a list of transcripts that are both significantly differentially expressed AND alternatively spliced among the four tissue samples use the Create gene list function from the workflow to invoke the List Manager dialog box Figure 32 Each of the tabs Venn Diagram ANOVA Streamlined and Advanced can be used to generate combinations of lists Analysis of RNA Seq Data with
36. m four diverse human tissues skeletal muscle brain heart and liver from different donors and sequenced on the Illumina Genome Analyzer The single end mRNA Seg reads were mapped to the human genome hg19 allowing up to two mismatches using Partek Flow alignment and the default alignment options The output files of Partek Flow are BAM files which can be imported directly into Partek Genomics Suite 6 6 software BAM or SAM files from other alignment programs like ELAND CASAVA Bowtie BWA or TopHat are also supported This same workflow will also work for aligned reads from any sequencing platform in the aligned BAM or SAM file formats Data and associated files for this tutorial can be downloaded by going to Help gt On line Tutorials from the Partek Genomics Suite main menu The data can also be downloaded Analysis of RNA Seq Data with Partek Genomics Suite 6 6 l directly from http www partek com Tutorials NextGen 4TissueBamFiles zip Once the zipped data directory has been downloaded to your local drive right click the file and select Extract All Select the directory you wish to work in and select Extract The data files are now unzipped and you are ready to proceed with the tutorial Workflow Open the RNA Seq workflow within Partek Genomics Suite software by selecting it from the Workflow drop down list in the upper right corner of the screen The entire workflow is shown in Figure 1 Workflows RNA Seq
37. mo_sapiens GRCh37 72 gqtf qz C RefSeq Transcripts 2013 09 03 The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non redundant well annotated set of sequences including genomic DNA transcripts and proteins C Ensembl Transcripts release 73 Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Built from ftp ftp ensembl org pub release 73 atf homo_sapiens Homo_sapiens GRCh37 73 atf qz C Ensembl Transcripts release 74 Ensembl transcripts are based on experimental evidence and thus the automated pipeline relies on the mRNAs and protein sequences deposited into public databases from the scientific community Built from ftp ftp ensembl org pub release 74 atf homo_sapiens Homo_sapiens GRCh37 74 atf gz The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non redundant well annotated set of sequences including genomic DNA transcripts and proteins Do not download any file at this time Don t show this dialog again x j cancel Figure 13 Selecting an annotation source e For this tutorial select RefSeq Transcripts 2014 01 03 Note the date beside the RefSeq Transcripts tells the release date of the annotation database Partek Genomics Suite software will download the relevant file subject to
38. n and alternative splicing The reads that are mapped to a certain tissue and the proportion of the transcript expressed in this tissue are colored the same In this screenshot brain is colored red heart is colored green liver is colored blue and muscle is colored brown Analysis of RNA Seq Data with Partek Genomics Suite 6 6 26 f Partek Genome Viewer File View Window Tracks __ amp chr12 98987403 98995779 MAE Tracks RefSeq Transcripts 2014 01 03 hg19 RefSeq Transcripts 2014 01 03 RefSeq Transcripts 2014 01 03 hg19 D E i i Ei aii r z soform proportion 1 transcripts esi mR TT TT Legend Base Colors Bam Profile brain_fa RefSeq Transcripts 2014 01 03 Bam Profile heart_fa Bam Profile liver_fa Isoform Proportion W brain_fa heart_fa liver_fa muscle_fa I NIMS Bam Profil de_fa am Profile musde_fa NM_odsee8 Cytoband hg 19 LJ Genomic Label EE u am M ___ 4 y A C G T N New Track PEE Base Colors o B B s brain_fa Profile Color Labels 0 x tak E Gene heart_fa E Ak A liver_fa ie ma a muscle_fa o re 4x ggg fn ee 98987 4KBps 98989 5KBps 98991 6KBps 98993 7KBps 98995 8KBps pony Reset J HAA gaj Figure 31 Viewing one gene SLC25A3 in the Genome Viewer SLC25A3 was reported by Wang et al Nature 2008 to have mutually exc
39. nd Group 2 to NOT muscle e Select and drag the samples from the Unassigned box to the correct group The setup is shown in Figure 9 Additional groups can be added as required using the New Group button e Select OK to proceed CP Create categorical attribute b Remove Group heart_fa muscle_fa Group Name NOT muscle Remove Group brain_fa liver_fa Figure 9 Create categorical attribute I x ca e When asked if you wish to add another categorical attribute select No and if you wish to save the spreadsheet select Yes The attribute will now appear as a new column with the heading Tissue and the groups Muscle and NOT muscle Figure 10 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 8 F Partek Genomics Suite 1 MyRNASeqProject File Edit Transform View Stat Filter Tools Window Custom Help Dex tikite SW Biey SE Tissue 1 MyRNASeqProject d _ Current Selection Musde NOT musde L zA Sample ID Number of Alignments 1 brain_fa 10984232 2 heart_fa 11199653 3 liver_fa 11583902 14486541 Figure 10 Tissue a new categorical attribute has been added It is also important to ensure that the correct column is defined as the Sample ID This is particularly important when integration of data from different experiments is desired From the mport section of the workflow select Choose sample ID co
40. ndle unbalanced and incomplete datasets nested designs numerical and categorical variables any number of factors and flexible linear contrasts when you do have biological replicates Detecting differential expression using the Partek ANOVA During import you created a categorical attribute called Tissue and assigned the 4 samples to either the Muscle or NOT muscle groups This step was to create replicates within a group albeit this grouping is somewhat artificial and is used in this dataset simply to illustrate the ANOVA Replicates are a prerequisite for analysis of differential expression using the ANOVA test e Select Differential expression analysis from the Analyze Known Genes section of the workflow You have the choice of analyzing at either the Gene or Transcript level Select Gene level analysis Figure 26 e Make sure the gene_rpkm spreadsheet is selected as the Spreadsheet e Select OK to open the ANOVA dialog box Analysis of RNA Seq Data with Partek Genomics Suite 6 6 22 amp Differential expression analysis s First specify the type of result that you would like to analyze then specify the spreadsheet Specify type of result Gene evel Transcripttevel Exonevel Specify spreadsheet Spreadsheet 1 gene_rpkm RNA Seq_results gene rpkm Figure 26 Differential expression analysis e Available factors are listed in the Experimental Factor s panel on the left Move Tissue to the ANOVA Factor s pane on
41. ne This is helpful for finding potential novel transcripts exons sequencing biases etc e Go to Tools gt Find Overlapping Genes option in the command toolbar e In the resulting dialog box Figure 39 select Add a new column with the gene nearest to the region and then OK amp Find Overlapping Genes Find Overlapping Genes Select a method for annotating regions with genomic features Create a new spreadsheet with genes that overlap with the regions Figure 39 Find Overlapping Genes Analysis of RNA Seq Data with Partek Genomics Suite 6 6 32 e Specify the database you wish to use In this example select RefSeq Transcripts 2014 01 03 Figure 40 Please note that it 1s recommended that you annotate with the same database as when you performed mRNA quantification e Select OK CP Output Overlapping Features Report regions from the specified database Download required Click OK to download the file RefSeq Transcripts 2013 05 10 The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non redundant well annotated set of sequences including genomic DNA transcripts and proteins Download required Click OK to download the file RefSeq Transcripts 2013 09 03 The Reference Sequence RefSeq collection aims to provide a comprehensive integrated non redundant well annotated set of sequences induding genomic DNA transcripts and proteins Download required Click OK
42. ow project e Under mport from the RNA Seg workflow select Import and manage samples The Sequence Import dialog box will open Figure 2 a P Sequence Import Please navigate to the directory where your samples are located and select samples using the check boxes Current Directory C Partek Training Data 4TissueBamFiles Files of type BAM Files bam cygwin z m FileName Type Date modified a dell J brain fa bam 01 16 2014 14 34 04 Ga downloads Drivers Ga FASTQC g FlowData G GenomeAnalysisTK Ge GenomeReference Ga Intel v liver_fa bam 01 16 2014 14 34 10 L T 2 heart_fa bam 01 16 2014 14 34 08 zs 4 v musde_fa bam 01 16 2014 14 34 15 4 Microarray Libraries g mybin Partek Training Data Ga Affymetrix miRNA3 0 Ga Affymetrix HGU219 Figure 2 Viewing the Sequence Importer window Choose the files you want to import e Files of type Select BAM Files bam from the drop down list Browse to the folder where you have stored the BAM files Select the files to import by checking the box to the left of the data files For this tutorial select brain fa heart fa liver_fa and muscle fa Analysis of RNA Seq Data with Partek Genomics Suite 6 6 3 e Select OK and the next Sequence Import dialog box will be opened as shown in Figure 3 amp Sequence Import Output file MyRNASeogProject
43. pped to chromosome one from the four tissue samples The y axis numbers on the left side of the tracks indicate the raw read counts The aligned reads are shown in the Genome Viewer in each track with a different color for each Bam Profile track Genome Sequence Cytoband and Genomic Label The Genome Sequence Cytoband and Genomic Label tracks are shown at the bottom of the panel Genome Sequence will display the bases different colors and labels of the reference genome specified when zoomed in sufficiently The other two labels are helpful for navigating about the chromosome Analysis of RNA Seq Data with Partek Genomics Suite 6 6 13 hm Q chra 128748315 128753681 F RefSeq Transcripts 2014 01 03 RefSeq Transcripts 2014 01 03 Base Colors BABCHCHOTEN brain_fa 7 ee ee ee eee heart_fa Se 0 liver_fa 0 muscle _fa 186 d o hg19 Reference 128748 3KBps 128749 7KBps 128751 0KBps 128752 3KBps 128753 7KBps a i AAA Ha Figure 16 Zooming to look at a gene of interest MYC Step 3 Analyze Known Genes The next step in the workflow is to detect differentially expressed genes by performing mRNA quantification This function creates data tables at the transcript and gene levels and identifies those transcripts that are differentially expressed or spliced across all samples The raw and normalized reads are also reported for each sample The normalization method used by
44. rs included in the SP CEH OL Sueur a ANOVA noise The F value is always set to 1 ratio of noise to noise Note that in this tutorial the overall p value for the factor column 4 is the same as the p value for the linear contrast column 5 as there are only two levels within factor If we had more than two groups the overall p value and the linear contrast p values would most likely differ You can also see the symbol in the ratio fold change columns 6 and 7 for several genes that also have a low p value resulting from zero reads in one of the groups thus ratios and fold changes cannot be calculated For more detailed examples of setting up the ANOVA including multiple factors and linear contrasts please refer to the gene expression tutorials Down s Syndrome and Breast Cancer available from Help gt On line Tutorials The unexplained_regions spreadsheet The contents of this sheet will be explained in more detail in Step 8 Step 4 Use the Genome Browser e Select the transcripts spreadsheet and then select Plot Chromosome View under the Visualization tab to view the analyzed results in Partek Figure 30 Analysis of RNA Seq Data with Partek Genomics Suite 6 6 25 f Partek Genome Viewer File View Window Tracks Tracks Legend Base Colors Bam Profile brain_fa Bam Profile heart_fa Bam Profile liver_fa Bam Profile muscle_fa Cytoband hg19 Genomic Label RefSeq Transcripts 2014 01 0
45. u may wish to utilize promoter or microRNA databases if you are interested in regulation of expression End of Tutorial This is the end of RNA Seq tutorial If you need additional assistance with this data set contact the Partek Technical Support staff at 1 314 878 2329 or email us at support partek com Date last updated August 2015 Copyright 2015 by Partek Incorporated All Rights Reserved Reproduction of this material without express written consent from Partek Incorporated is strictly prohibited Analysis of RNA Seq Data with Partek Genomics Suite 6 6 35 References Mortazavi A Williams B A McCue K Schaeffer L amp Wold B Mapping and quantifying mammalian transcriptomes by RNA Seq Nature 2008 5 621 8 Wang E T Sandberg R Luo S Khrebtukova I Zhang L Mayr C Kingsmore S F Schroth G P amp Burge C B Alternative isoform regulation in human tissue transcriptomes Nature 2008 456 470 6 Xing Y Yu T Wu YN Roy M Kim J Lee C An expectation maximization algorithm for probalisitic reconstructions of full length isoforms from splice graphs Nucleic Acids Res 2006 34 3150 3160 Copyright 2015 by Partek Incorporated All Rights Reserved Reproduction of this material without express written consent from Partek Incorporated is strictly prohibited Analysis of RNA Seq Data with Partek Genomics Suite 6 6 36
46. ults gene rpkm ANOVA lway ANOVAResults mapping_summary RNA Seq_results read summary transcript_reads RNA Seq_results transcript reads transcript_rpkm RNA Seq_results transcript rpkm transcripts RNA Seq_results transcripts unexplained_regions RNA Seq_results unexplained regions Figure 34 A list of the differentially expressed and alternatively spliced genes is now available for downstream analysis Analysis of RNA Seq Data with Partek Genomics Suite 6 6 29 The list of differentially expressed and alternatively spliced transcripts will be used in the next step Biological Interpretation for GO Enrichment Analysis Step 6 Biological Interpretation GO Enrichment With the GO Enrichment feature in Partek Genomics Suite software you can take a list of significantly expressed genes transcripts and find GO terms that are significantly enriched within the list For a detailed introduction of GO Enrichment refer to the GO Enrichment User Guide Help gt On line Tutorials gt User Guides e Select Gene set analysis in the Biological Interpretation section of the workflow e Select GO Enrichment and select Next Choose the spreadsheet that was just created Figure 35 and select Next C Gene Set Analysis Select the spreadsheet that contains the genes you want to test 1 Diff_Exp_and_Alt_Splice Diff Exp and Alt Splice txt Figure 35 Selecting the spreadsheet for GO Enrichment e The Gene Set Analysis

Analysis of RNA-Seq Data with Partek® Genomics Suite® 6.6

Contents

Download Pdf Manuals

Related Search

Related Contents

Analysis of RNA-Seq Data with Partek&reg; Genomics Suite&reg; 6.6

Contents

Download Pdf Manuals

Related Search

Related Contents

Analysis of RNA-Seq Data with Partek® Genomics Suite® 6.6