Home

5500 Series SOLiD™ Systems RNA Analysis User Guide (PN

image

Contents

1. Saturation of reads Because every cell type or tissue expresses only a subset of the total transcriptome mapping to a detection of 100 of a reference is not seen using RNA from a single cell type Instead reference the following saturation characteristics are observed see Supplemental data saturation of reads mapping to a reference on page 23 for examples with human RNA analysis As the number of mappable reads in a sequencing run increases the fraction of an RNA reference that is detected increases until a plateau is reached After the plateau is reached or saturation is achieved the rate of detecting additional reference transcripts genes or other genomic regions of interest with additional reads is much lower The number of reads needed to reach the plateau depends on a number of factors including the RNA sample used the size of the genome the percent of the genome transcribed into RNA the regions of experimental interest for example transcripts exons or splice junctions and the threshold used for calling an element detected In general greater read depth is required to saturate detection of splice junctions than to saturate detection of exons because the region spanning a splice junction is a subset of the corresponding exons Likewise more reads are required to detect a single exon than its corresponding transcript if the transcript contains multiple exons Given these considerations there are no stric
2. Related documentation Related documentation For a complete list of guides for the 5500 Series SOLiD Systems see the 5500 Series SOLiD Systems User Documentation Quick Reference Publication no 4465102 Search Publication no 4452437 4456596 4455352 4465650 by publication number at www appliedbiosystems com Document SOLiD Total RNA Seg Kit Protocol SOLiD SAGE Kit with Barcoding Adaptor Module Guide ERCC RNA Spike In Control Mixes User Guide 5500 Series SOLID Sequencers Reagents and Consumables Ordering Information Quick Reference Obtaining support For the latest services and support information for all locations go to www appliedbiosystems com At the Applied Biosystems web site you can e Access worldwide telephone and fax numbers to contact Applied Biosystems Technical Support and Sales facilities e Search through frequently asked questions FAQs e Submit a question directly to Technical Support e Order Applied Biosystems user documents MSDSs certificates of analysis and 19 other related documents e Download PDF documents e Obtain information about customer training e Download software updates and patches 5500 Series SOLiD Systems RNA Analysis Guide Appendix A Product information and support Obtaining support 20 5500 Series SOLiD Systems RNA Analysis Guide APPENDIX 8 Supplemental information Criteria e Discrete rRNA bands t
3. 110 000 100 000 90 0004 80 000 70 0004 60 000 50 000 0 Predicted junctions detected transcripts 22 Appendix B Supplemental information Supplemental data saturation of reads mapping to a reference Supplemental data saturation of reads mapping to a reference Whole In the example shown in Figure 3 a whole transcriptome library was prepared from transcriptome poly A Human Brain Reference RNA using the SOLiD Total RNA Seq Kit and analysis sequenced in the SOLiD System A total of 50 X 10 mapped reads was used for analysis The curves represent thresholds of 1 red line and 3 blue line reads uniquely mapping to RefSeq In this experiment the fraction of known RefSeq sequences detected begins to plateau at about 20 X 10 uniquely mapped reads The more stringent the calling criteria higher threshold the more mapped reads are needed to achieve the same fraction of RefSeq hits Figure 3 SOLID whole transcriptome analysis detection of human RefSeq 100 905 805 705 605 Percent RefSeq detected n 29 476 50 e 1read e 3reads 40 7 7 08 0 1647 et7 58 7 68 7 co 4 Uniquely mapped reads Small RNA analysis In the example shown in Figure 4 small RNA libraries were prepared from small RNA containing human RNA from 10 tissues using the SOLiD Small RNA Expression Kit this kit has been replaced by the SOLiD Total RNA Seg Kit sequ
4. See also splicing alternative splicing The compilation of annotated exon sequences in a genome Sequencing data derived from the P1 end of the template in the SOLiD templated bead using forward ligation chemistry The F3 tag is generated using the SOLiD FWD1 Seq Primers or the SOLiD Small RNA Seq Primers See the 5500 Series SOLiD Sequencers Reagents and Consumables Ordering Information Quick Reference Part no 4465650 for an illustration Sequencing data derived from the P2 end of the template in the SOLiD templated bead using reverse ligation chemistry The F5 tag is generated using the SOLiD REV1 DNA Seq Primers or the SOLiD REV1 RNA Seq Primers See the 5500 Series SOLiD Sequencers Reagents and Consumables Ordering Information Quick Reference Part no 4465650 for an illustration The microfluidics chamber housing six lanes in which SOLiD templated beads are deposited and through which SOLiD sequencing reagents flow during a sequencing run 5500 Series SOLiD Systems RNA Analysis Guide Glossary coverage dbSNP deletion discovery downstream upstream epigenomics Exact Call Chemistry ECC exon exome F3 tag F5 tag flowchip 32 Glossary A region of the flowchip upon which a single sequencing sample is loaded Each 5500 Series SOLiD Sequencer flowchip contains six lanes Sequencing reads in the direction from the P1 to P2 sequence using forward lig
5. libraries can be combined and sequenced in one lane for multiplex sequencing Table 9 in Appendix B lists example calculations for determining the number of libraries that can be accommodated in a single flowchip lane e Multiplex sequencing optimizes use of reagents and time on the sequencer e Multiplex sequencing lends itself to the strategy of sequencing multiple libraries from biological replicates which can give more statistical strength for discovery and gene expression analysis than a single library Auer and Doerge 2010 For example sequencing 4 libraries from biological replicates instead of one library at 4X the number of reads may give higher confidence in differential gene expression experiments See also Sequencing depth replicates and differential gene expression on page 12 In general multiplex sequencing enables greater flexibility in experimental design than does singleplex sequencing Multiplex sequencing must maintain color balance Color balance is the relative proportion of beads that are called as each of the four colors in a given cycle The SOLiD 3 Primers in the SOLiD RNA Barcoding Kit Modules are optimized to maintain color balance within consecutive groups of 4 primers To preserve color balance during multiplex sequencing each flowchip lane must have at least one set of Advantages of multiplex sequencing Maintaining color balance with multiplex sequencing 5500 Series 5010 Systems
6. primers are available in the SOLiD RNA Barcoding Kits See Multiplex sequencing on page 13 and Table 9 in Appendix B for further information For libraries prepared using the SOLiD Total RNA Seg Kit it is important to quantitate the libraries as described in the SOLiD Total RNA Seg Kit Protocol before proceeding with templated bead preparation The SOLiD Total RNA Seg Kit procedure using the Agilent 2100 Bioanalyzer to determine the molar concentration of the library gives the most reliable and consistent estimates of RNA Seq library quantity For the SOLiD SAGE Kit and other library preparation kits use the SOLiD Library TaqMan Quantitation Kit and real time PCR as recommended 8 5500 Series SOLiD Systems RNA Analysis Guide RNA analysis on the 5500 Series 501 0 System Seguencing strategy read direction s and length Whole transcriptome libraries using ERCC RNA Spike In Control Mixes We recommend including one of the ERCC RNA Spike In Control Mixes during preparation of a whole transcriptome library The ERCC RNA Spike In Control Mixes are 2 sets of 92 polyadenylated RNAs 250 2000 nt in length that are transcribed from a set of NIST certified plasmids External RNA Controls Consortium 2005 Each Spike In Mix is preformulated at defined quantities spanning a 106 fold concentration range with defined Mix 1 Mix 2 transcript molar concentration ratios When Spike In Mixes are added at known concentrati
7. FWD1 SOLID e Inserts are derived from 75 nt reverse 35 nt recommended for Seq Primers REV1 RNA fragmentation of RNA along the Discovery of novel transcripts Seq Primers entire transcript e Mapping all transcripts coding and non coding in a region e SNP discovery Paired end forward 75 nt reverse 35 nt SOLiD FWD1 SOLiD highly recommended for Seq Primers REV1 RNA e Detection of fusion transcripts Seq Primers e Discovery and mapping of translocations and fusion transcripts e Detection of alternative splicing Small RNA analysis SOLID Total Forward only read length as appropriate for SOLID Small RNA Seg Kit small RNA expected size of the small RNAt RNA Seg procedure Primers RNA derived seguence e 16 27nt e Insert encompasses entire RNA molecule SAGE analysis SOLID SAGE Forward only read length as appropriate for SOLiD Small Kitt the size of the SAGE tagstt RNA Sea RNA derived sequence Primers e 27 nt e Insert is derived from 3 end of transcript t For sequencing reads through the cDNA insert only for all library types use the FWD2 primers for barcode reads See Reagents and kits for RNA analysis on page 17 for ordering information 8 Does not include P1 P2 IA or BC sequences tt Acquiring reads longer than the insert size allows confirmation that the entire insert has been sequenced but may require trimming of the adaptor sequ
8. RNA Isolation Kit AM1830 e MagMAX 96 for Microarrays Total RNA Isolation Kit AM1839 MagMAX 96 Blood RNA Isolation Kit AM 1837 TRIzol9 Plus RNA Purification System 12183555 PureLink RNA Mini Kit 12183020 12183018A Poly A selection or rRNA depletion of total RNA e Poly A Purist Kit AM1916 e MicroPoly A Purist Kit AM19190 e Poly A Purist MAG Kit AM 1922 mRNA Catcher PLUS Kit K157002 RiboMinus Eukaryote Kit for RNA Seg A10837 RiboMinus Plant Kit for RNA Seg A10838 Small RNA analysis e mirVana miRNA Isolation Kit AM1560 e mirana Paris Kit AM1556 RecoverAll Total Nucleic Acid Isolation Kit for FFPE AM1975 5500 Series 5010 Systems RNA Analysis Guide Description Isolation of small RNA from mammalian cells and tissues plant tissues yeast and bacteria Isolation of total RNA from animal and plant cells and tissue bacteria and yeast Description Single kit for preparation of whole transcriptome or small RNA libraries that maintain genomic strand information for mapping Kit for preparation of SOLID libraries of cDNA tags of 3 ends of RNAs For use with a SOLID RNA Barcoding Kit 3 PCR Primers with unigue barcode seguences for incorporation of barcodes into SOLiD cDNA libraries For use with the SOLID Total RNA Seg Kit or SOLID SAGE Kit with Barcoding Adaptor Module Sets of syntheti
9. Seguencing depth replicates and differential gene expression Summary 12 RNA analysis on the 5500 Series 501 0 System Multiplex sequencing Table 3 SOLiD System analysis of human RNA suggested number of mapped reads Suggested number of mapped readst Type of analysis human RNA Whole transcriptome analysis gt 50 X 10 uniquely mapped reads gives good coverage of human RefSeq at 1 RPKM e Detection of novel transcripts e Mapping all transcripts coding and non coding in a region gt 100 X 105 reads to detect rare events at 1 RPKM Alternative splicing e Fusion transcripts e Variant transcripts e SNPs in RNA e Allele specific gene expression 3 4 X 105 mapped reads detection of known miRNAs 10 30 X 10 mapped reads discovery of new small RNAs or iso miRs 2 5 X 10 mapped reads Small RNA analysis SAGE analysis t See Appendix B for examples of the experiments used to derive the information in this table I RPKM reads per kilobase transcript length per million mapped reads see Appendix B Multiplex sequencing The 5500 Series SOLiD Sequencer flowchip has 6 lanes that can be individually programmed for a sequencing run For certain applications the bead capacity of the flowchip lane is several fold higher than the number of beads required to acquire all the necessary data In these cases multiple barcoded libraries of the same type for example all small RNA libraries or all SAGE
10. Whole transcriptome JPT Small RNA F3 FWD2 BC Small RNA or SAGE JPT RNArderived BC PZ 5500 Series SOLiD Systems RNA Analysis Guide 9 RNA analysis on the 5500 Series 501 0 System Sequencing strategy read direction s and length Generally we recommend the following e Acquire the longest reads that are allowed by the insert size see Table 2 and supported by the SOLiD platform e For whole transcriptome analysis we recommend paired end sequencing which optimizes the potential for discovery of novel transcripts fusion transcripts and alternative splice junctions even if that is not the primary objective See Supplemental data paired end sequencing for detection of fusion transcripts on page 22 However if the purpose of the experiment is only to determine relative levels of known RNAs then only a single forward read is needed Table 2 5500 Series 501 0 System sequencing strategies and primers for RNA analysis Analysis strategy library preparation method t Suggested read direction s Sequencing primers and read length s and insert size F3 tag F5 RNA tag Whole transcriptome analysis Forward only 75 nt suitable for SOLiD FWD1 ryTM 2 SOLID Total RNA Seq Kitt whole e Expression profiling of coding and non Seq Primers transcriptome procedure coding RNA RNA derived seguence8 e SNP discovery e 110 to 200 nt Forward only 75 nt OK paired end forward SOLID
11. all transcripts coding and non coding on a genomic or regional scale e Discovery of novel transcripts coding and non coding e Discovery and mapping of translocations and fusion transcripts e Detection of alternative splicing events e Discovery of alternative transcription start and stop sites Counting and discovery8 e Expression profiling of coding and non coding RNA e Discovery of SNPs and allele specific expression Counting and discovery e Analysis of small RNA expression patterns e Detection of length and 3 and 5 sequence isoforms e Discovery of novel small RNAs Counting8 e Profiling expression of known transcripts in one or more samples relative to another e Profiling alternative poly A site usage in known transcripts Analysis strategy Whole transcriptome analysis Global sequence analysis of coding and non coding RNA transcripts along their entire length Small RNA analysis Sequence analysis of small RNA species generally 16 27 nt in length SAGE analysis Serial Analysis of Gene Expression Expression profiling by counting short sequencing reads generated specifically from the 3 ends of RNA t The applications in this table are not an exhaustive or all inclusive list I Detailed information about library preparation is found in the user guide or manual for each kit 8 The terms counting or discovery are sometimes used for applications whose main fo
12. tttEach flowchip lane must include at least one full set of 4 color balanced barcodes The last set can be incomplete 5500 Series SOLiD Systems RNA Analysis Guide 27 Appendix B Supplemental information Calculations for the number of libraries per flowchip lane 28 5500 Series SOLiD Systems RNA Analysis Guide Bibliography Auer PL and Doerge RW 2010 Statistical Design and Analysis of RNA Sequencing Data Genetics 185 405 416 Breu H 2010 A Theoretical Understanding of 2 Base Color Codes and Its Application to Annotation Error Detection and Error Correction Applied Biosystems white paper Publication no 139WP01 02 Search by publication number at www appliedbiosystems com Costa V Angelini C De Feis I Ciccodicola A 2010 Uncovering the complexity of transcriptomes with RNA Seq J Biomed Biotech Article ID 853916 doi 10 1155 2010 853916 External RNA Controls Consortium 2005 Proposed methods for testing and selecting the ERCC external RNA controls BMC Genomics 6 150 Available at www biomedcentral com 1471 2164 6 150 Kassahn K Waddell N Grimmond SN 2011 Sequencing transcriptomes in toto The Royal Society of Chemistry DOL 10 1039 c0ib00062k Life Technologies white paper 2011 SOLiD System accuracy with the Exact Call Chemistry module Publication no CO31266 Search for lt CO31266 gt at www appliedbiosystems com Marguerat S and Bahler J 2010 RNA Seq from technology to b
13. 1 on page 7 Isolation of high quality intact RNA is critical for preparing SOLiD System cDNA libraries that are representative of the cellular RNA population e Table 5 in Appendix A lists RNA isolation kits suggested for different sample types and applications e Itis essential to use best practices for handling RNA from RNA isolation through cDNA library preparation For more information see Ambion Technical Bulletin 159 Working with RNA available at www invitrogen com workingwithrna We strongly recommend evaluating the quality of your RNA samples before proceeding with SOLiD System cDNA library preparation Table 7 in Appendix B lists common criteria for evaluating RNA quality Library preparation Table 1 lists kits optimized for preparation of SOLiD System cDNA libraries All kits enable preparation of a library that is ready to enter the SOLiD System workflow at the templated bead preparation step In the SOLiD Total RNA Seg Kit procedures the RNA is first ligated to SOLiD specific adaptor oligonucleotides this step preserves the strandedness of the RNA in the library and then reverse transcribed to cDNA In the SOLiD SAGE Kit procedure the RNA is first reverse transcribed and then ligated to the SOLiD adaptor oligonucleotides For multiplex sequencing barcodes can be incorporated with barcoded 3 PCR primers during the cDNA amplification step of either library preparation procedure Barcoded
14. NA library SNP splicing splice junction strandedness 5500 Series SOLiD Systems RNA Analysis Guide 35 Tag is used in two ways e Sequencing data from a single bead with a single primer set sometimes used interchangeably with read e A length of DNA or cDNA to be sequenced especially a relatively short stretch of DNA or cDNA that is used to infer information about the longer native molecule from which it is derived such as in mate paired library sequencing and SAGE analysis respectively Sequencing data derived from specific locations on the SOLiD templated bead See the 5500 Series SOLiD Sequencers Reagents and Consumables Ordering Information Quick Reference Part no 4465650 for an illustration Process of clonally amplifying template strands on beads by emulsion PCR enriching the beads to remove beads without template then modifying the 3 end of the template on the beads to prepare for bead deposition and sequencing A single SOLiD P1 DNA Bead with a clonal population of templates for sequencing Sometimes called clonal bead The compilation of all transcribed sequences from a genome both coding and non coding A read that is mapped only once in a genome with a given number of mismatches A difference in the nucleotide sequence of interest with respect to the reference sequence Global sequence analysis of coding and non coding RNA transcripts along their entire length A SOLiD System comp
15. RNA Analysis Guide 13 RNA analysis on the 5500 Series 501 0 System Experimental design putting it all together 4 color balanced barcoded libraries Further information about constructing color balanced barcoded cDNA libraries is provided in the protocols for the SOLiD Total RNA Seg Kit the SOLiD RNA Barcoding Kit Modules and the SOLiD SAGE Kit with Barcoding Adaptor Module Experimental design putting it all together Table 4 Examples of 5500 Series 501 0 System sequencing for human RNA analysis Sequencing strategy Libraries Looking at Analysis strategy RNA samplet and mapped reads lanet required Changes in expression Whole transcriptome rRNA depleted total Forward only 50 nt 1 levels of known transcripts counting RNA or poly A RNA 0x10 mapped reade during a treatment or condition SAGE counting Total RNA or poly A Forward only 25 nt 12 33 RNA 2 5 X 105 mapped reads Changes in gene Whole transcriptome rRNA depleted total Forward 75 nt reverse 1 expression of known or mostly counting with RNA 35 nt unknown transcripts option of discovery gt 50 X 10 mapped reads during a treatment or 7 condition looking for alternative splicing or promoter or terminator usage Discovery of novel Whole transcriptome rRNA depleted total Forward 75 nt reverse 1 transcripts and features of discovery RNA or total RNA 35 nt transcripts such as fusion 100 X 106 mapped reads transcripts d
16. USER GUIDE applied biosystems vy Life technologies 5500 Series SOLiD Systems Experimental design and analysis strategies for RNA Seg applications RNA ANALYSIS Publication Part Number 4460318 Rev A Revision Date April 2011 gt design experiment prepare libraries prepare beads run seguencer analyze data technologies lt Restriction statement in header if needed gt For Research Use Only Not intended for any animal or human therapeutic or diagnostic use The products in this User Guide may be covered by one or more Limited Use Label License s By use of these products the purchaser accepts the terms and conditions of all applicable Limited Use Label Licenses These products are sold for research use only and are not intended for human or animal diagnostic or therapeutic uses unless otherwise specifically indicated in the applicable product documentation or the respective Limited Use Label License s For information on obtaining additional rights please contact outlicensing dlifetech com or Out Licensing Life Technologies 5791 Van Allen Way Carlsbad California 92008 TRADEMARKS The trademarks mentioned herein are the property of Life Technologies Corporation or their respective owners NanoDrop is a registered trademark of NanoDrop Technologies SAGE is a trademark of Genzyme Corporation TagMan is a registered trademark of Roche Molecular Systems Inc Trizol is a registered trademark of Molecular Resear
17. aneously sequenced in a single flowchip lane Each bead is assigned to the correct library after the sequencing run according to the sequence of its barcode Non coding RNA RNA that has not been shown to encode a protein Many ncRNAs have been shown to have profound effects on the levels of proteins in the cell that are derived from coding RNAs Sequencing runs that acquire sequence from each end of the insert in a DNA fragment or whole transcriptome library using both forward and reverse reads A genetic variant in a population of individuals that may or not may be associated with an observable phenotypic trait Barcoded libraries that are combined before templated bead preparation to then be deposited in a single flowchip lane for multiplex sequencing In the SOLiD System the set of primers that are used sequentially to initiate ligation sequence chemistry The R3 tag applies only to mate paired libraries sequencing data derived from the mate pair tag closer to the P2 end of the SOLiD templated bead using forward TM ligation chemistry The R3 tag initiates in the IA sequence using the SOLID 2 Seq Primers See the 5500 Series SOLiD Sequencers Reagents and Consumables Ordering Information Quick Reference Part no 4465650 for an illustration Sequencing data from a single bead with a single primer set A sequence against which sequencing reads are aligned A multi organism database archive of DNA RNA and prot
18. at enables identification of the library during multiplex sequencing Sequencing data derived from the barcode region of the SOLiD templated bead using forward ligation chemistry The BC tag is generated using the SOLiD FWD2 Seq Primers See the 5500 Series SOLiD Sequencers Reagents and Consumables Ordering Information Quick Reference Part no 4465650 for an illustration In genomic or gene expression analysis the junction s of structural variations such as inversions deletions or insertions The criteria for calling a genetic variant or novel RNA present in the biological sample The relative proportion of beads in a given sequencing cycle that are called as each of the four colors e In genomics sequence analysis that generates read or tag counts for annotated regions of a reference sequence e In transcriptomics expression analysis that focuses on relative or absolute guantification of RNA molecules in a biological specimen alignment allele alternative splicing annotated gene annotation barcode barcoded library BC tag breakpoint calling threshold call stringency color balance counting 5500 Series SOLiD Systems RNA Analysis Guide 31 In genomic analysis the number of aligned or mapped sequencing reads that span a position in the reference genome e In RNA analysis this term is sometimes used to describe the fraction of the reference seguence that has seguencing reads aligne
19. atible library that is prepared from total RNA rRNA depleted total RNA or poly A RNA that enables seguence analysis of the transcripts along their entire length See whole transcriptome analysis 5500 Series SOLiD Systems RNA Analysis Guide Glossary tag tags BC F3 F5 R3 templated bead preparation templated bead transcriptome uniguely mapped read variant whole transcriptome analysis whole transcriptome library WTA 36 9700 SE Headquarters 5791 Van Allen Way Carlsbad CA 92008 USA Phone 1 760 603 7200 Toll Free in USA 800 955 6288 For support visit www appliedbiosystems com support technologies www lifetechnologies com
20. ation and they are necessary to give statistical power in experiments designed to detect differential expression caused by the treatment or condition e The size of the fold change in expression that you are trying to detect The number of reads More reads deeper seguencing give greater detection sensitivity How do these factors affect experimental planning Generally if the estimated number of reads that is reguired to give adeguate detection sensitivity LLD is well under the capacity of your seguencing run s consider including more biological replicates rather than deeper seguencing This approach gives more power to your experiment for detection of differential gene expression For further information see Auer and Doerge 2010 Table 3 summarizes guidelines developed at Life Technologies for human RNA analysis based on our current understanding of the human transcriptome see Appendix B for supporting data It is expected that the number of reads scales with the size of the genome or transcriptome however we recommend empirically determining the number actually reguired by your experimental needs by running pilot experiments in your experimental system similar to those presented in Appendix B It is always a good idea to generate more rather than less data in pilot experiments 5500 Series SOLiD Systems RNA Analysis Guide Number of reads reguired Estimating the lower limit of detection using ERCC Spike In Mixes
21. ation chemistry A section of the genome that maps to an exon from one gene followed by an exon from another gene It can occur as the result of a translocation deletion or chromosomal inversion A gene fusion junction excludes exon to exon boundaries that arise from alternative splicing of a transcript Global analysis of the genome to discern elements involved in regulation of gene activity or expession with an emphasis on genetic variation such as single nucleotide polymorphisms small and large insertions and deletions and other structural variants such as translocations and inversions Some use the term genomics as an umbrella term that includes transcriptomics epigenomics and analysis of the genome An RNA molecule that results from transcription of a gene fusion See also gene fusion The physical size of the genomic DNA segments or RNA molecules represented in a SOLiD library e Fragment libraries the size of the sheared DNA fragments e Mate paired libraries the length of the genomic DNA segment spanned by the corresponding mate pair tags e Whole transcriptome libraries the size of the RNA fragments An insertion of nucleotide sequence with respect to the reference genome or sequence The internal adaptor sequence is incorporated into the template during library construction and provides a common hybridization target for SOLiD sequencing primers See the 5500 Series SOLiD Sequencers Reagents and Consumables Ord
22. bout RNA Seg SOLiD System for RNA analysis Purpose of this guide RNA analysis workflow 5500 Series SOLiD Systems RNA Analysis Guide RNA analysis on the 5500 Series 501 0 System Overview of experimental design Design the RNA analysis experiment Whole transcriptome analysis Small RNA analysis SAGE analysis 4 4 4 Isolate RNA Total RNA Total RNA containing small RNA or or rRNA depleted total RNA rRNA depleted total RNA Enriched small RNA or or or Poly A RNA Poly A RNA Purified small RNA J 4 Prepare a 50110 cDNA library BC Barcode sequence RNA derived cDNA W BC P IA Internal Adaptor sequence 4 Prepare templated beads 4 RNA derived cDNA BC BWUP2 Run the SOLiD Sequencer gt t 4 Analyze data with LifeScope Software tools Overview of experimental design Good experimental design driven by the biological question and your experimental system is essential to effectively applying SOLiD System technology to RNA analysis The following sections cover key considerations for RNA expression analysis on the SOLiD System It is always a good idea to perform pilot experiments to confirm that your plan will generate the amount of sequence data required for your experimental goal e Analysis strategy RNA isolation and library preparation page 7 e Sequencing strategy read direction s and length page 9 e Number of reads required page 11
23. c RNA transcripts that are added to purified RNA samples before whole transcriptome library preparation for SOLiD System performance assessment 5500 Series SOLiD Systems RNA Analysis Guide Appendix A Product information and support Reagents and kits for RNA analysis Kit Part no t PureLink miRNA Isolation Kit K157001 SAGE analysis TRizol Plus RNA Purification System 12183555 t Available at www invitrogen com Table 6 Key kits and reagents for SOLiD library preparation from RNA Kit Part no t 501 Total RNA Seg Kit 4445374 Optional SOLiD RNA Barcoding Kits see below SOLiD SAGE Kit with Barcoding Adaptor Module 4452811 501 RNA Barcoding Kits see below SOLiD RNA Barcoding Kits e SOLID RNA Barcoding Kit Modules 1 48 4461565 e SOLID RNA Barcoding Kit Modules 49 96 4461546 SOLiD RNA Barcoding Kit Modules 1 96 4461567 e 16 barcode modules 1 16 4427046 17 32 4453189 33 48 4453191 49 64 4456501 65 80 4456502 81 96 4456503 ERCC RNA Spike In Control Mixes e ERCC RNA Spike In Mix 4456740 e ERCC ExFold RNA Spike In Mixes 4456739 Application Whole transcriptome analysis e Small RNA analysis SAGE analysis Barcoded cDNA libraries External RNA controls t Available at www appliedbiosystems com 18 Appendix A Product information and support
24. ch Center Inc O 2011 Life Technologies Corporation All rights reserved Part Number 4460318 Rev A 04 2011 Contents GUIDE RNA analysis on the 5500 Series 501107 System 5 0 LU Lea Le EN FYNN RF FERN FAE NANT AE A FERF FFON EA OH NY 5 Overview of experimental design 020 e cece tte eee eee 6 Analysis strategy RNA isolation and library preparation 2222020200 ee eeeee 7 Sequencing strategy read direction s and length 0 cece cece ence Lu 9 Number of reads required VYY cece ee eee eee eens 11 Multiplex sequencing YY Y Y Y Y LL LG ete eee eee 13 Experimental design putting it all together YYY Y Ru 14 Data analysiS ee sedis DU cee ai GL YGU BY DW ha alae WG 15 VaudauornrofTeSilll amp ry Gu Penta DD RYD GYR DD DD DY OU ea Y eee 15 APPENDIXA Product information and support 17 Reagents and kits for RNA analysis Y I FF a 17 Related documentation V Y 882 19 Obtaining support YY Y Y YI YF FFY Fyd 19 APPENDIX B Supplemental information 21 21 RNA sequence analysis concepts 0222222222 21 Supplemental data paired end sequencing for detection of fusion transcripts 22 Supplemental data saturation of reads mapping to a reference 23 Supplemen
25. cus is relative expression levels or discovery of novel RNAs respectively When using total RNA that includes rRNA gt 60 of mapped reads may be rRNA ifThe standard SOLiD SAGE Kit library preparation procedure selects poly A RNA from total RNA 5500 Series SOLiD Systems RNA Analysis Guide RNA analysis on the 5500 Series 501 0 System Analysis strategy RNA isolation and library preparation In many cases the most suitable analysis strategy for your experiment is obvious However if your primary purpose is gene expression profiling of known transcripts there is overlap in the suitability of whole transcriptome and SOLiD System SAGE analysis Whole transcriptome analysis examines the RNA molecule along its entire length enabling discovery of alternative splicing or start stop sites even if the primary objective is relative expression levels In contrast SOLiD System SAGE analysis interrogates only a short sequence at the 3 end of each transcript precluding analysis of transcript structure SOLiD System SAGE analysis generates data that are less complex than whole transcriptome analysis and more like that generated by microarray analysis but with the higher sensitivity and broader dynamic range of the SOLiD platform RNA isolation The RNA isolation method is determined by your experimental system and the input requirements of the library preparation method see Table
26. d or mapped at a certain calling threshold A Single Nucleotide Polymorphism database repository for single nucleotide polymorphisms and short insertion and deletion polymorphisms hosted by the NCBI A gap in a nucleotide seguence with respect to a reference genome or seguence Analysis that focuses on detection of novel genetic variants or RNA species that arenot already present or are present but not annotated in a reference database In transcriptomics these terms are used with respect to the direction of transcription in a genomic segment Upstream is to the 5 side of the mRNA downstream is to the 3 side In genomics analysis these terms are used with respect to the 5 side upstream or 3 side downstream of a specific location on one DNA strand Global analysis of changes in the genome that do not involve changes to the nucleotide seguence itself that result in regulation of gene activity or expession Examples include methylation of the DNA and changes in chromatin structure An optional primer round on the 5500 Series SOLiD Sequencer that enables higher accuracy reads and base sequence output derived without alignment to a reference sequence from the instrument that is included in the xsq output file In eukaryotic organisms a segment of a gene that encodes part or all of a protein Exons may be separated by introns that are spliced out of the primary transcript to produce a mature mRNA for translation into protein
27. d in a single flowchip lane A barcoded sample contains templated beads from up to 96 barcoded libraries An oligonucleotide that is the reverse complement of a designated site on the template strand and which serves as the initation point for subseguent ligation based seguencing cycles Seguence data that generates a specific analysis tag Global seguence analysis of the small RNA population of a cellular RNA sample small RNA includes microRNAs miRNAs short interfering RNAs siRNAs piwi interacting RNAs piRNAs and repeat associated siRNAs rasiRNAs A SOLiD System compatible library that is prepared from the small RNA fraction of a total RNA sample Single Nucleotide Polymorphisms SNPs single base pair variants in genomic DNA or the corresponding RNA transcript The process whereby introns are removed from a primary mRNA resulting in a mature mRNA that is ready for translation into protein Exon to exon boundaries that arise from splicing of a transcript See also gene fusion The polarity or orientation of a nucleic acid strand with respect to being sense or antisense Libraries prepared using the SOLiD Total RNA Seq Kit preserve the strandedness of the original RNA molecule such that F3 tag reads align to the sense strand and F5 tag reads align to the antisense strand RPKM run SAGE analysis SAGE library sample barcoded sample sequencing primer single read small RNA analysis small R
28. e Multiplex sequencing page 13 5500 Series SOLiD Systems RNA Analysis Guide RNA analysis on the 5500 Series 501 0 System Analysis strategy RNA isolation and library preparation Analysis strategy RNA isolation and library preparation Table 1 summarizes the RNA analysis strategies supported by the SOLiD System Analysis strategies The analysis strategy informs the cDNA library preparation method the amount and type of data generated the size complexity of the reference seguences used for alignment and the analysis algorithm s Library preparation method and input RNAt Method SOLID Total RNA Seg Kit whole transcriptome procedure Input RNA Total RNA rRNA depleted for analysis of coding and non coding transcripts e Total RNA not rRNA depleted can be appropriate if reguired for a specific application or with very limited samples to avoid incurring losses during rRNA depletion t e Poly A RNA for analysis of polyadenylated RNAs coding and non coding Method 501 0 7 Total RNA Seg Kit small RNA procedure Input RNA e Total RNA containing small RNA e Enriched small RNA e Purified small RNA Depends upon the percentage of small RNA in the total RNA sample see the SOLiD Total RNA Seg Kit Protocol Method SOLiD SAGE Kit procedure Input RNA e Total RNAfF e Poly A RNA Table 1 RNA analysis with SOLID System technology Purposet Discovery8 e Mapping
29. e color coding enables detection of rare transcripts and transcript variants at levels below 1 copy per cell Breu 2010 Library preparation with the SOLiD Total RNA Seg Kit for whole transcriptome and small RNA analysis preserves the strandedness of the RNA during library preparation This preparation method simplifies data analysis allows determination of the directionality of transcription and gene orientation and facilitates detection of opposing and overlapping transcripts This guide discusses key parameters for RNA analysis on the 5500 Series SOLiD System It also provides general guidelines for designing RNA analysis experiments using SOLiD System technology The guidelines provided here have been developed as a result of experience with analysis of human RNA expression both in house and in the scientific community While it is expected that these guidelines are adaptable to any organism they are intended only to provide a framework for discussion as you plan and implement your experiment In RNA analysis on the SOLiD System a cDNA library of the RNA sample is prepared clonally amplified onto SOLiD beads and sequenced on the SOLiD sequencer The sequencing reads also known as tags are mapped to one or more reference seguences and the structure is then deduced using bioinformatic tools The RNA analysis workflow on the 5500 Series SOLiD System is illustrated in the following figure Introduction A
30. easurement noise and increase confidence in the accuracy of quantification and detection Coverage is dependent on read length transcript length and the number of mapped reads For a 1000 nt transcript 1X coverage requires 20 reads with a 50 nt read length 20 RPK for 50 nt reads the same fold coverage could be achieved with 10 reads of 100 nt length e The lower limit of detection LLD is defined as the X value at the point where the regression line for the dose response data passes the 1X threshold These dose response data demonstrate that more reads give greater detection sensitivity and a lower limit of detection Figure 6 Dose response of ERCC transcripts in HeLa RNA 204 1 163 million reads 16 R squared 0 976 w bh 28 million reads R squared 0 9681 Log reads per kilobase 28 820 LLD id 20 RPK 1X coverage threshold T T T T T T 1 T T T 8 10 12 14 16 18 20 22 24 26 28 30 32 Log transcript molecules 5500 Series SOLiD Systems RNA Analysis Guide 25 Appendix B Supplemental information Supplemental data estimating the lower limit of detection with ERCC Spike In Mixes In Table 8 the LLD calculated in Figure 6 is transformed into an estimate of the number of copies of a transcript that are detectable per cell equivalent e The complexity ratio reflects the number of molecules that can be detected as a function of the total RNA molec
31. ein sequences hosted by the NCBI Sequencing reads in the P2 to P1 direction using reverse ligation chemistry Gene expression analysis using sequence based approaches RNA Seg can include whole transcriptome analysis small RNA analysis and SOLiD SAGE analysis 5500 Series SOLiD Systems RNA Analysis Guide Glossary library mapped read mapping miRBase multiplex seguencing ncRNA paired end seguencing polymorphism pooled libraries primer set R3 tag read seguencing read reference reference genome reference seguence RefSeg reverse read RNA Seg 34 Glossary The number of reads mapping to a transcript per kilobase of transcript length per million mappable reads RPKM is used to set a threshold for calling a transcript or new RNA species or isoform present 1 RPKM is equivalent to 20 reads mapping toa 1 kb transcript per 20 X 10 mappable reads Sequencing of beads on one or more flowchips at the same time Serial Analysis of Gene Expression Nucleotide sequence analysis seeking to find specific gene expression information using short stretches of cDNA also known as tags from the 3 ends of RNA molecules In the SOLiD System the SAGE tag is 25 27 bp in length A SOLiD System compatible library that is prepared from short cDNA segments generated from the 3 ends of RNA molecules In the 5500 Series SOLiD ICS the set of templated beads that will be sequence
32. ence to map the read effectively 10 5500 Series SOLiD Systems RNA Analysis Guide RNA analysis on the 5500 Series 501 0 System Number of reads required Exact Call The 5500 Series SOLiD Sequencers offer an optional Exact Call Chemistry ECC Chemistry primer round that enables e Higher accuracy reads for enhanced mutation detection e Base space sequence in the XSO output file without the need to map to a reference seguence For further information on ECC see publication number CO31266 SOLiD System accuracy with the Exact Call Chemistry module Number of reads required Overview Two complementary strategies are typically used for estimating the number of sequencing reads needed for an RNA analysis experiment e Estimating the number of mapped reads needed to saturate hits in the reference such as RefSeq or mikBase at a certain calling threshold e Estimating the lower limit of detection using external RNA controls such as the ERCC RNA Spike In Control Mixes that are spiked in and sequenced along with the endogenous RNA This estimate is based on a given total number of mapped reads at a certain calling threshold See RNA sequence analysis concepts on page 21 for information about the concepts of mapped reads and calling thresholds The following discussion is derived from experiments for detection of human RNAs It is meant to serve only as a guideline for your application
33. enced on the SOLiD System and mapped against the Sanger miRBase reference The calling threshold for these data was 3 reads For most tissues saturation of the Sanger miRBase reference begins at about 5 million mapped reads Based on these data one might choose to aim for 5 million reads for detection of RNAs in miRBase and about 10 million reads for discovery of new small RNAs More complex guestions such as discovery of iso miRs would reguire more than 10 million reads 5500 Series SOLiD Systems RNA Analysis Guide 23 Appendix B Supplemental information Supplemental data saturation of reads mapping to a reference Figure 4 Small RNA analysis saturation of miRBase roy Poy go a Percent miRBase detected 70 5 000 000 10 000 000 15 000 000 20 000 000 Number of reads SAGE analysi 5 In the example shown in Figure 5 a SAGE library was prepared from human brain reference RNA HBR using the SOLiD SAGE Kit and sequenced on the SOLiD System 70 million reads were counted The number of unique RefSeq hits was plotted against the total of the mapped reads using thresholds of 1 2 and 3 hits to call a transcript present While the total number of RefSeq hits continues to increase out to 4 million total mapped reads and beyond the number of newly detected unique RefSeq hits starts to plateau at 1 million mapped reads These data indicate that for routine expression profiling of human RNA 2 to 5 mil
34. ering Information Quick Reference for a schematic of sequencing primers compatible with each type of SOLiD library e The IA sequence is different in DNA source libraries and RNA source libraries therefore sequencing primers specific for RNA and DNA libraries must be used for reverse reads F5 tag e The lA containing adaptors used during mate paired library preparation are different from the adaptors used for fragment library preparation but the SOLiD FWD2 Seq Primers are used for all forward reads originating in the IA sequence generating the R3 and BC tags The genomic sequence between two exons that is spliced out of a primary transcript prior to translation See exon A place where two regions that are not contiguous in the genomic sequence are joined in a single sequenced region under consideration See flowchip lane flowchip lane forward read gene fusion genomics fusion transcript insert size insertion internal adaptor IA intron junction lane 5500 Series SOLiD Systems RNA Analysis Guide 33 A set of DNA cDNA molecules prepared from the same biological specimen and prepared for sequencing on the SOLiD System A sequencing read that has been aligned to a reference sequence The process of aligning sequencing reads to a reference genome or sequence An annotated database archive of miRNA sequences www mirbase org Sequencing runs in which multiple barcoded libraries are simult
35. hat is no trailing of the peaks e 285 185 rRNA ratio approaches 2 1t RNA Integrity Number RIN gt 7 e Discrete rRNA bands that is no significant smearing below each band e 28S 18S rRNA ratio approaches 2 1t The mappability of an RNA read depends on these parameters e The technical quality of the read that is the ability of the SOLiD System Criteria for high quality RNA Table 7 Criteria for high quality RNA for SOLiD System library preparation Evaluation method Spectrophotometry traditional or NanoDrop Microfluidics analysis Agilent 2100 Bioanalyzer requires picogram to nanogram amounts of RNA Denaturing agarose gel electrophoresis and nucleic acid staining reguires microgram amounts of RNA Parameter Purity RNA samples should be free of contaminating proteins DNA and other cellular material as well as phenol ethanol and salts associated with RNA isolation procedures Integrity RNA samples should have a high proportion of full length RNA with little or no evidence of degradation t For mammalian RNA values for RNA from other species may have different ratios RNA seguence analysis concepts About mapped reads instrument software to align a read to a given reference seguence with a certain degree of confidence e The quality of the reference sequence annotation Reference sequence databases such as RefSeq and mikBase are continuously changing and expanding as
36. iology Cell Mol Life Sci 67 569 579 Mortazavi A Williams B A McCue K Schaeffer L and Wold B 2008 Mapping and quantifying mammalian transcriptomes by RNA Seq Nature Methods 5 8 Wang Z Gerstein M and Snyder M 2009 RNA Seq a revolutionary tool for transcriptomics Nat Rev Genet 10 57 63 Wilhelm B T and Landry J R 2009 RNA Seq quantitative measurement of expression through massively parallel RNA sequencing Methods 48 249 257 5500 Series SOLiD Systems RNA Analysis Guide 29 5500 Series SOLiD Systems RNA Analysis Guide Bibliography 30 Glossary The process of mapping sequencing reads to a reference genome or sequence One of two or more alternative nucleotide sequences at the same location on homologous chromosomes A process whereby exons in a primary transcript are joined in multiple ways as part of the splicing process resulting in different mature mRNAs Within one of several reference databases a gene sequence that has biological attributes attached that describe structure or function such as coding regions or biochemical function Biological attributes or metadata that are attached to sequence data or files Examples include genes and protein coding features and verified variants A short unique sequence that is incorporated into a library that enables identification of the library during multiplex sequencing A library that has a unique barcode sequence incorporated th
37. lion mapped reads per library is likely sufficient Figure5 SOLiD SAGE analysis detection of human RefSeq 30 000 SOLID SAGE Kit HBR 70 10 tags counted 25 000 20 000 trhits Genes detected 5 5 0 500 000 1 000 000 1 500 000 2 000 000 2 500 000 3 000 000 3 500 000 4 000 000 4 500 000 Mapped reads 24 5500 Series SOLiD Systems RNA Analysis Guide Appendix B Supplemental information Supplemental data estimating the lower limit of detection with ERCC Spike In Mixes Supplemental data estimating the lower limit of detection with ERCC Spike In Mixes More reads lower In the experiment shown in Figure 6 ERCC Spike In Mix 1 was added to poly A HeLa limit of detection RNA and whole transcriptome libraries were prepared and sequenced on the SOLiD LLD System See Whole transcriptome libraries using ERCC RNA Spike In Control Mixes on page 9 for background information on the ERCC RNA Spike In Control Mixes e The dose response curves for the ERCC transcripts are shown at 28 million and 163 million total reads e The X axis indicates the level of each ERCC transcript logs transcript molecules 100 ng poly A RNA and the Y axis shows log reads per kilobase RPK e The horizontal line is placed at 1X coverage as a threshold for calling a transcript present This threshold was determined empirically and is designed to help reduce m
38. newly discovered transcripts are entered and the annotation of known transcripts is improved This issue is a consideration if your experimental purpose is discovery because by definition the novel regions or transcripts being studied have a low probability of being in the reference sequence or of being in the reference sequence and correctly annotated 21 5500 Series SOLiD Systems RNA Analysis Guide Appendix B Supplemental information Supplemental data paired end sequencing for detection of fusion transcripts The criteria for calling an RNA present varies according to the analysis method e Whole transcriptome data are often normalized for RNA length and for the total read number in the run to facilitate comparisons between and within sequencing runs One current method uses the concept of reads per kilobase kb of exon model per million mapped reads or RPKM Mortazavi et al 2008 As an example 1 RPKM is equivalent to 20 reads mapping to a 1 kb transcript per 20 X 106 20 million 20 M mapped reads The RPKM threshold for calling a transcript or new isoform present can be set more or less stringently depending on the quality of the data and the type of application Setting the RPKM threshold typically involves balancing the need for high accuracy with the cost associated with sequencing depth With current technology many researchers set the threshold at 1 RPKM SAGE analysis uses the concept of hits als
39. o known as the number of mapped reads per transcript SAGE analysis generally does not use the concept of RPKM since the technigue interrogates only a short read for each transcript With current technology calling thresholds are typically set at 1 3 hits Small RNA analysis uses the concepts of number of reads or alternatively reads per million reads RPM small RNA analysis does not use the concept of RPKM because the reads span the entire transcript length With current technology calling thresholds are set at 1 RPM About calling thresholds Supplemental data paired end seguencing for detection of fusion Figure 2 shows a comparison of paired end 50 nt forward 35 nt reverse and forward only 50 nt sequencing on the SOLiD 4 System of whole transcriptome libraries prepared from Universal Human Reference RNA UHR and Human Brain Reference RNA HBR with respect to detection of known exon exon junctions in the NCBI database For each library paired end sequencing detects more exon exon junctions for a given number of mapped reads Figure 2 Paired end is superior to forward only sequencing for detection of exon exon junctions HBR _Paired end HBR _Single read UHR _Paired end UHR _Single read T T T T T T 1 20 000 000 40 000 000 60 000 000 80 000 000 100 000 000 120 000 000 140 000 000 Uniquely mapped reads 5500 Series SOLiD Systems RNA Analysis Guide 150 000 140 000 130 0004 120 000
40. ons to RNA samples before library preparation you can compare observed ERCC transcript amounts and ratios to known amounts and ratios within and between samples These comparisons can be done with the libraries themselves using real time PCR and TaqMan Gene Expression Assays and in the sequencing data These assessments can be particularly useful when performing pilot experiments see Estimating the lower limit of detection using ERCC Spike In Mixes on page 12 Further information about the ERCC RNA Spike In Control Mixes and detailed instructions for use are provided in the ERCC RNA Spike In Control Mixes User Guide The ERCC RNA Spike In Control Mixes are not recommended for small RNA analysis due to the size of the transcripts Sequencing strategy read direction s and length The 5500 Series SOLiD System supports both forward only and forward reverse also called paired end seguencing reads for RNA Seg applications The 5500 Series SOLiD Sequencer allows variable read lengths up to 75 nt forward up to 35 nt reverse to optimize reagent use Figure 1 provides a conceptual overview of the primers used in SOLiD System sequencing of RNA source libraries FWD1 Small RNA FWD2 and REV1 RNA and the corresponding data analysis tags F3 F5 RNA BC Figure 1 5500 Series SOLID System sequencing primers for RNA analysis Forward Reverse Forward Barcode FWD1 F3 REV1 RNA F5 RNA FWD2 BC RNA derived cDNA IA
41. ries SOLID Sequencers For this number of And this You need to deposit And you can Type of analysis combine this mapped reads expected mapped this number of P2 human RNA art b number of 106 t reads beads 105 5 AD tt libraries lane Whole transcriptome gt 50 80tt 63 188 counting Whole transcriptome gt 100 80tt 125 188 discovery Small RNA counting 5 35 40 12 14 14 16ttt Small RNA discovery 10 30 35 40 25 85 2 gttt SAGE counting 2 5 30 6 17 12 33ttt See Supplemental data saturation of reads mapping to a reference in this Appendix for examples of the experiments used to derive these numbers t The percentage of all reads that map to the reference sequence Based on in house or published experiments with human RNA the percentage also depends on the quality and rRNA content of the RNA sample and the reference sequence used Total P2 beads required to achieve required number of mapped reads Mapped reads Expected fraction for example 30 0 3 mapped reads P2 beads are beads that have a P2 containing template DNA for these calculations it is assumed that gt 95 of the deposited beads are P2 t Calculation Bead density Total P2 beads For 1 micron beads bead density is assumed to be 200 X 10 beads lane Based on in house data using rRNA depleted RNA samples 88 Assumes 75 nt X 35 nt paired end sequencing run No more than 1 human transcriptome per lane recommended
42. se of your choice This software requires CSFASTA and QUAL input files A standalone program is available to convert XSQ files to CSFASTA and QUAL formats If the XSQ file includes base space data the conversion also exports the base space data into a FASTO file For further information see the LifeScope Genomic Analysis Software Command Shell User Guide The ERCC RNA Spike In Control Mixes User Guide provides information about mapping ERCC Control RNA reads and assessing performance of the 5500 Series SOLiD Seguencer using the ERCC RNA Spike In Control Mixes Information about third party analysis software can be found at the SOLiD Software Development Community website info appliedbiosystems com solidsoftwarecommunity If necessary use the conversion program described above to convert XSO output files to CSFASTA and OUAL input formats for third party software Data analysis Validation of results Validate your SOLiD System results with TagMan Assays targeting RNAs of interest available at www appliedbiosystems com e TaqMan Gene Expression Assays e TagMan Micro RNA Assays TagMan ncRNA Assays 5500 Series SOLiD Systems RNA Analysis Guide 15 RNA analysis on the 5500 Series 501 0 System Validation of results 16 5500 Series SOLiD Systems RNA Analysis Guide APPENDIX A Product information and support Reagents and kits for RNA analysis e Table 5 lists reagents and kits for high quality RNA isola
43. t rules We recommend running a pilot study in your experimental system to determine the saturation curve for the reference sequence of interest 5500 Series SOLiD Systems RNA Analysis Guide 11 RNA analysis on the 5500 Series 501 0 System This approach gives you an estimate of the number of reads needed to maximize detection of Known RNA species in the reference The assumption is that discovery of novel RNAs and seguence variants reguires acguisition of a number of mapped reads corresponding to a location well past the plateau of the saturation curve In general the more reads in a seguencing run the greater the sensitivity to detect RNA species of interest You can estimate the lower limit of detection LLD of a SOLiD System sequencing run using the ERCC RNA Spike In Control Mixes See Supplemental data estimating the lower limit of detection with ERCC Spike In Mixes on page 25 for an example with human RNA We recommend running a pilot experiment similar to that described on page 25 to estimate the LLD for your planned seguencing run s This approach can give you confidence that rare events that occur at a freguency above the determined LLD should be detectable in your system The power to detect differential expression is affected by several factors including e An accurate estimate of the population variation in your experimental system Biological replicates of a treatment or condition give you an estimate of the vari
44. tal data estimating the lower limit of detection with ERCC Spike In Mixes 25 Calculations for the number of libraries per flowchip lane 22222 27 Bibliography gt gt tiie nd DG 29 055 7 31 5500 Series SOLiD Systems RNA Analysis Guide 3 Contents 4 5500 Series SOLiD Systems RNA Analysis Guide USER GUIDE RNA analysis on the 5500 Series SOLID System Seguence based approaches to RNA expression analysis Known as RNA Seg have been enabled by the development of massively parallel high throughput seguencing technology such as the SOLID System RNA Seg methodology queries known and previously unknown RNAS in a sample a hypothesis neutral discovery approach that is advantageous compared to traditional microarray analysis which interrogates only known RNAs See Kassahn et al 2011 Marguerat and Bahler 2010 Costa et al 2010 Wang et al 2009 and Wilhelm and Landry 2009 for recent reviews about applying high throughput seguencing technology to RNA analysis SOLiD System sequencing provides several advantages over microarray platforms for RNA expression analysis The SOLiD System enables accurate estimates of relative transcript abundance throughout a dynamic range of detection that is typically greater than that of traditional microarrays and which scales with increased seguencing depth e The SOLiD System s deep sequencing capability combined with the high accuracy of 2 bas
45. tion e Table 6 lists reagents and kits for SOLiD library preparation from RNA e For products for templated bead preparation see the SOLiD EZ Bead System product page at www appliedbiosystems com e For 5500 Series SOLiD System sequencing reagents see the 5500 Series SOLiD Sequencers Reagents and Consumables Ordering Information Quick Reference Publication no 4465650 Detailed instructions for use and a complete list of required materials are provided in the user guides accompanying each product Description Isolation of total RNA from cells or plant mammalian tissue samples in tubes or 96 well plates Isolation of total and viral RNA from mammalian whole blood Isolation of total RNA from animal and plant cells and tissue bacteria and yeast Isolation of total RNA from animal plant yeast bacteria and blood Selection of poly A containing RNA from total RNA preparations Depletion of eukaryote 18S 28S 5 8S 5S rRNA from total RNA Depletion of 25 26S and 17 18S rRNA 23S and 16S chloroplast RNA and 18S mitochondrion RNA from total RNA Isolation of small RNA containing total RNA from tissues and cells enrichment of small RNA optional Isolation of total RNA including miRNAs from formaldehyde or paraformaldehyde fixe paraffin embedded FFPE tissues 17 Table 5 Key reagents and kits for RNA isolation Kit Part no Whole transcriptome analysis e MagMAX 96 Total
46. ules This sensitivity metric is similar in concept to detection in parts per million The complexity ratio is the number of native transcripts estimated in 100 ng HeLa poly A RNA 9 X 1010 assumes mean transcript length is 2 kb divided by the LLD e The detection expressed as copies of transcript per cell is estimated from the complexity ratio based on an average transcript number of 300 000 cell A calculation such as that shown in Table 8 is designed to help you decide whether the LLD expected in the sequencing runs is adequate for your experimental needs Detection copies cell 0 41 0 10 Detection sensitivity transcripts per cell equivalent Table 8 Detection sensitivity estimates of ERCC Control transcripts in HeLa poly A RNA Complexity ratio 1 735 000 1 3 123 000 LLD ERCC transcripts detected in 100 ng RNA 122 524 28 820 5500 Series SOLiD Systems RNA Analysis Guide Uniguely mapped reads 17 485 966 99 165 396 Total reads 28 200 852 163 452 796 Sample Sample 1 Sample 2 26 Appendix B Supplemental information Calculations for the number of libraries per flowchip lane Calculations for the number of libraries per flowchip lane The following table illustrates detailed calculations to estimate the number of libraries that can be configured on a 5500 Series SOLiD Sequencer Table 9 Human RNA analysis theoretical library configuration on 5500 Se
47. uring cancer progression allele specific gene expression or SNP discovery Global expression of small Small RNA discovery Small RNA enriched Forward only 35 nt 2 8 RNAs in a cell line or RNA or purified 10 30 X 106 mapped tissue small RNA eads Changes in expression Small RNA counting Small RNA enriched Forward only 35 nt 14 16 levels of known miRNAs with option of RNA or purified 5 X 106 mapped reads during a treatment or discovery small RNA condition t See Table 1 on page 7 for a summary of input RNA options for each library type See Table 5 on page 17 for a summary of kits for high quality RNA isolation from a variety of biological sources Barcoded color balanced libraries for gt 1 library lane See Table 9 in Appendix B for example calculations 5500 Series SOLiD Systems RNA Analysis Guide 14 RNA analysis on the 5500 Series 501 0 System Data analysis Sequencing run data are automatically exported from the 5500 Series SOLiD Sequencer in XSQ binary file format If an ECC primer round has been performed the XSQ output includes the sequence information in base space LifeScope Genomic Analysis Software incorporates tools for whole transcriptome and small RNA analysis LifeScope Software supports analysis of data from barcoded libraries The SOLiD SAGE Analysis Software v1 10 provides tools for mapping and counting SAGE sequencing reads with a reference databa

Download Pdf Manuals

image

Related Search

Related Contents

HDR-AS100V  Installation & operating instructions  installation manual  HMC835LP6GE - Analog Devices    ダウンロード  UNEF  Manual Descargar - Hitachi  Télécharger le fichier - Sirona  Operator Manual Guide de l'opérateur Betriebsanleitung Manual del  

Copyright © All rights reserved.
Failed to retrieve file