Home
PDF - BioMed Central
Contents
1. Identification of known miRNAs Visualization of analytic results Discovery of miRNAs miRNA expression profiling analysis Batch data processing Visualization of sequencing quality control available functions unavailable functions Low dependency As a rule such a tool should be easy to use and require no prior knowledge of specific computer programming language GUIs will save time in learning how to use this software A user friendly framework will allow biological researchers to focus on RNA data analysis and biological interpretation Also preparations of raw data and reference sequences are simplified in the GUlI based tool There are no requirements for raw data conversion Reference sequences can be downloaded from public databases and used without further manipulation Automated format conversion is also available High throughput ability Parallel processing in eRNA allows for the analysis of multiple RNA samples at the same time This approach efficiently uses computation power by balancing computer performance and running time The sample manage ment function exempts biological researchers from manually inputting numerous data sets eRNA can also be used for both small and large scale RNA seq data The package has been successfully tested in a personal computer as well as in an advanced server computer Biological researchers may customize their own computer platforms at a re
2. experiments Nucleic Acids Res 2011 39 W132 W138 4 Friedl nder MR Mackowiak SD Li N Chen W Rajewsky N miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades Nucleic Acids Res 2011 40 1 37 52 5 Mathelier A Carbone A MIReNA finding microRNAs with high accuracy and no learning at genome scale and from deep sequencing data Bioinformatics 2010 26 18 2226 2234 6 Wang W C Lin F M Chang W C Lin K Y Huang H D Lin N S miRExpress analyzing high throughput sequencing data for profiling microRNA expression BMC Bioinforma 2009 10 328 7 Ronen R Gan Modai S Sukacheov A Dror G Halperin E Shomron N miRNAkey a software for microRNA deep sequencing analysis Bioinformatics 2010 26 20 2615 2656 8 Humphreys DT Suter CM miRspring a compact standalone research tool for analyzing miRNA seq data Nucleic Acids Res 2013 41 15 e147 9 Zhu Zhao F Xu G Hou H Zhou L Li X Sun Z Wu J mirTools microRNA profiling and discovery based on high throughput sequencing Nucleic Acids Res 2010 38 W392 W397 0 Pantano L Estivill X Marti E SeqBuster a bioinformatic tool for the processing and analysis of small RNAs datasets reveals ubiquitous miRNA modifications in human embryonic cells Nucleic Acids Res 2009 38 5 e34 1 Li Y Zhang Z Liu F Vongsangnak W Jing Q Shen B Performance comparison and evaluation of software tools for microRNA deep sequencing data analysis
3. Nucleic Acids Res 2012 40 10 4298 4305 2 Trapnell C Roberts A Goff L Pertea G Kim D Kelley DR Pimentel H Salzberg SL Rinn JL Pachter L Differential gene and transcript expression analysis of RNA seq experiments with TopHat and Cufflinks Nat Protoc 2012 7 3 562 578 3 Goncalves A Tikhonov A Brazma A Kapushesky M A pipeline for RNA seq data processing and quality assessment Bioinformatics 2011 27 6 867 869 4 Kallio MA Tuimala JT Hupponen T Klemel P Gentile M Scheinin Koski M K ki J Korpelainen El Chipster user friendly analysis software for microarray and other high throughput data BMC Genomics 2011 12 507 5 Friedman BA Maniatis T ExpressionPlot a web based framework for analysis of RNA Seq and microarray gene expression data Genome Biol 2011 12 7 R69 6 Cumbie JS Kimbrel JA Di Y Schafer DW Wilhelm LJ Fox SE Sullivan CM Curzon AD Carrington JC Mockler TC Chang JH GENE counter a computational pipeline for the analysis of RNA Seq data for gene expression differences PLoS One 2011 6 10 e25279 7 Halbritter F Vaidya HJ Tomlinson SR GeneProf analysis of high throughput sequencing experiments Nat Methods 2012 9 1 7 8 Page 12 of 12 18 Givan SA Bottoms CA Spollen WG Computational analysis of RNA seq Methods Mol Biol 2012 883 201 219 19 Lohse M Bolger AM Nagel A Fernie AR Lunn JE Stitt M Usadel B RobiNA a user friendly integrated software solution for
4. for differential expression analysis Sort selected group A Sort sample names Sort selected group B HSPCO3 CRPCO1 NORM14 lt CC4F10 gt NORM21 CRPC1S NORM15 CC1FO9 NORM23 CC4FO2 gt lt CC1MO9 ss NORMO1 oo HSPC12 CC1M12 Options of Cuffdiff Name of differential expression analysis Number of multiple threads Minimum number of alignments cuffdiff 8 2 10 c Number of fragment assignment samples per gt Replicates needed for relative isoform shift tes False discovery rate allowed 50 21 3 0 05 o Maximum iterations allowed for MLE catculatioi Maximum fragments allowed in a bundle before Number of fragment generation samples 5000 1000000 100 c Save and close Figure 4 The GUls of Cufflinks in the module mRNA identification A Transcript assembling B Differential expression profiling analysis Yuan et al BMC Genomics 2014 15 176 Page 7 of 12 http www biomedcentral com 1471 2164 15 176 Sample management umber of multiple threads iv Setup I E cee maii Checking information he number of threads is 8 Read sample information file 1 A home yuan raw data rawdata microRNA 20130826 Sample NA 8 NA 8 ACTTGA L003 R1 002 fastq home yuan raw data rawdata microRNA 20130826 Sample NA 8 NA 8 ACTTGA L003 R1 001 fastq home yuan raw data rawdata_microRNA_20130826 Sample_NA 8 NA 8_ACTTGA_LOO3_R2_002 fastq fhome yuan raw_data rawdata_microRNA_20130826 Sample_NA 8 NA 8_ACTTGA_LOO3_R2_00
5. server computers and the security of remote data storage It is also time consuming to upload sequencing data and reference sequences In some cases additional modifications are required prior to miRNA analysis For example users have to trim adapter sequences and convert the inputs from FASTQ to FASTA format when using some miRNA tools namely mirTools or miRspring 21 The lack of sample management further complicates data analysis and increases the potential for errors It is impossible to analyze a large data set from numerous biological samples with different traits along with their technical and biological replicates when only one RNA sample can be processed at a time Furthermore computation running time under large data processing is another challenge for RNA seq data analysis Some tools process the datasets from only one RNA sample at a time because of their limits on parallel processing In addition for scientists without any programming experience it is often difficult to perform parameter setting and data format converting in command line tools Although some standalone tools are user friendly for bioinformaticians and computer scientists mastering such knowledge is often beyond the comfort level for most research scientists To meet these challenges we developed a GUlI based tool called eRNA which integrates common tools required for RNA seq analysis and facilitates large scale data analysis Page 2 of 12 Implementati
6. whole analytic work is split into certain components consistent with the number of multi threads eRNA automatically distributes the raw data to different components as the inputs based on the size of raw data for each RNA sample The data in each component are analyzed separately and simultaneously Yuan et al BMC Genomics 2014 15 176 Page 6 of 12 http www biomedcentral com 1471 2164 15 176 Options of transcripts assembling Options of Cufflinks Number of multiple threads 1 c Number of fragments of a locus before being si Alignment hit counting Length correction for transcript FPKM 1000000 gt total hits norm S length correction o Suppress low abundant isoforms for a gene Suppress low abundant intra intronic transcript Maximum intron length 0 05 0 05 300000 co The alpha value of binomial test Percent read overhang taken as suspiciously st Minimum number of fragments needed for new 0 001 0 09 S 10 c Maximum basepairs of a gap size to fill betwee Maximum genomic Length allowed for a given bi Minimum intron size allowed in genome 8 3500000 2 50 c Maximum fraction of allowed multireads per tre Maximum gap size to fill between transfrags ir Number of iterations during maximum Likelihood 0 75 gt 50 gt 5000 ro v O Make an initial procedure to more accurately weight reads mapping to multiple Locations Save and close r Options of differential expression analysis Select mRNA samples
7. 0 NORM21 CRPC15 gt gt lt lt Mismatches of read alignments 2 S2 Inner distance between mate pairs Standard deviation for th 100 c 20 Splice mismatches Minimum of intron length o 70 Maximum of insertion length 3 3 lt gt Options of genome mapping Length of discarded gaps of read alignments Maximum of deletion length Save and close Figure 3 The GUI of TopHat for genome mapping in the module mRNA identification Sort selected items Up Down Edit distance of read alignments 2 a v gt e distribution on inner Anchor length 28 Maximum of intron length 500000 lt lt Quality values 13 gt lt gt module The former pipeline utilizes the method imple mented in the R package DESeq to reveal differential expressed genes between two groups of given RNA samples 25 The latter pipeline utilizes the model implemented in the R package Party to predict the importance of expressed genes determined by the modules known as miRNA or mRNA identification dependent on the biological traits within the given RNA samples 26 In summary raw data and reference sequence pre paration sample information input and software pa rameter settings in eRNA are optimized to ensure a user friendly environment The learning time to un derstand RNA data analysis is minimized The pre paration of raw data and references in a successful run is significantly simplifie
8. 1 fastq home yuan raw data rawdata_microRNA_20130826 Sample_NA 1 NA 1 ATCACG_LOO3_R1_001 fastq home yuan raw data rawdata_microRNA_20130826 Sample_NA 1 NA 1_ ATCACG_LO03_R2_003 fastq home yuan raw data rawdata_microRNA_20130826 Sample_NA 1 NA 1_ ATCACG_LOO3_R2_001 fastq home yuan raw _data rawdata_microRNA_20130826 Sample_NA 1 NA 1_AT CACG_LOO3_R1_004 fastq home yuan raw data rawdata_microRNA_20130826 Sample_NA 1 NA 1_AT CACG_LO03_R2_004 fastq home yuan raw data rawdata_microRNA_ 201308 26 Sample_NA 1 NA 1_AT CACG_LO03_R2_002 fastq home yuan raw _data rawdata_microRNA_20130826 Sample_NA 1 NA 1_ATCACG_LOO3_R1_002 fastq home yuan raw_data rawdata_microRNA_20130826 Sample_NA 1 NA 1_ATCACG_LOO3_R1_003 fastq home yuan raw_data rawdata_microRNA_20130826 Sample_NB 9 NB 9_GAT CAG_LO03_R2_002 fastq home yuan raw _data rawdata_microRNA_20130826 Sample_NB 9 NB 9_GAT CAG_LO03_R2_001 fastq home yuan raw data rawdata_microRNA_20130826 Sample_NB 9 NB 9_GAT CAG_LO03_R1_002 fastq home yuan raw data rawdata_microRNA_20130826 Sample_NB 9 NB 9_GAT CAG_LO03_R1_001 fastq home yuan raw data rawdata_microRNA_20130826 Sample_NB 2 NB 2 _CGATGT_LOO3_R1_002 fastq home yuan raw_data rawdata_microRNA_20130826 Sample_NB 2 NB 2_CGATGT_LOO3_R2_002 fastq Differential expression profiling analysis Cutoff of p val Cutoff of transcriptional level on average E Sample Trai Trait names l Trait values sample_name A B C D E F G fastq_names
9. NA 1 NA 8 NB 2 NB 9 NC 3 NC 10 ND 4 ND 11 NE 5 NE 12 NF 6 NF 13 NG 7 NG 14 age 34 40 70 61 62 55 75 O resizable columns ormula of group A for differential compariso ormula of group B for differential compariso Figure 5 Sample management in program mode in eRNA A Automatic connections between miRNA samples sample names and raw data FASTQ format files B Automatic connections between miRNA samples and the traits of miRNA samples for differential expression analysis Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 Case study on miRNA seq data analysis To evaluate the performance of eRNA on miRNA identifi cation seven miRNA samples and their replicates were extracted from plasma exosomes of 7 human participants The participants gave written informed consent for the use of their tissue samples for this study Exosome isolation RNA extraction and miRNA library preparation have been previously reported 27 These samples were sequenced using an Illumina Hiseq 2000 DNA sequencing analyzer Raw data can be downloaded from http www ncbi nlm nih gov geo query acc cgi acc GSE53451 Human miRNA sequences were downloaded from miRBase http www mirbase org Release 19 2 043 entries 28 Human genome sequences were downloaded from the NCBI file transfer protocol FTP site ftp ftp ncbi nlm nih gov genomes H_sapiens 2 November 2012 Release 104 using the assembly build G
10. RCh37 p10 Bowtiel version 0 12 8 was used for sequence alignment 22 Identification of known matured miRNAs determined by eRNA was compared with the results determined by miRDeep2 4 and mikspring 8 Our study showed that there was no significant difference between identified known miRNAs among eRNA miRspring and miRDeep2 based on the same raw data of miRNA seq sequence aligner Bowtiel and miRNA references sequences Figure 6 However with increase of mismatches v and multiple alignments m in Bowtiel options miRNA pre cursors identified by eRNA covered almost all precursors identified by miRspring and miRkDeep2 due to the different considerations of these tools on multiple or non exactly matching alignments To improve the ability of eRNA with large miRNA data analysis we applied multi threads technology which assigns CPU sources in a computer with multiple CPUs or CPU cores to different analytic channels for parallel data analysis Multi threads processing in eRNA can achieve the optimized balance between the ability of computer hardware and the amount of miRNA seq or mRNA seq data The results showed that computation time decreased as the number of threads increased either in a personal computer Figure 7A or in a server computer Page 8 of 12 Figure 7B the peaks of memory usages per GB data in both testing environments are uniform The case study on mRNA seq data analysis To test the capability of eRNA
11. RNA Seq based transcriptomics Nucleic Acids Res 2012 40 W622 W627 20 Soderlund C Nelson W Willer M Gang DR TCW transcriptome computational workbench PLoS One 2013 8 7 e69401 21 Cock PJ Fields CJ Goto N Heuer ML Rice PM The Sanger FASTQ file format for sequences with quality scores and the Solexa Illumina FASTQ variants Nucleic Acids Res 2009 38 6 1767 1771 22 Langmead B Trapnell C Pop M Salzberg SL Ultrafast and memory efficient alignment of short DNA sequences to the human genome Genome Biol 2009 10 3 R25 23 Kim D Pertea G Trapnell C Pimentel H Kelley R Salzberg SL TopHat2 accurate alignment of transcriptomes in the presence of insertions deletions and gene fusions Genome Biol 2013 14 4 R36 24 Trapnell C Hendrickson DG Sauvageau M Goff L Rinn JL Pachter L Differential analysis of gene regulation at transcript resolution with RNA seq Nat Biotechnol 2013 31 1 46 53 25 Anders S Huber W Differential expression analysis for sequence count data Genome Biol 2010 11 10 R106 26 Strobl C Malley J Tutz G An introduction to recursive partitioning rationale application and characteristics of classification and regression trees bagging and random forests Psycho Methods 2009 14 4 323 348 27 Huang X Y Yuan T Z Tschannen M Sun Z Jacob H Du M J Liang M H Dittmar RL Liu Y Kohli M Thibodeau SN Boardman L Characterization of human plasma derived exosomal RNAs by deep sequen
12. Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 BMC Genomics SOFTWARE Open Access eRNA a graphic user interface based tool optimized for large data analysis from high throughput RNA sequencing Tiezheng Yuan Xiaoyi Huang Rachel L Dittmar Meijun Du Manish Kohli Lisa Boardman Stephen N Thibodeau and Liang Wang Abstract Background RNA sequencing RNA seq is emerging as a critical approach in biological research However its high throughput advantage is significantly limited by the capacity of bioinformatics tools The research community urgently needs user friendly tools to efficiently analyze the complicated data generated by high throughput sequencers Results We developed a standalone tool with graphic user interface GUI based analytic modules known as eRNA The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data The module miRNA identification includes GUIs for raw data reading adapter removal sequence alignment and read counting The module MRNA identification includes GUls for reference sequences genome mapping transcript assembling and differential expression The module Target screening provides expression profiling analyses and graphic visualization The module Self testing offers the directory setups sam
13. cing BMC Genomics 2013 14 319 28 Kozomara A Griffiths Jones S miRBase integrating microRNA annotation and deep sequencing data Nucleic Acids Res 2010 39 D152 D157 29 Langmead B Salzberg SL Fast gapped read alignment with Bowtie 2 Nat Methods 2012 9 4 357 359 30 Li H Handsaker B Wysoker A Fennell T Ruan J Homer N Marth G Abecasis G Durbin R The sequence alignment map format and SAMtools Bioinformatics 2009 25 16 2078 2079 31 Engstr m PG Steijger T Sipos B Grant GR Kahles A The RGASP Consortium Ratsch G Goldman N Hubbard TJ Harrow J Guig R Bertone P Systematic evaluation of spliced alignment programs for RNA seq data Nat Methods 2013 10 12 1185 1191 32 Steijger T Abril JF Engstr m PG Kokocinski F The RGASP Consortium Hubbard TJ Guig R Harrow J Bertone P Assessment of transcript reconstruction methods for RNA seq Nat Methods 2013 10 12 1177 1184 33 Rapaport F Khanin R Liang Y P Pirun M Krek A Zumbo P Mason CE Socci ND Betel D Comprehensive evaluation of differential gene expression analysis methods for RNA seq data Genome Biol 2013 14 9 R95 34 Khatri P Sirota M Butte AJ Ten years of pathway analysis current approaches and outstanding challenges PLoS Comput Biol 2012 8 2 e1002375 doi 10 1186 1471 2164 15 176 Cite this article as Yuan et al eRNA a graphic user interface based tool optimized for large data analysis from high throughp
14. d Results Sample management eRNA can automatically establish the connections among RNA samples raw data FASTQ format files and the traits of RNA samples Figure 5 In auto mode eRNA recognizes raw data in a certain directory and automatically associates them with RNA samples The association rule between raw data and RNA samples is based on the names of the FASTQ files This association has no limit on raw data size and is able to combine separate FASTQ files for specific RNA samples and identify group relationships among RNA samples There fore the process of data input is simplified at the FASTQ file level different from the one by one data input offered by other tools In program mode eRNA automatically con nects RNA samples with raw data based on a text file con taining pre annotated RNA samples Figure 5A Program mode includes all functions in auto mode and more specific functions for customized applications For example pro gram mode allows data combination of biological replicates with the same names or different RNA samples This mode can also automatically establish connections be tween RNA samples and their traits Therefore RNA samples can be quickly selected among a large number of samples for expression profiling analysis in the target screening module Figure 5B Another feature of sample management in eRNA is the distribution of the data flow in parallel processing Once parallel analysis is triggered the
15. d mRNA mapping Multiple mapping miRNA precursor mapping allowed RNAfold analysis F Enable RNAfold analysis discard all multiple mapping QC plot of NA 1_ATCACG_LOO3 nome syuan mysql_pre eRNA result QC_NA 1_ATCACG_LO03 jpg TWEETERS TTT ETE L i View graphs J Figure 9 Plug in tools for eRNA A GUI of the third party tools B Graphic viewers for quality control Quit Run Quit Sampie management SD viewer Number of multiple threads Z Setup sample information a c erna e Check and save Close Saas IL viewer Library insert length IL distribution viewer Select samples Plot mode Sort candidates Sort selected items Sonerated c NA 1 ATCACG LOO3 QC viewer Sequencing quality viewer lt mr Select samples Sort candidates Sort selected items gt NA 1_ATCACG_LOOS gt gt Run Discussion It is challenging for developers to strike a balance between a user friendly environment and high ef ficiency with respect to the processing of RNA seq data analysis GUIs and sample management in eRNA provide a user friendly environment and fulfill the re quirements for large data analysis The use of multi threads technology makes parallel processing of RNA seq data possible The objectives of eRNA are listed as follows Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 Table 1 Functional comparison of eRNA miRspring and miRDeep2 Functionality eRNA_ miRspring miRDeep2
16. ding is activated only in eRNA ee Figure 8 Running time under various CPU usages A and memory usages B for mRNA seq data analysis The area under the solid line shows the changes in running time CPU and memory usage when multithreading is activated only in TopHat and Cufflinks The grey area shows B O a xs Zo o D 2S 28 8h gt 45 3h ER o oO 0 2 4 6 8 10 12 14 16 Running time x10 000 s Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 Page 10 of 12 Graphic user interface of Bowtie v 1 Running mode of bowtie Bowtie output mode Bowtie directory Sequence alignment SAM bowtie output Select a folder Query input file fa o Graphic user interface of Bowtie v 2 Running mode of bowtie Alignment output directory Bowtie drectory Graw data o Sequence alignment Select a folder Name of bowtie output bowtie output Q file f jery input file fa Graphic user interface of miRspring Draw data Alignment mode end to end fast Adaptor removal and sequence alignment Select fasta files Number of multiple the Megabyte memory per 128 5 3 Attempts of consecu Enter sample name 15 Sequences of 3 adapter Select a folder GUI of Bowtie 1 GUI of Bowtie 2 GUI of miRspring Graphic user interface of miRDeep2 Sort candidates Select fast Mes 1 NA 1_ATCACG LO03_R1_004 fastq Sort candidates Sort selected
17. entral To date many bioinformatics tools have been developed to support the identification of known RNAs and analysis of RNA expression profiles A common workflow for micro RNA sequencing miRNA seq analysis includes adapter removal sequence alignment and read counting To complete this process various tools have been developed including DSAP 1 E miR 2 miRanalyzer 3 miRDeep2 4 MIReNA 5 miRExpress 6 mikRNAkey 7 miRspring 8 mirTools 9 and SeqBuster 10 Additional file 1 Table S1 These miRNA tools perform very well with respect to sensitivity accuracy and visualization for miRNA identification 11 Unlike miRNA seq a popular workflow for mRNA sequencing mRNA seq analysis includes genome mapping transcript assembling and differential expression analysis each separately accomplished by a combination of standalone 2014 Yuan et al licensee BioMed Central Ltd This is an Open Access article distributed under the terms of the Creative Commons Attribution License http creativecommons org licenses by 2 0 which permits unrestricted use distribution and reproduction in any medium provided the original work is properly credited The Creative Commons Public Domain Dedication waiver http creativecommons org publicdomain zero 1 0 applies to the data made available in this article unless otherwise stated Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 to
18. ers can easily revise any parameters in a certain step based on previous results and the refreshing run will not change the results determined by previous steps in the eRNA s pipeline If a computer lacks support for Perl Gtk2 eRNA can be operated by command Users can revise all variables in the parameters file variables txt which is always in the result directory set by users in command line mode of which parameter setup especially in the third party tools integrated in eRNA is more flexible Self testing This is the initial step in the RNA seq analysis pipeline and should be performed before any other modules This module guides all analytic steps for a successful run in eRNA including directory setup sample management third party tools checking and package dependency check ing Figure 1A The directory setup allows re allocation of raw data and results in more than one hard drive in a computer Sample management is used for task assignment in parallel processing and creating associations among raw data RNA samples and the biological traits The third party tools and package dependency checks are used for detection of third party RNA analytic tools and Perl packages required by eRNA With raw data in FASTQ format and reference sequences in FASTA format as data input the eRNA software package integrated Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 Page 3 of 12 eRNA A too
19. ervised the approach XYH prepared RNA seq libraries MK SNT and LB identified biological case studies LW RLD XYH MJD MK LB and SNT revised the final manuscript All authors read and approved the final manuscript Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 Acknowledgements We would like to thank the Human and Molecular Genetics Center at Medical College of Wisconsin for sequencing consultation and support Funding This work was supported by the Advancing a Healthier Wisconsin fund Project 5520227 and by the National Institutes of Health RO1CA157881 to LW Author details Department of Pathology and MCW Cancer Center Medical College of Wisconsin Milwaukee WI 53226 USA Department of Oncology Mayo Clinic Rochester MN 55905 USA 3Department of Laboratory Medicine and Pathology Mayo Clinic Rochester MN 55905 USA Received 30 October 2013 Accepted 26 February 2014 Published 5 March 2014 References 1 Huang P J Liu Y C Lee CC Lin W C Gan RR Lyu PC Tang P DSAP deep sequencing small RNA analysis pipeline Nucleic Acids Res 2010 38 W385 W391 2 Buermans HP Ariyurek Y van Ommen G den Dunnen JT t Hoen PA New methods for next generation sequencing based microRNA expression profiling BMC Genomics 2010 11 716 3 Hackenberg M Rodriguez Ezpeleta N Aransay AM miRanalyzer an update on the detection and analysis of microRNAs in high throughput sequencing
20. for mRNA data analysis ten mRNA samples were extracted from normal human pros tate tissue samples and sequenced The participants gave written informed consent for the use of their tissue samples for this study The use of these bio specimens was approved by the Institutional Review Boards at the Medical College of Wisconsin Milwaukee WI and Mayo Clinic Rochester MN Raw data can be downloaded from http www ncbi nlm nih gov geo query acc cgi acc GSE53452 The whole mRNA seq analysis process was finished by eRNA integrated with the third party tools Bowtie2 version 2 1 0 29 SAMtools version 0 1 19 30 TopHat version 2 0 8 23 and Cufflinks version 2 1 1 24 The use of GUIs for the parameter settings in eRNA is more intuitive than the long laborious command line arguments in TopHat and Cufflinks Furthermore eRNA has optimized the use of multi threading in mRNA seq data analysis Similar to TopHat Bowtie and Cufflinks it also takes advantage of multi threads to speed up mRNA seq data analysis However eRNA applies multi threads to the entire analytic process differing from the utilization of multi threads in partial steps in those tools Within the maximum allowed system load and memory usage of the computers the highest number of multi threads is eight the running time declined 36 from 45 3 hours to 28 8 hours at the cost of high efficient CPU usage Figure 8A and memory usage Figure 8B after multi threads opti
21. items Seed length l Seoras of mania a NA 1_ATCACG_LOO3_R2_004 fastq NA 1_ATCACG_LOO3_R1_004 fastq Sa Up 20 sa NA 1_ATCACG_LOO3_R2_004 fastq z Scores of the read g lt lt Down Genome sequences in fasta format Header exported in t Suppress header L MirBase matured miRNA file MiRDeep directory o lt Mismatch in the seed in alignment with bowtie GUI of miRDeep2 C One mismatch in the seed allowed Selected samples NA 1_ATCACG_LOO3 SHHKSAHInsert Length distribution pictures are Selected les NA 1_ATCACG_LOOS IL plot of NA 1_ATCACG LOO3 Yhomesyuanimy Sected samples Q 3 TTT o 2 3 ed o amp a at 3 o pS mena AAGATCGGAAGAGCACACGTCT Giref_seq Lz Minimum insert for pi Opti of bowtie default is bowtie R p a mRNA precursor sequences from miRBase in fas Matured mRNA sequences from miRBase in fasti mRNA species CM 1 02 4 16 nomaqround a best st Lares seq Ba Bret_seq 5 per 3 Other options Enter sample name Three letter prefix for reads Diret_seq Select another folder mena seq _ MicBase GFF fae Minimum discarded length of reads Maxium precursors in automatic excision gearing Minimum number of reads aligned to references Giref_seq is 7 s0000 sli s NEEE Mismatch allowed with miRDeep Number of nucleotides upstream of matured miR Number of nucleotides downstream of matured r 1 1 ciz ois lt Minimum score for prediction of novel mRNAs Both precursor and mature
22. l result in the iterative alignment For example the final results will be significantly affected when mapping precursors and mature miRNA are in a different order mRNA identification The pipeline in this module is categorized into Reference sequences Genome mapping Transcripts assembling and Differential expression Figure 1C Except for the Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 Page 4 of 12 Options of raw data reading Options of adapter removal Sequences of 3 adapter 5 D Pairs end R1 and R2 AcaTCGGAAGAGCACACGT CT Quality control filter Q value Sequences of 5 adapter 5 13 s AGATCGGAAGAGCACACGTCT Compression of Q value Length of adapter sequence with exact match Raw data reading 1 Seed Length l 28 1000 sil 8 Minimum query length Save and Close 18 5 J Step Il Step I M One mismatch is allowed in matching segences Adapter removal Step Ill Sequence alignment Cutoff of surpressed multiple alignmer Maximum alignment mismatches n v Save and close Options of sequence alignment The aligner for sequence alignment Bowtie v 1 A A vii vi v Maximum of valid alignments reported c a best strata Step IV Reads counting Options of reads counting a eee Reads backgroud gt hs_ref_GRCh37 p5 1 hsa_matured_miRNA uF 5 h
23. l for data analysis from high throughput RNA sequencing Outer tools Guatity viewers Help Self testing miRNA identification mRNA identification Target screening Directory setup gt Sample management Package dependency checking lt Outer tools checking Output viewing Progress bar Refresh output Stop and quit RNA A tool for data analysis from high throughput RNA sequencing Outer tools Quality viewers Help Self testing miRNA identification MRNA identification Target screening Reference selection gt Genome mapping gt Transcripts assembling Apply and Run lt Differential expression Output viewing Progress bar Refresh output Stop and quit D The module Target screening Figure 1 The modules in eRNA A The module Self testing B The module miRNA identification C The module mRNA identification RNA A tool for data analysis from high throughput RNA sequencing Outer tools Quality viewers Help Self testing miRNA identification mRNA identification Target screening Raw data reading gt Adapter removal gt Sequence alignment Apply and Run lt Reads counting Output viewing Progress bar Refresh output Stop and quit eRNA A tool for data analysis from high throughput RNA sequencing Outer tools Quality viewers Help Self testing miRNA identification mRNA identification Target screening Pata gt Target screening Vv Apply and Run Output viewing Progress bar Refresh out
24. latively low cost Integration eRNA is aiming at helping users gain insight into the underlying biology of the expressed RNAs determined by RNA seq The current version of eRNA has been inte grated with the other tools for identification differential expression profiling analysis and visualization of known miRNAs and mRNAs as well as the discovery of novel miRNAs target gene screening using recursive partitioning analysis sequence alignment and the visualization of sequencing quality control Additional mRNA seq tools 31 32 besides the TopHat Cufflinks pipeline used in the module mRNA identification more differential gene expression methods 33 besides the R package DEseq used in the module Targets Page 11 of 12 screening and the enrichment tools on pathway analysis 34 will be incorporated into future versions of eRNA Conclusions eRNA can be used for the identification of RNAs and expression profiling analysis of miRNA seq and mRNA seq data It is easy to use and requires no prior specific computer science knowledge A user friendly framework allows biological researchers to focus on biological interpretation Parameter settings and preparations of raw data and reference sequences are simplified Parallel processing in eRNA allows for the analysis of multiple RNA samples at the same time The sample management function exempts bio logical researchers from manually inputting numerous data sets Availabilit
25. mber of multi threads for miRNA seq data analysis and decreased running times A The GUI mode in the personal computer 1 CPU B Command line mode in the server computer 4 CPUs 10 15 20 25 C Running time e memory usage 30 Running time x100 min 10 20 Peak of memory usage GB 5 0 0 1 4 8 Number of threads Function extension To extend the applications of eRNA we developed plug in tools which can be run independently from the modules of eRNA Of these plug in tools GUIs for third party tools are listed in the menu Edit and graphic viewers for sequencing quality control are listed in the menu View Figure 9 GUls of the aligners The GUIs allow users to apply the aligners Bowtiel 22 and Bowtie2 29 for sequence alignment including index building separately from the other pipelines provided by eRNA Figure 9A Fourteen of 64 optional parameters in Bowtie v 1 are involved in the Bowtiel GUI and 22 of 73 of Bowtie v 2 are involved in the Bowtie2 GUI GUls of the third party miRNA tools miRspring 8 and miRDeep2 4 GUI along with the miRNA module provide common tools for miRNA seq analysis Table 1 Operations through the miRspring and miRDeep2 GUI are more user friendly and less re strictive than the operations of these miRNA tools through command lines Figure 9A miRspring is good at visualization calculation and reporting on the complexities of miRNA processing Howe
26. mization in eRNA the number of multi threads in eRNA TopHat and Cufflinks are 8 1 1 when compared to memory usage without multi threads optimization the number of multi threads in eRNA TopHat and Cufflinks are 1 8 8 This result reveals that multi threads utilization in eRNA can expedite mRNA analysis during parallel processing compared with separate runs of these third party tools TopHat Bowtie and Cufflinks miRspring miRDeep2 miRspring Bowtie reports only those alignments in the best stratum Figure 6 Venn diagram of eRNA miRspring and miRDeep2 on known miRNA precursors identification Options of Bowtiel used in eRNA A v 0 m 1 a best strata B v 1 m 2 a best strata C v 1 m 5 a best strata Default options from Bowtiel used in miRspring v 1 a best strata Default options from Bowtiel used in miRdeep2 v 1 a best strata norc v the maximum number of mismatches in the report alignments m the maximum number of the suppressed alignments if a read has multiple reportable alignments a best strata miRDeep2 miRspring miRDeep2 Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 Page 9 of 12 A 5 C Running time e memory usage A p ESD Sg B w DE a oO ESQ o ED cO o oD ex 2 og 5x xo a io S 3 a 1 2 4 Number of threads Figure 7 Correlation between the increased nu
27. ols namely a combination of Bowtie SAMtools TopHat and Cufflinks and R packages in R environments 12 Some open source analytic workbenches or soft ware solutions have been developed to integrate these different third party tools such as ArrayExpressHTS 13 Chipster 14 ExpressionPlot 15 GENE Counter 16 GenePattern www broadinstitute org cancer software genepattern modules RNA seq GeneProf 17 RNA seq Toolkit RST 18 RobiNA 19 and TCW 20 Additional file 1 Table S2 Of these the web based tools provide a GUI based computer platform User friendly access to web browsers makes RNA seq data analysis possible for broad research scientists The standalone tools however are more flexible than the web based tools Due to local installation and operation users may adjust the parameters or even write a program using command codes to meet their specific requirements For some open source tools users may revise the codes and integrate them into their own workflow for RNA seq data analysis Although there are few limits on sequencing data outputs and sample sizes the use of current bioinformatics tools remains challenging for broad research scientists due to insufficient abilities to process large data as well as the limitation on data inputs and sample management Large data analysis and multiple RNA sample management through the web based tools are not practical due to the limits of network connection the ability of
28. on eRNA can be operated in a user friendly running environment eRNA s interface has a main cascade graphic user interface GUI where multiple button operations trigger three cascade sub windows All of the operations for each module are accompanied by step by step guides and all parameters required for data analysis can be set through the GUIs eRNA is divided into several functional GUI based modules which can be flexibly used in any combination or separately operated based on the requirements for data analysis The modules Self testing miRNA identification mRNA identification and Target screening are presented as notebook pages in the main graphic interface Figure 1 After successfully operating Self testing users can easily follow the parameter setup guided by the arrow from left to right within notebook pages to perform data analysis of miRNA identification mRNA identification or RNA expression profiles This design allows a new user to start RNA seq data analysis with minimal self teaching eRNA s fast parameter setup and error free input format are superior to command line tools Due to the long duration of data analysis a visualization bar is added to show the program running status and at the same time a monitoring GUI is also activated to show the analytic status of each sample as well as the computer system load eRNA also supports a refreshing run after the first time setup to save time Us
29. ple management and a check for third party package dependency Integration of other GUIs including Bowtie miRDeep2 and miRspring extend the program s functionality Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA seq and mRNA seq data The software package provides an additional choice for scientists who require a user friendly computing environment and high throughput capacity for large data analysis eRNA is available for free download at https sourceforge net projects erna source directory Keywords RNA sequencing Bioinformatics tool Graphic user interface Parallel processing Background Advances in high throughput sequencing HTS tech nologies have achieved the analysis of genome wide RNA profiles with high accuracy and unprecedentedly deep coverage while costs continue to decrease The Illu mina Hiseq 2500 sequencing system is able to sequence 192 RNA samples multiplexed 24 samples in a single lane up to six billion paired end reads in a run http support illumina com Due to its high capacity RNA sequencing RNA seq has become a necessary research approach for transcriptomic studies and integrated systems analyses Correspondence liwang mcw edu Department of Pathology and MCW Cancer Center Medical College of Wisconsin Milwaukee WI 53226 USA Full list of author information is available at the end of the article C BioMed C
30. put Stop and quit with the third party tools is able to perform miRNA or mRNA seq data analysis miRNA identification The pipeline in this module Figure 1B is categorized into Raw data reading Adapter removal Sequence alignment and Reads counting as a GUI based step by step approach Figure 2 A third party aligner the default is Bowtiel is required in this module 22 The accurate identification of mature miRNAs based on their sequences alone is often difficult because miRNAs are short 21 23 nt and sometimes have similar or even identical sequences For example has miR 519c 5p and has miR 523 5p have the same sequence and members of the let 7 family have similar sequences Several miRNA families e g hsa let 7 and hsa miR 30 consist of highly homologous miRNAs that differ by only a single or a few nucleotides Sequence alignment in eRNA therefore includes the methods of separate and iterative alignments the panel Step III in Figure 2 The separate alignment option aligns all sequences against different references separately in a run The iterative alignment option aligns sequences against the reference sequences in a pre determined order The unmapped sequences in previous alignments will be used as query sequences in the next alignment Besides the setting parameters of the aligner the sequential order of references in particular those with closely related sequences may affect the fina
31. sa_precursor_miRNA Normalization of transcriptional level RC Read counts per sequences fo Sort candidates Save and close hs_ref_GRCh37 p5 hsa_matured_miRNA hsa_precursor_miRNA Figure 2 The GUls of the module miRNA identification Separate alignment references selection Homo sapiens GRCh37 72 dna primz p m Iterative alignment references selection Homo_sapiens GRCh37 72 dna primz a 5 z Sort selected items gt Up lt gt gt m lt lt Down g Sort selected items Up lt lt Down Save and close RNA sample selection all parameters presented in the GUIs are the same as those in TopHat 23 and Cufflinks 24 Due to GUI based parameter setup and RNA sample selection eRNA requires less time than command line based TopHat Figure 3 and Cufflinks Figure 4 In addition eRNA allows for parallel processing to maximize computation capacity which is different from the serial runs of those commands provided by TopHat and Cufflinks Target screening This module performs differential expression profiling analysis and recursive partitioning analysis Figure 1D R environment and R packages are required for this Yuan et al BMC Genomics 2014 15 176 http www biomedcentral com 1471 2164 15 176 Page 5 of 12 Number of multiple threads 1 gt Select sample names Sort candidates HSPCO3 NORMO9 CRPCO1 NORM14 CC4F1
32. ut RNA sequencing BMC Genomics 2014 15 176 Submit your next manuscript to BioMed Central and take full advantage of e Convenient online submission e Thorough peer review e No space constraints or color figure charges e Immediate publication on acceptance e Inclusion in PubMed CAS Scopus and Google Scholar e Research which is freely available for redistribution Submit your manuscript at www biomedcentral com submit C gt BioMed Central
33. ver it does not support FASTQ format inputs adapter removal and sequence alignment which must be finished in ad vance The miRspring GUI in eRNA supports raw data input and provides the whole pipeline of miRNA seq data analysis including the pipelines finished by miR spring and the pipeline of raw data reading adapter removal and sequence alignment finished by eRNA miRDeep2 is designed for the identification and dis covery of known and novel miRNA genes It also im proves the identification algorithms of canonical and non canonical miRNAs The miRDeep2 GUI in eRNA supports all operations provided by miRDeep2 and simplifies raw data input adapter removal and miRNA reference sequence preparation Graphic viewers for quality control Graphic viewers known as QS Viewer SD Viewer and IL Viewer are used for sequencing quality control in RNA seq experiments Figure 9B QS Viewer can plot distributions of quality scores Q score per sequencing cycle for each miRNA sample which can be used for sequencing quality testing 21 SD Viewer can plot RNAs against certain reference sequences to display sequencing depth indicating transcript abundance IL Viewer can plot the distribution of insert lengths from the sequencing library to show the general quality of RNA sequencing library construction gt 28 8h 45 3h CPU usage X100 0123456789 0 2 4 6 8 10 12 14 16 Running time x10 000 s their changes when the multithrea
34. y and requirements eRNA is available for free download and use at https sourceforge net projects erna source directory ac cording to the GNU Public License The user manual including its installation and the required running environments is also included in the eRNA package Any use by non academics requires license We developed eRNA using Perl language programming in the Linux operating system The developing and testing environments were Fedora Linux 17 X_86 64 bits in a personal computer equipped with one Intel Core i7 3770 K CPU 3 5 GHz 4 cores per CPU and 32 GB memory and a Red Hat Enterprise Linux Server release 5 9 X_86 64 bits equipped with four Intel Xeon X5687 CPUs 3 6 GHz 4 cores per CPU and 96 GB memory Other software environments included Perl version 5 14 Perl Gtk2 version 1 241 Bioperl version 1 6 http www bioperl org and R version 2 15 http www r project org Additional file Additional file 1 Table 1 Comparison of the pipelines on the identification of miRNAs Table S2 Comparison of the open source pipelines on the identification of mRNAs Abbreviations GUI Graphic user interface HTS High throughput sequencing miRNA microRNA miRNA seq microRNA sequencing mRNA seq mRNA sequencing RNA seq RNA sequencing Competing interests The authors declare that they have no competing interests Authors contributions TZY developed the software and wrote the manuscript LW sup
Download Pdf Manuals
Related Search
Related Contents
Manual de instrucciones del Konftel 55W Microvision User Manual SP101003.201 Installation and maintenance Readiris Pro 12 ASUS T100TAF User's Manual Valor™ 7000 Series - Ohaus Corporation Fujitsu DESKPOWER 6000/SS User's Manual QAW740 - Sorelest Firefriend DF-6513 fireplace Copyright © All rights reserved.
Failed to retrieve file